Jump to content



Welcome to AstaHost - Dear Guest , Please Register here to get Your own website. - Ask a Question / Express Opinion / Reply w/o Sign-Up!

Replying to Operating On Google News


Post Options

    • Can't make it out? Click here to generate a new image

  or Cancel


Topic Summary

Pavarr

Posted 28 January 2006 - 01:53 PM

First of all, we have to store our Google news in variable:

$googlenews = file("http://news.google.com/news/en/us/world.html");

Then, we get ourselves a table, containing news' link with title, popularity on Google News, how old is the news, and where it has been found by Google:

$popularity = 0; // table index

for($i = 46; $i < count($googlenews); $i++){ // real news start at line 46

$all = explode("<font size=",$googlenews[$i]); // it makes it easier to retrieve headers

for($j = 0; $j < count($all); $j++){

$act = $all[$j]; // actual current chunk

// a bit of cleaning up
$act = str_replace("</tr>","",$act);
$act = str_replace("</td>","",$act);
$act = str_replace("</table>","",$act);
$act = str_replace("</b>","",$act);
$act = str_replace("</font>","",$act);
$act = str_replace("<nobr>","",$act);
$act = str_replace("&nbsp;","",$act);
$act = str_replace("<br>","",$act);

// enough cleaning

if(stristr($act, "-1>") && stristr($act, "<font color=#6f6f6f>")){ // checking for markers of _real_ news

	$where_time = str_replace("-1><font color=#6f6f6f><b>","",$act); // getting where and time as one string
	$gdzie_czas = str_replace("</nobr>","",$gdzie_czas); // another cleaning routine

	$where_time_arr = explode("- ",$where_time); // dividing to time and source
	$popularity++; // getting current table index
        
        $where = $where_time_arr[0];
        $time = $where_time_arr[1];
        // we know now where news was found, let's get news title & link

	$news = explode('<td valign=top>',$all[$j-1]);  // right things be right :)
        $true_news = $news[1];
	$news_array[$jak_dawno] = $where.'|'.$time.'|'.$true_news; // table input
}
}
}

Now, that we have got the array with pure information, we can do virtually everything with it, for example:

foreach($news_array as $value){
       $values_arr = explode("|",$value);
       $where = $values_arr[0];
       $time = $values_arr[1];
       $news = $values_arr[2];
       echo "$news - found in $where $time<br/>";
}

We can also search it for keywords, or do with it something like that.

I have to add, I haven't yet developed a simple and always-working way to exclude the link from the news title - sometimes there is unknown bug and there are residual chars form link :/ But even without it, it's fully functional.

Review the complete topic (launches new window)