Screen Scraping Your Way Into RSS - For PHPit, the...
(Page 2 of 6 )
For PHPit, the pattern that match the content is <div class="contentitem">[Content Here]<div>. You can verify this yourself by going to the main page of PHPit, and viewing the source.
Now that we have a match we can get all the content items. The next step is to retrieve the individual information, i.e. url, title, author, text. This can be done by using some more regular expression and str_replace() on the each content items.
By now we have the following code;
// Get page
$url = "http://www.phpit.net/";
$data = implode("", file($url));
// Get content items
preg_match_all ("/<div class="contentitem">([^`]*?)</div>/", $data, $matches);
More Website Content Articles
More By Jase Dow