A long time ago, on a domain not my own, I wrote a PHP parser that handled pretty much all versions of RSS/RDF xml feeds. I released it for free, and a few people used it. It wasn’t pretty code, but it was small, and it was easy to use, and it didn’t care what sort of feeds you threw at it., it just chewed them up and spat out links. In fact, it didn’t even care if those feeds validated perfectly.

Today I’m releasing the first ever upgrade of that script.

HuddledParser is still written in PHP (tested in PHP 4.x), but now it is object oriented, and [new] it handles ATOM feeds as well. (tada!) Now, just to be clear: this is still not a fancy-shmancy feed formatter. I made it object oriented because it saves me some code and a bunch of nasty global variables.

HuddledParser doesn’t try to present all the data that is in a feed; that’s not it’s purpose. Rather, it creates a headline summary from a feed by grabbing the title, link and summary for the feed, and for each entry, (regardless of feed format), and putting them into a list for you. You have the option of specifying a maximum number of entries to parse, and even a maximum length for the “summaries” so you can cut feeds like LockerGnome’s full content feeds down to a size that you can feel comfortable putting in a sidebar on your web page. And you never have to worry about other site’s feeds breaking your layout, because all the HTML is stripped from their content (even if they specify a CDATA section).

Feel free to grab the source and play with it. Please let me know if you have any problems, or encounter any feeds that it cannot parse.

Before I go into details on how to use this script, let me say this: there’s still one thing I’d like this to do that it doesn’t do now. Currently, when you call ParseURL(...) you get text back which corresponds to the selected information. What I’d like to do is make it so that you could call ParseURL(...) multiple times, and it would nest all that data inside a list of lists. Then, when you wanted to output the feeds, you could call .ToHTML(...), or you could enumerate through the nested lists yourself (say, if you wanted to create DHTML menus or combo-boxes or something). But that will wait until the next version.

Basically, it works like this:


include "HuddledParser2.php";
$feedParser= new HuddledParser( 10, false);
print  $feedParser->ParseFeed( "http://www.HuddledMasses.org/feed/rss2/", "HMO");

 

The two required arguments for the HuddledParser constructor are the maximum number of entries to fetch per-feed, and whether or not to show the “summary” of the entries (the summary will still be inserted as the “title” attribute of each link, but it won’t clutter your page up). The two required arguments for the ParseFeed function are the URL of the feed, and a unique name for that feed (for the cache file). There are several optional arguments for each:

HuddledParser( $maxEntries, $showSummary, $cacheFolder, $cacheTime, $titleLimit, $summaryLimit )

  • maxEntries – A number. The maximum number of entries to retrieve
  • showSummary – A boolean (true or false). Whether or not to print out the summaries in divs under each entry title.
  • cacheFolder – A path. Defaults to an “xmlcache/” subfolder of the folder it’s called from. This must be writeable, so we can store our cache files there.
  • cacheTime – A number. The number of seconds before we consider our cache “old” and refetch the feed. Defaults to 3600 (one hour).
  • titleLimit – A number. The maximum length (in characters) of titles. Note: we do not cut words off, so this is approximate. Defaults to -1, no limit.
  • summaryLimit – A number. The maximum length (in characters) of summaries. Note: we do not cut words off, so this is approximate. Defaults to -1, no limit.

ParseFeed( $url, $cacheName, $title, $summary, $maxEntries, $cacheTime )

  • url – A URL. The URL of the xml feed to parse
  • cacheName – A file name. The name of a file to store a cache in, this must be unique to each feed.
  • title – Text. Allows you to override the title of the feed
  • summary – Text. Allows you to override the summary of the feed contents
  • maxEntries – A number. Override (for this feed) the maximum number of entries to fetch
  • cacheTime – A number. Override (for this feed) the number of seconds to consider our cache “fresh”

Edit: May 11, 2006

This script is now also available as a WordPress plugin