RSS Explained

While drafting a response to an email I received, it seemed like a good idea to do a complete write-up of RSS for everyone that reads MMH, whether or not you have a web site or blog. I’ve tried to reference information outside my limited knowledge of the subject, but cannot guarantee full citation. Some of what lies below is a reiteration of my July 2003 post about RSS; if you find it redundant, I apologize.

Subtopics of this post include:

  • The Web Page vs. RSS Feed Analogy
  • Convenience: RSS Feeds as Bookmarks
  • Convenience 2: Web Logging (Blogging) vs. Being a Webmaster
  • Convenience 3: Custom RSS Feeds
  • Trusting the Source

The acronym RSS stands for many things: RDF (Resource Description Framework) Site Summary, Rich Site Summary, or more colloquially, Really Simple Syndication [Computerworld]. But, what it stands for is ultimately insignificant; it’s how RSS works in practice that’s really cool and useful. It may be helpful to draw an analogy here.

The Web Page vs. RSS Feed Analogy

First, think about the Web. What you see is a web page (the final product). The thing that lets you see it is a web browser (called a “client”). The thing that defines the final product, and tells the client how to show it, is a simple text file written in a particular language, HTML (or some variation/combination of HTML). HTML is written in a particular way that says, “make these words appear bold”, “put this picture here”, or “link this text to a destination”. The browser takes all the HTML (content + appearance instructions), and makes everything look like how the HTML language says. What you get is a web page with bolded text, a picture, and a link.

RSS is really no different. You get a final product, presented by a “client” (called a “news reader” or “news aggregator”), based on the instructions in a simple text file written in a special language, XML. (Note that distinguishing between an Usenet news reader and RSS client is beyond the scope of this article.) The major difference between RSS and a web page is that web pages are written primarily to format/layout information, whereas RSS is more concerned with the information itself. In an RSS text file (the “RSS feed”), the XML language tells the “client” many things: this is the title, here is a link to a web page with this information, this is a description of the information contained there, here’s the authors name, this is his email address, and the information was written on this date.

So, basically, an RSS feed can contain information about many different web pages… and a “news aggregator” can present information about many different RSS feeds. What you end up with is a single place (the aggregator program) to find information from multiple feeds, each providing information about multiple web pages. You can think of RSS feeds as a list of “bookmarked” lists of web pages… a list of lists of info.

Here are some web sites where you can find a “news aggregator” for your computer: Blogspace, DMOZ, Weblog Consortium.

Convenience: RSS Feeds as Bookmarks

There are a bunch of different web sites I want to check every day. The effort it takes for me to open every single web site in my browser is enormous: open a new window (or tab), pull down the bookmark, click, it loads; open a new window, pull down the next bookmark, click, it loads; repeat as necessary. As the number of web sites I want to check increases, so does my individual effort… since performing all of those actions requires me to be “in the loop”, instantiating every site load; I have to tell the browser to do it.

Recent developments in some web browsers (e.g., Camino, Safari, bookmark grouping) make this a tad easier, but I still have to wade through everything on those web sites (pop-ups, banner ads, “featured” stories) to get to the information I want. But not any more.

Now, in my aggregator (I use NetNewsWire), if I want to keep track of the info provided by a web site, I simply “subscribe” to that web site’s RSS feed. When I launch NetNewsWire, it downloads the RSS feeds from all of the sites I subscribe to, and gives me the titles and descriptions of all the “stories” or “entries” on that site. I can quickly scan over that list of lists (with none of the other crap on the site), and click on only the stories/entries that interest me. When I click on a story, it opens up my web browser and loads only the page with the information I want. So easy!

Convenience 2: Web Logging (Blogging) vs. Being a Webmaster

Some of us are lucky. Some blogging programs we use (like Movable Type, and LiveJournal) not only update our web site (the HTML file), but also automatically generate the RSS feed (the XML file) for us. Others aren’t so lucky… like people who hand-write and/or manually maintain their own web site(s).

I code all the HTML for my work’s web site in a text editor, and manually upload the HTML files to our server. There is no built-in way for me to produce an RSS feed using this method. But, we do offer two separate RSS feeds from our site; more on this later.

Convenience 3: Custom RSS Feeds

The convenience offered by my aggregator collecting all these RSS feeds is what’s led me to write my own custom feeds for web sites I want to check every day, including FARK, The Register, and New Scientist. While these web sites may now offer their own RSS feeds, either I hadn’t discovered them at the time, or I didn’t like what they offered.

Without an RSS feed, to get the information from a web site, you have to visit the web site. I use the Python scripting language to download the HTML from that web site, analyze the HTML code (based upon how they write it), and pick out certain important pieces of information… that I later re-format into an RSS feed using the XML language. This is referred to as “screen scraping”, because I’m getting the same info that your browser is by visiting the site; I’m just plucking out what I want and putting it into an RSS feed so I can subscribe to it in my aggregator.

One major problem with this method is that if any of those sites change the way they do HTML, it will break the script that I wrote to analyze it, and my XML-based RSS feed will also be broken. Since I check them every day, though, I can fix any broken custom feeds by looking for changes in the HTML.

At work, I know exactly how the HTML is formatted… so I know exactly where to get the information for our RSS feeds. I’ve written XML generators and local “screen scrapers” to update our RSS feeds. That I know the HTML format, I also know that if I alter it, I’ll have to pre-emptively alter my RSS-feed generating code.

Trusting the Source

Gathering a lot of information from many different sources brings to mind another Internet phenomenon: email. You have one email program, and it gets your email from a single spot, your mailbox on the server. But the server receives individual emails from all over the place; in effect, it’s collecting that information just like an RSS feed news reader. Except for one thing: your mail server receives information, whereas your aggregator retrieves information. It’s a contrast between “take because I have to” versus “get because I want to” situations.

Email has become polluted. Not only are you receiving tons of spam from people you’ve never heard of… but you’re also receiving tons of spam and/or email viruses from people you “know”. I put the word know in quotes, because that “From” header in email is too easily (and too readily) faked.

While the aggregation aspect of email is much more convenient than the “visit every web site” scenario described earlier, the “visit every web site” scenario is inherently much more trustworthy, because you specify the information you want.

RSS feeds, in conjunction with your news aggregator, feature the best benefits of both email and the web: 1) You get the information you want in one place, and 2) you can be fairly sure that the information hasn’t been trifled with by a third party.

While it’s not impossible for a third party to alter a web site or RSS feed, it’s orders of magnitude more difficult to do so than to simply fake an email “From” header. With that in mind, you can better rely on the information supplied by an inconvenient web site than what you conveniently get in your email “Inbox”.

… unless you can conveniently get the information from the web site… which is what RSS feeds are all about.

If you have any questions, please feel free to contact me.