Re-Re-Distribution is a No-No: Scraping the Scraper

There’s an interesting post over on boingboing. Goes like this: Google News gleans its published content from hundreds of other news websites’ content; it’s an aggregator of sorts. A webdev guy [blog] [post] decided to publish an RSS feed on the ‘net for that same Google News content… only he’s getting it directly from Google News, not by scraping info from other news sites.

What seems obvious (to me) is that Google must have obtained permission from each website whose content gets republished.

Google: Can we have permission to publish capsules of news stories you release with links directly to your site?
News Site: Hrm… the world’s most popular search website actually wants to drive traffic to our site. I dunno… maybe.

They’d be idiots not to grant Google the rights to redistribute content. The poor webdev guy, however, didn’t ask Google if it was okay to make his RSS feed publically available, so they told him to stop… which he did, I guess. This is the same type of thing that Derek has posted about [post 1] [post 2]: namely, someone redistributing something you own without your permission.

I’m doing the same thing as webdev guy. I wrote the scrapers for Fark, Register and New Scientist because I wanted RSS feeds for those sites for personal use. Later, I thought it’d be cool to make them available here in case anyone in my limited audience found them useful. Okay, and to show off the Python skills a tad, I must admit.

Thing is: Nobody’s complained about it. If Fark, Register or New Scientist told me to knock it off, I’d rip the RSS feeds off the site without blinking. No doubt I’d keep them around at some highly obfuscated URL so I could continue to read them for personal, non-commercial and educational use only.

But the question remains: Are the RSS feeds offered here (under the pretense of helping regular people) hurting traffic or otherwise damaging the websites they scrape? Or, is MMH (or NB as a whole) too small to register a blip on the radar, and they’re not experiencing or noticing any ill effects? Or, do they know and just don’t care?

Afterthought: To ensure that I’m not being a complete dickhead by offering those RSS feeds, I’m going to contact each website and ask their explicit permission to do so. More later…

Advertisements