Regarding Google/Feedburner, and Spam Blogs
by Mike on May 23, 2007

I was just reading some of the commentary about the Google/Feedburner story from earlier today. We confirmed the previous rumor, and the story was soon up on Digg and Slashdot, although Slashdot called us “TechDirt”. That’s ok, though, they got the link right.

The spam blogs were out in force, with 10-12 blogs lifting the article wholesale and without attribution. That’s pretty normal, but what isn’t normal is for a large blogger to get tricked into thinking a spam blog was the actual source, and linking to it.

Randy Charles Morin, a respected blogger with nearly 10,000 RSS subscribers, seems to have fallen for it though. He linked to a blog that lifted our article and reposted it word-for word and without any attribution at all.

That’s not good. I don’t mind losing the link that much, but I do mind if spam blogs start to see a real return on their infringement because big blogs mistakenly link to them. That can give others an incentive to start their own, and the problem gets worse. It’s also embarrassing for the blogosphere when a respected blog makes an error like this.

I’ve copied both posts below. Ours on top, the spam blog on bottom.

Comments

Mike, can you help me understand why “any” lifting is ok? I don’t get how it benefits TC at all. I wrote about this in January (http://www.centernetworks.com/copyright-how-much-lifting-is-ok) because frankly it’s wrong I believe. What’s the difference between this and just stealing a book from Borders? You spend a ton of time writing the content only for some other site to steal it completely and profit from it.

 

There is one difference: 105 comments on “TechDirt”, 0 (zero) comments on the splog.
This tells a lot to who cares to look at it. Zero comments on a good post ~= splog.

I wonder if search engines look at the number of comments for their ranking.

 

Number of comments would not be good Ugo. Some sites have comments turned off.

 

I doubt he is saying that it’s OK, more like it happens and there’s not a lot that can be done to stop it.

 

Nice catch. Link removed. I make an effort to link to c-listers, which leads to these problems. Did you lift me from the blacklist just to take a shot at me? Respected blogger? Wow, where did you source that? My wife and kids don’t even respect me :-p

 

I suspect that Randy heard about the story and then found the post he linked to using one of the blog search engines.

The rest of the blogosphere is a link trust relationship so the only way that splogs can tap in are through the search engines, and through trackbacks

I have no doubt as to where the responsibility for splogs lies, it is with the search engines (particularly Technorati) and also the blog providers (Blogger) who make it too easy to setup a splog

I just can’t believe that Technorati isn’t smart enough to say ‘well, this post from a blog with zero trackbacks is word-for-word the same as the post this other guy who has 17,000 trackbacks just posted - perhaps I shouldn’t include it in my results?’

The reason why Google kicks ass is because their development is driven by providing accurate and clean search results for users, it is not driven by the number of blogs or posts that they index

 

I just can’t believe that Technorati isn’t smart enough to say ‘well, this post from a blog with zero trackbacks is word-for-word the same as the post this other guy who has 17,000 trackbacks just posted - perhaps I shouldn’t include it in my results?’

Yeah, but Technorati can’t even handle 301 redirects.

I buy a new domain for my blog and my technorati authority goes from 1400 to 0.

What I can’t believe is that the rssweblog has over 9000 subscribers. I’m adding them to my comment blacklist now (I’m pretty sure they’re scraping me as well, I’ve seen referrers)

 

Woah, I take back my last comment completely. I misread the original post and thought therssweblog was the splog, not the person who linked to the splot.

Sorry Randy, and thanks for linking me (he did not scrape my content — I saw an rssweblog trackback because of a genuine link).

Time to go have more coffee… or maybe, less coffee.

 

More coffee. Once you cut back on coffee, you turn into a 60s flower child.

 

To the corporate cutthroatedness of the 80s I go then.

 

I hate to say it but if more of these search engines would use Spinn3r data they wouldn’t have this problem.

We can find duplicate content and remove it from our system…..

We might actually be exposing a “duplicate or not” style API to hopefully solve this problem moving forward.

Kevin

 

@Allen Stern
I’m only saying that the number of comments is an additional evidence people and algorithms can consider.
As Nick wrote, search engine could try to be smarter, and properly rate blogs with comments disabled as well as blogs with spammed comments.

 

Leave a Reply

Create a Gravatar for your comments.
« Back to text comment