Wednesday, February 15, 2006

The Good, the Bad and the Ugly

It's been a while since I posted so here is an update on what's happening behind the scenes. I've been busy with splog fighting in many fronts but mostly analysis of collected data. So far my blog archive has grown to 1.3 million blog. My newly purchased 300 GB hard drive is about half full at the moment. My analysis script has identified about 275,000 blogs as splogs. The script currently identifies about 50% of new blogs created daily as splogs. I expect this percentage to eventually rise to about 70% when I implement two more layers of filtering. So far I've made pretty good progress at identifying splogs though mostly automated means.

There are couple things spammers are doing to makes their splogs tricky to detect. More and more I'm seeing hidden text and links in splog masked via CSS. Also I've noticed that some splogs that are completely disguised as a normal looking blog when seen through the browser yet when looking at the source it has a very different motives. Then again I've seen other splogs doing completely the opposite. The content of the splog is innocuous but the whole page has been overwritten by a JavaScript turning the blog into a splog. These spammers are getting much more sophisticated to allude detection. Currently I'm making efforts to counteract these anti-detection measure.

As expected splog situation is constantly in flux. Up till about a month ago majority of splogs being created were gambling and porn related. Now all of a sudden there is a huge surge in splogs funded by AdSense. To make matters worse Google has pretty much stopped shutting down spammer's AdSense account no matter what they do. I'm suddenly seeing thousands of splogs pointing to an AdSense plastered webpage with little or no real content. I'm starting to see some really heavyweight well funded spammers entering the splog fray. Few days ago I've identified one splogger who registered about 900 .com domains for the sole purpose of creating splogs funded by AdSense. When I report these situations to Google AdSense, the splogs gets deleted eventually but for some unknown reason Google doesn't do anything about the blatent AdSense policy violation. Google is simply looking the other way. I wonder if this has to do with Google not meeting the last quarterly wall street expectations. I'm beginning to seriously doubt whether Google is still sticking to it's corporate philosophy of "Do No Evil".

No comments: