SPLOG = SPAM + BLOG

Monday, October 30, 2006

Big Batch of Splogs

Here is a big batch of splogs I've managed to extract using some new code I've been working on. Combined total splog count is over 150,000. Some of these splogs are quite trickly. Many of them are cloaked meaning they appear to be 404 dead blog page but underneath it is a link farm in attempt to jack up Google search engine rankings. I'm also seeing a rise in redirected splogs and I'm working on some code to identify those in mass number.

I see that Google has been deleting quite a large number of splogs but even then they are on average about 20% effective. What that means is if a single spammer creates 1000 splogs, Google will eventually delete at most about 200 of them leaving 800 alone. Obvously this is rather poor percentage and hopefully my efforts will bump up that figure close to 90% and above.

20061030_1.txt - 19401 splogs
20061030_2.txt - 4332 splogs
20061030_3.txt - 8936 splogs
20061030_4.txt - 8794 splogs
20061030_5.txt - 18912 splogs
20061030_6.txt - 5158 splogs
20061030_7.txt - 70755 splogs
20061030_8.txt - 1182 splogs
20061030_9.txt - 11410 splogs
20061030_10.txt - 968 splogs
20061030_11.txt - 1584 splogs

Here is a tarball of all splog list files listed above: 20061030.tar.gz

Monday, October 16, 2006

Visualizing Splogs

I decided to have some fun with GD library and generate something visual representation to gauge daily splog activities. Even though the images below look like some corrupted image file but it's really a graph of daily splog activity. Every pixel represents a blog ping during that day. Blogs are sorted alphabetically from top left to bottom right. The background black pixel represents fairly normal blog. Red pixel represents a potential splog. The brighter the red, more likely it's a splog. Horizontal streaks represent a block of splogs generated by one spammer. White pixels are blogs that show excessive characteristics of splogs far beyond bright red. Of course this is just a visual representation of just one algorithm I'm working on.


Sunday, Oct. 15, 2006
2006-10-15

Saturday, Oct. 14, 2006
2006-10-14

Friday, Oct. 13, 2006
2006-10-13

Thursday, Oct. 12, 2006
2006-10-12

Wednesday, Oct. 11, 2006
2006-10-11


I can see that Friday was relatively light day compared to Saturday. On Friday, I see exceptionally long red streaks which means one spammer just went all out that day. Also I see that Thursday's splogs were much more scattered than other days. The sheer amount of data is just way too much to make sense out of by just looking at pages of numbers so this sort of visualization is really only way to do it effectively.

Friday, October 13, 2006

AdSense Spammers #304 through #351

Here are some AdSense spammers I found recently. It's been a slow going for a while but I expect that to change soon. I'm finally caught up with some data collection efforts I've been working on which means I can start testing some code against it.

#304 - 20061013_pub-0086569112426277.txt - 64 splogs
#305 - 20061013_pub-0374718219910878.txt - 188 splogs
#306 - 20061013_pub-0491670909572501.txt - 52 splogs
#307 - 20061013_pub-0826479338578281.txt - 141 splogs
#308 - 20061013_pub-1027226234026401.txt - 320 splogs
#309 - 20061013_pub-1235184352456025.txt - 42 splogs
#310 - 20061013_pub-1417869016985498.txt - 45 splogs
#311 - 20061013_pub-1540371791985906.txt - 48 splogs
#312 - 20061013_pub-1623768256260872.txt - 51 splogs
#313 - 20061013_pub-2179102579615022.txt - 48 splogs
#314 - 20061013_pub-3579364374947771.txt - 49 splogs
#315 - 20061013_pub-3956252550794831.txt - 113 splogs
#316 - 20061013_pub-4202015514897439.txt - 47 splogs
#317 - 20061013_pub-4295825832277380.txt - 53 splogs
#318 - 20061013_pub-4377593677584626.txt - 75 splogs
#319 - 20061013_pub-4629783411726898.txt - 63 splogs
#320 - 20061013_pub-4713763737975972.txt - 51 splogs
#321 - 20061013_pub-4736888333359447.txt - 46 splogs
#322 - 20061013_pub-5547795822349903.txt - 54 splogs
#323 - 20061013_pub-5951616262328939.txt - 164 splogs
#324 - 20061013_pub-6059581191391719.txt - 43 splogs
#325 - 20061013_pub-6199069860145127.txt - 226 splogs
#326 - 20061013_pub-6397672854399843.txt - 93 splogs
#327 - 20061013_pub-6665656872345453.txt - 51 splogs
#328 - 20061013_pub-6807512809326601.txt - 54 splogs
#329 - 20061013_pub-7052194275931465.txt - 58 splogs
#330 - 20061013_pub-7181340478359149.txt - 45 splogs
#331 - 20061013_pub-7296641527999074.txt - 59 splogs
#332 - 20061013_pub-7387889257879318.txt - 48 splogs
#333 - 20061013_pub-7438526762148519.txt - 51 splogs
#334 - 20061013_pub-7718058126221187.txt - 56 splogs
#335 - 20061013_pub-7736561109795979.txt - 50 splogs
#336 - 20061013_pub-7817704833552130.txt - 50 splogs
#337 - 20061013_pub-5813722382236767.txt - 34 splogs
#338 - 20061013_pub-8170626305881347.txt - 65 splogs
#339 - 20061013_pub-8350191921163884.txt - 49 splogs
#340 - 20061013_pub-9644089385489523.txt - 35 splogs
#341 - 20061013_pub-8899510804032437.txt - 48 splogs
#342 - 20061013_pub-1834658899198017.txt - 61 splogs
#343 - 20061013_pub-9204054190522966.txt - 48 splogs
#344 - 20061013_pub-9805358228556659.txt - 42 splogs
#345 - 20061013_pub-4700850531117777.txt - 75 splogs
#346 - 20061013_pub-7041538720244066.txt - 38 splogs
#347 - 20061013_pub-9046673571027200.txt - 60 splogs
#348 - 20061013_pub-3054181973973346.txt - 47 splogs
#349 - 20061013_pub-2145030168387916.txt - 38 splogs
#350 - 20061013_pub-4311428648076373.txt - 37 splogs
#351 - 20061013_pub-9483962462431328.txt - 58 splogs

Here is a tarball of all splog list files listed above: 20061013.tar.gz