I noticed some strange referrers in my http logs. I saw some entries like:
80.175.64.69 - - [08/Aug/2005:01:30:59 -0700] "GET /imgs/rss20_logo.gif HTTP/1.1" 200 989 "http://lisa.duckdriver.com:3420/cache.px?id=22823" "Mozilla/5.0 (compatible; Konqueror/3.1; Linux)"
It looks like there is a web site called DuckDriver which states it is a Personal Internet Manager. It looks like it is caching my site. But that is not the bad part, it looks like it is stripping part of my html. One of the things that it is stripping is my Google Adsense javascript.
I have blocked access from clients with this referrer, and from the host that is doing the caching itself (80.175.64.72).
Update: It looks like the site that had a cache of my content has been made inaccessible from the outside, and access are redirected to this page. I found something interesting on that page:
Because the bots do not 'spider' your website (ie. they do not recursively grab pages) they do not check robots.txt before scanning. The ability to regularly scan the submitted page is essential for the upkeep of the Blogwise database and we require this action to avoid delisting blogs unnecessarily.
So if I wanted to prevent the indexing from their spider, I wouldn't be able to use the standard robots.txt mechanism. This seems wrong.
Update #2: Sven from Blogwise added a comment describing that this is the Blogwise cache. It is not intended to be a public cache. I will be removing the blocks that I added before.
Technorati Tags: Blogging, DuckDriver, system admin
Paul - this is the Blogwise cache. Presumably you've either submitted your site to Blogwise or you're already listed.
ReplyDeleteIt's not indexing your site - merely taking a cache of the page submitted so that I can do further analysis of it without taking unnecessary bandwidth.
The cache is designed to be private - ie. it's not intended as a public copy of your site. I realised this was open earlier this week and hence put the bounce in to an explanation page.
I've also changed the useragent and referer so that it's explicity coming from Blogwise -- the activity has nothing to do with DuckDriver (my employers, who kindly let me host my hobby site on their connection).
Sorry for the confusion, and I am very open to people's views on this, so your feedback is valuable.