Tim notes that Googlebot is frequenting his server, and costing him real money.

A quick investigation on an old page from his log reveals;

HTTP/1.1 200 OK
Date: Wed, 03 Mar 2004 14:20:02 GMT
Server: Apache/1.3.26 (Unix) Debian GNU/Linux
Last-Modified: Wed, 03 Mar 2004 08:00:54 GMT
ETag: "1b404e-f1d-404590b6"
Accept-Ranges: bytes
Content-Length: 3869
Keep-Alive: timeout=15, max=20
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8

Like many other agents and caches, Googlebot presumably uses some “freshness” heuristic based on Last-Modified. As you can see above, Tim’s server is telling the world that even his archived content changes frequently. Ergo, Google hits him frequently. Conclusion; don’t do that! 8-)

Full disclosure; my weblog isn’t cacheable at all – not even any Last-Modified headers – and I have little motivation to fix it because my bandwidth isn’t metered.

Trackback

no comment until now

Add your comment now