2004/03/03
Tim notes that Googlebot is frequenting his server, and costing him real money.
A quick investigation on an old page from his log reveals;
HTTP/1.1 200 OK Date: Wed, 03 Mar 2004 14:20:02 GMT Server: Apache/1.3.26 (Unix) Debian GNU/Linux Last-Modified: Wed, 03 Mar 2004 08:00:54 GMT ETag: "1b404e-f1d-404590b6" Accept-Ranges: bytes Content-Length: 3869 Keep-Alive: timeout=15, max=20 Connection: Keep-Alive Content-Type: text/html; charset=utf-8
Like many other agents and caches, Googlebot presumably uses some “freshness” heuristic based on Last-Modified. As you can see above, Tim’s server is telling the world that even his archived content changes frequently. Ergo, Google hits him frequently. Conclusion; don’t do that! 8-)
Full disclosure; my weblog isn’t cacheable at all – not even any Last-Modified headers – and I have little motivation to fix it because my bandwidth isn’t metered.
no comment until now