Monthly Archives: June 2003

Rogue AV Spider Referer Goo

Yesterday, I got a whack load of Altavista “goo” in my referer log. It all came from a single machine, The requests were for valid URIs on my web site, and some of the referer URIs were valid, but it had nothing to do with this site or the URIs being requested. I’ve seen referer spam before, but this didn’t appear to be it (hence “Goo”), as the referer sites weren’t obviously commercially oriented or associated in any way.

Here’s some of the log entries; -> /2002/09/Blog/2003/02/06 -> /2002/09/SemanticWebHumour -> /2001/03/James/2002/OneYearOld/112-1276_img.html -> /2001/08/Tremblant/105-0518_IMG.html -> /2001/11/37Charles/2002/Renovation/113-1365_img.html -> /2002/09/Blog/2002/12 -> /2002/09/Blog/2003/06 -> /2001/03/James/2002/OneYearOld/112-1249_img.html -> /2001/08/Tremblant/dirindex.html -> /2001/03/James/AfterWeek6 -> /2001/03/James/2002/OneYearOld/112-1231_img.html -> /2001/03/James/2003-Two -> /2001/03/James/2002/OneYearOld/113-1331_img.html -> /2002/09/Blog/2002/10/25 -> /2001/11/37Charles/2002/Renovation/114-1417_img.html

Has anybody else seen this?

So that’s what an XML Catalog does

I’ve heard about XML Catalogs before, but never in a context that piqued my interest enough such that I’d want to go learn what they were. Thanks to Norm Walsh’s description of them today in his weblog, I now know.

The idea, it seems, is that you need different identifiers in different contexts. So, for example, a http URL for some document won’t be usable when you’re offline, so you need a way to package that identifier, with the local one on the file system.

My view is that while I agree this is a problem, I don’t think new standards are required to fix it. I suggest that better technology is what is required.

“” is an identifier for a DocBook DTD, and independent of the online status of your notebook, it remains an identifier for that DocBook DTD. What’s needed are operating systems, browsers, and network libraries that, when offline and asked for a representation of the resource identified by that URI, returns a cached representation.

Another consequence of this is that “File->Save As” in a browser should be de-emphasized. I’d prefer it be just “Save” or “Store” or something like that where the user isn’t prompted for a file name. The implication being that the file already has an identifier, so why does it need a different name on my computer? Obviously you’d still want access to “File->Save As” in some cases, but I don’t believe it’s what most people need most of the time.


Simon St. Laurent reports on Norm Walsh’s XML is not Object Oriented essay.

Simon writes;

The only thing I can think to add is that XML is pretty explicitly a rejection of an aspect of OO practice that Norm touches on only briefly: encapsulation. Everything accessible all the time is pretty clearly a hallmark of XML work. You can hide things if you want to, but it takes a lot more effort.

I’m pretty sure that Simon meant to say “data hiding” instead of encapsulation there, as the last sentence suggests. Encapsulation refers to the binding of associated data and behaviour into an identifiable whole. Data hiding refers to, well, hiding that data by not exposing it via the interface. There are many OO fanatics, myself included, who believe that you don’t need data hiding to be OO.

FWIW, I consider the Web to be the epitome of the anti-data-hiding view; resources as objects, URIs as object identifiers, GET as “give me your data”, POST as “process this data”, etc..

Don Park likes Tuple spaces

While noting that Roguewave has terminated its XML/tuple-space project, Ruple, Don Park wrote;

I am getting a dangerous itch to apply tuplespaces to web services workflow problems. TupleSpaces are extremely powerful as coordination infrastures so tuplespaces and web services go very well together IMHO.

Don, do you realize that REST’s uniform interface (GET/POST, etc..) defines a coordination language very similiar to a tuple space?

And for enabling workflow, there’s the additional REST constraint of using hypermedia as the engine of application state.

TimBL on the Semantic Web and Integration

Jeremy Allaire posts a transcript of a “conversation”(?) with Tim Berners-Lee on the Semantic Web at PC Forum.

Here’s a snippet which includes some of Tim’s words, plus Jeremy’s commentary;

TBL: business model for semantic web is the biz model of the web. it’s how apps interoperate, it’s how apps talk. short answer: dramatically reduce cost of enterprise app integration.

(My side conversation with Adam Bosworth, BEA chief architect and ex-Microsoft, Adam helped shape many of the XML standards. We both agree that this RDF thing is a big joke and TBL is on another planet. Adam helped drive the creation of XML Schema and XML Namespaces, as well as Web Services standards that uses these, and these are the things that are actually driving the semantic web. Virtualy no one uses RDF, but nearly everyone is moving to these other standards).

I’m a big believer in the technology behind the Semantic Web, but am skeptical that it will see widescale deployment anytime soon, due mainly to the (current) lack of a killer app. But that doesn’t reduce its value for application integration by very much. As we’ve seen, any form, of exposure of a system in a machine processable manner is an improvement over the alternative of having no access. It sounds to me like Jeremy and maybe Adam don’t even see the Semantic Web as a solution to the same problem that they’re tackling in their Web services work. Well, it is, and it’s worth investigating further before so easily dismissing it.

I’d recommend reading an earlier blog entry about the value of the Semantic Web for integration.