Ah, that’s better.

My poor little P133 Redhat 6.2 box gave up the ghost on Friday, after about 9 years of faithful service. It had been tempermental for a couple of years, but not so much so that I had to consider replacing it. Its original replacement as my desktop machine a few years ago – a smokin’ 128 (count ’em!) megabyte PII-400 – once again performs replacement duties.

I also took the opportunity to switch distributions. I started out with Yggdrasil back in 1994 (IIRC), switched to Redhat in ’97, and now have gone with Debian after a lot of other developer types recommended it.

Except for some really odd behaviour with a bad bash, and some corrupted binaries (a bad /bin/mount makes rebooting kinda hard) which required a re-install, it’s looking nice. apt is sweet, and it makes me wonder why I ever felt the need to fork over money for rhn.

I stumbled upon an “old” paper by Dan Larner yesterday that I first read when it was published back in ’98, but had forgotten all about. I find it poignant today not because I agree with its conclusions (I don’t), but because it so well describes the tension between specific and generic interfaces, albeit without actually acknowledging the tension 8-O

I liked this image in particular;

At the top you see the generic objects/interfaces, while at the bottom are the specific interfaces; Printer, Scanner, Copier (this is Xerox, after all). But why do those services require specific interfaces? Check out the methods on Printer; Print, CancelJob, Status. Why is that needed? Why can you just not call GET on the printer to retrieve it’s status, POST to the printer to print a document, and DELETE on a job resource (which is subordinate to the printer) to cancel a job? Simple.

Many of the folks behind HTTP-NG were from PARC where ILU, a CORBA ORB with some funky extensions, provided the impetus for their W3C contributions. Like Web services proponents, their backgrounds were with systems which didn’t constrain interfaces, and so it was pretty much an implicit requirement that HTTP-NG would need to support specific interfaces by basically being a messaging layer ala SOAP. It’s too bad they didn’t take the time to study what was capable with the HTTP interface specifically, or even constrained interfaces in general. I think that’s a big part of the reason why HTTP-NG flopped.

Steve’s next article for his “Towards Integration” column is titled “Web Services Notifications”. It’s the usual high quality stuff you can count on from Steve, but I’d like to respond a couple of comments made regarding my forte, Web architecture. Something tells me Steve probably anticipated this response. 8-)

A URL specifies only a single protocol, but a service could be reachable via multiple protocols. For example, a service might accept messages over both HTTP and SMTP, but any URL for the service can specify only one of those access methods.

It’s very commonly believed that URI scheme == protocol, but that really isn’t the case. A URI scheme defines a few things, but most important are the properties of the namespace it forms, in particular scope and persistence of uniqueness, whether it’s hierarchical, and probably some other things I’m forgetting. Defining an algorithm for mapping an identifier in that namespace, to a connection to a remote server on the Internet someplace is independent of those properties. Consider that;

  • interactions with mailto URIs don’t necessarily begin or end with SMTP
  • an http URI can be minted and used successfully before a Web server is installed, or even while the Internet connection is down
  • RFC 2817 describes how to interact using HTTPS with a resource identified by a http URI using HTTP only as a bootstrap mechanism via Upgrade. Upgrade isn’t specific to HTTPS either.

There is certainly a relationship – as defined in the aforementioned algorithm above – between a URI scheme and a protocol in the context of dereferencing and sending messages, but as those last two points above describe, it’s not quite as clear-cut as “URI scheme == protocol”.

Steve also adds;

URLs can’t adequately describe some transport mechanism types. For example, message queues typically are defined by a series of parameters that describe the queue name, queue manager, get and put options, message-expiration settings, and message characteristics. It isn’t practical to describe all of this information in some form of message-queue URL.

I’ve had to tackle this exact problem recently, and I figure there’s two ways to approach it. One is to suck up the ugly URI and embed all that information in one; I’m confident that could be done in general, because I’ve done something very similar. I would highly recommend this solution if you can do it because it’s efficient. But, if you can’t, you can always use a URI which doesn’t include that information, but which is minted at runtime as a result of POSTing the endpoint descriptive data to a resource which hands out URIs for that purpose; that requires an additional coordination step, but you get nice, short, crisp looking URIs.

Moreover, not only do I believe that URIs are great for identifying message queues, I believe (surprise!) that http URIs are. Consider what it means to invoke GET on a message queue; what’s the state of a queue (that will be returned on the GET)? Why, the queued messages of course. This, plus POSTing into the same queue, is the fundamental innovation of mod-pubsub, IMO.

Next up…

URLs do not necessarily convey interface information. This is especially true of HTTP URLs, because Web services generally tunnel SOAP over HTTP.

Wait a sec, you’re blaming http URIs for the problems caused by tunneling?! 8-O 8-) Most (good) URIs do convey interface information, at least in the context of dereferencing (which is the only place that an interface is needed). So if I see a http URI, I can try to invoke HTTP GET on it (keeping in mind some of the considerations mentioned above).

Savas writes;

Mark Baker talks about the WSDL WG’s decision not to require the name of an operation in the body of a SOAP message

Just to be clear, the issue wasn’t about placing the operation name anyplace in particular. It was just that I wanted a self-descriptive path to find it, no matter where it’s located. That could be in the body, the headers, the underlying transfer protocol, in a spec, or in a WSDL document someplace.

Of course, I think having the method name in the SOAP message is harmful. I’d much rather it were inherited from the underlying transfer (not transport!) protocol, at least when used with a transfer protocol.

And to respond to this comment of his;

Web Services are all about exchanging information and not identifying methods, operations, functions, procedures that must be called. What services do with the information they receive, through message exchanges, it’s up to those services.

I’d just say that, well, at some layer you have to worry about operations. If Web services aren’t that layer, then whatever goes on top of them will have to worry about it. And my understanding was that Web services wanted to tackle this layer. FWIW, I think Jim and I agreed on this in a recent private exchange.

I’ve talked about my nose for self-description problems before in the context of RDF and media types. Now, with the publication of -04 of the RDF/XML media type registration draft, there’s another one.

Comment submitted.

The best summation of the issue, as I wrote in the ensuing thread is probably;

If somebody on the Web can’t distinguish between an RDF message which says “Mark hates bananas” versus one that says “Mark hates bananas (but not really)” (aka unasserted), then there is a failure to communicate. The “but not really” part must be part of the message. It can either be done through mechanisms in the RDF specs themselves (e.g. parseType=”literal”), or it can be done in an encapsulating spec or registry, such as the media type registration.

Joshua writes;

Mark Baker is asking for Orkut + De.licio.us. I would actually rather see FOAF + de.licio.us, and better integration with the browser.

Ah, yes, definitely. But I’d also like a decent interface to the FOAF Web. So I what I really want is …

  • Orkut, Friendster, Linked-In etc.. to be value-added aggregators of FOAF (and other, e.g. bookmark) data
  • All of those, and others, to compete based on who presents the better interface to that data rather than who “owns” it

I’m not sure what kind of browser integration would be needed, beyond something generic like mod-pubsub. Certainly an RDF viewer. But in general I like to avoid application-specific (e.g. FOAF) functionality in the browser, because that doesn’t scale. But there are exceptions (which bookmarklets are perfect for). And perhaps there’s even a way to generalize what Joshua wants from browser integration, but I can’t say because he doesn’t describe what he has in mind.

If Atom is going to consider merging with RSS, it should merge with RSS 1.0. I consider it a mistake that this wasn’t done in the first place, primarily as a means to hook into the installed base of RSS 1.0 processors, but also to gain the self-descriptive extensible goodness of RDF.

It seems the Web Service Description WG has opted not to bother trying to fix a glaring architectural flaw in Web services by maintaining the status quo of making it impossible to determine which operation is being requested for any particular (or even a reasonably sized subset of) SOAP messages. Bah.

The issue behind this problem is that the so-called “document exchange” model used by Web services is little more than wishful thinking; If only we hid the operation from the message, then we’d be more loosely coupled, yeah, that’s the ticket!. Sorry folks, it just don’t work like that. What you’ve done by removing the operation is just to make your message less self-descriptive. That’s it. The only gain is saving a few bytes.

If there’s anything worse than RPC, it’s less self-descriptive RPC.

The easiest way that we know to go about “doing away with operations”, is to just give every component the same set of operations. Then you don’t really need to think about them much of the time.

you agree with Chris Ferris? 8-)

(though I still wish he’d stop referring to HTTP as a “transport protocol”)

I subscribed to public-webarch-comments because I like to hear what folks have to say about the AWWW document.

Since Saturday afternoon, the comments list has been absolutely bombarded by spam (to the tune of about 600 messages). It’s like it’s become the Internet’s sweetest honeypot, despite presumably having a handful of subscribers. Spammers work in mysterious ways.

Luckily my new still-being-trained spam filter caught all of them after I trained it on the first one, but still, yikes.

over 900 messages as of late Sunday. Had to unsubscribe to lessen the load on my little P133 gateway box.