Steve’s next article for his “Towards Integration” column is titled “Web Services Notifications”. It’s the usual high quality stuff you can count on from Steve, but I’d like to respond a couple of comments made regarding my forte, Web architecture. Something tells me Steve probably anticipated this response. 8-)

A URL specifies only a single protocol, but a service could be reachable via multiple protocols. For example, a service might accept messages over both HTTP and SMTP, but any URL for the service can specify only one of those access methods.

It’s very commonly believed that URI scheme == protocol, but that really isn’t the case. A URI scheme defines a few things, but most important are the properties of the namespace it forms, in particular scope and persistence of uniqueness, whether it’s hierarchical, and probably some other things I’m forgetting. Defining an algorithm for mapping an identifier in that namespace, to a connection to a remote server on the Internet someplace is independent of those properties. Consider that;

  • interactions with mailto URIs don’t necessarily begin or end with SMTP
  • an http URI can be minted and used successfully before a Web server is installed, or even while the Internet connection is down
  • RFC 2817 describes how to interact using HTTPS with a resource identified by a http URI using HTTP only as a bootstrap mechanism via Upgrade. Upgrade isn’t specific to HTTPS either.

There is certainly a relationship – as defined in the aforementioned algorithm above – between a URI scheme and a protocol in the context of dereferencing and sending messages, but as those last two points above describe, it’s not quite as clear-cut as “URI scheme == protocol”.

Steve also adds;

URLs can’t adequately describe some transport mechanism types. For example, message queues typically are defined by a series of parameters that describe the queue name, queue manager, get and put options, message-expiration settings, and message characteristics. It isn’t practical to describe all of this information in some form of message-queue URL.

I’ve had to tackle this exact problem recently, and I figure there’s two ways to approach it. One is to suck up the ugly URI and embed all that information in one; I’m confident that could be done in general, because I’ve done something very similar. I would highly recommend this solution if you can do it because it’s efficient. But, if you can’t, you can always use a URI which doesn’t include that information, but which is minted at runtime as a result of POSTing the endpoint descriptive data to a resource which hands out URIs for that purpose; that requires an additional coordination step, but you get nice, short, crisp looking URIs.

Moreover, not only do I believe that URIs are great for identifying message queues, I believe (surprise!) that http URIs are. Consider what it means to invoke GET on a message queue; what’s the state of a queue (that will be returned on the GET)? Why, the queued messages of course. This, plus POSTing into the same queue, is the fundamental innovation of mod-pubsub, IMO.

Next up…

URLs do not necessarily convey interface information. This is especially true of HTTP URLs, because Web services generally tunnel SOAP over HTTP.

Wait a sec, you’re blaming http URIs for the problems caused by tunneling?! 8-O 8-) Most (good) URIs do convey interface information, at least in the context of dereferencing (which is the only place that an interface is needed). So if I see a http URI, I can try to invoke HTTP GET on it (keeping in mind some of the considerations mentioned above).

Savas writes;

Mark Baker talks about the WSDL WG’s decision not to require the name of an operation in the body of a SOAP message

Just to be clear, the issue wasn’t about placing the operation name anyplace in particular. It was just that I wanted a self-descriptive path to find it, no matter where it’s located. That could be in the body, the headers, the underlying transfer protocol, in a spec, or in a WSDL document someplace.

Of course, I think having the method name in the SOAP message is harmful. I’d much rather it were inherited from the underlying transfer (not transport!) protocol, at least when used with a transfer protocol.

And to respond to this comment of his;

Web Services are all about exchanging information and not identifying methods, operations, functions, procedures that must be called. What services do with the information they receive, through message exchanges, it’s up to those services.

I’d just say that, well, at some layer you have to worry about operations. If Web services aren’t that layer, then whatever goes on top of them will have to worry about it. And my understanding was that Web services wanted to tackle this layer. FWIW, I think Jim and I agreed on this in a recent private exchange.

It seems the Web Service Description WG has opted not to bother trying to fix a glaring architectural flaw in Web services by maintaining the status quo of making it impossible to determine which operation is being requested for any particular (or even a reasonably sized subset of) SOAP messages. Bah.

The issue behind this problem is that the so-called “document exchange” model used by Web services is little more than wishful thinking; If only we hid the operation from the message, then we’d be more loosely coupled, yeah, that’s the ticket!. Sorry folks, it just don’t work like that. What you’ve done by removing the operation is just to make your message less self-descriptive. That’s it. The only gain is saving a few bytes.

If there’s anything worse than RPC, it’s less self-descriptive RPC.

The easiest way that we know to go about “doing away with operations”, is to just give every component the same set of operations. Then you don’t really need to think about them much of the time.

Yet another WS-* specification with a bunch of Get* methods that fails to use the SOAP 1.2 WebMethod feature which supports HTTP GET and therefore giving important resources URIs.

WS-* effect considered harmful? It appears so.

In my ample spare time, I often fire up one of a handful of multiplayer first-person shooters that I’m familiar with, and play against opponents across the Internet. Perhaps you’ve done this too.

I frequently use a tool called Gamespy3D to help me locate the servers playing the game/mod/map I’m interested in. Of course, sometimes pinging this list of servers takes quite some time, leaving the ever-growing possibility that the ones pinged first are no longer playing the map they said they were N seconds ago. As a result of this, sometimes I double-click on a server, only to end up playing a different map than I intended, wasting up to a minute of my time.

What would be nice is if part of the “join” message that is sent from my PC to the server, contained information which declared “Here is the map I’m interested in playing, and if you’re not playing it right now, I don’t want to join”.

Of course, sometimes you don’t have the expectation that any particular map is being played, such as when you just want to join one where you know your friends hang out. So it should be optional. But when present, its value must be understood.

You know, something like SOAPAction (well, mostly), something that forms languages should support, because that server list inside Gamespy is a form.

Oh my, WS-Discovery is a Web service spec I might actually use! Horror! 8-)

When I heard what it was, and that it was written by BEA, I was sure that Yaron Goland would be involved, after all his related work on UPnP. He wasn’t, nor was he even acknowledged. Odd.

But there’s not really too much to say about it (at least until I do a detailed review). Link local discovery is a pretty well understood domain, and the authors of this spec seem to grok it at least as well as I do. The use of SOAP/XML is unfortunate, I’d say, because of its bloat; you really need to keep things lean for multicast discovery so as to fit everything in a single datagram. Some kind of binary-encoded SOAP would be useful here.

I sort of wonder why Rendezvous or LLMNR weren’t adopted; the former has a whole lot of support and running code behind it, while the latter has MS and should be published as an RFC shortly. But I suppose that nothing’s really close to critical mass in this space, so I can’t blame them for starting from scratch.

There’s also mention of a “SOAP/UDP” spec, which is “To be published”. That’ll be interesting to see, especially if there’s a compact (but still extensible) binary encoding. What’s suggested in the spec, re “UNICAST_UDP_REPEAT” and “APP_MAX_DELAY”, and comments such as “waiting for timers” suggests that it might be more a case of trying to reinvent parts of TCP rather than embracing the message-per-datagram model which seems to work so well. But my experience there is rather limited, so I’d be happy to be proven wrong.

Chris Ferris writes in response to my suggestion that processing an XML document is an all-or-nothing proposition;

I don’t see it that way. Understanding an XML document is not an all-or-nothing proposition by any stretch of the imagination. For instance, I can have a generic SOAP processor that understands the SOAP namespace but is oblivious to the content of the soap:Body element (amongst other things such as certain SOAP headers).[…]

I see the disconnect. I’m referring to any/all XML document(s). No fair saying that some specific kinds of XML documents are partially understandable, because clearly you can design one to be, and SOAP, as an envelope, is one as you correctly point out.

So, consider this XML document;

<iwoejaf xmlns="http://example.org/oijerwer">
  <ijrwer>inm4jvxc</ijrwer>
</iwoejaf>

That’s the kind of document I’m talking about. Wouldn’t you say that understanding that document is all or nothing? You either recognize the namespace or you don’t, right? Well, that’s not the case with RDF/XML since it gives you “partial understanding”; if that document above were known to be RDF/XML (and it is valid RDF/XML), then an RDF/XML processor can extract information from it piece-meal (in triples). Now, maybe none of the terms in any of the triples will be recognizable, but perhaps if you dereference the URI for each of the terms in those triples, you’ll find that the terms you don’t know are related to ones you do.

Now can you see why TimBL is so keen to see folks use RDF/XML? It’s the answer to the schema evolution problem.

HTTP is a great application protocol, for the application for which it was designed… the Web.

Finally, something we can agree on! 8-) Now, if only you understood what the Web was, and was capable of, we’d be all set.

Chris responds to an earlier comment of mine.

My point remains that HTTP is not suited to extension of its methods because it requires centralized administration of the method names. You can’t simply make up a new method like MONITOR and deploy it unless you go through the IETF to revise the HTTP specification. Unless you do, then there’s no way that anyone could tell the difference between Mark’s MONITOR method and mine (should I devise one) and yet they might be very different animals.

His point about decentralized method definition is very well taken; HTTP does not permit method names to be URIs. PEP attempted to remedy this, but was never deployed.

But in order to use MONITOR, the HTTP spec doesn’t need revision. There are a multitude of HTTP extensions which are defined as standalone extensions which required no revision of HTTP itself. Consider WebDAV. And as Dave Orchard noted, HTTP is rife with extensibility points; this is no accident, because HTTP was explicitly designed to be extended. Which brings us to this comment;

HTTP wasn’t designed to support pubsub. Just because some sharp people can take the protocol and tweak it here and there to enable pubsub doesn’t change that fact.

HTTP was not designed to support pub/sub, but so? Was SOAP? Nope. But that doesn’t prevent one from using it that way. What’s important is that it wasn’t designed in such a way that prevented (even by being merely a “poor fit”) its use for pub/sub, and IMO, neither HTTP nor SOAP were. HTTP was designed for document transfer, and pub/sub fits there perfectly.

Chris then asks a very good mod_pubsub question;

However, all that aside, I am curious about something else related to mod_pubsub. Sure, it makes use of HTTP GET as well as POST, but are GET and POST really the methods? I mean really… how is this any different than the way in which SOAP uses HTTP POST? do_method?

This is largely what I was referring to when I said “well, parts sure are, but the bulk of it?” in reference to mod_pubsub and hackery. The client portion of mod_pubsub – the Javascript Web server and library – had to resort to lowest common denominator; AIUI, they couldn’t present access to HTTP internals to developers. AFAIK, that’s why “route” is a parameter rather than a new method. It’s actually semantically quite close to WebDAV’s COPY method, both in that it is essentially a copy action, but also that it’s an “interaction at a distance”, i.e. that two URIs are used as arguments, rather than just one which would have required the data being routed to flow to the client. But doing this RESTfully, I could easily imagine a ROUTE method.

This is different to a typical Web services approach (I won’t say “SOAP approach”, because SOAP can be used in so many ways), due to two reasons IMO; first, the semantics being tunneled are uniform, and second, they’re tunneled because there was no other way to do it. In my observation, Web services developers use tunneling primarily because they don’t know how to solve their problems without tunneling, and because they’ve been lead to believe that “protocol independence” is a feature rather than a bug.

When the Web Services Architecture WG closed down, I took the opportunity to ask working group members what their reasons were for not using REST as a base for Web services. I continue to hear, on an almost daily basis, about how the Web is for humans, so that’s what I expected to hear. Instead, to my surprise and elation, I heard comments such as this from Roger Cutler;

Although I have not put the time and effort into studying it enough to be very sure, what I have seen of the REST-like solutions you have proposed or described to problems addressed by Web services indicates to me that it COULD have been done that way and that it would have worked. In fact, it’s even possible that it would have worked better and that it would have been better had it been done that way. I don’t really know that this is the case, but I think it’s possible it might be. I also think it’s utterly irrelevant. What’s done is done, and the world ain’t goin that way. In hindsight there are many, many places in the way all sorts of things have developed in the world that might have been done better or more directly. The progress of human affairs is imperfect at best. I personally participate in those imperfections.

A couple of other people responded, basically agreeing with Roger.

This is an important milestone, I’d say. It seems to signify the end of the “REST Wars”, as some Web services folks now accept that there are RESTful solutions to application-to-application integration over the Internet. Stage one is complete.

Stage two – which I’ve been arguing alongside stage one, but can now apparently focus upon more intently – is about software architecture; that unless your architecture has the properties your environment requires, you will fail. Even pervasive agreement on an architecture lacking in those properties is an insufficient condition for success.

Onward to stage two! Let’s hope this one doesn’t take another four years of effort (that message was my first message critiquing Web services, AFAICT).

Dave Orchard wrote, and Don Box concurred, that it’s a good thing to avoid registration at the likes of IANA and IETF. I also concur, as my hopefully-soon-to-be-BCP Internet Draft with Dan Connolly describes.

Where I disagree with Dave and Don, is summed up by Dave;

XML changes the landscape completely. Instead of having a small number of types that are registered through a centralized authority, authors can create arbitrary vocabularies and even application protocols through XML and Schema. In the same way a client has to be programmed for media types, a client must be programmed for xml types and wsdl operations.

IMO, XML doesn’t change the landscape in that way at all. It’s always been possible to have an explosion of data formats and protocols; 10 years ago you could have done it with ASCII and ONC or DCE. The fact of the matter is that we don’t see these things on a large scale on the Internet because most people don’t want them. Not only is it expensive to develop new ones – even with a fine framework for their development, such as SOAP & XML Schema – but you’re very typically left amortizing that expense over a very narrowly focused application, such as stock quotes or shoe ordering, or what-have-you. The Web and Semantic Web efforts are an attempt to build a supremely generic application around a single application protocol (HTTP) and a single data model (RDF). Now that’s landscape-changing.