Thought for the day; how different would the Web services architecture be, if the Web didn’t exist?
Via Savas, a pointer to a paper by Jeff Schneider titled The World Wide Grid. It includes some incorrect assumptions about the Web that I’d like to address. Luckily, they’re summed up in this statement;
The focus for the web was to describe pages that could be linked together and read by people.
Bzzt. Can this be read by people? Nope.
The same mechanisms used to permit a browser to pull in HTML from any HTML-serving site around the world, can also be used to enable an automaton to pull in any kind of data from anywhere around the world (namely, GET + URIs). You are using data, right? 8-)
Even if you accept that all of these different approaches (Grid, WS, Web) are workable solutions to the over-the-Internet application-to-application integration problem, do you really want to bet against the world’s most successful distributed application?
Patrick and Stefan pick up on my “Linda did win” meme
Obviously the Web isn’t Linda; clearly, there are important differences. But IMO, there are more important similarities. Both Stefan and Patrick claim, in effect, that the way the Web is used today doesn’t leverage those similarities. While I agree that we haven’t yet taken full advantage of it, the most common use of the Web today is for browsing in a Web browser, and I see that as very similar to just rd()-ing (reading) a space, or at least far more similiar to the Web than a Web services approach of invoking getStockQuote or getPurchaseOrder.
How different from the human-centric Web is a tuple space returning HTML on a rd()?
… you’d have to negotiate much of the hard stuff that TCP accomplishes (e.g. flow control), for each different party you wanted to interact with.
When building an Internet scale machine-to-machine stack, your objective is to embue it with sufficient information to enable ad-hoc integration between parties which implement it. Agreeing on only a “messaging layer” while not agreeing on an interface, prevents two parties from integrating in an ad-hoc manner, just as agreeing on IP alone (or IP plus part of the TCP spec) is insufficient to permit an ad-hoc TCP connection to be established.
Congratulations to Tim on his new position at Sun.
In an interview he gave about the move, he said something interesting that I’d like to comment on. When asked whether he’d looked into peer-to-peer technologies, he said;
Only trivially. That has some roots of its thinking in the old Linda [parallel programming coordination language] technology that [David] Gelertner did years and years ago, which I thought was wonderful and which I was astounded never changed the world. But I have been working so hard on search and user interface and things like that for the last couple of years that I haven’t had time to go deep on JXTA.
Linda – or something very Linda-like – did change the world; the World Wide Web.
I’d really love to get Tim’s views on Web services. He’s said a handful of things on REST/SOAP, etc.. , almost all of which suggest that he totally gets the value of the Web (unsurprisingly). But he’s also said some things which have me wondering whether he appreciates the extent of the mistakes being made with Web services.
BTW, I wonder what’s going to happen on the TAG now that both Tim and Norm are at Sun, given that the W3C process document doesn’t allow two members from the same company? Either way, it will be a big loss to the TAG. Bumber.
Update; Tim resigns
Jim writes, regarding a diagram by Savas;
It shows that managing virtualised resources across organisations isn’t scalable, whereas composition of services is. Why the difference in terms of scalability? In the service-oriented view, services can manage whatever backend resources they have for themselves, therefore the complexity of the application driving the services increases linearly with the number of services. In the resource-oriented view, the consuming application must deal with each resource directly and so complexity increases as the sum of a multiple (the number of resources) of each service.
Aside; I think Jim’s using the WS-RF notion of “resource”, which is cool, since it jives so closely with the Web’s notion of one (stateful resource, stateless interaction).
I think the scalability claim above is only correct if you ignore a whole class of useful resources; containers which contain other resources. So I could layout a resource centric view of the network in that diagram to look exactly like the service centric view Savas draws. For example, I might define a container called “the aggregate log ‘file’ of all devices in this building”, and this might be dynamically constructed in basically the same way that aggregate RSS feeds are constructed. And, of course, it would be given a http URI so that I could snarf data from it. Each log entry could also provide the URI of the more granular “device” that it came from so that I, or an automata, could visit there to find its current status.
Ah, that’s better.
My poor little P133 Redhat 6.2 box gave up the ghost on Friday, after about 9 years of faithful service. It had been tempermental for a couple of years, but not so much so that I had to consider replacing it. Its original replacement as my desktop machine a few years ago – a smokin’ 128 (count ’em!) megabyte PII-400 – once again performs replacement duties.
I also took the opportunity to switch distributions. I started out with Yggdrasil back in 1994 (IIRC), switched to Redhat in ’97, and now have gone with Debian after a lot of other developer types recommended it.
Except for some really odd behaviour with a bad bash, and some corrupted binaries (a bad /bin/mount makes rebooting kinda hard) which required a re-install, it’s looking nice. apt is sweet, and it makes me wonder why I ever felt the need to fork over money for rhn.
I stumbled upon an “old” paper by Dan Larner yesterday that I first read when it was published back in ’98, but had forgotten all about. I find it poignant today not because I agree with its conclusions (I don’t), but because it so well describes the tension between specific and generic interfaces, albeit without actually acknowledging the tension 8-O
I liked this image in particular;
At the top you see the generic objects/interfaces, while at the bottom are the specific interfaces; Printer, Scanner, Copier (this is Xerox, after all). But why do those services require specific interfaces? Check out the methods on Printer; Print, CancelJob, Status. Why is that needed? Why can you just not call GET on the printer to retrieve it’s status, POST to the printer to print a document, and DELETE on a job resource (which is subordinate to the printer) to cancel a job? Simple.
Many of the folks behind HTTP-NG were from PARC where ILU, a CORBA ORB with some funky extensions, provided the impetus for their W3C contributions. Like Web services proponents, their backgrounds were with systems which didn’t constrain interfaces, and so it was pretty much an implicit requirement that HTTP-NG would need to support specific interfaces by basically being a messaging layer ala SOAP. It’s too bad they didn’t take the time to study what was capable with the HTTP interface specifically, or even constrained interfaces in general. I think that’s a big part of the reason why HTTP-NG flopped.
Steve’s next article for his “Towards Integration” column is titled “Web Services Notifications”. It’s the usual high quality stuff you can count on from Steve, but I’d like to respond a couple of comments made regarding my forte, Web architecture. Something tells me Steve probably anticipated this response. 8-)
A URL specifies only a single protocol, but a service could be reachable via multiple protocols. For example, a service might accept messages over both HTTP and SMTP, but any URL for the service can specify only one of those access methods.
It’s very commonly believed that URI scheme == protocol, but that really isn’t the case. A URI scheme defines a few things, but most important are the properties of the namespace it forms, in particular scope and persistence of uniqueness, whether it’s hierarchical, and probably some other things I’m forgetting. Defining an algorithm for mapping an identifier in that namespace, to a connection to a remote server on the Internet someplace is independent of those properties. Consider that;
- interactions with mailto URIs don’t necessarily begin or end with SMTP
- an http URI can be minted and used successfully before a Web server is installed, or even while the Internet connection is down
- RFC 2817 describes how to interact using HTTPS with a resource identified by a http URI using HTTP only as a bootstrap mechanism via Upgrade. Upgrade isn’t specific to HTTPS either.
There is certainly a relationship – as defined in the aforementioned algorithm above – between a URI scheme and a protocol in the context of dereferencing and sending messages, but as those last two points above describe, it’s not quite as clear-cut as “URI scheme == protocol”.
Steve also adds;
URLs can’t adequately describe some transport mechanism types. For example, message queues typically are defined by a series of parameters that describe the queue name, queue manager, get and put options, message-expiration settings, and message characteristics. It isn’t practical to describe all of this information in some form of message-queue URL.
I’ve had to tackle this exact problem recently, and I figure there’s two ways to approach it. One is to suck up the ugly URI and embed all that information in one; I’m confident that could be done in general, because I’ve done something very similar. I would highly recommend this solution if you can do it because it’s efficient. But, if you can’t, you can always use a URI which doesn’t include that information, but which is minted at runtime as a result of POSTing the endpoint descriptive data to a resource which hands out URIs for that purpose; that requires an additional coordination step, but you get nice, short, crisp looking URIs.
Moreover, not only do I believe that URIs are great for identifying message queues, I believe (surprise!) that http URIs are. Consider what it means to invoke GET on a message queue; what’s the state of a queue (that will be returned on the GET)? Why, the queued messages of course. This, plus POSTing into the same queue, is the fundamental innovation of mod-pubsub, IMO.
Next up…
URLs do not necessarily convey interface information. This is especially true of HTTP URLs, because Web services generally tunnel SOAP over HTTP.
Wait a sec, you’re blaming http URIs for the problems caused by tunneling?! 8-O 8-) Most (good) URIs do convey interface information, at least in the context of dereferencing (which is the only place that an interface is needed). So if I see a http URI, I can try to invoke HTTP GET on it (keeping in mind some of the considerations mentioned above).
Savas writes;
Mark Baker talks about the WSDL WG’s decision not to require the name of an operation in the body of a SOAP message
Just to be clear, the issue wasn’t about placing the operation name anyplace in particular. It was just that I wanted a self-descriptive path to find it, no matter where it’s located. That could be in the body, the headers, the underlying transfer protocol, in a spec, or in a WSDL document someplace.
Of course, I think having the method name in the SOAP message is harmful. I’d much rather it were inherited from the underlying transfer (not transport!) protocol, at least when used with a transfer protocol.
And to respond to this comment of his;
Web Services are all about exchanging information and not identifying methods, operations, functions, procedures that must be called. What services do with the information they receive, through message exchanges, it’s up to those services.
I’d just say that, well, at some layer you have to worry about operations. If Web services aren’t that layer, then whatever goes on top of them will have to worry about it. And my understanding was that Web services wanted to tackle this layer. FWIW, I think Jim and I agreed on this in a recent private exchange.