Clemens has some doubts and questions about REST, which I’m more than happy to respond to.

What I don’t get together is the basic idea that “every thing is a uniquely identifiable resource” with the bare fact that the seemingly limitless scalability of the largest websites is indeed an illusion created by folks like those at Akamai, which take replicated content around the globe and bring it as close as possible to the clients

I’ve heard this objection several times. What I always point out, is that REST is explicitly layered, and that the Web is scalable because Akamai and other solutions like it, can exist.

Taken to the last consequence of every data facet being uniquely identifiable through a URI (which it then should be and is) this model can’t properly deal with any modification concurrency.

Not true. There are many known means of dealing with concurrent access to data, and the Web has a good one; see the lost update problem (though other solutions may be used too). The one described in that document is actually a very well designed one too, as other solutions have problems with “hand of god” centralization issues (e.g. round identifiers).

While this solution only addresses concurrent access to an individual resource, WebDAV offers some useful extensions for dealing with collections of resources, and concurrent access to them.

So, what are the limits of data granularity to which REST applies? How do you define boundaries of data sets in a way that concurrency could work in some way […]

REST’s level of data granularity is the resource, but a resource can be a collection of other resources, as WebDAV recognizes.

and how do you define the relationship between the data and the URI that identifies it?

By use; the data returned by GET over time determines what the URI identifies. Though in the case a resource that’s created by PUT, then the one who’s doing the PUTting already knows, since they’re directly effecting what GET returns.

So to sum up, I too believe that REST has places where it’s useful, and places where it’s not, and that there will be other useful systems deployed on the Internet which are not built around REST or REST extensions. But I don’t believe “Web services” will be amoungst them, because the current approach fails to recognize that being built around a coordination language is essential to any Internet scale system.

A few people have already commented on Don’s comments about there being enough specs already. FWIW, and not too surprisingly I expect, I saw those comments as a direct jab at BEA, who earlier this week released three more WS-* specs.

It was good to finally see BEA go out on its own, lord knows that’s been long overdue. But those three specs were very disappointing as a first attempt. I mean, they’re ok work (though MessageData seems to be a pretty weak attempt at addressing Roy’s issue about the different types of HTTP headers being all munged together), but are over specified (note to authors, leave some room in the margins for the standardization process 8-), and don’t stand alone very well. They need to be bundled together under some kind of catch-all “SOAP Extension Suite” or something. Or perhaps in WS-I, separating them out as three is the best way to get them into some new-fangled profile, who knows. Ah politics, gotta love ’em.

A flurry of “It’s not REST vs. SOAP” comments in response to Bob McMillan’s latest REST article.

I helped Bob with that story, and I know I’m always careful to frame the argument as “REST vs. Web services” rather than “REST vs. SOAP”, so I apologize for not communicating that point to Bob well enough. Heck, I spent all that time on the XML Protocol WG for a reason, you know; to make sure SOAP became the best darned PEP replacement it could be.

But I suppose there’s an inevitability to this confusion, since the word “SOAP” brings along with it an implied use, which is contrary to REST. Unfortunate, but whad’ya gonna do?

If you blinked last week, you might have missed an ordinary article on double digit growth in IBM’s content management software division. The gist of the article appears to be that content management is hot in large part due to record keeping systems in the post-Enron world. That probably has something to do with it, but I think it’s much more than that. I think it has something to do with the more general trend of a wider variety of data being dealt with under the content management umbrella. Certainly from a Web centric point of view (which I’ve been known to promote 8-), everything is content.

So here’s a bold prediction. By the end of 2005, IBM’s content management software division will have absorbed their enterprise software group.

Werner responds. I don’t think we’re that out of synch, but I maintain that from what I’ve read of the techniques he’s talking about, they are not suited for Internet scale use. And by that, I mean a few orders of magnitude larger than the 10K/100K numbers he quotes. More like 10^8-10^11.

I know that Werner is anti-transparency, as am I. It was really interesting to watch the evolution of GCS research and tools in this regard. Sometime during this transition, “group communications” stopped being referred to as such, perhaps due to the reduced degree of coupling between members of a group; a result of the movement away from transparency (or perhaps because of the bad rep that it got due to the early highly-transparent commercial toolkits being quite brittle 8-). I guess I never bought into that terminology switchover though, as I always considered “group communications” to refer to any multi-party state-alignment based approach to concensus problems, which I’d say that even Werner’s groups more recent work falls under. Hopefully that explains my seemingly outdated view of the work of his group.

Anyhow, to see what a RESTful “GCS” might look like, I point to KnowNow and their mod_pubsub project. The principle means of managing reliability is via the stateless approach that’s part of REST, that KnowNow reused. That is, the client maintains application state, and so is responsible for dealing with partial failures and getting up-to-state (using GET, of course).

Werner Vogels comments on my argument against reliable messaging. I’m not sure he read it in its entirety though, as he leads off by saying;

I was surprised to read Mark Baker’s statement that he feels there is no need for reliable communication provisions in web-services runtimes.

Which isn’t the case, because I said that HTTP could do with some reliability help. What I’m against is the specific solution of an application-independant reliable messaging layer. There are other ways of achieving the same goals, though at the expense of application protocol neutrality (see below).

I fully understand that some of Werner’s work, and pretty much the whole group communications style of distributed computation, builds upon the reliability-as-a-layer approach. I studied his work, and the work of his group at Cornell under Birman, and even developed code with the Isis toolkit. But GCS doesn’t scale up to the size of system I’m interested in, or that Web services are struggling to be. Perhaps it has a role in the LAN, or in other small group environments though. It’s definitely cool tech.

I’m for “doing reliability” in the application layer, by coordination, with a coordination language in the guise of an application protocol. As I mentioned above, that means disposing of the notion of doing it in a protocol-neutral manner, in so far as application protocols define the application. So basically that means ensuring that when you design an application protocol, it’s able to give you the reliability semantics you need (within the realm of possibility, as we seem to agree on 8-), or can be extended to do so. HTTP is such a protocol for the hypermedia application model. The interesting question is, how general is the hypermedia application model, and is it relevant to your problem? I say, yes, there’s a good chance it is relevant, and it’s certainly relevant for Web services.

Jorgen responds to my comments on a presentation he gave last week.

Re my comment that architectural styles are pattern languages, not patterns, I can only point to Roy’s dissertation on this, where he suggests the association between an Alexendar “pattern language” and an “architectural style”, by suggesting indirectly that both are a system of patterns that constrain the form of the resultant system. “Stateless”, “Uniform interface”, “Layered Client Server”, etc.. are constraints, and when coordinated together, form the REST architectural style.

Re the stateful/stateless point (both parts), I don’t see how whether the targetted endpoint does additional dispatch or not, matters to this issue, unless of course that dispatch operation uses some state (which is not required of an OO style, AFAICT). You suggest that all object oriented styles are stateful, yet REST is stateless, and it’s object oriented in that messages are targetted at identifiable objects.

Re SQL, and to add to my last blog, it’s true that a SQL row may be a resource, but it’s not the case that all resources (as defined in 2396) can be manipulated via SQL.

Re POST, perhaps we miscommunicated, but by “partial update” I thought you meant that the meaning of POST permitted the updating of partial state changes to be communicated to the client, which it doesn’t. It is true that the effect of a POST may be a “partial update” of the state of the resource, but the issue is that a client will not know that. All a successful response to a POST can mean to a client is “thanks, I accept this data”. So I’d say that your comparison to SQL UPDATE is inaccurate, because after a successful UPDATE, the client knows the state of some part of some table. UPDATE is more like PUT for this reason, whose successful invocation informs that client that the state of the resource is what they asked for it to be.

Jorgen writes;

And there I was thinking this statement would be an olive branch for you, Mark ;-)

Heh, yah, I appreciated the attempt, but I felt it was pretty early to propose a synergy existed before you really understood REST. Oh, and please don’t take that the wrong way 8-); understanding REST isn’t a matter of smarts, it’s just a matter of recognizing what it is (or more specifically, what an application protocol is, at least in my experience as an ex-CORBA lover). I’m confident that you’ll understand soon enough because of your broad experience, and your eagerness to learn.

I still don’t see any real conflict – you can still have requests returning XML data, the only question would be whether the request data must be in XML format or whether it can be encoded into a URL/URI.

Depends what you mean by “request data”. A typical Web services centric solution, because its normally early bound, requires that the request data include a method name. REST, because it’s late bound, requires only that you provide an identifier. From a cost-of-coordination perspective, the latter is vastly superior.

You can use Web Services standards and do pure REST. Equally you can use Web Services standards and _not_ do REST.

Of course. As I said before, I consider this a bug, not a feature.

Roy defined the null architectural style which could be said to be another style in which you can do REST or not. The way it does this is by being entirely devoid of constraints, which has the “disadvantage” (cough 8-) of also being entirely devoid of any desirable properties.

I’m not suggesting that you believe that the null style is a useful thing, but I’ve heard from a lot of Web services folks who feel that architectural constraints are a bad thing. From a software architecture point of view, this is madness. Have they never heard of entropy? 8-/

Fundamentally, cacheability IS a big factor as the client implicitly caches a local copy of the resource data at least for a time.

Sure, it is important, but as a side effect of the client maintaining the state (i.e. the interaction being stateless). If the style were said to “revolve around” anything, this (statelessness) could be one of the big things, sure.

I am not sure Roy Fielding’s dissertation would agree with your assertions here, Mark – see bottom of page 14:

Roy’s comment about combining styles doesn’t suggest how styles are combined. As I see it, if you’re using an architectural style with constraints A, B, and C which yield desirable properties X and Y, and then you want to add property Z which is known to be obtained via constraints D and E, then in order to get Z without giving up X and Y, your new style needs to have constraints A, B, C, D, and E.

FWIW, this model of constraints and properties has been around for some time, since at least Perry and Wolf’s “Foundations for the study of software architecture” (using that terminology, anyhow). Some of the folks you quote, like Shaw, Garlan, Kazman, etc.. have accepted this model. I don’t see how what I’m saying is controversial in this regard.

I’m glad to hear you’re giving the presentation again, and I look forward to following up on this with you.

Jorgen asks Is REST the SQL of the Internet?.

There are definitely some similarities between REST’s uniform interface and the SQL language, most importantly that they are both coordination languages, a priori deployed application interfaces that defer component binding (i.e. late binding), which are ideal for deployment on a network between untrusted parties (hence the use of the word “coordination”). But “Resource oriented”, a term that Jorgen defines, doesn’t apply to SQL since its coordination semantics are not specific to “resources” as defined in RFC 2396 (i.e. they’re not uniform), just as it doesn’t apply to other coordination languages like Linda, SMTP, or IMAP. If I knew more about what he was trying to achieve with such a categorization, I might be able to recommend a better name.

Apparently OASIS has decided to tackle reliable messaging, with help from the usual non-IBM/MS Web services suspects.

I think “reliable messaging” is a huge waste of time. It’s akin to saying that the network is unreliable, so let’s just make a reliable network on top (which is different than “reliable data stream” ala TCP). Sorry, it just doesn’t work that way. “reliable network” is an oxymoron, for any number of reliability layers you might try to build on top.

As with most problems over the Internet, reliability is a coordination problem. That is, how do two or more independant pieces of software distributed on an unreliable network, coordinate to achieve some goal in a reliable manner (such that both know that the goal has been achieved or failed, etc..)? Unfortunately, you can’t coordinate “reliability” in a vacuum, like the typical reliable messaging approach of store/forward/timeout; you have to look at what task is being coordinated in the first place, and then figure out how to augment your coordination semantics such that the necessary degree of reliability can be achieved. In the context of the Web, that means using the uniform coordination semantics that are made available through HTTP.

Simple example. I want to turn on a lightbulb, and do it reliably such that I know if my attempt succeeded or not. I would use PUT. If I got back a 2xx, I would know the lightbulb was on. If I didn’t get back a response at all – say if the connection died after the request was sent – then I don’t know. But if I needed to know, I could do a GET. Perfectly reliable, no reliable messaging solution in sight.

That example doesn’t work for everything of course, because PUT is idempotent and not all operations you might want to perform are idempotent. POST is different, but the requirements on it are different too, since if you use POST, you accept that you won’t know the state of the resource after a successful submission (getting deeper into that is a topic too large for a blog, sorry).

Anyhow, I acknowledge that some work needs to be done to HTTP to help with reliability (as Paul describes). But that is in no way “reliable messaging”.

Bob DuCharme gets back to basics about RDF, and in doing so clearly hilights the value of partial understanding. Notice how the integration problem he undertakes scales linearly with the number of documents, rather than proportionally (O(N)) as it would if his code had to have full knowledge of all those schemas. By using RDF’s data model (note that each file uses a different serialization of the same basic RDF triples), this scaling problem is averted.