Monthly Archives: February 2004

Architectural property of the day; reusability

As long time readers of mine know, I’ve talked a lot about the value of visibility, but had little success convincing Web services proponents that WS/SOA has significantly less of it than do Internet scale architectural styles.

With that in mind, I thought I’d talk a bit about a couple of related properties reusability (sometimes called “substitutability” when relating to components), and configurability. Combined, these properties refer to the ability to swap components in and out at runtime, as you can with the Web (your browser can request data from any Web server), or with pipe and filter.

I wonder, are there any Web services proponents who’d claim that this isn’t much more loosely coupled than with the unconstrained-interface SOA approach?

And again, and again …

Chris Ferris writes in response to my suggestion that processing an XML document is an all-or-nothing proposition;

I don’t see it that way. Understanding an XML document is not an all-or-nothing proposition by any stretch of the imagination. For instance, I can have a generic SOAP processor that understands the SOAP namespace but is oblivious to the content of the soap:Body element (amongst other things such as certain SOAP headers).[…]

I see the disconnect. I’m referring to any/all XML document(s). No fair saying that some specific kinds of XML documents are partially understandable, because clearly you can design one to be, and SOAP, as an envelope, is one as you correctly point out.

So, consider this XML document;

<iwoejaf xmlns="http://example.org/oijerwer">
  <ijrwer>inm4jvxc</ijrwer>
</iwoejaf>

That’s the kind of document I’m talking about. Wouldn’t you say that understanding that document is all or nothing? You either recognize the namespace or you don’t, right? Well, that’s not the case with RDF/XML since it gives you “partial understanding”; if that document above were known to be RDF/XML (and it is valid RDF/XML), then an RDF/XML processor can extract information from it piece-meal (in triples). Now, maybe none of the terms in any of the triples will be recognizable, but perhaps if you dereference the URI for each of the terms in those triples, you’ll find that the terms you don’t know are related to ones you do.

Now can you see why TimBL is so keen to see folks use RDF/XML? It’s the answer to the schema evolution problem.

HTTP is a great application protocol, for the application for which it was designed… the Web.

Finally, something we can agree on! 8-) Now, if only you understood what the Web was, and was capable of, we’d be all set.

Savas on REST again

More good insight from Savas on REST.

He writes;

The human factor is involved. If a resource (e.g., a web page) has moved, applications don’t break. It’s just that there is nothing to see. We are frustrated that it’s not there. If an application depends on that resource being there, that application breaks.

Yep. But how is that any different than a service which you depend on not being there? At least HTTP responses code are well-defined, and handle a lot of common cases, including redirection, retiring, retry-later, etc.. I don’t see how this is human-centric at all; it’s just dealing with inevitable errors in distribution across trust boundaries.

I’m not sure what he means by using HTML for “interfaces”, but he then later speaks my language again when he describes HTML as a format for describing resource state;

If a resource’s representation is described in HTML, all is fine. Everyone knows how to read HTML. How about an arbitrary XML document though? Did we have a way of specifying to the recipient of the resource’s representation about the structure of the document? Perhaps they wouldn’t have requested it if they knew about it.

XML is fine and dandy, and I use it whenever I can, but it’s just a syntax. As such, it doesn’t do anything to alleviate the issue that understanding an XML document is an all-or-nothing proposition. That’s why when I use XML, I almost always use RDF. It enables a machine to extract triples from an arbitrary RDF/XML document, and triples are much finer grained pieces of information than a whole document. It allows me to process the triples I understand, and ignore the ones I don’t, which another way of saying that it provides a self-descriptive extensibility model. See this example.

If we are going to glue applications/organisations together when building large scale applications, we need to make sure that contracts for the interactions are in place. We need to define message formats. That’s what WSDL is all about.

Agreed, but that’s also an important part of HTTP. It just defines message formats in a more self-descriptive way (i.e. that doesn’t require a separate description document to understand what the message means).

Also, we talk about exchanging messages between applications and/or organisations. Do we care how these are transferred? Do we care about the underlying infrastructure? I say that we don’t; at least, not at the application level.

I’m not sure we’ll get past this nomenclature problem, but in my world, documents are transferred while messages are transported. I do agree that how message transport occurs doesn’t matter, but I don’t agree that how document transfer occurs doesn’t matter. As an example, consider a document transferred with HTTP PUT, versus that same document transferred with HTTP POST. Both messages mean entirely different things (more below).

If there is a suggestion that constrained interfaces are necessary for loose-coupling and internet-scale computing, then here’s a suggestion… What if the only assumption we made was that we had only one operation available, called "SEND"? Here are some examples:

TCP/IP:SEND, CORBA:SEND, HTTP:POST, EMAIL:SEND, FTP:PUT, SNAIL:POST (for letters), etc.

Ah, this one again. 8-)

You can’t compare TCP/IP “SEND” with HTTP POST or SMTP DATA. TCP/IP is a transport protocol and therefore defines no operations. You can put operations in the TCP/IP envelope yourself (e.g. by sending “buyBook isbn:123412341234”), or you can have them be implicit by the port number by registering your “Book Buying” protocol with IANA, only ever using that one operation (“buyBook”), and sending just “isbn:123412341234”). On the other hand, HTTP, SMTP, and FTP, all do define their own very generic operations.

Service types

Jeff Schneider asks;

I’m looking for some common vocabulary to describe the various nomenclatures found in service operations.

In IETF-land, a “Pass-All Service” would be called a “transport protocol”, and a “Verb-Only Service” an “application protocol”. The last one, “Verb-Noun Service” is just a not-as-generic application protocol, but still an application protocol.

Ask the expert; REST benefits

Here’s the answer I gave to a question that came in on Ask The Experts, asking “What are the most important benefits a company can realize by following REST?”;

In general terms, ease of integration. Implementing RESTful services opens up your data to machines as easily as the HTML based Web does to humans, in effect turning your application integration problem into a data integration one. And almost as easily as you can integrate your data together (using data integration technologies like RDF or Topic Maps), you can also integrate other third party data sources together too.

The value of using http URIs to identify your business objects, and having them answer HTTP GET requests and responding with their state in document form is immense; it is the fundamental technique that enables this form of universally accessible data integration.

Content vs data?

From the it-just-keeps-getting-wackier file, this;

The goal of XDI is to do for controlled data sharing what the Web did for open content sharing

Wow, where to start on this one?

So, how do “content” and “data” differ exactly? Can the same data not be returned from an HTTP GET on a URI as from an XDI GET on an XRI? And meanwhile, it’s doing the same thing that HTTP does two layers up from HTTP (presumably because of protocol independence reasons), yet is itself a system which is dependent upon the XDI protocol (which unapologetically copies the bulk of HTTP application semantics). Yet another case of those who don’t understand the Web, trying to reinvent it I suppose. Stop the insanity!

mod_pubsub and hackery

Chris responds to an earlier comment of mine.

My point remains that HTTP is not suited to extension of its methods because it requires centralized administration of the method names. You can’t simply make up a new method like MONITOR and deploy it unless you go through the IETF to revise the HTTP specification. Unless you do, then there’s no way that anyone could tell the difference between Mark’s MONITOR method and mine (should I devise one) and yet they might be very different animals.

His point about decentralized method definition is very well taken; HTTP does not permit method names to be URIs. PEP attempted to remedy this, but was never deployed.

But in order to use MONITOR, the HTTP spec doesn’t need revision. There are a multitude of HTTP extensions which are defined as standalone extensions which required no revision of HTTP itself. Consider WebDAV. And as Dave Orchard noted, HTTP is rife with extensibility points; this is no accident, because HTTP was explicitly designed to be extended. Which brings us to this comment;

HTTP wasn’t designed to support pubsub. Just because some sharp people can take the protocol and tweak it here and there to enable pubsub doesn’t change that fact.

HTTP was not designed to support pub/sub, but so? Was SOAP? Nope. But that doesn’t prevent one from using it that way. What’s important is that it wasn’t designed in such a way that prevented (even by being merely a “poor fit”) its use for pub/sub, and IMO, neither HTTP nor SOAP were. HTTP was designed for document transfer, and pub/sub fits there perfectly.

Chris then asks a very good mod_pubsub question;

However, all that aside, I am curious about something else related to mod_pubsub. Sure, it makes use of HTTP GET as well as POST, but are GET and POST really the methods? I mean really… how is this any different than the way in which SOAP uses HTTP POST? do_method?

This is largely what I was referring to when I said “well, parts sure are, but the bulk of it?” in reference to mod_pubsub and hackery. The client portion of mod_pubsub – the Javascript Web server and library – had to resort to lowest common denominator; AIUI, they couldn’t present access to HTTP internals to developers. AFAIK, that’s why “route” is a parameter rather than a new method. It’s actually semantically quite close to WebDAV’s COPY method, both in that it is essentially a copy action, but also that it’s an “interaction at a distance”, i.e. that two URIs are used as arguments, rather than just one which would have required the data being routed to flow to the client. But doing this RESTfully, I could easily imagine a ROUTE method.

This is different to a typical Web services approach (I won’t say “SOAP approach”, because SOAP can be used in so many ways), due to two reasons IMO; first, the semantics being tunneled are uniform, and second, they’re tunneled because there was no other way to do it. In my observation, Web services developers use tunneling primarily because they don’t know how to solve their problems without tunneling, and because they’ve been lead to believe that “protocol independence” is a feature rather than a bug.

Persistence; messages and documents

Mark talks about document persistence and message transience. I would have agreed with that a few years ago, but as I’ve come to understand the value in self-description, I see that it is possible to have messages persist in meaning for as long as the documents which they encapsulate. A RESTful message is purely self-descriptive, while “SOA” messages are not (they don’t use a constrained interface, nor are they necessarily stateless), so perhaps that’s where the different view point comes from. But I think that persistent messages is an absolute requirement in the asynchronous future we all want to get too, because if you’re doing asynch – which is, of course, “without a clock” – then your message should mean the same thing whether it arrives now or ten years from now.

General case

Sean McGrath writes;

If you want to look at a cheap, solid, scalable way to do distributed computing, look no further than the combination of HTTP and asynchronous messaging using business level XML documents. The beauty of this, is that it is both Intranet and Internet class at the same time. Work with the web – not against it. Its resources + names for resources + an application protocol (not a transport protocol) that make it work.

+1. The Internet, not the Intranet, is the general case.