If you want to make an omelette… you have to shoot some ducks!
A counterpoint to Arjen Poutsma’s WS-DuckTyping
Author’s note: this was actually an article written by me for a different publication some time back. However, this is its first publication.
In his blog on WS-DuckTyping, Arjen Poutsma gives some tips on using duck typing (essentially dynamic typing) in the implementation of web services. In this response to his blog, I explain why I disagree fundamentally with everything he says.
Don’t validate incoming messages!
The first tip Arjen gives in his blog is not to validate incoming messages. He has two reasons for this, the first being that schema validation is slow and the second being that validation violates Postel’s law of be conservative in what you do; be liberal in what you accept from others.
Schema validation may be slow (or at least slower than, say, no validation) and unaccepting of faults in others, but it does tend to buy you one major advantage: the absolute knowledge that answering an incoming request is within the capabilities of your (web) service.
As we were all taught by the likes of Tony Hoare and Edsger Dijkstra (see Hoare Logic), the precept of all programs is that a program S is guaranteed to terminate in a well-defined state Q if and only if it starts in a state conforming to precondition P of S. That is to say, a program can only function correctly if it is started in a state that matches certain assumptions. It is not possible, in general, to write a program that will always carry out its task no matter what. This is why we do things like input validation and parameter checking — to ensure that our programs can work at all.
In a web service, incoming message validation is precondition verification. Message validation is what tells us whether we are dealing with an answerable question or whether we are dealing with syntactic (and sometimes even semantic) junk. Moreover, message validation is what tells us that we can have our service implementation process the message without choking on the message contents (and throwing an exception because the data is unmanageable). It is also a security aspect – in web services too, we validate to prevent SQL injection.
Poutsma has a point in that the processing involved in validation may be slow. However, he seems to forget that the considerations mentioned above mean that if you do not validate automatically against a schema, you will still have to validate manually (i.e. with hand-written code) to prevent program errors. He also ignores that the cost of program errors (ranging from transaction rollback to catastrophic service failure) and recovery (e.g. queueing and manual processing in a high-volume environment like KLM) may be significantly higher than the cost of upfront service denial.
Arjen’s next tip is to use XPath to query an XML document for data rather than rely upon its structure to find the data. His main argument here is that using XPath is more forgiving of structural errors in non-conforming documentation than traditional parsers and XML marshalling (which is also a parser technique, by the way). Using XPath allows you to find the data, even if the client put it in the wrong place — just skip over the XML document tree and go directly to the leaf of the right name.
That’s a very nice idea, except that (unless your project architect is a complete idiot), you are usually actually interested in the intermediate document structure. The structure often carries as much meaning as the attribute leaves themselves. Some typical examples from the airline domain:
- It’s very nice that you can use an XPath expression to loop quickly over the names in a travel party, independently of whether the passengers are adults, children or infants. But very often, whether a passenger is an adult, child or infant is actually interesting – it makes a difference in all sorts of business rules.
- Again, very nice that you can use XPath to retrieve all the segments of a flight quickly, no matter how those segments are organized. But the ordering of those segments is significant (Amsterdam – Detroit – Anchorage is not the same as Amsterdam – Anchorage – Detroit). Also, how the segments are grouped into Origin-Destination blocks is significant (on a return flight, some segment belong to the outgoing flight and others to the flight home).
That aside, what if you want to use a name (element name or attribute name) twice in your message, with a different meaning based on context? A simple XPath expression will capture too much data, including the wrong data. Of course, you can still use XPath; you simply make the expressions more specific. However, if you have to specify the whole path from root to leaf in your XPath expression (or a significant part of it), it begs the question of why you should bother with XPath at all.
A completely different point is that if you use XPath, you persist in seeing your service request as an XML document. And you might not want that. You might want to define your service indepently of messaging infrastructure. Using XPath in your service implementation binds you to a single view of your data.
Don’t create stubs or skeletons!
Arjen’s final tip is against using stubs and skeletons to represent parts of an XML document (in other words, don’t use object marshalling). This is an extension of his flexibility argument, since stubs rely on an exact XML grammar for their definition and adherence to that grammar for marshalling and unmarshalling.
The counter-argument is much the same as well. Stubs rely on an exact grammar in the same way that the service implementation itself does. Certainly, not using stubs or validation and simply considering a message to be a flexible mass of accumulated data allows for far more flexibility. But how far do you think you can go before your service allows for so much flexibility that it simply cannot assume anything about an incoming message anymore? Before it becomes impossible to assign meaning to that message and therefore impossible to have a working service?
And yes, it is also true that a more flexible, duck-typing approach makes a service implementation more flexible in the face of change. But so what? If the service changes with a widening change to its interface, this change will be transparent to the clients. If it is a non-widening change, no matter how many ducks, the client will have to be informed. And why would you want to make it possible for a client to send over data that you were never expecting in the first place? A service is not something you spout data at in the hopes that it will somehow reply to a useful subset — clients that do not know how to use a service have no business trying.
Arjen Poutsma presented some tips on web service implementation and I didn’t believe him. In fact, I still don’t. It seems to me his tips result in unwanted binding at best and make it impossible to implement reliable web services at worst. I’m scheduled to follow a training given by Arjen on web services and perhaps we will discuss eachother’s points of view. And perhaps he will be more convincing to me then. But he has a lot of quacking left to do. In the meantime, my motto is this:
Fly away, ducks — leave the web services to the spiders.