If you want to make an omelette… you have to shoot some ducks!

A counterpoint to Arjen Poutsma’s WS-DuckTyping

Author’s note: this was actually an article written by me for a different publication some time back. However, this is its first publication.

Introduction

In his blog on WS-DuckTyping, Arjen Poutsma gives some tips on using duck typing (essentially dynamic typing) in the implementation of web services. In this response to his blog, I explain why I disagree fundamentally with everything he says.

Don’t validate incoming messages!

The first tip Arjen gives in his blog is not to validate incoming messages. He has two reasons for this, the first being that schema validation is slow and the second being that validation violates Postel’s law of be conservative in what you do; be liberal in what you accept from others.

Schema validation may be slow (or at least slower than, say, no validation) and unaccepting of faults in others, but it does tend to buy you one major advantage: the absolute knowledge that answering an incoming request is within the capabilities of your (web) service.

As we were all taught by the likes of Tony Hoare and Edsger Dijkstra (see Hoare Logic), the precept of all programs is that a program S is guaranteed to terminate in a well-defined state Q if and only if it starts in a state conforming to precondition P of S. That is to say, a program can only function correctly if it is started in a state that matches certain assumptions. It is not possible, in general, to write a program that will always carry out its task no matter what. This is why we do things like input validation and parameter checking — to ensure that our programs can work at all.

In a web service, incoming message validation is precondition verification. Message validation is what tells us whether we are dealing with an answerable question or whether we are dealing with syntactic (and sometimes even semantic) junk. Moreover, message validation is what tells us that we can have our service implementation process the message without choking on the message contents (and throwing an exception because the data is unmanageable). It is also a security aspect – in web services too, we validate to prevent SQL injection.

Poutsma has a point in that the processing involved in validation may be slow. However, he seems to forget that the considerations mentioned above mean that if you do not validate automatically against a schema, you will still have to validate manually (i.e. with hand-written code) to prevent program errors. He also ignores that the cost of program errors (ranging from transaction rollback to catastrophic service failure) and recovery (e.g. queueing and manual processing in a high-volume environment like KLM) may be significantly higher than the cost of upfront service denial.

Use XPath!

Arjen’s next tip is to use XPath to query an XML document for data rather than rely upon its structure to find the data. His main argument here is that using XPath is more forgiving of structural errors in non-conforming documentation than traditional parsers and XML marshalling (which is also a parser technique, by the way). Using XPath allows you to find the data, even if the client put it in the wrong place — just skip over the XML document tree and go directly to the leaf of the right name.

That’s a very nice idea, except that (unless your project architect is a complete idiot), you are usually actually interested in the intermediate document structure. The structure often carries as much meaning as the attribute leaves themselves. Some typical examples from the airline domain:

It’s very nice that you can use an XPath expression to loop quickly over the names in a travel party, independently of whether the passengers are adults, children or infants. But very often, whether a passenger is an adult, child or infant is actually interesting – it makes a difference in all sorts of business rules.
Again, very nice that you can use XPath to retrieve all the segments of a flight quickly, no matter how those segments are organized. But the ordering of those segments is significant (Amsterdam – Detroit – Anchorage is not the same as Amsterdam – Anchorage – Detroit). Also, how the segments are grouped into Origin-Destination blocks is significant (on a return flight, some segment belong to the outgoing flight and others to the flight home).

That aside, what if you want to use a name (element name or attribute name) twice in your message, with a different meaning based on context? A simple XPath expression will capture too much data, including the wrong data. Of course, you can still use XPath; you simply make the expressions more specific. However, if you have to specify the whole path from root to leaf in your XPath expression (or a significant part of it), it begs the question of why you should bother with XPath at all.

A completely different point is that if you use XPath, you persist in seeing your service request as an XML document. And you might not want that. You might want to define your service indepently of messaging infrastructure. Using XPath in your service implementation binds you to a single view of your data.

Don’t create stubs or skeletons!

Arjen’s final tip is against using stubs and skeletons to represent parts of an XML document (in other words, don’t use object marshalling). This is an extension of his flexibility argument, since stubs rely on an exact XML grammar for their definition and adherence to that grammar for marshalling and unmarshalling.

The counter-argument is much the same as well. Stubs rely on an exact grammar in the same way that the service implementation itself does. Certainly, not using stubs or validation and simply considering a message to be a flexible mass of accumulated data allows for far more flexibility. But how far do you think you can go before your service allows for so much flexibility that it simply cannot assume anything about an incoming message anymore? Before it becomes impossible to assign meaning to that message and therefore impossible to have a working service?

And yes, it is also true that a more flexible, duck-typing approach makes a service implementation more flexible in the face of change. But so what? If the service changes with a widening change to its interface, this change will be transparent to the clients. If it is a non-widening change, no matter how many ducks, the client will have to be informed. And why would you want to make it possible for a client to send over data that you were never expecting in the first place? A service is not something you spout data at in the hopes that it will somehow reply to a useful subset — clients that do not know how to use a service have no business trying.

Conclusion

Arjen Poutsma presented some tips on web service implementation and I didn’t believe him. In fact, I still don’t. It seems to me his tips result in unwanted binding at best and make it impossible to implement reliable web services at worst. I’m scheduled to follow a training given by Arjen on web services and perhaps we will discuss eachother’s points of view. And perhaps he will be more convincing to me then. But he has a lot of quacking left to do. In the meantime, my motto is this:

Fly away, ducks — leave the web services to the spiders.

Shooting ducks

Tagged on: duck typing webservices

3 thoughts on “Shooting ducks”

Freddie van Rijswijk
March 22, 2008 at 8:58 pm

Hi All,
Ben I agree with you and in fact it´s probably tips like these from Poutsma and Jon Postel´s is the reason why the “web standards” also so frigging messed up. If you have enough time to read a long epistel from Joel on Software, then read this article. http://www.joelonsoftware.com/items/2008/03/17.html

Best Regards
Freddie
BenPost author
March 21, 2008 at 11:19 pm

Hi, Jettro.

As you said, I have since followed the training. Unfortunately there was very little time to get into any sort of real discussion with Arjen about these matters. He did say that he was a little more confrontational in his blog than he actually feels himself to provoke a response, but that is as far as the discussion went.
jettro
March 21, 2008 at 8:03 pm

He Ben, nice view point. I am not relly joining the discussion. I think you should take the right approach for the right situation. In some occasions tha might be the flexible duckly thing. For most enterprise applications we have experience with, we’d prefer the strong xsd based messages and stuf like jaxb. I know by now you had the training, did you change your mind, maybe a little bid. Did you have a chance to have a good discussion about this with Arjen?

greez Jettro

Comments are closed.