SOA component design: thinking about error handling

When designing components for a SOA landscape (or any multiprocess system), the primary concern is with the communication behavior of the component: how messages are passed to and from the component and in what order, what those messages are and what constitutes a valid message and what doesn’t. When the time comes to implement the component, related concerns come into play: how are messages projected from the communication language into the domain model and into the implementation language, how are communication patterns met and ensured, et cetera. In addition the project technical architect has to consider how to implement the component’s domain without hardlinking it to any other components whose domains are known or to the communication medium du jour (unless the component’s purpose is linked to that medium).

Now here’s the strange thing: with all the concerns that go into design of components at all levels (from the enterprise architect down to the developers of the different components), one of the most overlooked things in SOA component building is the handling of cross-component error handling.

Error handling concerns

The fact that error handling is often the lowest-priority concern is doubly weird if you consider that cross-component error handling is the same concern as core functionality messaging. In both cases there are the same sets of concerns, both with regards to communication with external components and interaction between the component’s internal implementation and the communication layer. Some typical concerns that development teams have to deal with are:

Error message definition: Just as SOA components require clear definitions of the messages that will be exchanged with client components for mainline communication, clear definitions must also be given of messages that carry error information.
Error communication behavior definition: Just as mainline communication behavior between SOA components must be clearly and formally defined, so must similar definitions be given for when a component can send an error message in response to a request for operation.
Projecting exception types from the domain language onto error types from the communication language: In the case of error handling, this is one half of a problem that also exists for mainline communication. In mainline communication request messages must be projected onto domain model types and domain model types must be projected onto communication language types when the component returns a response. In the case of error handling of course there is no concept of request projection since nobody requests an error; however, the analog with projecting a domain model type onto a response communication message remains.
Maintaining component independence: This concern actually affects a component more as a consumer of other services than as a publisher of services. Maintaining component independence is related to avoiding domain models leaking over into foreign components (as Eric Evans puts it). In the case of messaging it means not building your domain model so that it is a mere copy of a foreign component’s communication model. In the more specific case of error handling it relates to not linking error handling too closely to the definition of errors used by foreign components. Instead, as with mainline messages, foreign component-generated errors should be projected onto the local domain model’s concept of errors or exceptions.

Balancing error handling strategies

Another concern in error handling (in addition to the ones mentioned above, which occur in all message processing systems) is introduced when communication layers introduce their own concept of error handling. A typical example of this is the WSDL/SOAP combination with its SOAPFault concept of error messaging. These built-in mechanisms can be very convenient in that they dictate an error handling strategy and leave you nothing to think about in your design. Not to mention that standardization is very valuable in these systems: it introduces an error handling strategy that everybody can agree to. However, there is also an inherent risk that these standardized error handling mechanisms will hardlink your component to a particular communication layer or framework (especially if the mechanism is allowed to leak through into the component’s inner domain model).

It is clear that each project must find a way to balance concerns in cross-component error handling strategy. Several strategies exist and must be considered:

Ignoring the communication layer’s mechanism

Doubtlessly the simplest strategy is to ignore the communication layer’s error handling mechanism. Instead one of several possible alternatives must then be considered. A simple alternative might be not communicating errors across components ever (this might for example be an enterprise policy) but allowing “empty” responses instead. Or else a project might define its own error reporting as part of regular messaging, creating either/or responses (i.e. responses that contain either a regular response or an error).

The upside of this strategy is its simplicity, plus the fact that it is guaranteed to keep the component implementation separate from the communication layer. However, client components (especially third-party ones) are not likely to appreciate the divergence from standardized mechanisms — making this a strategy that is only suitable for components that are for internal use only.

Using the communication layer’s mechanism

This strategy is more or less the other extreme of ignoring the communication layer’s mechanism. The choice here is to take the mechanism (and its reflection in the implementation language) and incorporate that throughout the component’s implementation. For example, a Java project might elect to use AxisFaults throughout the project instead of project-specific exception types.

Again, the upside of this strategy is its simplicity, plus its guaranteed interoperability with the communication layer. However, the component is now hardlinked to the communication layer. This is not necessarily a bad thing: there are projects and components whose purpose in life is linked to a single communication layer. A custom integration layer for different SOAP web services for instance might want to use SOAP Faults and related types for error handling (since it is technically part of the domain). In general though, this strategy will limit the component implementation’s future flexibility.

Mapping internal error handling to communication layer error handling

The final option seeks to combine the best of both worlds, separating the internal domain model from the specifics of the communication layer and allowing the two to touch only through a translation layer. This strategy also allows the internal component implementation to be linked to several different communication layers. For example, a component implementation might report errors using java.lang.Exceptions. Different translations might translate these exceptions into SOAP Faults for a SOAP web service publishing, a JMS exception for JMS and a specific error page for a RESTful service.

Architecturally this probably sounds like the go-to strategy for all situations. And factually this is probably the option that most projects will want to use, since relatively few projects want to be hardlinked to the communication layer. However, it is more work to implement, especially if the projection of domain errors onto the communication layer is not simple to implement.

Conclusion

Error handling is one of the most overlooked topics in the design and implementation of SOA components. However, it is at least as important as the design and implementation of mainline messaging and functionality, for the same reasons. And like mainline functionality, error handling requires some careful forethought and weighing of options. Different strategies are available, each with its own characteristics with regards to implementation difficulty and ease of use. Projects must consider very carefully which strategy fits project goals and expected future usage patterns.

SOA component design: thinking about error handling