After a good night with drinks and fun we started of with the second day. A keynote about google v8, their java script engine. Hmm, not my thing. It does look incredibly fast by the way. So as a user I am interested. The next presentation is what this blog post is about, Failure comes in flavors. The other presentations I attended about RIA’s were not very interesting and based on the feedback, I did not miss anything by not attending any of the last presentations.
More about the good news, the extra long presentation (two sessions) about Failures in Java was very interesting. I got a lot of ideas out of it. If you continue reading, I’ll share some of these ideas.
The presentation was by Michael T Nygard. The first thing he introduced was a new term: FOM, meaning Failure Oriented Mindset. He was talking about the biggest part of application brake downs. Most of them (around 96%) are due to application problems. After having seen a lot of these problems he could categorize the problems. It all boils down to the difference between feature complete and production ready. Usually we test for things that we expect to happen and if you are doing a good job you are also testing things that might go wrong. We never test for things we do not know. There are two things that you can do. First of all make your special Test Harness that does simulate these very strange things and throw them at your application. Concentrate on the integration points with other applications. Michael calls this the Test Harness. To me a more important thing is the fail gracefully. Of course fail as fast as possible but do it gracefully. Try to take down just a part of the application. Just make it fail fast and not come into some kind of vicious circle.
Good example is a not responding website, users tend to keep pushing the button because the website does not seem to be responding. Usually the website is just very busy trying to respond (due to various reasons). Making more requests does not help the system to get out of this state.
Another nice example is a horizontally scaled application. If one server gets to busy and breaks down, all traffic will be transferred to the other servers. Now the other servers take the burden and will be more busy. One after one they will fail and your whole cluster will break down.
A lot of these problems are related to sessions being created for any type of user. Try not to use sessions if you do not need them. Like I said before, the integration points with other applications result in problems often.
So, what is there to do? Well there are a few words: timeouts, a-synchronous and Circuit breaker. The first two are pretty obvious, the third maybe not. So let’s focus on that on for now.
Let me stress that this might have another name, but this is what I got out of it. What does it do? It is a mechanism for failing gracefully but fast. Imagine you have an integration point with a web service. Suddenly the web service changes it’s contract and you did not know. Your integration gives an error and every call that does need this service takes a long time and comes to the conclusion it does not work. A circuit breaker would be a sort of wrapper for the integration point. It passes the request and does not do a lot when everything goes well. Now the contract has changed and within a very short amount of time three calls are erroneous. Then the circuit breaker steps in. He cuts the connection and immediately throws an exception to the client. Now you can implement logic in the circuit breaker that waits for 5 minutes and try it with 1 client again. If it fails the connection keeps being closed, if it goes well, the connection is opened again.
In case of the mentioned example we might need a new deployment to actually overcome the problem. But what if the service was not very responsive due to heavy load. The circuit breaker could use time outs and track the amount of time outs taking place. That way you can help the called web service to recover and in the end start using it again without interference of an administrator.
It was a very interesting talk and I think there will be more info on this blog the coming weeks.
The night was shorter and with less drinks than the other two. So I am ready for day three and the drive back.