How many projects have you done that had a possible issue with performance? How many times was the word Cache used in the same context? I have a very simple rule DON’T use it. Of course, like with every rule, there are exceptions. For one of our projects at JTeam we were asked to try and boost performance for a large data import. It is a pretty big project with a lot of different frameworks and a lot of “AspectJ” aspects. Before diving in the code we decided to have a look at performance bottlenecks using a unit test and a profiler tool. The customer had done this as well so we could reuse his work. Using the profiler it is not hard to find points in the application where a lot of time is spend by the threads using the profiler. The question is what to do to improve performance. Yes finally a good opportunity to use caching?. This article is about caching. I’ll show you a way you can use it within a springframework/jpa application. For now I’ll focus on caching at the back-end. For the code samples I use EHCache, not because I think it is the best. Mostly because most of the samples I could find use it and hibernate has very tight integration with EHCache. I will also give my conclusion of working with caching in general.
Read on for the good stuff
Should you cache?
In general, only if you really need to. But when do you need it. Most of the times it is performance related. This can be due to high traffic, meaning many calls for the same information. It can be due to high costs to create the data. Nowadays more and more front end technologies use ways to generate images to show to the visitors. These images are the same for all users, therefore the results of these processes can and should be cached. Another thing can be data that is hard to get. Communicating with other systems like databases or content management systems entail network latency, which can be a disaster for performance and responsiveness of applications. One side effect is that your usage of the other system decreases as well so the CPU cycles used for appropriate tasks increases.
In short, caching helps if data is reused a lot and the means to get to this data is hard. A very easy example. If you have an order entry system with orders for certain products. The products itself do not change every minute and are therefore an excellent choice to use caching. The orders however are read no more than a few times and they are changed almost the same amount of times. Therefore there is no point in caching the orders. If you want a more mathematical example I suggest the good read of the ehcache reference manual.
I cannot stress enough that you must be careful to implement cache. Using the cache needs CPU cycles and memory as well. There are a lot of things to think about like:
- When to invalidate cache, the cache cannot live forever, you must think about strategies like, invalidate after an update, limit the amount of items in the cache or limit the amount of storage available in the cache. You also need to think about which items to push out of the cache
- Pareto principle – this means that 20% of the data is used 80% of the time. If this is true, chances are good you can effectively use cache.
- When you have a clustered environment you must think about one cache per server in a cluster, or a clustered cache (which is supported by ehcache by the way)
- Do you get better scalability or performance by using a cache
Still want to cache?
Good, then it becomes time to share some code. There is no example this time. I do have some code. Luckily it is not very hard to start using it. The hard part is to try and understand where you can get the most out of a caching solution. Let’s have a look at configuring second level caching of (hibernate/jpa) entities and caching complete queries.
Configuring the cache
You need to provide some properties in the persistence.xml configuration file
<property name="hibernate.cache.provider_class" value="net.sf.ehcache.hibernate.SingletonEhCacheProvider"/> <property name="hibernate.cache.use_query_cache" value="true"/> <property name="hibernate.cache.use_second_level_cache" value="true"/> <property name="hibernate.generate_statistics" value="true"/>
As you can see we enable query cache as well as second level cache. We also enable generation of statistics. We will talk about this more later on. EHCache comes with some sensible defaults out of the box when you configure caching. But you can create your own configuration. You can easily do this by providing a file names ehcache.xml on the class path.
<cache name="nl.gridshore.samples.books.domain.Author" maxElementsInMemory="2000" eternal="false" timeToIdleSeconds="1800" timeToLiveSeconds="3600" overflowToDisk="false" />
This way we can configure the maximum amount of books in the memory and the time they should stay there. Does not look very difficult does it? Beware, making the elements to big can result in a lot of memory consumption. This means that that memory cannot be used by the application server anymore.
Second level cache
In the previous configuration block I already showed the configuration of a cache item named nl.gridshore.samples.books.domain.Author. But how do you configure that entity to be cached? One of the ways to do it is using annotations.
@org.hibernate.annotations.Cache(usage = org.hibernate.annotations.CacheConcurrencyStrategy.READ_WRITE)
That is about it. Now second level cache is configured. But what did you actually configure? You have now configured the cache for loading a single object at a time. For example the loadById kind of methods. Also when using the hibernate iterate over results sets makes use of this cache. When you are doing more like finder queries, query by example or using Criteria queries, you are not using second level cache. SO if we have a queries that we execute a lot, we need to use something else. Let’s talk about the query cache
The query cache does exactly what it presumes, it caches queries. But it caches the complete query including the parameters. So the query cache only works if you execute the exact same query multiple time. It is very easy to enable query caching, but beware, the more parameters you use, the more variations there are. The following lines of code show how you can use query caching when using a jpa Query object and hibernate implementation.
Does your cache work?
When you have finally configured your cache, you want to know if it works. Of course you can track the amount of queries that are executed, but this is hard in unit tests and even harder than in systems test. There is a better alternative when you are using ehcache. Beware, the following solution works for the combination of springframework, jpa, hibernate and ehcache. The following code comes from a test extending the AbstractJpaTests. The EntityManagerFactory is injected by spring. It is of the type org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean. Jpa has some advanced class loading things, of which you can read in the javadoc of the spring class. There is a proxy created that implements the interface import org.springframework.orm.jpa.EntityManagerFactoryInfo. You can cast the provided entity manager factory to this interface. Using this interface you have access to the actual implementation of the EntityManagerFactory interfacce. Which in our case is a hibernate specific implementation : import org.hibernate.ejb.EntityManagerFactoryImpl. Using this factory you finally have access to the right SessionFactory, which in it’s turn gives access to the Statistics object. The statistics object has a rich API to obtain all sorts of statistics. You can also print them all using the method as described below. Do not forget to enable the generation of statistics in your persistence.xml like described.
EntityManagerFactoryInfo emfi = (EntityManagerFactoryInfo)batchProcessEntityManagerFactory; EntityManagerFactory emf = emfi.getNativeEntityManagerFactory(); EntityManagerFactoryImpl empImpl = (EntityManagerFactoryImpl)emf; System.out.println(empImpl.getSessionFactory().getStatistics());
Printing the statistics will result in something like this:
Statistics[start time=1209041587295, sessions opened=7, sessions closed=3, transactions=6, successful transactions=3, optimistic lock failures=0, flushes=1319, connections obtained=7, statements prepared=37699, statements closed=37699, second level cache puts=439, second level cache hits=20, second level cache misses=0, entities loaded=1390, entities updated=3841, entities inserted=5063, entities deleted=0, entities fetched=3, collections loaded=1145, collections updated=1608, collections removed=0, collections recreated=2572, collections fetched=1145, queries executed to database=20515, query cache puts=8420, query cache hits=3589, query cache misses=8420, max query time=18]
That is about it, the implementation is not that hard. I did not cover jsr 107. There is some information to be found in the ehcache reference manual. I do not want to repeat it here. For now the jsr is not finished yet. Try to keep the caching code as not intrusive as possible (which is hard in the query cache example). Final recommendation, performance test you application before and after sing cache. Find the optimal usage and speed up you applications.
ehcache reference manual