Blocking Cache

I have just came back from 2009 retreat, it was really fun. A lot of old faces and new faces, a lot of fun talking and genius ideas. I am glad i was there, i had a good time

In the retreat a few people told me they read my blogs which surprised me a lot, i only wrote two blogs and it was about almost one year ago, feel sorry about this. I used to plan to write a series for performance tuning, i still want to but maybe don't have enough time. But i do want people know, how many efforts liferay has spent on performance. Liferay is fast and getting better and better. I will write about things we have done which improved performance, people may already know this, but may not know the detail about how we made it, i will explain some detail(So you will trust we did improve it), today's topic "Blocking Cache".

We use cache(Ehcache) a lot in our codes to improve IO performace(database), in most of the case it works perfectly. But there is a special case, we may still waste some IO performance--concurrent hitting an absent cache entity.

We need two conditions to trigger this happen:
1)More than one thread hit cache with the same key.
2)These threads hit cache at the same time.
    (this is not a strong condition, it does not have to be exactly same time, if there is an overlap among threads' cache missing and populating window time, it will happen.)

When this happens we waste some IO performance, because multiple threads are doing the same jobs which actually only needs to be done once. So why not just ask one thread to do it, the others to wait there for result populated in cache by the "bad luck" working thread. This is where ehcache BlockingCache comes to help! When multiple threads hit cache with same key and an overlap window time, only the first arrived one can pass the cache getting a cache missing, then do its job hitting database and populating cache, the other threads will be blocked at the cache until cached element is ready. This seems like a good fix, but it will cause more serious problem--deadlock!

Because our cache system is complex, it is very common to look up multiple cache elements in one request, which means we may have multiple cache missings for one request(one thread). When multiple threads look up multiple cache elements in different orders, BlockingCache can cause deadlock easily.

We have two BlockingCache 1 and 2, and two threads A and B. A and B both need an element from 1 and 2 with same keys. But A needs 1 first, then 2, B needs 2 first, then 1.
Thread A locked BlockingCache 1 first, then try to lock BlockingCache 2, but right before it does that, thread B locked BlockingCache 2, so thread A has to wait thread B to release BlockingCache 2, which won't happen because thread B can't lock BlockingCache 1(thread A locked it). Deadlock!

This is a very classic deadlock case, there are two traditional ways to fix this:
1)Use a global lock to protect all caches.
    This kills all concurrency for cache, we will never use it.
2)Force all threads to access caches in the same order.
    This won't hurt concurrency, but will add very complex logic to all cache accessing code, too difficult to implement.

The traditional ways can not help us!
JVM does not support deadlock recover as database server, because there is no transaction in java code execution(That is too heavy). But in this special case we don't need deadlock recover, if we can make sure each thread locks no more than one BlockingCache, we are deadlock free. But we have to do a compromise here, because there is no way to do this without changing caching strategy. We change the strategy to when a thread tries to lock up its second BlockingCache, it has to give up its first locked BlockingCache first. This seems like a big performance lost, but actually not. Think about a case like this(this is very common), we have two threads, one has a complex BlockingCache requirement which means, it has to lock up a lot of BlockingCaches to finish its job. But the other thread's requirement is quite simple, maybe only needs to lock up 1 or 2 BlockingCaches, they just happen to require a same key for one cache. If we want to improve IO performance as much as possible, we should ask one of them to wait the other, but if we just happen to ask the simple thread to wait, it may take a long time to get the result, maybe even longer than hitting the database directly, because the complex thread really needs a long time to finish its job. We do save the whole system overhead, but we ruin certain thread's response time. The compromise is we pay some overhead for responsibility and we get deadlock free by the way, we are killing two birds with one stone!

We implement this by BlockingPortalCache, see LPS-5468.

BlockingPortalCache uses a ThreadLocal to record each thread's last lock, when a thread try to lock up a new one, it has to give up its last one, so all threads wait on its first lock(if any) can move on, either hitting database or locking next BlockingCache.

And BlockingPortalCache also uses CompeteLatch to improve concurrent performance for multi-core machine(4+), see LPS-3744. CompeteLatch scales very well on multi-core comparing to synchronized key word, and it is recycleable(if used properly), not as CountdownLatch.

So now you can safely turn on your BlockingPortalCache for performance, no need to worry about deadlock

I don't see why this complexity is needed. With a BlockingCache, the acquisition and release of the lock is done within the get(..) methods. In your example, thread A cannot even attempt to acquire a lock on BlockingCache2 until it gets a result from BlockingCache1 (thus releasing the lock on BlockingCache1.) Am I wrong?
In pic 2, Thread A has locked BlockingCache 1, which means it got a null from the get(), so it has to go ahead to populate the cache element, before it finishes this, all other threads hitting BlockingCache 1 with same key will lock there. But to populate the BlockingCache 1, thread A needs a BlockingCache 2 which just happened to be locked by Thread B, on the other side thread B can't move further because it was waiting thread A to release BlockingCache1, so deadlock happened. I think you don't understand how does ehcache's BlockingCache work, BlockingCache will return a null for the first arrived thread, and that thread is responsible for populating the missed cache element, all other threads will block there until the element has been populated.
I see. It wasn't clear to me that an item from cache 2 was needed to populate cache 1.