Why Load Balancing != Clustering in Liferay

Just a quick post on a couple of misunderstandings I've seen out in the field lately... Simply setting up Liferay Portal to be load balanced does *not* mean it is clustered. 
 

First, let's be clear on what "load balancing" is. Load balancing is simply a technique to distribute workload across resources. It is but one component in a high-availability cluster. In Liferay's case, we are load balancing the volume of requests across multiple app servers, which may or may not be on physically separate hardware. Initially, this may seem sufficient, until you realize some of the components that the portal uses.

 

Clustering is not just pointing a load balancer to two (or more) Liferay nodes. You're not done. Why? Because there are certain components that need to be either centrally managed or synchronized. Here is the basic check list of components that need to be addressed in a Liferay cluster:

 

1. Load Balancer - it can be software (i.e. - Apache), or hardware (F5), or whatever you wish, really. All it is doing is redirecting requests. 

2. Centralized Database - Hopefully, you have gotten off of HSQL and are using a real DB server. This is a JBDC connection. It is abstracted from Liferay's point of view. Any level of redundancy you have behind that JDBC connection is up to you and your DBA. Just as an example, you may choose to configure a MySQL cluster, or Oracle RAC, for DB high availability

3. Ehcache - This is what Liferay uses out-of-the-box for it's Hibernate level 2 cache. This needs to be configured to sync, else you will see inconsistencies depending on what node the load balancer redirects end users to. You are not forced to use Ehcache, it is simply what it ships with. You could use something like Terracotta, for example. If you do not do this, you will most definitely see inconsistencies depending on the node the end user is on, due to stale caches. 

4. Lucene - This needs to be centralized. This can be done: a) via JDBC (can work, but there may be issues with speed and table locks), b) swapped out for something like SOLR (runs as a webapp in Tomcat), or c) starting with Liferay 5.2 SP1 there is a clusterlink feature that can be turned on where each node maintains its own replicated cache. If you do not do this, you will see inconsistencies with search and other indexed data that is returned from the DB. 

5. Document Library & Image Gallery - This needs to be centralized. This is because each node keeps the assets locally on the filesystem by default. While the meta-data is in the DB, the files serve up faster this way (vs. BLOBS in a DB). So, you need to either a) point the content repository to be stored in the DB (can work but performance may suffer) via JCRHook in portal properties, or b) mount a path a shared storage (i.e. SAN or other file system storage) and configure each node to connect to this common place via AdvancedFileSystemHook in portal properties. If you do not do this, the meta-data regarding your documents will be in the DB, but when you try to retrieve them, the node may or may not find the document there, physically. 

 

This is just an outline of what needs to be done. To find out more details consult the Liferay Portal Administrator's Guide or contact Liferay Professional Services

Blogs
Hi James,
quick question,
if documents and images are stored in SAN to make them centralized, is it advisable to use same same SAN to centralize Lucene indexes?
Mitesh,

First of all, you don't necessarily need a SAN for the DocLib, but something like a SAN, if not an actual SAN.

Second, for Lucene, that might not work because for the indexing you need something pretty fast. That is why in portal properties, there are three options: file system, JDBC, and RAM (i.e. - RAM drive). But not everyone has a RAM drive. You also have the option of swapping out Lucene with something like SOLR index server. SOLR has performed faster in certain very high volume situations.
Thanks for making misunderstandings clear .. it was nice article..
On a related note, here is nice explanation on why - Load balancing without clustering will always result in data corruption?

========
Hibernate thinks it's cache always right. i.e. If one node changes a entry in the DB, and another node happens to have that entry already cached, whether there is a change made of not, hibernate will always do sanity checks before removing the entity from it's cache. It will see "Hey, this entity has been changed in the DB! There must be an error there. Good thing I have a cached copy! Overwrite the 'corrupt' data in the DB!"
========

(Thanks to Ray Auge for this explanation)