Ask Questions and Find Answers
Important:
Ask is now read-only. You can review any existing questions and answers, but not add anything new.
But - don't panic! While ask is no more, we've replaced it with discuss - the new Liferay Discussion Forum! Read more here here or just visit the site here:
discuss.liferay.com
Hardware load balancing and session management
I am planning out the addition of a second server and load balancer. How will this affect session management? Will I need to rely on setting something like "sticky IPs" on the load balancer or will a logged in user's session be preserved from server to server?
My setup will be: 2 (or more) tomcat servers (no apache front end web servers) and 1 database server. Basically, I would make a copy of the current server...that server doesn't know load balancing is going on.
Suppose a user logs in. Then, the server that he was hitting during the login process is taken out of service. His next click on the site will be directed to the second tomcat server. Will he still be logged in? If not, how can this be handled to avoid a disruption to the user's logged in session?
We are running Liferay Portal Enterprise 4.2.1 (Machen / Build 3501 / January 10, 2007).
If anyone has worked through this kind of thing...the next question will be how you handle uploaded file attachments from users? rsync? shared storage array?
My setup will be: 2 (or more) tomcat servers (no apache front end web servers) and 1 database server. Basically, I would make a copy of the current server...that server doesn't know load balancing is going on.
Suppose a user logs in. Then, the server that he was hitting during the login process is taken out of service. His next click on the site will be directed to the second tomcat server. Will he still be logged in? If not, how can this be handled to avoid a disruption to the user's logged in session?
We are running Liferay Portal Enterprise 4.2.1 (Machen / Build 3501 / January 10, 2007).
If anyone has worked through this kind of thing...the next question will be how you handle uploaded file attachments from users? rsync? shared storage array?
For loadbalancing, it's suggested to use sticky sessions in what's known as a "Layer 7" configuration.
Basically, the loadbalancer having this L7 capability, can observe content of requests to decide how to associate a client with a cluster node. In this case it should associate a clients JSESSIONID cookie with a particular node in the cluster, notably the node that handed out that cookie in the first place.
Sticky sessions is a good way to go, because it ensures that a client continues with a node throughout the entire session. No need to configure session replication, which is complex and normally incurs non-linearly increasing processing cost as the cluster grows (see tomcat docs cluster for an explanation). Sticky sessions incurs virtually no cost as the cluster grows, hence the reason it is being so widely adopted.
You can do this fairly easily with either a L7 capable hardware load balancer, or you can use Apache 2.
Here is an article which basically describes the Apache 2 setup. It should get you started and/or give you ideas.
http://altuure.blogspot.com/2006/11/load-balance-and-cluster-setup-for.html
Next, you mention cluster nodes going out of service. I recently setup a client to use the autologin feature which does exactly as the name implies. Based on cookies, the client can be logged back in if for some reason he looses his session.
The only point of contention here would be where an application is making wide use of session objects and the node goes down, then these session objects would be lost, as the other nodes are not aware of them. For this precise reason, Liferay itself does not rely on session objects to such point that running in this scenario would be a problem. i.e. Liferay should be safe to run in a cluster without session replication.
If your applications are storing uploaded files to the file system, then I would suggest using a shared storage, like NFS. Liferay itself uses Lucene for it's document library and other such data. Recently we upgraded to a version of Lucene which would be capable of operating on shared storage. Or you could use a remote or standalone Lucene server and configure all nodes to use it rather than a local store.
We're in need of some better loadbalancing/clustering documents, and if I can, I'd like to do that soon. These will likely focus on open source tools like Apache 2/Tomcat/LVS as the loadbalancer.
Basically, the loadbalancer having this L7 capability, can observe content of requests to decide how to associate a client with a cluster node. In this case it should associate a clients JSESSIONID cookie with a particular node in the cluster, notably the node that handed out that cookie in the first place.
Sticky sessions is a good way to go, because it ensures that a client continues with a node throughout the entire session. No need to configure session replication, which is complex and normally incurs non-linearly increasing processing cost as the cluster grows (see tomcat docs cluster for an explanation). Sticky sessions incurs virtually no cost as the cluster grows, hence the reason it is being so widely adopted.
You can do this fairly easily with either a L7 capable hardware load balancer, or you can use Apache 2.
Here is an article which basically describes the Apache 2 setup. It should get you started and/or give you ideas.
http://altuure.blogspot.com/2006/11/load-balance-and-cluster-setup-for.html
Next, you mention cluster nodes going out of service. I recently setup a client to use the autologin feature which does exactly as the name implies. Based on cookies, the client can be logged back in if for some reason he looses his session.
The only point of contention here would be where an application is making wide use of session objects and the node goes down, then these session objects would be lost, as the other nodes are not aware of them. For this precise reason, Liferay itself does not rely on session objects to such point that running in this scenario would be a problem. i.e. Liferay should be safe to run in a cluster without session replication.
If your applications are storing uploaded files to the file system, then I would suggest using a shared storage, like NFS. Liferay itself uses Lucene for it's document library and other such data. Recently we upgraded to a version of Lucene which would be capable of operating on shared storage. Or you could use a remote or standalone Lucene server and configure all nodes to use it rather than a local store.
We're in need of some better loadbalancing/clustering documents, and if I can, I'd like to do that soon. These will likely focus on open source tools like Apache 2/Tomcat/LVS as the loadbalancer.
Thanks, this is helpful. It sounds like auto login could help. Can you point me to more info on that? I searched, but could not find the documentation of how it works.
Look in the portal.properties for the autologin settings.
Defaults:
The comments should be pretty clear what they do.
Defaults:
#
# Set the following to true to allow users to select the "remember me"
# feature to automatically login to the portal.
#
company.security.auto.login=true
#
# Set the following to the maximum age (in number of seconds) of the browser
# cookie that enables the "remember me" feature. A value of 31536000
# signifies a lifespan of one year. A value of -1 signifies a lifespan of a
# browser session.
#
# Rather than setting this to 0, set the property
# "company.security.auto.login" to false to disable the "remember me"
# feature.
#
company.security.auto.login.max.age=31536000
The comments should be pretty clear what they do.
another question...what do you do with "hibernate"?
I've been proceeding on the assumption that we will need to turn caching off. Already, we have weird things happen because of caching--if I update something in the database directly, it may end up reverting to a prior value...or requiring a restart of the service to take effect, etc.
But I was curious if that is the only choice...when you go to a load balanced application, do you lose the ability to cache database objects?
I've been proceeding on the assumption that we will need to turn caching off. Already, we have weird things happen because of caching--if I update something in the database directly, it may end up reverting to a prior value...or requiring a restart of the service to take effect, etc.
But I was curious if that is the only choice...when you go to a load balanced application, do you lose the ability to cache database objects?
There are two scenarios:
1) One turn off hibernate caching (not recommended, though it has been done succesfully)
2) Use OsCache to setup cluster wide cache management. See wiki article Clustering.
That should help.
1) One turn off hibernate caching (not recommended, though it has been done succesfully)
2) Use OsCache to setup cluster wide cache management. See wiki article Clustering.
That should help.
Thanks...
The wiki page says:
# The multicast ip is a unique namespace for a set of cached objects.
# Set it to 231.12.21.100 to keep it unique from the multicast ip set in
# cache-multi-vm.properties.
What is the significance of that 231.12.21.100 address? Is it the ip address of the cluster--the virtual IP that is being load balanced?
So you might have two servers:
192.168.1.21
192.168.1.22
with a virtual of
192.168.1.100
that translates into a public ip that end users can hit. Is 192.168.1.100 the multicast ip? What is the multicast ip referenced in cache-multi-vm.properties??
The wiki page says:
# The multicast ip is a unique namespace for a set of cached objects.
# Set it to 231.12.21.100 to keep it unique from the multicast ip set in
# cache-multi-vm.properties.
What is the significance of that 231.12.21.100 address? Is it the ip address of the cluster--the virtual IP that is being load balanced?
So you might have two servers:
192.168.1.21
192.168.1.22
with a virtual of
192.168.1.100
that translates into a public ip that end users can hit. Is 192.168.1.100 the multicast ip? What is the multicast ip referenced in cache-multi-vm.properties??
No! Actually you really do use 231.12.21.100 for each computer. 
This entire subnet 231.12.x.x is a multicast address pool. You can have 1000+ cluster nodes all configured to listen on the same multicast address... due to precisely that reason, it's multi-cast. In other words, all the information sent from a node to that address is sent simultaneously to every one listening on that address, which is why it's so great for clustering.
Here is a quote from Wikipedia:
<span style="text-decoration: none;">
This entire subnet 231.12.x.x is a multicast address pool. You can have 1000+ cluster nodes all configured to listen on the same multicast address... due to precisely that reason, it's multi-cast. In other words, all the information sent from a node to that address is sent simultaneously to every one listening on that address, which is why it's so great for clustering.
Here is a quote from Wikipedia:
<span style="text-decoration: none;">
Multicast: A multicast address is associated with a group of interested receivers. According to RFC 3171, addresses 224.0.0.0 to 239.255.255.255 are designated as multicast addresses. This range was formerly called "Class D." The sender sends a single datagram (from the sender's unicast address) to the multicast address, and the routers take care of making copies and sending them to all receivers that have registered their interest in data from that sender.</span>
Thanks, Ray
So, to summarize...key issues for load balancing
1) session management
use sticky sessions on load balancer. However, if the node is restarted, those users stuck on that node will lose their sessions. If they have selected autologin ("remember me" checkbox at login) then they will be fine. Otherwise, they will have to log back in.
I don't think we can force users to autologin--it would be perceived as less secure by users. And we probably can't rely on them to check the checkbox. So middle of the day server node restarts are always problematic.
The place where this is most problematic is for users who are in the middle of writing a post. When they submit, they lose what they typed. People get very annoyed about that. It's always the CEO's neighbor it happens to.
It sounds like there is no good way to avoid dropping some users when we restart the application.
2) hibernate - just need to configure the files as described in the wiki page.
3) local files - when users upload attachments, etc., they should be put on a shared filesystem or synced in some way
So, to summarize...key issues for load balancing
1) session management
use sticky sessions on load balancer. However, if the node is restarted, those users stuck on that node will lose their sessions. If they have selected autologin ("remember me" checkbox at login) then they will be fine. Otherwise, they will have to log back in.
I don't think we can force users to autologin--it would be perceived as less secure by users. And we probably can't rely on them to check the checkbox. So middle of the day server node restarts are always problematic.
The place where this is most problematic is for users who are in the middle of writing a post. When they submit, they lose what they typed. People get very annoyed about that. It's always the CEO's neighbor it happens to.
It sounds like there is no good way to avoid dropping some users when we restart the application.
2) hibernate - just need to configure the files as described in the wiki page.
3) local files - when users upload attachments, etc., they should be put on a shared filesystem or synced in some way
Most file attachments, to MB posts, etc. reside in the DB, so you're good there. The main concern are files stored in the JCR jackrabbit repository, which by default is a file system level store. Luckily there is a brand new wiki article on how to configure jackrabbit to store it's files in a/the db.
http://wiki.liferay.com/index.php/Jackrabbit
So, that one is solved as well.
As for the session, you can always use a db store for those as well. See the tomcat docs for details. It scales better than session replication. The best case scenario, I think, is the following:
- load balancing using sticky session (Apache2, or Layer 7 load balancer, IP stickiness works too)
- DB session persistence (for those times when the CEO's neighbor is on line
)
- clustered db caching via oscache
- DB based JCR (jackrabbit) file store
That's likely your best bet for scalability.
http://wiki.liferay.com/index.php/Jackrabbit
So, that one is solved as well.
As for the session, you can always use a db store for those as well. See the tomcat docs for details. It scales better than session replication. The best case scenario, I think, is the following:
- load balancing using sticky session (Apache2, or Layer 7 load balancer, IP stickiness works too)
- DB session persistence (for those times when the CEO's neighbor is on line
- clustered db caching via oscache
- DB based JCR (jackrabbit) file store
That's likely your best bet for scalability.
Yes, Jackrabbit is solved, how about lucene?
Andy
Andy
Ray Auge:
Most file attachments, to MB posts, etc. reside in the DB, so you're good there. The main concern are files stored in the JCR jackrabbit repository, which by default is a file system level store. Luckily there is a brand new wiki article on how to configure jackrabbit to store it's files in a/the db.
http://wiki.liferay.com/index.php/Jackrabbit
So, that one is solved as well.
As for the session, you can always use a db store for those as well. See the tomcat docs for details. It scales better than session replication. The best case scenario, I think, is the following:
- load balancing using sticky session (Apache2, or Layer 7 load balancer, IP stickiness works too)
- DB session persistence (for those times when the CEO's neighbor is on line)
- clustered db caching via oscache
- DB based JCR (jackrabbit) file store
That's likely your best bet for scalability.
I have been busy with other issues but need to return to this ASAP...
We also use Lucene, so I need to know how to handle that...
As for files being stored in the database...I don't know where they would be. I think all the attachments are in
/Liferay_Jboss/downloads
with a folder for each user
When I have a "cookbook" I'll post it here...but any help in the meantime is appreciated!
We also use Lucene, so I need to know how to handle that...
As for files being stored in the database...I don't know where they would be. I think all the attachments are in
/Liferay_Jboss/downloads
with a folder for each user
When I have a "cookbook" I'll post it here...but any help in the meantime is appreciated!
disregard my question about the /downloads folder--that is custom code on our end
Here's another random question...
I'm trying to work out a process to deploy our code to these servers. As I was testing, on a test server, I accidentally wiped out the whole /Liferay_Jboss directory. Is there anything in there that is machine-specific? Can I simply do a command like:
rsync -au user@production_server:/Liferay_Jboss /Liferay_Jboss
to get things back?
I'm trying to work out a process to deploy our code to these servers. As I was testing, on a test server, I accidentally wiped out the whole /Liferay_Jboss directory. Is there anything in there that is machine-specific? Can I simply do a command like:
rsync -au user@production_server:/Liferay_Jboss /Liferay_Jboss
to get things back?
I wanted to check back here and see if there is any new information on load balancing. I've cobbled together an approach based on the advice here, but it seems very much a hacked together set of random things. I am hoping someone has created a step-by-step, best practices approach.
One approach I've seen is having multiple worker threads from various tomcat servers going into an apache webserver. Frankly, I don't see any logic in that. I will have a hardware device, a Netscaler, that is desiged to do layer-7 http load balancing of web servers. So I will just run multiple app servers that are serving the app on port 80.
I've been testing this approach by using an apache webserver to do proxy redirects to two test servers...this allowed me to configure the OSCache and jvm properties for sticky ips. However, I notice that when I go to the Enterprise administrative site, I only see the live sessions of users that are attached to that physical server. That makes me wonder what will happen when I go to edit CMS articles...is it all going to remain in sync automagically? I can live with the separate sessions, but I just want to be sure the admin site works ok like this. Obviously, I can test, but I thought I'd throw this up and see if anyone has any new ideas since I started this thread...
One approach I've seen is having multiple worker threads from various tomcat servers going into an apache webserver. Frankly, I don't see any logic in that. I will have a hardware device, a Netscaler, that is desiged to do layer-7 http load balancing of web servers. So I will just run multiple app servers that are serving the app on port 80.
I've been testing this approach by using an apache webserver to do proxy redirects to two test servers...this allowed me to configure the OSCache and jvm properties for sticky ips. However, I notice that when I go to the Enterprise administrative site, I only see the live sessions of users that are attached to that physical server. That makes me wonder what will happen when I go to edit CMS articles...is it all going to remain in sync automagically? I can live with the separate sessions, but I just want to be sure the admin site works ok like this. Obviously, I can test, but I thought I'd throw this up and see if anyone has any new ideas since I started this thread...
TTT.
David Atkins:
My setup will be: 2 (or more) tomcat servers (no apache front end web servers) and 1 database server. Basically, I would make a copy of the current server...that server doesn't know load balancing is going on.
We have a similar setup as above and are using Liferay 5.2.2. I am seeing an issue where we are getting Calendar Event reminder emails from both servers. Is there a setup option we are missing for this to work without getting flooded with duplicate reminder emails?
To fix this in the meantime I have added this line to the portal-ext.properties file of one of the servers so that only one sends out event reminder emails.
calendar.email.event.reminder.enabled=falseThe only problem with doing this is if for some reason the server that can send reminders goes down, no reminders will be sent.
Any ideas for a good fix would be appreciated!