Blogs
Avoid session storage at all costs.
Introduction
For folks that know me, they know one of my favorite soapbox rants is on HTTP and/or Portlet session storage.
Recently my friend Kyle Stiemann wrote a blog about Session Storage in Liferay, and he reached out to me to proof the blog post before it was published.
While it was really well written and informative, I must admit I didn't want it published. Providing a convenient place showing how to use sessions, even with all of the warnings, to me seemed almost like promoting session storage too much. I am very much against session storage usage, but could only find one other reference that shared my opinion: https://dzone.com/articles/rules-thumb-dont-use-session
Since I hadn't really made my rant public before, and since I've been getting questions lately about session usage, I thought it was about time that I make my case publicly so it's available for all to see (or trash, as the case may be).
Session Storage is Evil
There, I said it. Session storage is evil.
I'll go even farther - if a developer uses session storage of any kind, it demonstrates that the developer either doesn't care about the runtime impact or doesn't know about the runtime impact, and I'm not really sure which is worse.
Session Storage as the Sirens Song
For developers, session storage is akin to a sirens song. Sailors, hearing the siren's song, would steer their ships upon the rocks, leading to destruction and their death.
Session storage is the same for developers. It is so darn easy to use. It has been part of the javax.servlet.http.HttpServletRequest class since the very first release. There are tons of examples for using session storage for developers to reference online. It is presented to new Java developers who are learning to build servlets. And considering other temporary unstructured data storage, it is so simple.
So it definitely has its allure.
So how can it be evil?
How Session Storage is Evil
Session storage is not evil from a developers perspective, but they are absolutely evil from deployment and runtime perspectives.
Here's a short list of why session storage is evil:
1. Session Storage Consumes Server Resources.
Although this may sound obvious, it may surprise developers if load and capacity were not considered during development.
As a developer, it might seem trivial to store a list of objects retrieved for the user into their session. The code is easy and unit and system testing will not reveal any obvious defects.
Problems may only surface during load testing.
Let's consider a calendar implementation. Imagine a system where, when a user logs in, their list of upcoming calendar events is retrieved and stored in their session. The idea was that this would offer a significant performance boost by not retrieving the data from the database every time the user refreshes the page.
Such a system would be easy to code, easy to unit and system test. After all, we're going to do our testing with some relatively small number of events, the session storage aspect will work out fine and performance would be great.
Now consider that an event is, say, on average 250 bytes. Then say on average, a user would have 20 events on their calendar at any given time. Rough math gives us on average 5k then for each user.
Now given that session storage is in-memory only, further rough math says that this system will accommodate about 200 users per MB of memory.
These kinds of numbers define what our capacity is going to be for mostly concurrent users at any given time. If the numbers used for the average increase, i.e. you add a description string to the event and events grow to an average of 500 bytes per, this will decrease your capacity by half.
And it is "mostly concurrent" because session storage is only reclaimed if a) the user logs out or b) the user's session times out. You cannot expect that every user will always log out, in fact you should plan on the worst case scenario that users never log out and their 5k of calendar events will remain in memory until the session expiration timeout.
Factoring these things together, you can start to see how session data can actually start to consume the system resources and can negatively effect node capacity.
2. Session Storage is Implemented using Object Serialization.
All objects that will be stored to the session must implement java.io.Serializable.
So on the surface, that might not seem like a big deal. And maybe for some use cases, it isn't a big deal. If you control the classes that you will be pushing to the session, serializability is easy to include. The problem comes when you do not control the classes that you want to push to the session. Maybe these classes come from another team in your organization, or maybe the classes come from a third party library or Liferay itself.
When you don't have control over the classes, you may not be able to make the classes serializable so they might not be compatible with session storage.
And honestly, developers are really, really bad about implementing serialization. I guarantee that few developers actually follow the best practices for using Serializable. If you think you're the exception, check out http://www.javapractices.com/topic/TopicAction.do?Id=45 or https://www.javacodegeeks.com/2010/07/java-best-practices-high-performance.html or https://howtodoinjava.com/java/serialization/a-mini-guide-for-implementing-serializable-interface-in-java/ for good Serialization usage. Compare that to yours or Liferay's code to see if you can find an instance where Serialization is implemented according to best practices.
Did you know that serialized data is not really secure? Serialized objects capture the data the instances contained, but by default it is not going to be encrypted at rest. Look for the .ser files from Tomcat after session storage to determine if your data is exposed.
Serialized data also has issues. OWASP defines a vulnerability inherit when deserializing untrusted data: https://www.owasp.org/index.php/Deserialization_of_untrusted_data. If you are set up to persist session data across restarts, the reality is that this data must be considered untrusted because there are no guarantee that the serialized data has not been tampered with.
Finally, as serialization is seen as the source of many Java security defects, there are reports that Oracle plans on dropping support for serialization: https://www.bleepingcomputer.com/news/security/oracle-plans-to-drop-java-serialization-support-the-source-of-most-security-bugs/. When and if this happens, this will likely force changes on how session storage is handled.
3. Session Data is Bound to One Node.
When using session data, it is normally only stored only on a single node. If you only have a single production application server, then no problem because the stored data is where it needs to be.
But if you have a cluster, data stored in a session on a node is not normally available across the cluster. Without some kind of intervention in the deployment, data stored to a session for a user is only available to the user if their request ends up back on the same node. If they end up on a different node, the session data is not available.
Switching nodes can happen automatically if the node the user stored the data on either crashed or was shut down.
In an OOTB session configuration in Tomcat, sessions can be used to store data, but this data is not persisted across restarts. So any data stored in the session is lost if the node is either stopped or crashed; so even if you can restart a failed node, the session data is lost.
You can configure Tomcat to persist session data across restarts, but even in this configuration when the node is down, the session data is not available since it is bound to that specific node. Plus you still have all of the issues with serialized data from #2 above to deal with.
The first wave of Java shopping cart implementations used session storage for all of the cart items. It was super easy to use as a temporary, disposable store of data not worthy of longer-term persistence. But customers had hard times using these carts because they would sometimes see their cart items disappear. This would happen if the customer got switched to a new instance because the node crashed, the load balancer redistributed traffic or the node was taken down for maintenance.
4. Session Replication is Costly and Consumes Resources.
One solution for the loss of the node w/ all of the session data was to introduce session replication.
Session replication copies data stored in session to all of the other nodes in the cluster. Since this is not an OOTB solution for most app servers, the solution requires additional server(s) and software. There is no standard for session replication, so each offering is custom and leads to lock-in. Often times at the start of a project these additional costs are not planned for; they usually crop up at the end of the project when the operations team is trying to figure out how to fix an implementation that was broken due to use of session storage, so you get an end of project surprise implementation cost.
Once the replication stuff is in place, there is still the ongoing overhead to deal with. Every session data update will result in additional network overhead (to share the update). In some cases, replication is done by copying to all nodes (in a mesh) which has a large amount of overhead, in some cases session storage is centralized to minimize on network overhead.
In either case, operationally you are adding another possible point of failure in the infrastructure. What happens if your session data container crashes? How will your application recover? Is it even operable at that point? What are the disaster recovery concerns? How do you fail over gracefully? In a distributed cluster, how do you handle latency issues or network issues between the regions? How do you monitor availability of your session replication infrastructure? How do you debug issues arising from session replication? Will existing session data be available to a new node coming online or is there some amount of syncing that needs to be done?
As developers, we often don't have to think about any of these kinds of issues. But I guarantee that these issues exist and must be planned on from an operations perspective.
Remember the math example from item #1 above? For session replication, all of the math gets multiplied by the number of nodes deployed in the cluster. The replication solution needs to be able to store as much session data as is generated by X number of nodes, each under peak load; anything less could potentially lead to data loss. And the "mostly concurrent" aspects encountered by users not manually logging out, this is an additional factor that affects the sizing of your replication solution.
5. Sticky Sessions Unbalance Resource Consumption.
Another option often used with session storage is the sticky sessions. In sticky sessions, the load balancer is configured to send traffic originating from the same host to the same target node. This ensures that a user will have access to the data stored in their session.
It is the lightest-weight solution for stored session data use since it won't require additional hardware/software, but it does have its own serious drawbacks.
If the node crashes or is taken out, the user loses access to the session data and the UX will not be good. The load balancer in this situation will switch the user to another node, but the data in the session is still not available.
However, while the node is up, all traffic originating from the same origin will always go to the same node.
In an autoscaling scenario, sticky sessions work against being able to distribute traffic amongst the nodes. If you have a two node cluster and both nodes are saturated, when you spin up a third node it will only receive requests from new origins; the two saturated nodes will remain saturated because the sticky session logic binds origins to nodes.
Ideal Solution
So what's the ideal solution? Avoid session storage altogether.
Seriously, I mean that.
The benefits are tremendous:
- No resource consumption for session data.
- No additional network chatter to broadcast session data for replication.
- Load balancing able to shift load across the cluster based on capacity.
- No additional costs in sizing the nodes or session replication solutions.
- No autoscaling issues.
- No security concerns.
- No lingering data waiting for session timeouts.
- No developer impact to correctly implement Serializable interface for internal and external code.
Most often the pushback I get on this is a developer who needs to stash temporary, unstructured data for a short period of time. If session storage is taken off the table, what is left?
The database, of course. If your data is a Java object graph, you can marshal that into JSON or XML and store the string in the database if it must be completely unstructured. Or you could use a NoSQL solution like MongoDB or even Elasticsearch to hold the data. For wizard-like forms, you can carry forward form field values into hidden fields, allowing the temporary data storage to occur in the client's browser instead of your application server.
There are just so many solid, cluster-friendly ways to carry this data around. All it takes is a good architecture and design.
And the general desire to avoid the evil that comes with session storage...
If you are advocating using session storage, consider the items above. If a coworker is using session storage, call them out on it as soon as possible. If a potential candidate proposes using sessions to store data, question whether the candidate understands the runtime issues that plague session storage. If a contractor wants to use session storage, get a new contractor.
Follow the advice of Odysseus; fill your ears with beeswax and avoid the Sirens Song sung by Session Storage...

