Goodbye MySQL, Hello PostgreSQL!

Well, that title sounds a little harsh since Liferay's not ditching MySQL, they are just moving cloud systems to PostgreSQL...

Liferay is Moving from MySQL to PostgreSQL

Introduction

Goodbye MySQL? No, don't worry, Liferay isn't dropping support for MySQL, don't start panicking just yet.

Liferay has been transitioning new PaaS and new/existing SaaS systems from MySQL to PostgreSQL 16, so the cloud-based systems are saying goodbye to MySQL and saying hello to PostgreSQL...

This blog post isn't really about announcing the change, I believe that has already been done in other channels.

I have heard, although haven't seen any announcements, Liferay is planning on releasing tools to help on-prem clients migrate from MySQL to PostgreSQL also. When I hear anything on this, I'll let you know...

This blog post is all about the why it is being done.

Because the why is important, right? I mean, if you're about to stand up an on-prem Liferay environment or your own hosted cloud environment, one of the choices you're going to have to make is what database should you host Liferay on.

And if you're already running Liferay on one database (such as MySQL), you might not have given any thought to moving to another database and whether there might be advantages in undertaking that effort...

So to me, the why behind the move is interesting and definitely blog-worthy.

Our only hope, our only peace is to understand it, to understand the `why.’ `Why’ is what separates us from them, you from me. `Why’ is the only real social power, without it you are powerless. And this is how you come to me, without `why,’ without power. Another link in the chain. The Matrix Reloaded

Performance

The why boils down to one word: Performance.

Well, actually there's a second word: Scalability.

Oh, and a third phrase: Resource (CPU & Memory) Management.

Those big words hide a lot of the details, so let's start peeling away the layers of this onion and see what we find.

Testing

To evaluate MySQL vs PostgreSQL, Liferay had to create a load test to get some numbers.

The methodology of the tests is similar with what Liferay does for the performance whitepapers. A throughput-oriented testing tool was used to emulate tens of thousands of virtual online users. These online users are emulated to have "sleep" time to reflect the real-world users reading the articles/posts on the page, typing in the comment or new-post text box, during which no request will be made. Since the sleep time is fixed between each operation, as defined by the test case, the more online users can exist, the more throughput the portal can hold. The number of the online users are used as the result number.

In the tests, "SampleSQLBuilder" was used to pre-generate SQL files to be inserted into the database. This is because it was too slow to generate e.g. 100k users or 10k MB posts by calling portal's API...

Normally this is not something I would recommend doing because it can be easy to get the data wrong, but these were Liferay Engineers so they get an exception. To everyone else, I'd recommend using the Batch APIs because they don't have the performance issues individual portal API calls would have...

In the first round of the comparison tests, Liferay found that

  • There was almost no difference in the results when running database-light test cases, like Login;
  • There was almost no difference in the results in a relatively database-heavy test case like Document Library, but the database CPU % with MySQL was 3x the value of PSQL: 13% vs 4%
  • There was a big difference in the results in the very database-heavy test case Message Boards, as the MySQL database CPU often jumped between 20% and 90%, causing the portal response times to be unstable, while the PSQL was able to constantly consume ~10% of DB server CPU.

Regarding the bad result with MySQL, Liferay noticed that the generated test data using SampleSQLBuilder had several issues:

  • all MB posts were created by the same user;
  • all MB posts were created and published on the same date and time.

These issues effectively confused MySQL's query optimizer, as the indexes on the user or date columns can't reduce any data, causing MySQL to create a really bad query plan involving irrelevant indexes and back-to-table-queries. On the other hand, PSQL just uses in-memory full-table scanning, which was much faster and lighter.

To resolve this, Liferay made a portal side optimization about how ServiceBuilder generates indexes for finders and also modified SampleSQLBuilder to fix the 2 issues. With all the optimizations, in the Message Boards test case,

  • MySQL was stably consuming less than 20% database CPU while holding ~27000 online users;
  • PSQL was stably consuming 3% database CPU while holding ~30000 online users.

In general, PSQL did show better CPU consumption in the tests compared to MySQL.

It's important to note that in the database setups, the MySQL server's configuration was fine-tuned, while PSQL was using the out-of-box defaults. This implies that if the team had spent some time on the PSQL tuning, the results could have been [perhaps significantly] improved.

To go back to the Why though, the test showed in general that:

  • Performance using PostgreSQL for Liferay loads were better than those for MySQL.
  • The better performance numbers reduced scalability demands while increasing the effects of scaling up.
  • The CPU and memory usage under PostgreSQL were better than those for MySQL which would affect cloud environment sizing decisions.

So these improvements, when multiplied by the number of environments Liferay is managing in PaaS and SaaS, translate into significant cost savings for Liferay while at the same time slightly improving performance for individual environments.

Conclusion

So, after this testing and the experience from monitoring all of the Liferay SaaS environments (which were exclusively MySQL), Liferay decided to make its default database in PaaS and SaaS to be PostgreSQL.

Had they asked me originally, I would have pointed them towards PostgreSQL over MySQL. I've always found PostgreSQL to be better for Liferay.

And if you research PostgreSQL vs MySQL performance and have some understanding about how Liferay data is structured (I'm not guiding you to tear into the Liferay data model here), it kind of makes sense.

Pretty much every online review will make a general recommendation like the one I found here:

PostgreSQL vs MySQL: Performance Comparison

  1. PostgreSQL Performance
    • PostgreSQL is great for handling complex queries and large datasets.
    • It excels in read-write-heavy operations and supports advanced features like concurrency control.
    • However, it may be slower for simple queries compared to MySQL due to its additional features and overhead.
  2. MySQL Performance
    • MySQL is known for being faster in read-heavy operations and simple queries.
    • It performs well in high-traffic websites or applications that need to handle many requests quickly.
    • It’s less resource-intensive, making it more efficient for smaller-scale operations but may struggle with complex queries or large datasets.

When you combine this with a bit of understanding about the Liferay data model, how everything is an asset, all of the cross-table linking not maintained by managed foreign keys, etc., then you would start to classify Liferay as having complex queries and in many organizations there are large datasets (Liferay in fact is designed to support large dataset situations).

So yeah, when they found that PostgreSQL was better for Liferay than MySQL, I wasn't surprised.

The only question left to answer is "What should you do?"

If you're already on PostgreSQL, great choice, you don't have to do anything.

If you don't have a Liferay environment already and you're facing the choice, choose PostgreSQL if you can. By that I mean if it is truly a choice, i.e. you have experience with both database systems, you have resources that can install, admin and manage both systems, then choose PostgreSQL.

If you're on MySQL, though, do you stay there or do you change to PostgreSQL?

To me, it always comes down to what do you know, or what do you have resources for...

If you're a MySQL shop, you have no PostgreSQL installations, you've never used PostgreSQL, you don't have PostgreSQL administrators, etc., the pain you'll face trying to use PostgreSQL for Liferay just to get a bit of a performance gain may not be worth it.

If it truly is a choice, meaning you have experience or administrators for both platforms, the next question is how the ROI measures up. Migrating from MySQL to PostgreSQL is not going to be free, there's definitely going to be a cost there in time, money and resources. The return though is going to be on the performance impact to the site which is going to be rather arbitrary. Only you can know how your site would perform in PostgreSQL vs MySQL, only you can determine if there is a benefit in changing, and only you can evaluate what the performance gain might be worth. So only you can determine if there would be an ROI on the change and whether that is going to be greater than the cost to switch.

So generally, I wouldn't think any existing site should really consider switching from MySQL to PostgreSQL. While I can't know your context, my own impression is that the ROI is generally just not going to be there.

For Liferay though, for hosting as many instances as they do in SaaS and PaaS, the ROI is completely clear and significantly covers the costs they'll face in moving environments from MySQL to PostgreSQL, so it is easy to justify.

I just think that few other implementations are going to be able to demonstrate that as clearly.

As a last word, though, I'll just go back to what I've already said - if you have a choice, go with PostgreSQL.