Understanding Liferay's Database

The long missing ER diagram

If you're following Liferay's Forums, Slack, and other locations where questions regarding Liferay Portal and Liferay DXP are asked and answered, you'll find that there's one question coming up over and over again. It's about Liferay's internal database's Entity Relationship diagram, to understand the database structure and content.

Typically this is asked for because individuals would like to manually import data, change table structures, update certain content directly.

I've now given up on the arguments and created such a diagram. Before I tell you where to get it: Let me remind you for the second-to-last time that changing Liferay's internal data is a galactially bad idea. The database should be a closed box for you - it's not yours, and you have no business changing it. Many people who changed data manually wish they hadn't. Thus several of us highly active forum-posters strictly recommend not even trying to understand the database.

Instead of thinking in SQL terms, you should think in API terms. Liferay provides a comprehensive API that you can use to interact with all the content that's stored in the database. No component in Liferay itself (apart from the actual persistence layer) manipulates the database directly: Everything utilizes the API.

Liferay's internal database has no foreign key relationships, and uses manual counters - because this is the technique that's guaranteed to work across the full range of supported databases (and always was). Liferay is not using fancy vendor specific options - or those that used to be vendor specific a while ago. This gives you the flexibility of running it on the infrastructure of your choice.

In the past, changing Liferay's internal database, inserting / updating data through SQL, has led to problems - and these problems often don't manifest right away, but only after months. For example during your next upgrade, or when suddenly there are unexpected duplicate primary keys (because API-inserted objects now use keys that manually imported data used in the past).

The only way you should insert or manipulate data should be the API.

That being said - I've went through the trouble of generating an ER diagram poster for Liferay DXP and CE 7.2 (Format  A1 landscape, approximately 84*59 cm or 33*23 inch) that you can print and hang in your Liferay team's workspace. It will help you with those manual changes that your team thinks they really need to make.

Once you've acknowledged the warnings above and still really really want to have the database documentation: Download the poster here. You have been warned. I hope that this article and poster simplifies future slack&forum answers, now that I can point to this article.

Blogs

Nice article, but I can't totally agree with the point here.

 

Understanding database structure and manual modification of data are different things.

 

Of course, nobody should manually insert/modify data to internal Liferay tables, change tables structure, etc.

 

But sometimes you need just to verify that data has been saved to database.

 

For example - you can't find created Web Content in Control Panel. It may be an indexing issue, or data is really missing in database. First of all you need to verify if it's there in database. But where will you look, if you don't understand the database? 'webcontent' ? No such table among the Liferay tables...   :) 

 

I agree with this argument, Vitaliy. Only I (almost) never see it as the reason for questions regarding the database structure. And if someone really believes to understand the database, they're only a few fingersnips away from the next UPDATE or INSERT statement. And, regarding your example: If you know the API, you'll have a good hint where to look. If the API is your solid first line of defense, I'm happy. :)

I couldn't agree more with this Olaf. I also feel that forcing yourself Tongo through the API has the added advantage of LOOKING AT THE SOURCE CODE. I would argue that this is too of list in terms of "best ways to learn Liferay" as you will almost certainly discover other things along the way. 

 

I will also add that the term "galactically bad" is awesome and I will be desperately seeking an opportunity to use this expression today. I'm now hoping that someone asks me about manipulating the database directly lol :)

Oh come on! What the ....!! Not even funny. 

 

Actually this poster would really helpfull.

 

As if Liferay was a bug free product! It has been a while since I met Liferay for the first time and I had to dig and dive in this schema, trying to solve some bugs. And not always my bugs but Liferay's.

 

"The database should be a black box for you - it's not yours, and you have no business changing it." What?!

 

Mostly since DXP, Liferay is no longer what it used to be. Not as a product which in fact is a good one, but as a Company and a philosophy.

 

Regards!

 

 

 

 

 

I see and understand the frustrations, but the pain after changing the database is orders of magnitude worse. I've spoken to enough people who have done it, or who have helped those who have done it.

 

Thus, what I've documented should really be the default. Are there exceptions to the rule? Certainly. But hopefully the distribution of "rule vs exception" is similar to "finding bugs in your own code vs finding bugs in the compiler".

 

Ok, I admit - it might not hit *that* ratio.

""The database should be a black box for you - it's not yours, and you have no business changing it." What?!" This is absolutely true. It may not seem so at the time, but if you change a column or a type or a size or whatever, when it comes time to upgrade, your upgrade will fail. Liferay's database is a product database, and the product expects it to be in a certain state to function correctly. And besides, Liferay supports numerous ways to extend entities (via expandos) or leverage external tables (ServiceBuilder) so there is typically no reason to change the Liferay database. Fixing bugs in your code or Liferay's code is great; I hope you were able to report the issues and possibly submit a PR to address the problems so the rest of us can benefit...

" I hope you were able to report the issues and possibly submit a PR to address the problems so the rest of us can benefit"

 

You now what David, I actually did. (In case you meant I didn't).

 

And I am telling even more. Back in that time, about 2009, there existed a kind of schema (not all tables where included) that helped me understand the internals and workout a solution.

 

I can not understand how can Liferay staff be defending that it is better not having some documentation (because a ER diagram is documentation).

 

I could completely agree If you say "Do it on your own risks" "It is completely discouraged" ... But at the same time Liferay should be eager to provide documentation.

 

I have dived a lot in Liferay schema, and Learned a lot doing so.

Agree completely. If I wanted a black box I would have stayed with WebSphere Portal. Then my only option was to contact support whenever something did not work as expected.

 

We could also do a lot of damage using the Liferay APIs, perhaps they should be hidden as well ?!

The API is the only thing that knows every touchpoint when a save occurs. When you create a new user, I think there's like 5 or 6 (minimum) tables that change as a result, and all of those things need to happen when you add a user.

 

 

 

None of those key relationships will be in the database, cannot be represented by an ERD, and it is too easy to assume what is going on and miss a key point for a specific edge case or a code update.

 

 

 

Every table that has the class name/class pk combo means it is a foreign key reference, but not to a single table (like ERDs display), but could point to any table, including custom tables.  This is not a relationship that ERDs can or would model, especially since it is not backed up by any foreign keys.

 

 

 

Additionally Liferay stores to the index, something direct DB manipulation cant do and this results in all kinds of odd behaviors.  

 

 

*Using the API is the only way, during a CRUD operation, to ensure the right thing is done in the database, the index, etc.*

 

 

 

The issue here is really around setting expectations for new developers who don't know Liferay. We see things all the time, things that are done by inexperienced Liferay developers that work at given time, but broke because the dev did not have enough Liferay experience to understand why their approach was flawed in the first place.

 

 

 

Aldo and Aritz, if you guys have been around for a while, you must realize the experience you have now informs you and stops you from doing things that you otherwise would have done when you were new to the platform. I know when I first started using it, if I had access to the data model and someone told me "here it is, but don't use it" or "use at your own risk", as an experienced developer (but not experience in Liferay), I would have had at it, not understanding that yeah, it really is super easy to mess up your environment with a single delete of a record in the database.  

 

But I'm sure you'll agree, now that you know more about Liferay, that sure, you sometimes do have to open the hood and peek around inside. As Aritz points out, it is sometimes necessary. But that necessity, we understand that because of our experience with Liferay and that helps to guide our hand even when we do poke in the database. We don't just go in and run roughshod all over the place.

 

 

 

I think that's the goal with Olaf's post; to intercept the green Liferay developer, to convince them not to go poking around in the database, to point out that many have tried and many have failed, to dissuade someone from making a catastrophic mistake. It won't stop experienced developers like you or I from doing it anyway, but because of our Liferay experience, we'd do it in a much more responsible way than we would have just starting out.

What David says, 100%.

 

Folks: Am I guilty to have looked at the database? Yes. Was I tempted to change something? Yes. Did I? No. Is changing the database some tool to keep under your belt? You're arguing against David's experience of 14703 forum posts, my 5538, umpteen blog articles,  numerous customer engagements, uncounted conversations with our support team about their experience with messed up databases (where all customers were in good faith that they were doing the right thing)

 

Trust us: Changing the database is such a powerful tool that you can shoot yourself in the foot in so many ways, that it's not worth attempting to use it for anything else than just that.

 

I'll end my replies here, I'm refusing to get into an argument about the issue. If you don't want to hear the message: Be my guest. You've been warned. And I'll take the public comments here as "read receipt" - I don't need you to be compliant. Just don't come to me, crying, when you realize you should have taken it seriously.

I see you didn't like the comments at all but writing a blog post named "Understanding Liferay's Database" and after spending time reading, being "RickAndRolled" in the end with an April's fool pdf like if we where dumb kids is not nice either.

 

As you can see in the comment from community members (apart form LR staff) we are willing and we would be happy if this schema was really documented.

 

Of course I understand the risk but hey! Isn't it an open source product? You can screw the  same if you mess with the source code.

 

So, in a form, are we being discouraged indirectly to read, understand or even change de source code? at our own risk of course.

 

As I said, since DXP Liferay is not what it used to be.

 

... Never gonna give you up, Never gona let you down ... \o/

 

Regards!

Just to make clear, the discussion for me is not about making manual changes in the database and how dangerous and stupid that is. I completely agree with your opinion on that.

 

My point is about having documentation to understand Liferay functioning and, if needed, being able to fix a corrupted database due to unexpected exceptions, even if that hardly ever happens with Liferay :-)

> ... uncounted conversations with our support team about their experience with messed up databases

Exactly. Who fixes the database when something goes wrong ? Can the support team have the ER documentation ? But non-Liferay consultants (partners) not ?

 

We have migrated 6.1 and 6.2 databases where there were issues, databases grown over many years, who knows when a small thing here or there went wrong.

Without manual intervention the upgrade tools did not run to completion.

 

We manage without ER documentation, but it would have helped and saved time.

 

> ... you're arguing against David's experience of 14703 forum posts, my 5538, umpteen blog articles, ...

I'm just arguing with 2 knowledgeable and reasonable persons.

 

Actually, it is so much better  and even faster (via groovy) accessing Liferay data via API. Thank you guys. 

Yes I agree with what you said. If I need to add or alter some of my web contents from my code, I need to use API concerned with but how to find that API. Can you please direct me where to find and understand the API I need to use for different purposes related to Liferay assets.

By the way thanks in advance for any help.

For historical reasons, the Web Content API is named "Journal", so anything you find prefixed with Journal* is a good candidate. Some asset-related APIs are also prefixed with "Asset*", but if you're dealing with JournalArticles, the asset-related stuff should be covered for you under the hood.

 

Which is a good opportunity to mention stuff that happens while you're changing an article: It'll also be updated in the search index, transparently. This wouldn't happen if you'd just change the database content.

Thanks Vitality, i just read the article and wanted to write exactly the same comment.

To me this is a long missed diamond, as I do like to understand what's going on, and I learn best top to bottom - meaning when I have a graphical overview on the entities and how they are connected it helps me a LOT in finding the right API for hooking into.

Also when we want to verify what's happening (we did encounter bugs in the API, also we used what seemed to be the correct api call but wasn't,...) this makes things more easy.

To Olaf, while I do agree with your concerns and warnings, I do think that it giving that information benefits more that it does harm.

Because: If I want to modify the DB directly, I will do that nevertheless. Now if I got more information on the structure (including the given warnings) I might look for the correct API, or at least I will be more careful and look for the relationships.

I agree that you should not change anything in the database, or change the database structure, but when you have over 2.5k pages on your site and someone has added a widget on to say 200 of those pages, without manually looking through each page to discover them, and without knowing how the database tables relate, how can you get this list?   For example, 

A content creator has added an html widget in various pages across the site to provide some specific functionality, in that html widget, I know that each one contains a comment <!-- David R. 12th Feb 2023: I had to do this for the the new thing that we're working on -->

So how can I use Liferay, or it's APIs to find all pages that have this in an html widget.

NOT ALL Database access is to change the database, we already provide a custom search portlet for this for our in-house portlets, and this required an understanding of the DB.  Sadly, the way that Liferay widgets are used, and where they store data, and how they relate to layouts is a mystery.