Expandos III (Liferay, NoSQL, and MongoDB)

Update: The expando-mongodb-hook plugin is now committed to SVN and available from trunk.

Over the past few months the hype around NoSQL type databases had really been heating up the tech news and blog feeds. There seems to be an overwhelming desire to find scallability solutions that don't seem to be addressed with an RDBMS. What is Liferay to do?

Could Liferay support some form of NoSQL integration? I think so, and I surely couldn't go long without doing something to draw attention to the fact that Liferay is a prime candidate as a viable platform for scalling dynamically via a NoSQL backend.

The most obvious way I could see to leverage a NoSQL solution was with perhaps the most dynamic aspect of the portal, Expandos (and by association Custom Fields).

In order prove the concept of NoSQL with Liferay we decided to write an adapter (using a Liferay Hook pattern) to build a backend for Expando on MongoDB. I had no real idea how long it would take to accomplish but we decided to try. As it turns out it was not too difficult. We now have a fully functional adapter to store all of Liferay's dynamic Expando data in a highly scalable MongoDB. But note that Expandos still support all Liferay permissions. And Custom Fields are still indexed along with the entities anywhere they would be normally. This is a fantastic demonstration of just how extensible Liferay portal really is.

I tested against the version of mongodb that was readily available for Ubuntu 10.04 (1:1.2.2-1ubuntu1.1). I also tried to make sure to support cluster configurations. So check out the portlet.properties file in the plugin as well as the mongodb driver javadocs for what and how to set that up.

I did several small usage tests (none of which were load testing, since this was an informal design) to see that everything was working the right way. I created several Custom Fields on several different entites and tested CRUD opperations to make sure that the data was landing (as well as being removed/updated) where I wanted it, in MongoDB.

Meanwhile, I was also using the Mongo DB command line client mongo to make sure that everything was working from that end. I added a custom field called test to the User entity, and for the first user in the system, I set the value to test value . Here is an example of what we see via mongo:

 

[rotty@rotty-desktop  expando-mongodb-hook]$ mongo
MongoDB shell version: 1.2.2
url: test
connecting to: test
type "exit" to exit
type "help" for help
> show dbs
admin
local
lportal_0
lportal_10135
> use lportal_10135
switched to db lportal_10135
> db.getCollectionNames()
[
	"com.liferay.portal.model.User#CUSTOM_FIELDS",
	"com.liferay.portlet.blogs.model.BlogsEntry#CUSTOM_FIELDS",
	"com.liferay.portlet.documentlibrary.model.DLFileEntry#CUSTOM_FIELDS",
	"system.indexes"
]
> db.getCollection("com.liferay.portal.model.User#CUSTOM_FIELDS").count()
1
> db.getCollection("com.liferay.portal.model.User#CUSTOM_FIELDS").find()
{ "_id" : ObjectId("4d28f318fcfcc08a7855ebe4"), "companyId" : 10135, "tableId" : 17205, "rowId" : 10173, "classNameId" : 10048, "classPK" : 10173, "valueId" : 17207, "test" : "test value" }
> 

So far so good! As you can see the data is landing nicely into the Mongo DB database.

 

While that was a good test I also wanted to make sure that other use cases would work just as well. I decided to revive the First Expando Bank example to see how that would work.

I first had to make a few small API changes in the Velocity template. The updated template is attached. See this article and the follow up for more information on that topic.

After adding some accounts into the First Expando Bank app, the mongo console results looked like this:

 

> db.getCollectionNames()
[
	"AccountsTable#AccountsTable",
	"com.liferay.portal.model.User#CUSTOM_FIELDS",
	"com.liferay.portlet.blogs.model.BlogsEntry#CUSTOM_FIELDS",
	"com.liferay.portlet.documentlibrary.model.DLFileEntry#CUSTOM_FIELDS",
	"system.indexes"
]
> db.getCollection("AccountsTable#AccountsTable").count()
3
> db.getCollection("AccountsTable#AccountsTable").find()
{ "_id" : ObjectId("4d29292abda2c08a05e35e67"), "companyId" : 10135, "tableId" : 17320, "rowId" : 1294543146642, "classNameId" : 17313, "classPK" : 1294543146642, "valueId" : 17336, "balance" : 55, "firstName" : "Ray", "lastName" : "Auge", "modifiedDate" : "Sat Jan 08 2011 22:19:06 GMT-0500 (EST)" }
{ "_id" : ObjectId("4d292945bda2c08a06e35e67"), "companyId" : 10135, "tableId" : 17320, "rowId" : 1294543173086, "classNameId" : 17313, "classPK" : 1294543173086, "valueId" : 17337, "balance" : 120, "firstName" : "Daffy", "lastName" : "Duck", "modifiedDate" : "Sat Jan 08 2011 22:19:33 GMT-0500 (EST)" }
{ "_id" : ObjectId("4d292958bda2c08a07e35e67"), "companyId" : 10135, "tableId" : 17320, "rowId" : 1294543192848, "classNameId" : 17313, "classPK" : 1294543192848, "valueId" : 17338, "balance" : 300, "firstName" : "Mickey", "lastName" : "Mouse", "modifiedDate" : "Sat Jan 08 2011 22:19:52 GMT-0500 (EST)" }
> 

Excelent! It would appear that all our use cases are covered from automatic Custom Fields via the UI to programmatic use in a CMS template.

I'd love to get your feedback about it! Please note that there is currently no rich way to perform queries (à la MongoDB). But with a little enginuity we could probably make that possible.

Blogs
Good to see you considering the NoSQL alteratives for scale. One thing you might consider is to one up the others by using an ODB. In general, they will do a much better job at handling content relations which span partitioning schemes. For example, Versant object database is used by Eidos Media and Factiva ( Dow Jones , Reuters ) CMS systems - handles real-time content feeds from over 9000 sources (WallStreet Journal, Financial Times, etc ). Here is a video which shows how to build an application, make it ditributed, parallel, fault tolerant, optimize integrated cache loading. Little boring at first because it's a detailed tutorial, but about 20 minutes in it gets real interesting.

http://www.blip.tv/file/3285543

Cheers,
-Robert
Very nice Ray!

One other area where offering a NoSQL database as an option is as an storage of the logs for the new audit plugin in 6.0EE
This is fantabulous, Ray.

Your next step would be to make this an integral part of the service builder. How to make service builder inherently support any NoSQL like how it supports Hibernate / JPA. One of those is integration with google BigTable.

All the best for all these endeavors !!

You guys are rocking.

Ahmed Hasan
This is what I have been searching for all the time.

This is really awesome
Good work. I'am currently using MongoDB and I love this thing! With Morphia on top, things feels a little bit like JPA2 with its power of annotations. Additionally Morphia provides a nice way for DAOs.

It would be great to integrate it with ServiceBuilder. Currently, I have no idea how this could be done. Do you have any hints?

In general, how could MongoDB be used as a default DB? Can you lead us into some directions, where to look for the answers? Or how you would do this?
Hey All,

First, thanks for all the nice comments!

@Steffen, @Ahmed, regarding ServiceBuilder integration: I can't see this happening in the near future simply due to the overwhelming number of other features already on our roadmap. BUT, on the other hand, I would not completely count it out, since SB is really so flexible internally. In the meantime, I would not hesitate to work with a bigtable, NoSQL solution paired with SB as is. I would do it by simply letting SB handle the entity modeling and Service tier generation, and then I would implement the DAO myself and have my Service tier call this custom DAO instead of the generated one. This would still save a significant amount of work and still provide all the features like generated web services and allow integration with all the nice Liferay framework APIs like permissions and assets.
Note, if you would like to see integration with SB as an option sooner than later, I would suggest creating a feature request ticket in JIRA and vote it up.

The sad part is there there is no standard API (like Hibernate provides for SQL) so we would have to pick and choose one or two best of breed products to base on.

I have also thought that what we really might need is to support REST end points as SB DataSource. This would potentially allow us to connect to any type of solution supporting REST.
Thanks Ray, Sounds good to me. I'll update the results here and how easy/difficult it is to do this with morphias DAO support for mongodb.
Please do as I'm very interested in how it turns out.

Also, it would be a very great Liferay Live topic showing how it can be done. I expect it might generate a whole lot of interest.
Hi Ray, so called polyglot persistence might help to the expando scaling issue. http://www.youtube.com/watch?v=fI4L6K0MBVE&feature=player_detailpage#t=199s

It's one of the hidden aces of Spring Data project http://www.springsource.org/spring-data and it could basically substitute the expando data model.
Thanks for this. I took a look and it's definitely interesting.
Hey Ray - just FYI I did some scaling/performance measurements and presented my findings @ OSCON - others may be interested: http://www.oscon.com/oscon2011/public/schedule/detail/21536 (there is a link to the presentation with results)
Hi Ray,
Just wanted to check with any body implemented Cassandra with Liferay.
Our aim is to use Cassandra as backend for lifreay application.
Please share some thoughts on this to move forward.
Thanks,
Venkat
We actually have it in our minds to create Cassandra adapter(s) for dealing with several of our storage APIs, particularly:

com.liferay.portlet.documentlibrary.store.Store
com.liferay.portlet.dynamicdatamapping.storage.StorageEngine
com.liferay.portlet.expando.*
.. However, there is no definitive date for this. We would welcome any good community based implementations. Otherwise, as for general use of Cassandra as a backend store for your custom data is totally great idea. Go for it.
Ray, Thanks for your reply...

We are looking at Using Cassandra for both custom and liferay application usage(all tables roles, permissions etc..), i.e running liferay portal with cassandra.

Could you please throw some options which we can achive this functinality.

One way i have seen - overidding specific(expanpo) service implementation through hooks.

How can we use cassandra as full liferay Database storage option? Please share some ideas.

Thanks,
Venkat
I highly doubt you'd succeed in doing that. The portal relies quite heavily on relational database behaviors (such as being ACID). Also, it really would be of little gain in going to that effort in the first place. Most of the data in Liferay is in fact relational and trying to map that to NoSQL for the sake of it would be useless in my mind.

However, I acknolwedge that using a NoSQL solution for several of Liferay's data scenarios is appropriate, but not as a general replacement.
Ray , Are we still can not replace any relation database with NoSQL database to store Liferays out of box tables and data.(Liferay's tables includes : Layout and its sub tables etc..)
Oops !. I have seen an error when I have tried posting a message , but after some time I am seeing my message got posted several times.
Fixed!

Re question: Not yet! It will take lots of design consideration to make that possible. At least we have to isolate features more than they are now in order to even conceive doing that. But there is hope (just not short term).
HI Venkat ,

have achieved in implementing cassandra in liferay ?

if so can you pls share me the steps ? or give me some ideas or suggestions to move forward .
I am trying to develop with (and possibly extend) the Mongo hook. If anyone has any technical resources on setting it up......