Introducing Portal Sandbox: Improve Portal Resiliency

We’re excited to introduce the Portlet Sandbox plugin which can greatly improve portal resiliency and stability. This new plugin does so by isolating high-traffic or resource-hungry portlets. It prevents unstable portlets from crashing portal JVM (Java Virtual Machine). Memory leaks or other stability issues of a single portlet or a group of portlets do not terminate the entire portal, instead their services will be offline for a short period of time and system will automatically recover by restarting or disabling the offending components. With Portlet Sandbox you improve your portal’s resiliency against things like misbehaving custom portlets. Enterprises with multiple development teams deploying to a single instance can reduce their impacts on one another by isolating their portlets to their own sandboxes.


The Portlet Sandbox has a very similar concept comparing to WSRP. WSRP allows us to treat portlets as enterprise services, accessing them via web services. However, WSRP has excessive overhead due to request marshalling, single sign-on, and a variety of other factors. The Portlet Sandbox runs isolated portlets in different JVMs on the local machine. It ensures the lowest communication level possible to reduce overhead. The communication between the MPP (Master Portal Process) and PSC (Portlet Sandbox Container) processes are implemented by a private RPC (Remote Procedure Call) framework over pipes (POSIX mkfifo) or sockets (Windows). The communication utilizes a private binary protocol, there is no SOAP overhead as WSRP does. Since all JVM processes run in the same machine, static resource (javascript, css, etc.) are served directly by the MPP, only the portlets accessing are isolated by the Portlet Sandbox. By breaking the entire portal into multiple JVM processes, each JVM can have a smaller heap size which eases the GC (Garbage Collector) overhead and improve the system memory utilization.


In cluster setup, each cluster node can be independently configured for the Portlet Sandbox. The MPP and PSCs on the same machine are logically considered as a single traditional cluster node that interacts with other cluster nodes. When requests are dispatched to a cluster node, MPP first takes over, and then delegates to proper PSCs based on the configuration. The combination of cluster and Portlet Sandbox can help the system to scale-out (on multiple machines) and scale-up (on high-end machines).


Would you like to try it out? The Portlet Sandbox plugin is available today in the Liferay Marketplace. It is free for Liferay Portal Enterprise Subscribers. Documentation for the plugin is also available now.


If you’re not yet using Liferay, check out our free 30-day trial or request a quote and see if the Liferay Portal Enterprise Subscription is right for you.

 

 

 

14
Blogs
I hope the overhead of multiple JVMs is less than a single, beefier JVM running WSRP. I am very interested to see how I can incorporate this into our Enterprise SOA.
Very interesting feature! But the fact that we have to convert the configuration of Database Connection from JNDI to the Liferay’s built-in data source to use sandboxing is also a limitation. :-)
This is actually a front end restriction, in the back end we have a workaround for it.
We allow portal properties overriding for SPIs, means you can keep using JDNI database for MPP, but configure your SPIs explicitly to use properties based jdbc configuration during the SPI definition creation(it has to point to the same db as MPP is using).

We will try to expose this ability on UI in next spi-admin plugin release.
A very good step moving forward. Thanks Shuyang!

Does stopping an SPI supposed to return the portlet execution to MPP?
A short answer is yes, it will fallback to MPP on SPI failure.
A longer answer is, it depends on your recovery configuration, the default setting is fallback to MPP. But you can also configure SPI to auto recover for a couple times, if it keeps failing up to your configured times, it will eventually fallback to MPP.

The backend actually has a more flexable reaction suppport on SPI failure. In the future spi-admin release. we may expose more settings, like prevents SPI fallback to MPP.(as if a plugin keeps crashing your SPIs, it may crash your MPP too, you may not want to run it at all to keep your system alive).
If there are slow portlets that takes a lot of time before generate response that slow down a whole page, it's possible to configure max request timeout (so the page it's rendered even if SPI portlets will show an error message) ?
You don't need portal resiliency for that. We have a feature called "server side parallel rendering" for it. In portal.properties, search for properties whose names start with "layout.parallel.render.".
Hello

It sounds very promising but from what I see it doesn't work.

I took clean LR 6.2 sp 8 bundled with Tomcat and followed steps in documentation linked above.

First problem I've noticed is rejection of the license by SPI during first startup:

[10581]12:37:24,777 ERROR [localhost-startStop-1][LicenseManager:?] Corrupt license file. Removing license file {productEntryName=Portal Development, startDate=1410904800000, expirationDate=1413496800000, description=30-Day Trial License, maxHttpSessions=10, owner=XXXXXX, licenseEntryName=Portal Developer, productVersion=6.2 EE, type=developer, accountEntryName=Liferay Trial, version=4}
[10581]12:37:24,778 ERROR [localhost-startStop-1][LicenseManager:?] No binary licenses found

Then it throws a lot exceptions starting with:

[10581]12:41:19,181 WARN [localhost-startStop-1][SpriteProcessorImpl:200] Unable to process jndi:/localhost/html/themes/classic/images/ratings/star_off.png
[10581]12:41:24,481 WARN [localhost-startStop-1][SpriteProcessorImpl:200] Unable to process jndi:/localhost/html/themes/control_panel/images/ratings/star_off.png
wrz 22, 2014 12:41:47 PM org.apache.catalina.core.ApplicationDispatcher invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.IllegalStateException: Page needs a session and none is available
at org.apache.jasper.runtime.PageContextImpl._initialize(PageContextImpl.java:148)
at org.apache.jasper.runtime.PageContextImpl.initialize(PageContextImpl.java:125)
at org.apache.jasper.runtime.JspFactoryImpl.internalGetPageContext(JspFactoryImpl.java:112)
at org.apache.jasper.runtime.JspFactoryImpl.getPageContext(JspFactoryImpl.java:65)
at com.liferay.portal.kernel.servlet.JspFactoryWrapper.getPageContext(JspFactoryWrapper.java:63)
at org.apache.jsp.html.portlet.login.navigation.create_005faccount_jsp._jspService(create_005faccount_jsp.java:420)

Then it gets even more bizarre. When I try to go to the test page created earlier (http://localhost:8080/web/guest/test) I'm redirected to http://localhost:8080/c/portal/license which shows that I have valid 30-day trial license. I can't get to the test page in any way. The page was created before installation of Portlet Sandbox. It contains only "calendar" portlet which I selected to be running on SPI. My suspicion is that SPI thinks there is no license and redirects me to license info page. But MPI has valid license.

When I stop SPI the page is accessible.

On the next start of SPI it logs no license:

[10581]13:15:07,283 ERROR [localhost-startStop-1][LicenseManager:?] No binary licenses found

And the test page is inaccessible.

Can you help me with this?

Regards
Michał
Looks very interesting. However, I'm having an issue after configuring the SPI, during the startup.

Caused by: java.rmi.ConnectException: Connection refused to host: 10.14.151.177; nested exception is:
...
com.liferay.portal.kernel.resiliency.spi.remote.RemoteSPIProxy.init(RemoteSPIProxy.java:121)
at com.liferay.portal.resiliency.spi.service.impl.SPIDefinitionLocalServiceImpl.startSPI(SPIDefinitionLocalServiceImpl.java:238)
... 38 more

It tries to connect using my first IP adresse (10.14.151.177) rather than localhost or 127.0.0.1. Is there a way to configure this?