Blogs

Increasing Capacity and Decreasing Response Times Using a Tool You're Probably Not Familiar With

David H Nebinger

May 08, 2017

9 Minute Read

Introduction

When it comes to Liferay performance tuning, there is one golden rule:

The more you offload from the application server, the better your performance will be.

This applies to all aspects of Liferay. Using Solr/Elastic is always better than using the embedded Lucene. While PDFBox works, you get better performance by offloading that work to ImageMagick and GhostScript.

You can get even better results by offloading work before it gets to the application server. What I'm talking about here is caching, and one tool I like to recommend for this is Varnish.

According to the Varnish site:

Varnish Cache is a web application accelerator also known as a caching HTTP reverse proxy. You install it in front of any server that speaks HTTP and configure it to cache the contents. Varnish Cache is really, really fast. It typically speeds up delivery with a factor of 300 - 1000x, depending on your architecture.

So I've found the last claim to be a little extreme, but I can say for certain that it can offer significant performance improvement.

Basically Varnish is a caching appliance. When an incoming request hits Varnish, it will look at in it's cache to see if it has been rendered before. If it isn't in the cache, it will pass the request to the back end and store the response (if possible) in the cache before returning the response to the original requestor. As additional matching requests come in, Varnish will be able to serve the response from the cache instead of sending it to the back end for processing.

So there are two requirements that need to be met to get value out of the tool:

The responses have to be cacheable.
The responses must take time for the backend to generate.

As it turns out for Liferay, both of these are true.

So Liferay can actually benefit from Varnish, but we can't just make such a claim, we'll need to back it up w/ some testing.

The Setup

To complete the test I set up an Ubuntu VirtualBox instance w/ 12G of memory and 4 processors, and I pulled in a Liferay DXP FP 15 bundle (no performance tuning for JVM params, etc). I also compiled Varnish 4.1.6 on the system. For both tests, Tomcat will be running using 8G and Varnish will also be running w/ an allocation of 2G (even though varnish is not used for the Tomcat test, I think it is "fairer" to keep the tests as similar as possible).

In the DXP environment I'm using the embedded ElasticSearch and HSQL for the database (not a prod configuration but both tests will have the same bad baseline). I deployed the free Porygon theme from the Liferay Marketplace and set up a site based on the theme. The home page for the Porygon demo site has a lot of graphics and stuff on it, so it's a really good site to look at from a general perspective.

The idea here was not to focus on Liferay tuning too much, to get a site up that was serving a bunch of mixed content. Then we measure a non-Varnish configuration against a Varnish config to see what impact Varnish can have in performance terms.

We're going to test the configuration using JMeter and we're going to hit the main page of the Porygon demo site.

Testing And Results

JMeter was configured to use 100 users and loop for 20 times. Each test would touch on the home page, the photography, science and review pages and would also visit 3 article pages. JMeter was configured to retrieve all related assets synchronously to exagerate the response time from the services.

Response Times

Let's dive right in with the response times for the test from the non-Varnish configuration:

Response Times Without Varnish

The runtime for this test was 21 minutes, 20 seconds. The 3 article pages are the lines near the bottom of the graph, the lines in the middle are for the general pages w/ the asset publishers and all of the extra details.

Next graph is the response times from the Varnish configuration:

Response Times With Varnish

The runtime for this test was 11 minutes, 58 seconds, a 44% reduction in test time, and it's easy to see that while the non-Varnish tests seem to float around the 14 second mark, the Varnish tests come in around 6 seconds.

If we rework the graph to adjust the y-axis to remove the extra whitespace we see:

Response Times With Varnish

The important part here for me was the lines for the individual articles. In the non-Varnish test, /web/porygon-demo/-/space-the-final-frontier?inheritRedirect=true&redirect=%2Fweb%2Fporygon-demo shows up around the 1 second response time, but with varnish it hovers at the 3 second response time. Keep that in mind when we discuss the custom VCL below.

Aggregate Response Times

Let's review the aggregate graphs from the tests. First the non-Varnish graph:

Aggregate Without Varnish

This reflects what we've seen before; individual pages are served fairly quickly, pages w/ all of the mixed content take significantly longer to load.

And the graph for the Varnish tests:

Aggregate With Varnish

At the same scale, it is easy to see that Varnish has greatly reduced the response times. Adjusting the y-axis, we get the following:

Aggregate With Varnish

Analysis

So there's a few parts that quickly jump out:

There was a 44% reduction in test runtime reflected by decreased response times.
There was a measurable (but unmeasured) reduction in server CPU load since Liferay/Tomcat did not have to serve all traffic.
Since work is offloaded from Liferay/Tomcat, overall capacity is increased.
While some response times were greatly improved by using Varnish, others suffered.

The first three bullets are easy to explain. As Varnish is able to cache "static" responses from Liferay/Tomcat, it can serve those responses from the cache instead of forcing Liferay/Tomcat to build a fresh response every time. Having Liferay/Tomcat rebuild responses each time requires CPU cycles, so returning a cached response reduces the CPU load. And since Liferay/Tomcat is not busy rebuilding the responses that now come from the cache, Liferay/Tomcat is free to handle responses that cannot be cached; basically the overall capacity of Liferay/Tomcat is increased.

So you might be asking that, since Varnish is so great, why do the single article pages suffer from a response time degradation? Well, that is due to the custom VCL script used to control the caching.

The Varnish VCL

So if you don't know about Varnish, you may not be aware that caching is controlled by a VCL (Varnish Configuration Language) file. This file is closer to a script than it is a configuration file.

Normally Varnish operates by checking the backend response cache control headers; if a response can be cached, it will be, and if the response cannot be cached it won't. The impact of Varnish is directly related to how many of the backend responses can be cached.

You don't have to rely solely on the cache control headers from the backend to determine cacheability; this is especially true for Liferay. Through the VCL, you can actually override the cache control headers and make some responses cachable that otherwise would not have been and make other responses uncacheable even when the backend says it is acceptable.

So now I want to share the VCL script used for the test, but I'll break it up into parts to discuss the reasons for the choices that I made. The whole script file will be attached to the blog for you to download.

In the sections below comments have been removed to save space, but in the full file the comments are embedded to explain everything in detail.

Varnish Initialization

probe company_logo {
  .request =
    "GET /image/company_logo HTTP/1.1"
    "Host: 192.168.1.46:8080"
    "Connection: close";
  .timeout = 100ms;
  .interval = 5s;
  .window = 5;
  .threshold = 3;
}

backend LIFERAY {
  .host = "192.168.1.46";
  .port = "8080";
  .probe = company_logo;
}

sub vcl_init {
  new dir = directors.round_robin();
  dir.add_backend(LIFERAY);
}

So in Varnish you need to declare your backends to connect to. In this example I've also defined a probe request used to verify health of the backend. For probes it is recommended to use a simple request that results in a small response; you don't want to overload the system with all of the probe requests.

Varnish Request

sub vcl_recv {
  ...
  if (req.url ~ "^/c/") {
    return (pass);
  }

  if (req.url ~ "/control_panel/manage") {
    return (pass);
  }
  ...
  if (req.url !~ "\?") {
    return (pass);
  }
  ...
}

The request handling basically determines whether to hash (lookup request from the cache) or pass (pass request directly to backend w/o caching).

For all requests that start with the "/c/..." URI, we pass those to the backend. They represent request for /c/portal/login or /c/portal/logout and the like, so we never want to cache those regardless of what the backend might say.

Also any control panel requests are also passed directly to the backend. We wouldn't want to accidentally expose any of our configuration details now would we?

Otherwise the code is trying to force hashing of binary files (mp3, image, etc) if possible and conforms to most average VCL implementations.

The last check of whether the URL contains a '?' character, well I'll be getting to that later in the conclusion...

Varnish Response

sub vcl_backend_response {

  if (bereq.url ~ "^/c/") {
    return (deliver);
  }
  
  if ( bereq.url ~ "\.(ico|css)(\?[a-z0-9=]+)?$") {
    set beresp.ttl = 1d;
  } else if (bereq.url ~ "^/documents/" && beresp.http.content-type ~ "image/*") {
    if (std.integer(beresp.http.Content-Length,0) < 10485760 ) {
      if (beresp.status == 200) {
        set beresp.ttl = 1d;
        unset beresp.http.Cache-Control;
        unset beresp.http.set-cookie;
      }
    }
  } else if (beresp.http.content-type ~ "text/javascript|text/css") {
    if (std.integer(beresp.http.Content-Length,0) < 10485760 ) {
      if (beresp.status == 200) {
        set beresp.ttl = 1d;
      }
    }
  }
  ...
}

The response handling also passes the /c/ type URIs back to the client w/o caching.

The most interesting part of this section is the testing for content type and altering caching as a result. Normally VCL rules will look for some request for "/blah/blah/blah/my-javascript.js" by checking for the extension as part of the URI.

But Liferay really doesn't use these standard extensions. For example, with Liferay you'll see a lot of requests like /combo/?browserId=other&minifierType=&languageId=en_US&b=7010&t=1494083187246&/o/frontend-js-web/liferay/portlet_url.js&.... These kinds of requests do not have the standard extension on it so normal VCL matching patterns would discard this request as uncacheable. Using the VCL override logic above, the request will be treated as cacheable since it is just a request for some JS.

Same kind of logic applies to the /documents/ URI prefix; anything w/ this prefix is a fetch from the document library. Full URIs are similar to /documents/24848/0/content_16.jpg/027082f1-a880-4eb7-0938-c9fe99cefc1a?t=1474371003732. Again since it doesn't end w/ the standard extension, the image might not be cached. The override rule above will match on all /documents/ prefix and content types of images and treat the request as cacheable.

Conclusion

So let's start with the easy ones...

Adding Varnish can decrease your response times.
Adding Varnish can reduce your server load.
Adding Varnish can increase your overall capacity.

Honestly I was expecting that to be the whole list of conclusions I was going to have to worry about. I had this sweet VCL script and performance times were just awesome. As a final test, I tried logging into my site with Varnish in place and, well, FAIL. I could log in, but I didn't get the top bar or access to the left or right sidebars or any of these things.

I realized that I was actually caching the response from the friendly URLs and, well, for Liferay those are typically dynamic pages. There is logic specifically in the theme template files that change the content depending upon whether you are logged in or not. Because my Varnish script was caching the pages when I was not logged in, after I logged in the page was coming from the cache and the necessary stuff I needed was now gone.

I had to add the check for the "?" character in the requests to determine if it was a friendly URL or not. If it was a friendly URL, I had to treat those as dynamic and had to send them to the backend for processing.

This leads to the poor performance, for example, on the single article display pages. My first VCL was great, but it cached too much. My addition for friendly URLs solved the login issue but now prevented caching pages that maybe could be pages so I swung too far again, but since the general results were still awesome I just went with what I had.

Now for the hard conclusions...

Adding Varnish requires you to know your portal.
Adding Varnish requires you to know your use cases.
Adding Varnish requires you to test all aspects of your portal.
Adding Varnish requires you to learn how to write VCL.

The VCL really isn't that hard to wrap your head around. Once you get familiar with it, you'll be able to customize the rules to increase your cacheability factor without sacraficing the dynamic nature of your portal. In the attached VCL, we add a response header for a cache HIT or MISS, and this is quite useful for reviewing the responses from Varnish to see if a particular response was cached or not (remember the first request will always be a MISS, so check again after a page refresh).

I can't emphasize the testing enough though. You want to manually test all of your pages a couple of times, logged in and not logged in, logged in as users w/ different roles, etc., to make sure each UX is correct and that you're not bleeding views that should not be shared.

You should also do your load testing. Make sure you're getting something out of Varnish and that it is worthwhile for your particular situation.

Note About SSL

Before I forget, it's important to know that Varnish doesn't really talk SSL, nor does it talk AJP. If you're using SSL, you're going to want to have a web server sitting in front of Varnish to handle SSL termination.

And Varnish doesn't talk AJP, so you will have to configure for HTTP connections from both the web server and the app server.

This points toward the reasoning behind my recent blog post about configuring Liferay to look at a header for the HTTP/HTTPS protocols. In my environ I was terminating SSL at Apache and needed to use the HTTP connectors to Varnish and again to Tomcat/Liferay.

Although it was suggested in a few of the comments that separate connections could be used to facilitate the HTTP and HTTPS traffic, etc., those options would defeat some of the Varnish caching capabilities. You'd either have separate caches for each connection type (or perhaps no cache on one of them) or other unforseen issues. Being able to route all traffic through a single pipe to Varnish will ensure Varnish can cache the response regardless of the incoming protocol.

Update - 05/16/2017

Small tweak to the VCL script attached to the blog, I added rules to exclude all URLs from /api/* from being cached. Those are basically your web service calls and rarely would you really want to cache those details. Find the file named localhost-2.vcl for the update.

17 Comments

Please sign in to comment.

Do you know by chance if it is perfectly safe to cache the whole theme? (Of course, theme deployments necessitate clearing the cache, no big deal)

It bugs me somewhat that Liferay appends a lot of stuff to various urls, e.g. the browserId:

aui.css?browserId=firefox&themeId=...

Currently I am just ignoring those parameters and caching theme css, js, ... unconditionally. I remember, I did a quick look into the code once and came to the conclusion that it is safe. Especially with 7.0+, since the theme is build now with gulp. But I might have missed something.

I usually use nginx because I know it quite well and it has the advantage that it can handle ssl. With the Varnish solution I would still need to use an nginx causing an extra hop. And I don't need the flexibility of Varnish, nginx is a quite powerful reverse proxy and cache server too. Of course, in case you have tested it and Varnish beats nginx, I am all ears. :-)