Bringing DropWizard Metrics to Liferay 7/DXP

Introduction

So in any production system, there is typically a desire to capture metrics, use them to define a system health check, and then monitor the health check results from an APM tool to preemptively notify administrators of problems.

Liferay does not provide this kind of functionality, but it was functionality that I needed for a recent project.

Rather than roll my own implementation, I decided that I wanted to start from DropWizard's Metrics library and see what I could come up with.

DropWizard's Metrics library is well known for its usefulness in this space, so it is an obvious starting point.

The Metrics Library

As a quick review, the Metrics library exposes objects representing counters, gauges, meters, timers and histograms. Based upon what you want to track, one of these metric types will be used to store the runtime information.

In addition, there's also support for defining a health check which is basically a test to return a Result, basically a pass/fail, and it is intended to be combined with the metrics as a basis for the result evaluation.

For example, you might define a Gauge for available JVM memory. As a gauge, it will basically be checking the difference between the total memory and used memory. A corresponding health check might be created to test that available memory must be greater than, say, 20%. When available memory drops below 20%, the system is not healthy and an external APM tool could monitor this health check and issue notifications when this occurs. By using 20%, you are giving admins time to get in and possibly resolve the situation before things go south.

So that's the overview, but now let's talk about the code.

When I started reviewing the code, I was initially disheartened to see very little in the way of "design by interface". For me, design by interface is an indicator of how easy or hard it will be to bring the library into the OSGi container. With heavy design by interface, I can typically subclass key implementations and expose them as @Components, and consumers can just @Reference the interfaces and OSGi will take care of the wiring.

Admittedly, this kind of architecture can be considered overkill for a metrics library. The library developers likely planned for the lib to be used in java applications or even web applications, but likely never considered OSGi.

At this point, I really struggled with figuring out the best path forward. What would be the best way to bring the library into OSGi?

For example, I could create a bunch of interfaces representing the clean metrics and some interfaces representing the registries, then back all of these with concrete implementations as @Components that are shims on top of the Drop Wizard Metrics library. I soon discarded this because the shims would be too complicated casting things back and forth from interface to metrics library implementation.

I could have cloned the existing DropWizard Metrics GitHub repo and basically hacked it all up to be more "design by interface". The problem here, though, is that every update to the Metrics lib would require all of this repeated hacking up of their code to bring the updates forward. So this path was discarded.

I could have taken the Metrics library and used it as inspiration for building my own library. Except then I'd be stuck maintaining the library and re-inventing the wheel, so this path was discarded.

So I settled on a fairly light-weight solution that, I feel, is OSGi-enough without having to take over the Metrics library maintenance.

Liferay Metrics

The path I elected to take was to include and export the DropWizard Metrics library packages from my bundle and add in some Liferay-specific, OSGi-friendly metric registry access.

I knew I had to export the Metrics packages from my bundle since OSGi was not going to provide them and having separate bundles include their own copies of the Metrics jar would not allow for aggregation of the metrics details.

The Liferay-specific, OSGi-friendly registry access comes from two interfaces:

  • com.liferay.metrics.MetricRegistries - A metric registry lookup to find registries that are scoped according to common Liferay scopes.
  • com.liferay.metrics.HeallthCheckRegistries - A health check registry lookup to find registries that are scoped according to common Liferay scopes.

Along with the interfaces, there are corresponding @Component implementations that can be @Reference injected via OSGi.

Liferay Scopes

Unlike in a web application where there is typically like one scope, the application, Liferay has a bunch of common scopes used to group and aggregate details. A metrics library is only useful if it too can support scopes in a fashion similar to Liferay. Since the DropWizard Metrics library supports different metric registries, it was easy to overlay the common Liferay scopes over the registries.

The supported scopes are:

  • Portal (Global) scope - This registry would contain metrics that have no separate scope requirements.
  • Company scope - This registry would contain metrics scoped to a specific company id. For example, if you were counting logins by company, the login counter would be stored in the company registry so it can be tracked separately.
  • Group (Site) scope - This registry would contain metrics scoped to the group (or site) level.
  • Portlet scope - This registry would contain metrics scoped to a specific portlet plid.
  • Custom scope - This is a general way to define a registry by name.

Using these scopes, different modules that you create can lookup a specific metric in a specific scope without having tight coupling between your own modules.

Metrics Servlets

The DropWizard Metrics library ships with a few useful servlets, but to use them you need to be able to add them to your web application's web.xml file. In Liferay/OSGi, instead we want to leverage the OSGi HTTP Whiteboard pattern to define an @Component that gets automagically exposed as a servlet.

The Liferay Metrics bundle does just that; it exposes five of the key DropWizard servlets, but they use OSGi facilities and the Liferay-specific interfaces to provide functionality.

The following table provides details on the servlets:

Servlet Context Description
CPU Profile /o/metrics/gprof Generates and returns a gprof-compatible file of profile details.
Health Check /o/metrics/health-checks Runs the health checks and returns a JSON object with the results. Takes two arguments, type (for the desired scope) and key (for company or group id, plid or custom scope name).
Metrics /o/metrics/metrics Returns a JSON object with the metrics for the given scope. Takes same two arguments, type and key, as described for the health checks servlet.
Ping /o/metrics/ping Simple servlet that responds with the text "pong". Can be used to test that a node is responding.
Thread Dump /o/metrics/thread-dump Generates a thread dump of the current JVM.
Admin /o/metrics/admin A simple menu to access the above listed servlets.

The Ping servlet can be used to test if the node is responding to requests. The Metrics servlet can be used to pull all of the metrics at the designated scope and evaluated in an APM for alterting. The Health Check servlet can run health checks defined in code that perhaps needs access to server-side details to evaluate health, but they too can be invoked from an APM tool to evaluate health.

The CPU Profile and Thread Dump servlets can provide useful information to assist with profiling your portal or capturing a thread dump to, say, submit to Liferay support on a LESA ticket.

The Admin portlet, while not absolutely necessary, provides a convenient way to get to the individual servlets.

NOTE: There is no security or permission checks bound to these servlets. It is expected that you would take appropriate steps to secure their access in your environment, perhaps via firewall rules to block external access to the URLs or whatever is appropriate to your organization.

Metrics Portlet

In addition, there is a really simple Liferay MVC portlet under the Metrics category, the Liferay Metrics portlet. This is a super-simple portlet which just dumps all of the information from the various registries. Can be used by an admin to view what is going on in the system, but if used it should be permissioned against casual usage from general users.

Using Liferay Metrics

Now for some of the fun stuff...

The DropWizard Metrics Getting Started page shows a simple example for measuring pending jobs in a queue:

private final Counter pendingJobs = metrics.counter(name(QueueManager.class, "pending-jobs"));

public void addJob(Job job) {
    pendingJobs.inc();
    queue.offer(job);
}

public Job takeJob() {
    pendingJobs.dec();
    return queue.take();
}

Our version is going to be different than this, of course, but not all that much. Lets assume that we are going to be tracking the metrics for the pending jobs by company id. We might come up with something like:

@Component(
        immediate = true
)
public class CompanyJobQueue {

    public void addJob(long companyId, Job job) {
        // fetch the counter
        Counter pendingJobs = _metricRegistries.getCompanyMetricRegistry(companyId).counter("pending-jobs");

        // increment
        pendingJobs.inc();

        // do the other stuff
        queue.offer(job);
    }

    public Job takeJob(long companyId) {
        // fetch the counter
        Counter pendingJobs = _metricRegistries.getCompanyMetricRegistry(companyId).counter("pending-jobs");

        // decrement
        pendingJobs.dec();

        // do the other stuff
        return queue.take();
    }

    @Reference(unbind = "-")
    protected void setMetricRegistries(final MetricRegistries metricRegistries) {
        _metricRegistries = metricRegistries;
    }

    private MetricRegistries _metricRegistries;
}

The keys here are that the MetricRegistries is injected by OSGi, and that class is used to locate a specific instance of the DropWizard Metrics registry instance where the metrics can be retrieved or created. Since they can be easily looked up, there is no reason to hold a reference to the metric indefinitely.

In the liferay-metrics repo, there are some additional examples that demonstrate how to leverage the library from other Liferay OSGi code.

Conclusion

So I think that kind of covers it. I've pulled in the DropWizard Metrics library as-is, I've exposed it into the OSGi container so other modules can leverage the metrics, I've provided an OSGi-friendly way to inject registry locators based on common Liferay scopes. There's the exposed servlets which provide APM access to metrics details and a portlet to see what is going on using a regular Liferay page.

The repo is available from https://github.com/dnebing/liferay-metrics, so feel free to use and enjoy.

Oh, and if you have some additional examples or cool implementation details, please feel free to send me a PR. Perhaps the community can grow this out into something everyone can use...