Take Advantage of Elastic Stack to monitor Liferay DXP

Complement Elasticsearch to get all the potential of Elastic Stack to monitor your installation

As many of you probably know, starting with Liferay DXP, Elasticsearch is the default Search Engine. In fact, by default, an Elasticsearch instance is embedded in Liferay DXP (it’s a good moment to remind everyone that this is not supported for Production environments, where a separate Elasticsearch instance must be created).

In this article we’re going to outline how to use the Elastic Stack in order to monitor Liferay DXP.

The first idea about this article was to create a step by step guide, but I decided to avoid this approach because it'd require very detailed configurations that in the end would be slightly different on each installation, and also because anyone who uses the Elastic Stack should get a minimum knowledge in order to be able to monitor things that are important for his/her particular situation, so the learning process is going to be beneficial.

So what’s the Elastic Stack?

Elastic Stack is a set of products created by Elastic, it’s also known as ELK, which comes from Elasticsearch, Logstash and Kibana.

Elasticsearch is the index engine where information is stored.
Logstash is a program that sends information to Elasticsearch.
Kibana is a server where we can create dashboards and graphics from the information that Logstash stored on Elasticsearch.

In this example we are not going to alter, nor consume the indexes created on Elasticsearch for Liferay DXP, we are going to create our own indexes just for monitoring purposes.

Setting up an Elastic Stack environment.

Kibana is a central service for the ELK stack, so for practical purposes we chose to install it on the same server where Elasticsearch is installed. Also because Logstash has to access Liferay logs, we chose to install a Logstash service on each Liferay node server. Anyway this approach is not mandatory, you can install Kibana in a different server from Elasticsearch and point it to the right address. The following examples are based on a 2 node Liferay cluster (node1 and node2).

Collecting data with Logstash:

The first step for the ELK stack to work is to have Logstash collect data from Liferay logs and send it to Elasticsearch. We can create different pipelines for Logstash in order to define which data it should ingest and how to store it on Elasticsearch. Each pipeline is a json file where you can define three sections:

input → how the data is going to be collected, here we specify the Logstash plugin we are going to use and set its parameters.
filter → where we can alter the data that our plugin has collected before sending it to Elasticsearch
output → it indicates the endpoint where our Elasticsearch is and also the name of the index where it’s going to store the data (if it doesn’t exist, it’ll be created automatically).

We are going to focus on two different pipelines for this article, here is some general information about how to configure pipelines.

Liferay logs pipeline:

This pipeline collects the information from the Liferay logs using the file input plugin. This plugin reads the logs (specified in the input phase) and parses each line according to some conditions. In our case that parsing happens during the filter phase, where we tokenize the log message to extract the log level, java class, time, and all the data we want to extract from each log line. Finally, we choose in which index we want to store the extracted data using the output phase:

input {
   file {
       path => "/opt/liferay/logs/liferay*.log"
       start_position => beginning
       ignore_older => 0
       type => "liferaylog"
       sincedb_path => "/dev/null"
       codec =>  multiline {
           pattern =>  "^%{TIMESTAMP_ISO8601}"
           negate => true
           what => previous
       }
   }
}

filter {
   if [type] == "liferaylog" {
       grok {
           match => { "message" => "%{TIMESTAMP_ISO8601:realtime}\s*%{LOGLEVEL:loglevel}\s*\[%{DATA:thread}\]\[%{NOTSPACE:javaclass}:%{DATA:linenumber}\]\s*%{GREEDYDATA:logmessage}" }
           tag_on_failure => ["error_message_not_parsed"]
           add_field => { "hostname" => "LIFERAY_SERVER_HOSTNAME"}

       }
       date {
           match => [ "realtime", "ISO8601" ]
           timezone => "UTC"
           remove_field => ["realtime"]
       }
   }
}

output {
   if [type] == "liferaylog" {
       elasticsearch {
           hosts => ["ELASTIC_SERVER_HOSTNAME:9200"]
           index => "logstash-liferay-log-node1-%{+YYYY.MM.dd}"
       }
   #stdout { codec => rubydebug }
   }
}

JVM statistics pipeline:

The previous plugin is installed by default when we install Logstash, but the plugin we'll use to take JVM statistics (logstash-input-jmx) has to be installed manually.

When using this plugin, we should tell Elasticsearch that the information this plugin sends to a particular index has a decimal format, otherwise Elasticsearch will infer it based on the first data it‘ll receive, which could be interpreted as a long value.

To configure this we can execute a simple CURL to set up this information. We’ll make a HTTP call to Elasticsearch with some JSON data to tell Elasticsearch that all indexes beginning with “logstash-jmx-node” will treat its values as doubles. We just need to do this once and then Elasticsearch will know how to deal with our JMX data:

curl -H "Content-Type: application/json" -XPUT ELASTIC_SERVER_HOSTNAME:9200/_template/template-logstash-jmx-node* -d '
{
"template" : "logstash-jmx-node*",
"settings" : {
"number_of_shards" : 1
},
"mappings": {
"doc": {
"properties": {
"metric_value_number" : {
"type" : "double"
}
}
}
}
}'

Before using this plugin, we will also need to enable the JMX connection in the Liferay JVM. For example, if running on Tomcat, we can add this on the setenv.sh:

CATALINA_OPTS="$CATALINA_OPTS -Djava.rmi.server.hostname=LIFERAY_SERVER_HOSTNAME -Dcom.sun.management.jmxremote.port=5000 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"

To set up this plugin we need to create two different configuration files, the usual Logstash pipeline configuration and a JMX pipeline configuration which location is specified in the Logstash pipeline, in the jmx node:

Logstash pipeline: here we specify where the jmx configuration file (/opt/logstash/jmx) is and which index we are going to use.

input {
   jmx {
       path => "/opt/logstash/jmx"
       polling_frequency => 30
       type => "jmx"
       nb_thread => 3
   }
}
filter {
   mutate { add_field => { "node" => "node1" } }
}
output {
   if [type] == "jmx" {
       elasticsearch {
           hosts => [ "ELASTIC_SERVER_HOSTNAME:9200" ]
           index => "logstash-jmx-node1-%{+YYYY.MM.dd}"
       }
   }
}

The JMX Configuration file: where we decide which JMX statistics we want to collect and send to the index:

{
   "host" : "localhost",
   "port" : 5000,
   "alias" : "reddit.jmx.elasticsearch",
   "queries" : [
       {
           "object_name" : "java.lang:type=Memory",
           "object_alias" : "Memory"
       }, {
           "object_name" : "java.lang:type=Threading",
           "object_alias" : "Threading"
       }, {
           "object_name" : "java.lang:type=Runtime",
           "attributes" : [ "Uptime", "StartTime" ],
           "object_alias" : "Runtime"
       },{
           "object_name" : "java.lang:type=GarbageCollector,name=ParNew",
           "object_alias" : "ParNew"
       },{
           "object_name" : "java.lang:type=GarbageCollector,name=ConcurrentMarkSweep",
           "object_alias" : "MarkSweep"
       },{
           "object_name" : "java.lang:type=OperatingSystem",
           "object_alias" : "OperatingSystem"
       },{
           "object_name" : "com.zaxxer.hikari:type=Pool (HikariPool-1)",
           "object_alias" : "Hikari1"
       },{
           "object_name" : "com.zaxxer.hikari:type=Pool (HikariPool-2)",
           "object_alias" : "Hikari2"
       },{
           "object_name" : "Catalina:type=ThreadPool,name=\"http-nio-8080\"",
           "object_alias" : "HttpThread"
       },{
           "object_name" : "java.lang:type=MemoryPool,name=Metaspace",
           "object_alias" : "Metaspace"
       },{
           "object_name" : "java.lang:type=MemoryPool,name=Par Eden Space",
           "object_alias" : "Eden"
       },{
           "object_name" : "java.lang:type=MemoryPool,name=CMS Old Gen",
           "object_alias" : "Old"
       },{
           "object_name" : "java.lang:type=MemoryPool,name=Par Survivor Space",
           "object_alias" : "Survivor"
   }]
}

Monitoring with Kibana:

Once all this information is being processed and indexed, it's time to create dashboards and visualizations on Kibana.

First we have to point Kibana to the Elasticsearch index whose data we are going to consume. We’ll use the kibana.yml configuration file for this purpose.

Logtrail:

We are also going to install a plugin (logtrail) that allows us to see the logs, in a similar way a tail instruction does, via Kibana’s UI. This way, we can share logs with developers, sysadmins, devops, project managers… without having to give all those people access to the actual server where the logs are.

How to create visualizations and dashboards

Once we have pointed Kibana to the index and installed Logtrail, then we can start creating dashboards and visualizations in Kibana. To create dashboards and visualizations is easy in Kibana, we just need to use the UI. The steps we need to follow to create a visualization are:

Indicate which index patterns we are going to deal with. In our case we wanted to use JMX and log information from two separate Liferay nodes. In our example:

logstash-jmx-node1-*

logstash-liferay-log-node1-*

logstash-jmx-node2-*
logstash-liferay-log-node2-*

Create a search query in Kibana using Lucene queries. In this example we’re retrieving Process and System CPU Loads retrieved via JMX:

metric_path: "reddit.jmx.elasticsearch.OperatingSystem.ProcessCpuLoad" || metric_path: "reddit.jmx.elasticsearch.OperatingSystem.SystemCpuLoad"

Create different visual components to show that information.There are different kind of visualizations we can create, like an histogram showing the cpu usage we have recorded using the JMX plugin:

Or a table counting the times a java class appears in the log with the ERROR loglevel:
Create dashboards that group all the visualizations you want. Dashboards are collections of visualizations:

To conclude:

The ELK stack is a very powerful way to monitor and extract information of running Liferay instances, and in my opinion the main advantages of this system are:

Democratize logs: so everybody can access them, not only sysadmins. The logs can also be searched easily.
Historical of JMX stats, so it’s possible to know how the CPU, memory, database pools were on a given time.

I hope I have convinced you that ELK can help you monitor your Liferay DXP installations, and you feel you're ready to start creating the monitoring system that covers your needs.