MDC/NDC Logging in Liferay 7.4

Introduction

Recently I was asked how to implement MDC and/or NDC in Liferay 7.4, and since I thought it might be interesting to a wider audience, I'm going to blog about it here.

Background

In a multithreaded, multiuser environment such as in a web application like Liferay, different threads in the container will be handling different requests, but likely each request is doing something radically different. I could be changing my Search Blueprint for the next holiday cupcake promotion while you might be reading the blog I wrote about it.

Either way, when either the Liferay code or our custom code is logging messages, the messages just get added into the log in the order they are hit in the code, so basically my log messages and your log messages are all mixed together in the logs. For a handful of concurrent users, this is probably not a big problem, but if you imagine 1,000 concurrent users all doing different things and all generating log messages and those messages being woven together in the log, I'm sure you can see that trying to analyze and diagnose a problem from one user in this kind of log would likely be like trying to find a particular needle in a pile of needles.

To allow for meaningful interpretation and analysis of the logs, major logging frameworks have support for MDC or Mapped Diagnostic Context, and/or NDC or Nested Diagnostic Context.

With a diagnostic context, the code is able to add information on a thread-by-thread basis that can be included in a log message. So I, as the developer, can define a diagnostic context that includes the current user's screen name at the thread level along with a log format to include that detail in every message that is logged.

In each of the diagnostic contexts, information is added at the thread level and will remain with the thread until it is cleared, so if I am adding the current user screen name in the diagnostic context at the start of servicing a request, I have to remember to clear it when done otherwise the value I set might be used in the next thread. I also want to clear them up to avoid leaking memory or sensitive details across reuse of threads by the application server.

From here on in I'm going to focus specifically on Log4j2 as that is the logging framework that Liferay uses internally.

Updating Log4j2 Configuration

Liferay 7.4 leverages Log4j2 for the internal logging framework and to get the diagnostic context added to the log messages we'll need to update the log4j2 configuration.

Fortunately for us, Liferay makes this kind of easy. All we need to do is create the $LIFERAY_HOME/tomcat-9.0.56/webapps/ROOT/WEB-INF/classes/META-INF/portal-log4j-ext.xml file with our configuration overrides.

You want to start from Liferay's base file and then make the necessary pattern changes to include either the MDC or NDC you're after. Here's the initial part of the file I am using for NDC:

<?xml version="1.0"?>

<Configuration strict="true">
  <Appenders>
    <Appender name="CONSOLE" type="Console">
      <Layout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5p [%t][%c{1}:%L] %x %m%n" 
        type="PatternLayout" />
    </Appender>

    <Appender filePattern="@liferay.home@/logs/liferay.%d{yyyy-MM-dd}.log" 
        ignoreExceptions="false" name="TEXT_FILE" type="RollingFile">
      <Layout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5p [%t][%c{1}:%L] %x %m%n" 
        type="PatternLayout" />
...

When I'm using the MDC pattern I'd use something like %X{SN} and I'll get only the screen name right before the message.

When I'm using the NDC pattern, I'll get all the NDC context before the message.

So now we know the basics of MDC and NDC plus we know how to reconfigure Liferay to use the updated patterns, but what do we do in the code to leverage them?

MDC vs NDC

First let's start with what they share... Both of these connect data to the current thread and both must be cleaned up when finished. They also both leverage the ThreadContext object to add/remove context values.

The difference is really that NDC values are simple strings, while an MDC is a map of key/value pairs. Since NDC is a little simpler, I'll start showing that one before the MDC.

So I might use an NDC like:

ThreadContext.push("SN=" + user.getScreenName());

or the MDC equivalent of:

ThreadContext.put("SN", user.getScreenName());

On the logging side, we use the %x conversion pattern for the NDC messages, or we use the %X{key} conversion pattern for the MDC messages.

So let's say I want to build a service wrapper around the RoleLocalService so I could identify who is adding roles (kind of like an audit, but I'm just writing to the logs).

I'd use a RoleLocalServiceWrapper and in my addRole() method override I'd have to set up and then clean up my diagnostic context...

NDC Version

Here's what I would do for my addRole() method logging for NDC:

public Role addRole(
    long userId, String className, long classPK, String name,
    java.util.Map titleMap,
    java.util.Map descriptionMap, int type,
    String subtype, ServiceContext serviceContext)
  throws com.liferay.portal.kernel.exception.PortalException {
  
  // set up the context with the user's screen name.
  User user = _userLocalService.getUser(userId);
  ThreadContext.push("SN=" + user.getScreenName());
  
  // let the super add the role
  Role addedRole = super.addRole(
    userId, className, classPK, name, titleMap, descriptionMap, type,
    subtype, serviceContext);
  
  _log.info("Added role " + name);

  // remove the screen name we pushed
  ThreadContext.pop();
  
  return addedRole;
}

With this code in place, any time a user action invoked the addRole() call (and using the correct log4j2 config from the previous section), I'd see a message like:

2022-02-14 06:08:03.663 INFO  [main][MyRoleLocalServiceWrapper:134] [SN=omniadmin] 
  Added role NewRole

It starts to get ugly as I might add additional NDC context for current layout, current site, etc, as all of those context strings would be tacked onto my log message; I can't really be selective about what to include for the NDC usage.

MDC Version

One feature of the MDC is that we can be selective about what we want on a log message. In the log4j2.properties file example above, the MDC conversion pattern includes %X{SN} which basically says to use the value of the SN in the thread context map in the log message.

The MDC code that I would use is:

public Role addRole(
    long userId, String className, long classPK, String name,
    java.util.Map titleMap,
    java.util.Map descriptionMap, int type,
    String subtype, ServiceContext serviceContext)
  throws com.liferay.portal.kernel.exception.PortalException {
  
  // set up the context map with the user's screen name.
  User user = _userLocalService.getUser(userId);
  ThreadContext.put("SN", user.getScreenName());
  
  // let the super add the role
  Role addedRole = super.addRole(
    userId, className, classPK, name, titleMap, descriptionMap, type,
    subtype, serviceContext);
  
  _log.info("Added role " + name);

  // remove the screen name we stored earlier
  ThreadContext.remove("SN");
  
  return addedRole;
}

And my log file message will now look like:

2022-02-14 06:08:03.663 INFO  [main][MyRoleLocalServiceWrapper:134] [omniadmin] Added
  role NewRole

Now it doesn't matter what is in the thread context map, only the SN key will be included in the log because that's all the conversion pattern includes.

Manual Version

Some developers out there might be asking themselves, well what the heck is wrong with just changing the code so the log message just reads:

_log.info("[" + user.getScreenName() + "] Added role " + name);

This absolutely creates the same message as we generated for the MDC example, but we didn't have to change the logging config and we didn't have to worry about the ThreadContext setup or cleanup.

The issue is in any messages that might actually be logged by Liferay's RoleLocalServiceImpl's addRole() method. Since you're not changing all logging messages to look up the user and tack their screen name at the end of every message, those other messages will not have the screen name on them.

Using the MDC or NDC option, any message logged between the two ThreadContext method calls will include the users screen name in the log, that includes the RoleLocalServiceImpl class, any ModelListener<Role> classes that might be logging, etc.

So sure, it can be more work to add the MDC/NDC handling, but I hope you can see the benefits of including the diagnostic contexts to analyze the flow of control from a single thread through the addRole() call.

So Which One Wins?

Manual is clearly out, but for choosing between the MDC and the NDC I think it depends upon who is in control of what will be logged...

The layout pattern in the log4j2.properties (or XML or YAML or whatever you use) is going to be fixed, right?

If I'm the system admin and I want to control what is going into the log, I might want to use MDC and the %X{SN} pattern to only include the screen name in the logs. If the developers want to define a new value, say Site that they want to include, they'll need to come to me and convince me to add it and maybe my pattern would become %X{SN} %X{Site}.

On the other hand, if I don't want to be bothered with changing the log patterns every time a developer thinks they need some new value to be added and reported in the log message, I might want to use NDC and the %x pattern. The developers will be responsible if they put too much junk in there, but after all I did put the responsibility on their shoulders by using NDC. At any time the developers could do a ThreadContext.push("Site=blah") and that value would automatically start showing up in the logs along with the SN.

Where to put MDC/NDC Code

In my cheesy example above, I was setting the MDC or NDC around the addRole() message. Maybe this is okay because I'm really only concerned with setting up the context only for messages logged in this particular method.

More often than not, though, you're probably going to want to put your ThreadContext code in places that will offer the widest coverage that you're interested in. Some prime examples are:

  • A PortletFilter implementation - here you could set up the MDC/NDC around interactions with a specific portlet and not have it across everything in the logs.
  • A Servlet Filter implementation - here you're looking more at the full request handling scope, so every message generated in a thread while servicing the request would include the diagnostic context. This can be a pretty wide net though as you could end up getting called for not just page rendering and actions, but also for web service calls, internal cluster messaging, etc.

I prefer to look at a ServicePreAction/ServicePostAction pair.

Note: ServicePreAction and ServicePostAction are the legacy names from the pre-7.0 days. I use these names because I think it correctly names what they are doing. I'm not sure technically what they are called now, but I doubt it is any more apropos than the ServicePreAction and ServicePostAction.

These actions are invoked on the way in to a new service call and on the way out of the service call, so it tends to be a great way to set up the context before anything significant happens and then clean it up after the service completes.

We can define a ServicePreAction in 7.4 as follows:

@Component(
  immediate=true,
  property="key=servlet.service.events.pre",
  service=LifecycleAction.class
)
public class NDCSetupServicePreAction extends Action {
  public void run(HttpServletRequest request, HttpServletResponse response) {
    try {
      // if we are paranoid, we might want to add
      // ThreadContext.clearAll() call here to ensure nothing
      // from the previous thread is still in the context for this
      // thread.
      
      ThemeDisplay themeDisplay = 
        (ThemeDisplay) request.getAttribute(WebKeys.THEME_DISPLAY);
      User user = null;
      
      if (themeDisplay != null) {
        user = themeDisplay.getUser();
      }
      
      if (user == null || user.isDefaultUser()) {
        ThreadContext.push("Guest");
      } else {
        ThreadContext.push(user.getScreenName());
      }
    } catch (Exception e) {
      ThreadContext.push("Unknown");
    }
  }
}

In this pre-action implementation, we grab the ThemeDisplay from the request attributes (should already be set up by the time this is called), get the User from the ThemeDisplay instance, and if the user is not the guest user we push their screen name, otherwise we just push the word Guest onto the ThreadContext.

This is only invoked on the way into the service handling by the portal, so we can't do the cleanup in this class. Instead, we'll include a ServicePostAction in a similar way but to handle the cleanup:

@Component(
  immediate=true,
  property="key=servlet.service.events.post",
  service=LifecycleAction.class
)
public class NDCCleanupServicePostAction extends Action {
  public void run(HttpServletRequest request, HttpServletResponse response) {
    ThreadContext.clearAll();
  }
}

The great part about this implementation for the cleanup, it really won't matter where else in the code that a developer puts a ThreadContext push() or put() call, this cleanup post action will always clean up the thread context before the thread starts to handle a new request.

If you're going to be using either MDC or NDC in your code, I would absolutely recommend including this cleanup service post action to purge the thread context at the end of every service request.

Fish Tagging

Yes, that's right, Fish Tagging. If you've not heard of this before, I'm not talking about catching fish, tagging them, and then throwing them back into the river.

Neil Harrison described this method in the book Pattern Languages of Program Design 3, and basically it is a technique to lightly stamp a unique identifier on a request initiated from the same client transaction.

We can do the same thing with an updated Service Pre Action:

@Component(
  immediate=true,
  property="key=servlet.service.events.pre",
  service= LifecycleAction.class
)
public class FishTaggingServicePreAction extends Action {
  public void run(HttpServletRequest request, HttpServletResponse response) {
    ThreadContext.push(PortalUUIDUtil.generate());
  }
}

This is an extremely lightweight method for putting a unique identifier at the beginning of each log message generated in a service call, allowing you to gather all associated messages and filter out the ones you don't need. It also avoids bleeding PII or other sensitive info in the logs.

You'd still use the NDCCleanupServicePostAction to strip out the UUID from the ThreadContext to prevent mis-tagging messages.

Cleanup, Cleanup, Cleanup

So I can't really emphasize this enough because it is super important to clean up the ThreadContext or else you could end up blowing out memory.

For example, if I only deployed the module with the NDCSetupServicePreAction class and I tested on my local to see that my screen name was included in the log message, I would think it was working and not know about a possible problem that exists.

If I were to just start navigating around the portal page, each navigation would trigger the NDCSetupServicePreAction and each time I would end up invoking ThreadContext.push(user.getScreenName()) adding more and more values to the ThreadContext until eventually things would start failing.

Cleanup is important because it would prevent this kind of potential failure from happening. You don't want to clean up too soon lest you get log messages without the diagnostic context in them, but you absolutely want to do them when you are technically done with them.

Conclusion

So here we are, we've successfully added MDC and/or NDC into our Liferay 7.4 platform. Going forward our log messages will each be stamped with either Guest or the user's screen name. When you're called up to figure out what went wrong with Sue's request, you should be able to find all the log messages stamped with her screen name and filter out the ones that belong to Tom, Dick and Harry.

Not all environments will need MDC and/or NDC support. But for those out there that do need this support, I hope this blog helps you to set it up and use it correctly.

Blogs

In https://issues.liferay.com/browse/LPS-145235 it seems Liferay is going to set some information in the MDC context. See code changes https://github.com/liferay-core-infra/liferay-portal/pull/322/files

Very interesting. We recently used MDC to push information via a logging agent to OpenTelemetry.

At the time I found your article on slf4j (https://liferay.dev/blogs/-/blogs/liferay-logging-is-dead-long-live-liferay-logging-) and we used the slf4j API to push the MDC.

Here you are using ThreadContext, so directly calling the log4j2 API. What is the vision of Liferay on logging ? Use slf4j or use the Liferay logging APIs ? The latter still don't support parametrized log messages.

True enough...

When I'm coding, I continue to use slf4j for logging, this still works even under Liferay's change to Log4j2.

However, for this MDC/NDC blog, I was trying to avoid forcing the use of slf4j and kept to the pure Log4j2 approach. I wanted to focus more on the log4j2 configuration and usage and not get stuck in a battle over which log framework is better.

That said, I would continue using slf4j even for my own MDC/NDC needs, especially because it supports the parameterized log messages, a handy optimization indeed.