Health Check

Systematically inspect your system

If you've ever repeatedly gone through a checklist, with programming knowledge: How often did you have to repeat the steps until you automated them? Or, without programming knowledge: How often did you repeat them until you wished for them to be automated?

I'm working in Liferay's Sales Engineering team. And in this role, I constantly set up new servers and install demos - and I'd like to know that everything is "to my standard" (read that as "preferences"), even on demos that we share within the team. And my preferences can be different from everyone else's (and they likely are). It begins with the tedious "as long as my browser is open, I want my session to extend". It's for a demo, and there's no need for extra-high-security. I need comfort. And this is only item 1 from my checklist - over time, it grew a lot longer...

So, my first attempt at automating my checklist was for my preferred relaxed demo configuration, and it served me well: Quick to install and signals immediately when something is not according to my personal expectation.

Then I found that Liferay Commerce implements "Health Checks", e.g. to make sure that Currencies and required roles are found in the database. And I didn't like that this was implemented for Commerce only - I want it promoted to top level: This is a feature that really makes sense everywhere. And the individual health checks should be easy to write and extend.

The last bit that triggered me to do more with the spike was when I ran into LPS-157829: Apparently, configuring a boolean value from properties twice in the same file can result in the value "true,true", which nastily translates to the boolean value false. And here we are again at the beginning of the article: My session didn't extend any more, despite my configuration clearly having the right value (only twice: At the beginning and at the end of my portal-ext.properties). So I could use 15 seconds to fix the issue, or start a project and invest a few hours.

This is something I want to go through exactly once, and never again. So the check for correct declaration of boolean properties (and numerical ones for that) also got into the spike solution. And such a check is important for every single installation - not just for relaxed-security-demos.

So I invested a few hours...

Entering: A spike solution

This is a project that gets exactly the job done that I care for, while not being too hard to extend. It's just not easy to the eye - "Spike solution" is to be understood literally.

As writing this article took longer than I originally intended, the feature list at the time of writing this article already grew quite large, so it really is time to go public with it:

  • Are boolean and numerical properties configured with valid values?
  • Make sure the current instance accessed through https (unless it is localhost)
  • Is there a proper full-text-index?
  • Is the Metaspace memory setting sufficient?
  • Is the currently used host name legal for redirections?
  • Are you still using outdated/deprecated/removed properties?
  • Do you not have a user account with default credentials?
  • Have you implemented any deprecated and/or deactivated services? (This check is totally fake, but demonstrates what could be implemented in such a case)
  • A check for available locales (and them being supported by the underlying platform)
  • An (untested) sample for the Google Cloud FileStore to check the access key
  • A check to make sure your system is configured for the recommended round of hashes for password encoding (PBKDF2)
  • A check for existing users who are still in the database with an outdated method of password hashing
  • If you're using the (default) SimpleFileStore, make sure that there are not too many files in a single directory (that's a limitation of typical file systems), and that there's enough free disk space. Trigger values are configurable.
  • A check for proper JSON-API configuration after an upgrade
  • Check if you're running with a remote (non-sidecar) Elasticsearch installation

as well as wrappers to the built-in Commerce Health Checks. You'll also find a re-implementation of the Demo Health Checks mentioned earlier under the changed interface. And maybe I've forgotten (or implemented) a few more...

I'm closing on a Call for Ideas and submissions:

Are there any settings that you ran into in the past? Some that caused pain that have been avoided if only someone pointed them out explicitly? Any changes or checks that you're applying to every single system that you deal with? Let me know about them - and if you think such checks are valuable, vote on LPS-151937.

Also, if your comfort zone is on the UI side (where mine obviously isn't): The current UI is so ugly that it definitely could use some love. Pull requests welcome. I hope you have you seen the two github links in the text above? Here they are again:


References and acknowledgements:

Related in name, but with a different use case: Louis Guillaume's Health Check
​​​​​​​Illustration: Public domain (CC-0) Stethoscope rendering