Troubleshooting indexation errors using Index Checker

Liferay stores all the information in a relational SQL database, but most of the entities (Web Content, Documents, Users, Blogs Entries, etc.) are also indexed on the Elasticsearch server.

This Elasticsearch server is used by several Liferay functionalities, for example:

  • Liferay search functionality: execute searches by keywords.
  • Asset Publishers: display the contents filtered by some criterion
  • Segmentation: uses the Elasticsearch server to check the user segments.
  • Many other places like user list or commerce functionalities

Every time some information is stored or deleted in the database, Liferay executes its indexers classes to replicate the new information or deletion to the Elasticsearch server.

If your Elasticsearch server is down or not working properly for some time, the updates from Liferay are not stored. So you can end having inconsistent data in the Elasticsearch server.

This inconsistent data can cause Liferay to malfunction.

To resolve all these inconsistencies, you can use Index Checker.

You can download it from:

The Index Checker application allows Liferay administrators to check the index status and solve the detected problems.

It scans both database and Elasticsearch index, displaying:

  • Missing objects: objects that only exist in the database
  • Outdated objects: objects that are out-of-date in the Elasticsearch server
  • Orphan data: deleted objects that only exist in the Elasticsearch server

​​​​​​​
 

How does it work?

To obtain the necessary data, this application compares primary keys, modified dates, status, version, and other related data of both database and index.

After executing the index check, the wrong data is displayed:

  • you will be able to reindex the missing or outdated objects
  • you can also remove orphan data from the index.

​​​​​​​
 

Filtering data:

Before executing the analysis, you can apply several filters:

  • Filter by entities
  • Filter by sites
  • Filter by modified date (objects modified last hour, last week, last month, etc.)

This will help you in case your system has a lot of data.

For example, if you have a lot of web contents in your system: it is possible to reindex web contents of only one site

Additional options

On the configuration page you will be able to:

  • Group the output by sites.
  • Execute site-by-site queries: You can save memory executing, but this will be slower.

In the output, you can also display the correctly indexed information just in case you want to double-check it.

Installation

You can download it from:

To install it, copy it to Liferay deploy folder.

Index Checker 1.0.1 works in Liferay DXP/Portal from 7.1 to 7.4.

Index Checker 0.9 works in Liferay DXP/Portal from 6.2 to 7.0.

  • This version is an old WAR application
  • For more information see 0.9 6.2 and 7.x releases
Blogs

Outdated objects: objects that are out-of-date in the Elasticsearch server - Could you please elaborate what are these objects in the results? How do we get rid of these from the search results?

These objects are those that exists in both Database and Elasticsearch, but they have some metadata that doesn't match, so they need to be reindexed.

The attributes that are used to check if a object is outdated are configured in the portlet configuration.yml file, see: https://github.com/jorgediaz-lr/index-checker/blob/ed325d07c34c44e3cb30ea72ba62b55aaf4579c2/modules/index-checker/src/main/resources/configuration.yml#L9-L10​​​​​​​

So to fix it, just reindex them from the Index Checker portlet should be enough, nevertheless I think sometimes there are some false positives if you are using an outdated Liferay version.

If you consider this is a bug of the Index Checker portlet, please open an issue on github page: https://github.com/jorgediaz-lr/index-checker/issues