This website uses cookies to ensure you get the best experience. Learn More.
Maintaining Search with High-Availability: Concurrent and Sync Reindex Modes (BETA) in 7.4 GA/U98+ and 2023.Q4
Search is a fundamental capability of any modern sites to discover content and products (from now on, content).
To do it right, content has to be stored in a special format, optimized for (full-text) search called a search index living inside a search engine, like Elasticsearch. Content like Web Content Articles, Object entries and their categories or tags; other types like users, organizations etc. are all indexed in Liferay by default.
The search indexes are not only for serving user searches through a Search Bar though: under the hood, it is also driving many of Liferay’s out-of-the-box applications and features that users interact with over the UI or through headless APIs.
To propagate changes and to make sure that the database and the search index are in sync, Liferay has been offering the ability perform an operation called reindex via Control Panel - Configuration - Search > Index Actions.
Over time, the search index requires maintenance. As Liferay's functionalities evolve, upgrades can introduce changes in how data is indexed, or a failed staging publication or outages in the connection between Liferay and Elasticsearch can also result in stale index data.
Maintaining the integrity of the search index is a challenge. While reindexing is a remedy, its traditional "delete-first & index again" (learn about later) approach in Liferay is resource-intensive, also leading to noticeable downtimes, subsequently impacting the user experience and system operations negatively.
When we approached this complex problem domain, one of the goals was to provide a way to perform a reindex with minimal or zero impact on the searching and indexing capabilities of the live environment to provide business continuity and high-availability.
Starting with DXP 7.4 U98 / DXP 2023.Q4, two new reindex execution modes become available as BETA:
BETA
when Liferay is operating with Elasticsearch as the search engine.
The new reindex modes can come in handy in different scenarios providing better alternatives for administrators to operate and maintain search data in Liferay with high-availability.
To understand the benefits and when to use them, let’s recap first how the Full reindex works (which remains available as the default mode).
Control Panel - Configuration - Search > Index Actions with Execution Modes in 7.4 GA/U98+/2023.Q4.
In a nutshell, the default full reindex mode follows a “delete-first & index again” strategy.
This means that when executing action,
Reindex All Search Indexes: indexes are deleted (erasing all content/data) and then re-created at the beginning of the process and content gets indexed again;
Reindex Individual Types (ie, users): documents corresponding to the selected type are deleted from the indexes at the beginning of the process and then content will be indexed again.
Because of the delete-first behavior, this mode is disruptive resulting in down-time or missing results while the operation is running.
Despite the known downsides, this mode does not go away: it remains the default as not all deployments are impacted equally by the negative consequences of the disruptive nature, and there are still certain cases (see later) when it is a viable (and sometimes, the only) solution.
This mode is only available with the Reindex All Search Indexes action. Also-known-as Blue/Green reindex. BETA
(With Elasticsearch only)
At the beginning of the process, a second, new (“green”) index is created with the up-to-date storage instructions (aka. field mappings) and content is indexed into it.
Meanwhile, the current, original (“blue”) index will remain in use throughout the whole operation, serving interim searches providing high-availability.
Updates (originating from creating/updating/deleting content or users actions) are sent to both the original and new index at the same time during the operation (this is where the concurrent nature comes from).
Once the new index is populated, the platform deletes the original index and directs requests (both search and write) to the new index.
This mode is available for both Reindex All Search Indexes and Reindex single type actions. Also referred as soft reindex. BETA
In a nutshell, the Sync reindex mode follows an “index again & delete-last” strategy. This mode starts by updating documents in the index without deleting anything. At the end of the process, any stale documents are deleted according to a timestamp field which is populated on all documents starting with DXP 7.4 U90.
timestamp
This comparison is here to help understanding the different modes, their main characteristics and when it is recommended to use them.
Full
Concurrent
Sync
Feature Status
GA
Provides High-Availability
☑
Available with Action: Reindex All Search Indexes
Available with Action: Reindex Single Type
Available with Action: Reindex Spell-Check Dictionaries
Behavior: Index Deleted/Created
Behavior: Field Mappings Updated
Behavior: Documents Updated
Recommended After: Liferay Upgrades
Recommended After: Elasticsearch Upgrades (1)
Recommended After: Connection Outages
Recommended After: Other Uptime Search Issues
(1) From 7.x to 8.x. Technically, a Full reindex is only required when connecting Liferay to a new, empty Elasticsearch cluster. In other cases, when Elasticsearch is upgraded (so the index data from the previous Elasticsearch cluster is also upgraded) currently a Sync reindex is enough.
Concurrent mode requires more resources (primary in the form of disk space) in Elasticsearch. To prevent a situation when Elasticsearch would run of out space, the administrator user is presented with a warning confirmation dialog when hitting reindex if the estimated disk space available in Elasticsearch may not be enough to complete the operation.
Warn dialog with Concurrent reindex mode in Index Actions.
The estimation is only a best-effort calculation due to the internals of Elasticsearch when it comes to index related I/O operations and disk utilization.
It is recommended to first review the sizing of the current Elasticsearch cluster and adjust the configuration as needed to prepare for the potential (altough, temporary) extra load of a Concurrent reindex process.
Besides the Execution Mode selector, the Index Actions layout has received a solid visual revamp:
Now there is a confirmation dialog appearing before executing a reindex to avoid triggering a heavy-operation accidentally:
Confirmation dialog in Index Actions.
When index.on.startup is enabled (not recommended), it is possible to configure the default reindex mode via Control Panel - Configuration - System Settings > Search:Reindex Configuration, defaulting to Full.
index.on.startup
Default Reindex Execution Mode configuration in System Settings.
Via OSGi config file: com.liferay.portal.search.configuration.ReindexConfiguration.config.
com.liferay.portal.search.configuration.ReindexConfiguration.config
Property: defaultReindexExecutionMode="full"
defaultReindexExecutionMode="full"
Learn more.