Semantic Search Powered by the Elastic Stack with Liferay Enterprise Search

Learn how to create semantic search experiences through configuration by leveraging built-in models, third-party providers and other Generally Available capabilities that you can unlock with your Liferay Enterprise Search (LES) subscription.

Tibor Lipusz
Tibor Lipusz
6 Minute Read

It was great to be there at Liferay DEVCON in Madrid. Now that the recordings are available on YouTube, I decided to write a companion blog post as a follow-up to my session.

 

In my talk, I was presenting the benefits and the possibilities of Liferay Enterprise Search (LES) focusing on Semantic Search and ML/AI capabilities of the Elastic Stack that you can unlock with LES.

  • Recapping what we have and how for Semantic Search under LES today and what LES is, briefly.
  • Exploring Elastic Stack features which can be unlocked with LES to achieve Semantic Search. We’ll achieve Semantic Search using Generally Available (GA) building blocks of your Liferay-Elastic stack.
  • Touching a little bit on some additional, exciting AI/ML capabilities of the Elastic Stack that might be relevant to your projects in the future.

In this post, I'm sharing some key details from my session. The following GitHub repository includes the related configuration and API call snippets for your convenience.

https://github.com/lipusz/liferay-devcon25

Environment

Make sure you are using the versions below:

  • Liferay DXP: All Liferay DXP Quarterly releases work where Elasticsearch 8 is supported. I was using the release candidate of 2025.Q4 in my local setup. Refer to the compatibility matrix here.
  • Elasticsearch: You must be using Elasticsearch 8, preferably one of the latest releases, like 8.18 or 8.19, but minimum 8.15​​​​​​​Sizing your cluster properly and allocating enough CPU, memory and disk space is very important. In addition, using the built-in models requires dedicated ML-node(s) with enough resources allocated. In the demo. I was using a 2-node Elasticsearch cluster.
  • Kibana: Use the same version as for Elasticsearch.

Note: The capabilities showcased here are not available in Elasticsearch 7 and because it has already reached end-of-maintenance and will reach end-of-life in January 2026, it is a good opportunity to upgrade your stack. Learn more

License

Your Elasticsearch cluster must be activated with an Enterprise license.

With the Liferay Enterprise Search (LES) subscription, self-hosted and PaaS deployments can access Enterprise level Elastic features. The license is provided by Liferay for self-hosted deployments and provisioned automatically for PaaS deployments.

If you have an active LES subscription, you might need to contact your Sales representative to receive the new license file replacing the previous one that granted Platinum level access. If you have purchased LES in the past few weeks, you probably already have the right license file.

If you need to scale your Elasticsearch cluster up and expand it with additional nodes (e.g. to add a dedicated ML-node), please contact Liferay Sales to discuss the billing implications.

Alternatively, for testing,  non-production purposes, you can start a 30-day trial.

The easiest way to activate Elasticsearch is through the License Management in Kibana.

Semantic Search: The Ingredients

Semantic search is a capability to understand the meaning (semantics) and intent behind a user's query, rather than just matching keywords. It is achieved through the combination of various ingredients.

  • Content Vectorization: Creating vector representation (aka. embeddings or vector embeddings) of content (like text, files, images etc.).
  • Vector Storage: Storing the vectors in a way that it can be searched later.
  • Keyword Vectorization: Transforming the user’s keywords to vectors at search time.
  • Vector Search: Finding items whose vectors are mathematically closest to the query vector.
  • Model: At the heart of the transformation process, there is always a Machine Learning model and Natural Language Processing (NLP) doing the heavy-lifting.

Benefits of Semantic Search Powered by the Elastic Stack

Liferay has been offering Semantic Search through direct integration with select providers and using Elasticsearch as a vector database since 2023. The direct integration means that the content and keyword vectorization happens at Liferay DXP side using the configured provider. Elasticsearch is used primarily as a storage.

While this method also provides more relevant results, there are several benefits of harnessing the full potential of the Elastic Enterprise stack with LES:

  • Semantic ingestion of the full content with automatic chunking regardless of your choice of model and provider.
  • Support for built-in models to quick-start any solutions and projects who prefer keeping their data within their stack.

  • Support for bring-your-own-LLM (BYO-LLM) use-cases with a broader range of available providers for enterprises with specific model needs and service preferences.

  • Support for dedicated query types and hybrid ranking methods to build hybrid search experiences blending the strengths of traditional lexical (keyword) search with AI-powered methods, such as semantic search.

  • The related building blocks and configuration points showcased in the DEVCON session and below are all Generally Available (GA).

Creating a Custom Ingest Pipeline in Elasticsearch for Liferay DXP

In order to combine content from multiple fields and use as the embedding input, we are creating a custom Ingest Pipeline in Elasticsearch.

This is a necessary step in this manual approach, because we need a way to create the embedding input from our localized content entries (in this demo, from Web Content Articles with English and Hungarian translations). We will store the embedding input in semantic_text fields. This special field type & workflow allows us to generate the vectors as internal inference fields using the configured model through the Inference Endpoints in Elasticsearch.

We'll use this pipeline in Liferay's Additional Index Settings configuration in order to apply it to the Liferay indexes upon the next reindex.

It is also possible to create a custom Ingest Pipeline to combine raw contents of structured Web Content article and Object entry fields. They are indexed as Nested Fields under the ddmFieldArray/nestedFieldArray field in the Elasticsearch documents. Learn how to explore index fields in Liferay here.

​​​​​​​
 

Semantic Ingestion with Built-in Models (ELSER, E5) in Liferay DXP Powered by the Elastic Stack

Elasticsearch 8 comes with two built-in models (ELSER for English and E5 for non-English language content) and pre-configured endpoints on ML-nodes for ELSER and E5 that we can use.

We need to configure the following ingredients to start using the built-in models and vectorize content in Liferay DXP:

  1. You must have at least one dedicated ML-node in the Elasticsearch cluster that will handle the inference using the built-in models.
  2. Creating a custom Ingest Pipeline. We have just created it above.
  3. Configuring Additional Type Mappings in Liferay's Elasticsearch connector config to create the necessary field mappings for the embeddings.
  4. Configuring Additional Index Mappings in Liferay's Elasticsearch connector config to use the custom pipeline for Liferay indexes.
Once all these three elements are in place, you need to perform a Full or Concurrent reindex to generate the embeddings!

If you only have English content, using ELSER is enough so you can remove the relevant configurations for E5 from the provided snippets.


 


 

Semantic Ingestion with a Third-party Provider (OpenAI) in Liferay DXP Powered by the Elastic Stack

Another great advantage of Elasticsearch and the Enterprise Elastic subscription that LES includes is that there are a number of third-party providers available through the Inference Endpoints.

Using those services is very easy thanks to the abstraction that the semantic_text field & workflow provides through the Inference Endpoints. So the steps and the ingredients are essentially the same with slight differences:

  1. Creating a new Inference Endpoint using your choice of provider, for example the OpenAI service.
  2. Creating a custom Ingest Pipeline: In the demo, I'm using OpenAI as an example. Because OpenAI's models are multilingual, you may want to generate only one embedding per content entry using for example the English translation as input. This means that you can tweak the custom Ingest Pipeline above and remove the third set processor that generates the embeddings for the Hungarian translation.
  3. Configuring Additional Type Mappings in Liferay's Elasticsearch connector config to create the necessary field mappings for the embeddings. The same way, you can define only one new field for Liferay in the Additional Type Mappings and this field will use the OpenAI Inference Endpoint you have just created.
  4. Configuring Additional Index Mappings in Liferay's Elasticsearch connector config to use the custom pipeline for Liferay indexes.
Once all these elements are in place, you need to perform a Full or Concurrent reindex to generate the embeddings!


 

Semantic Search Queries with Blueprints in Liferay DXP

Now that your content has been vectorized, you can use Search Blueprints, Liferay's low-code/no-code query builder tool, to customize your search and query the embeddings in the semantic_text fields using one of the applicable Elasticsearch query types. 

I have included an example for each query type here.

Once you have your blueprint ready and tested, you can use it on search pages, in search bar suggestions and also in headless searches.


 

Remember that query elements in a blueprint act as additions to the default filter and query clauses that Liferay DXP is contributing to the search requests for each searchable type.
This means that your custom query elements may interfere with the default ones returning not the expected results. To perform a pure semantic search or a truly custom (hybrid) search with Blueprints, you can experiment with disabling specific query contributors via the Query Settings.
 

Final Thoughts

I hope that this blog post and the DEVCON recording helps you to quick start your journey with semantic search.

The good news is that you can start using majority of these capabilities in your Liferay-Elastic stack today, if you have LES. These building blocks are all Generally Available (GA).

But we don't want to stop here.

​​​In the Search team here at Liferay, we are actively working on to provide a seamless integration allowing you to leverage these capabilities directly from Liferay DXP through well-known methods (i.e. the Instance Settings configuration UI), so in the future you would not need to create custom Ingest Pipelines, Inference Endpoints or to configure Additional Type mappings.

The new native Elasticsearch 8 connector that is planned to be included with the Liferay DXP 2026.Q1 LTS release also plays an important role to support advanced techniques, like the reciprocal rank fusion (RRF) ​​​​​​​for hybrid searches in Liferay DXP.

You may ask, what will happen to the current, direct integration based Semantic Search capability that is in Beta status. This is an alternative approach that can be kept to provide basic semantic search. Stay tuned for updates.

Disclaimer

While these capabilities are Generally Available (GA), this blog post is not meant to act as a recipe or step-by-step guide for production deployments. Some limitations may apply depending on your Liferay DXP deployment mode. As usual, always consult with official documentation of Liferay DXP and Elasticsearch for more information regarding the usage of the showcased ingredients.

Page Comments

Related Assets...

No Results Found

More Blog Entries...