Introducing the Semantic Search Capabilities (BETA) in DXP 7.4 U70+

Getting one (big) step closer to create a search experiences that understand the meaning and the context of the content and users searches leveraging new generation language ML models through third-party providers and the visual query builder of Search Blueprints.

As AI has become paramount recently and Liferay DEVCON 2023 is about to get started, it is exciting to highlight this recent release and to promote the feature from development to beta phase.

Since the first demo of the prototype and the live session delivered on /dev/24 last year, further enhancements have been made and now Semantic Search can be enabled as beta feature through - Control Panel - Configuration - Instance Settings > Feature Flags in DXP 7.4 U70+.


 

Configuring Semantic Search with Blueprints

Search-as-you-type with and without Semantic Search in Liferay DXP
To build a semantic search experience like the one above, there is a new query element provided called Rescore by Text Embedding ready to be used in a search blueprint.

Rescore by Text Embedding element in Blueprints
 

Thanks to this element and the visual query builder, users can easily configure the different aspects of search and test how it performs to build the right solution for their content and use-cases.

Once the blueprint is ready, just like any blueprints, it can be applied on a search page or used in the Search Bar to drive search-as-you-type suggestions.

New to Search Blueprints? Checkout the Easily customize Liferay's search behavior with Liferay Enterprise Search Experiences article in Help Center.

Understanding Semantic Search

While Search Blueprints plays an integral part in this feature, there are new technologies and concepts under the hood to make this possible. Let’s go through briefly the main building blocks and how it integrates into Blueprints.

This recording​ provides a similar overview of the concepts of semantic search (including a quick demo).

Text Embeddings

When this feature is enabled and a provider (see later) is configured, a numeric (vector) representation of the input text (obtained from a given content), called embedding is stored in the documents in Elasticsearch at indexing time.

Indexing with Semantic Search
 

Embeddings are meant to capture the meaning and the context of the input they are generated for example from the title and the first few sentences of the content of a Basic Web Content Article and it can be used to provide better results for user searches over the traditional keyword matching.

From text to embedding (vector)
​​​​​​​Currently, the title/subject plus parts of the content/body of the supported content types are used to generate the text embeddings depending on the Text Truncation Strategy configured.

Similarity Search with Blueprints

In order to achieve this, at search time, the keywords entered by users need to go through the same process making it possible to perform a similarity search or vector search from Liferay DXP to provide better, semantically more relevant results for users.

This is where the Rescore by Text Embedding element mentioned earlier for Blueprints comes in handy: this element is able to improve the results by reordering the top matching (by the keywords) items using cosine similarity or dot product function over the vector field Liferay DXP populates at indexing time. It also comes with additional options to configure how Elasticsearch will re-score the documents.

Searching with Semantic Search
Multiple Supported Providers for Different Scenarios​​​

Currently, the feature supports txtai (self-hosted / self-managed), Hugging Face's Inference API (suitable for quick testing/development purposes) and Hugging Face's Inference Endpoints (enterprise-grade, paid inference as a service) as text embedding providers. The provider can be configured via Instance Settings - Search Experiences - Semantic Search.

​​​​​​​
 

Machine Learning in Action

At heart of the transformation process, there is always an ML model doing the heavy-lifting.

From text to embedding through a provider and the model
Administrators can choose from a wide range of pre-trained models from Hugging Face's Models Hub and configure different properties of the provider connection through the System / Instance Settings in Liferay DXP.

​​​​​​​
Ready to Try?

To get started,

  1. Download a DXP 7.4 U70+ Tomcat Bundle from the Customer Portal or docker pull liferay/dxp:7.4.13-u77

  2. Enable the feature using one of the available methods described here

Need Help or Got a Feedback?

Reach out to us via one of the preferred channels.