Introducing the Semantic Search Capabilities (BETA) in DXP 7.4 U70+
Getting one (big) step closer to create a search experiences that understand the meaning and the context of the content and users searches leveraging new generation language ML models through third-party providers and the visual query builder of Search Blueprints.
Tibor Lipusz
May 23, 2023
2 Minute Read
As AI has become paramount recently and Liferay DEVCON 2023 is about to get started, it is exciting to highlight this recent release and to promote the feature from development to beta phase.
Since the first demo of the prototype and the live session delivered on /dev/24 last year, further enhancements have been made and now Semantic Search can be enabled as beta feature through - Control Panel - Configuration - Instance Settings > Feature Flags in DXP 7.4 U70+.
Configuring Semantic Search with Blueprints
To build a semantic search experience like the one above, there is a new query element provided called Rescore by Text Embedding ready to be used in a search blueprint.
Thanks to this element and the visual query builder, users can easily configure the different aspects of search and test how it performs to build the right solution for their content and use-cases.
Once the blueprint is ready, just like any blueprints, it can be applied on a search page or used in the Search Bar to drive search-as-you-typesuggestions.
While Search Blueprints plays an integral part in this feature, there are new technologies and concepts under the hood to make this possible. Let’s go through briefly the main building blocks and how it integrates into Blueprints.
This recording provides a similar overview of the concepts of semantic search (including a quick demo).
Text Embeddings
When this feature is enabled and a provider (see later) is configured, a numeric (vector) representation of the input text (obtained from a given content), called embedding is stored in the documents in Elasticsearch at indexing time.
Embeddings are meant to capture the meaning and the context of the input they are generated for example from the title and the first few sentences of the content of a Basic Web Content Article and it can be used to provide better results for user searches over the traditional keyword matching.
Currently, the title/subject plus parts of the content/body of the supported content types are used to generate the text embeddings depending on the Text Truncation Strategy configured.
Similarity Search with Blueprints
In order to achieve this, at search time, the keywords entered by users need to go through the same process making it possible to perform a similarity search or vector search from Liferay DXP to provide better, semantically more relevant results for users.
This is where the Rescore by Text Embedding element mentioned earlier for Blueprints comes in handy: this element is able to improve the results by reordering the top matching (by the keywords) items using cosine similarity or dot product function over the vector field Liferay DXP populates at indexing time. It also comes with additional options to configure how Elasticsearch will re-score the documents.
Multiple Supported Providers for Different Scenarios
Currently, the feature supports txtai (self-hosted / self-managed), Hugging Face's Inference API (suitable for quick testing/development purposes) and Hugging Face's Inference Endpoints (enterprise-grade, paid inference as a service) as text embedding providers. The provider can be configured via Instance Settings - Search Experiences - Semantic Search.
Machine Learning in Action
At heart of the transformation process, there is always an ML model doing the heavy-lifting.
Administrators can choose from a wide range of pre-trained models from Hugging Face's Models Hub and configure different properties of the provider connection through the System / Instance Settings in Liferay DXP.
Ready to Try?
To get started,
Download a DXP 7.4 U70+ Tomcat Bundle from the Customer Portalor docker pull liferay/dxp:7.4.13-u77
Enable the feature using one of the available methods described here.
This website uses cookies and similar tools, some of which are provided by third parties (together “tools”). These tools enable us and the third parties to access and record certain user-related and activity data and to track your interactions with this website. These tools and the informationcollected are used to operate and secure this website, enhance performance, enable certain website features and functionality, analyze and improve website performance, and personalize user experience.
If you click “Accept All”, you allow the deployment of all these tools and collection of the information by us and the third parties for all these purposes.
If you click “Decline All” your IP address and other information may still be collected but only by tools (including third party tools) that are necessary to operate, secure and enable default website features and functionalities. Review and change your preferences by clicking the “Configurations” at any time.