Introducing the Semantic Search Capabilities (BETA) in DXP 7.4 U70+
Getting one (big) step closer to create a search experiences that understand the meaning and the context of the content and users searches leveraging new generation language ML models through third-party providers and the visual query builder of Search Blueprints.
Tibor Lipusz
May 23, 2023
2 Minute Read
As AI has become paramount recently and
Liferay DEVCON 2023 is
about to get started, it is exciting to highlight this recent release
and to promote the feature from development to beta phase.
Since the first demo of the
prototype and the live session delivered
on /dev/24 last year, further enhancements have been made and now
Semantic Search can be enabled as beta feature through -
Control Panel - Configuration - Instance Settings > Feature Flags
in DXP 7.4 U70+.
Configuring Semantic Search with Blueprints
To build a semantic search experience like the one above, there
is a new query element provided called Rescore by Text
Embedding ready to be used in a search blueprint.
Thanks to this element and the
visual query builder, users can easily configure the different aspects
of search and test how it performs to build the right solution for
their content and use-cases.
Once the blueprint is ready, just
like any blueprints, it can be applied
on a search page or used in the Search Bar to drive
search-as-you-typesuggestions.
While Search Blueprints
plays an integral part in this feature, there are new technologies
and concepts under the hood to make this possible. Let’s go through
briefly the main building blocks and how it integrates into
Blueprints.
This recording provides
a similar overview of the concepts of semantic search (including a
quick demo).
Text Embeddings
When this feature is enabled and a
provider (see later) is configured, a numeric (vector)
representation of the input text (obtained from a given content),
called embedding is stored in the documents in
Elasticsearch at indexing time.
Embeddings are
meant to capture the meaning and the context of the input they are
generated for example from the title and the first few sentences of
the content of a Basic Web Content Article and it can be used to
provide better results for user searches over the traditional
keyword matching.
Currently, the title/subject plus parts of the content/body of
the supported content types are used to generate the text embeddings
depending on the Text Truncation Strategy
configured.
Similarity Search with Blueprints
In order to achieve this, at search
time, the keywords entered by users need to go through the same
process making it possible to perform a similarity
search or vector search from Liferay DXP to provide
better, semantically more relevant results for users.
This is where the Rescore by
Text Embedding element mentioned earlier for Blueprints
comes in handy: this element is able to improve the results by
reordering the top matching (by the keywords) items using cosine similarity or dot
product function over the vector field Liferay DXP populates
at indexing time. It also comes with additional options to configure
how Elasticsearch will re-score the
documents.
Multiple Supported Providers for Different Scenarios
Currently, the feature supports txtai
(self-hosted / self-managed), Hugging
Face's Inference API (suitable for quick
testing/development purposes) and Hugging
Face's Inference Endpoints (enterprise-grade, paid
inference as a service) as text embedding providers. The provider
can be configured via Instance Settings - Search Experiences -
Semantic Search.
Machine Learning in Action
At heart of the
transformation process, there is always an ML model doing
the heavy-lifting.
Administrators can choose from a wide range of pre-trained
models from Hugging Face's Models
Hub and configure different properties of the provider
connection through the System / Instance Settings in Liferay
DXP.
Ready to Try?
To get
started,
Download a
DXP 7.4 U70+ Tomcat Bundle from the Customer
Portalor docker pull
liferay/dxp:7.4.13-u77
Enable the feature using
one of the available methods described here.
This website uses cookies and similar tools, some of which are provided by third parties (together “tools”). These tools enable us and the third parties to access and record certain user-related and activity data and to track your interactions with this website. These tools and the informationcollected are used to operate and secure this website, enhance performance, enable certain website features and functionality, analyze and improve website performance, and personalize user experience.
If you click “Accept All”, you allow the deployment of all these tools and collection of the information by us and the third parties for all these purposes.
If you click “Decline All” your IP address and other information may still be collected but only by tools (including third party tools) that are necessary to operate, secure and enable default website features and functionalities. Review and change your preferences by clicking the “Configurations” at any time.