Search is one of the fundamental Liferay features, which provides users with the possibility to quickly find the content they look for. It’s powerful, pretty much configurable, and flexible.
However, during implementation of the latest project for our client we run into a bunch of issues. Some of them were Liferay bugs or missing functionality, other ones - configuration issues.
I’d like to share our experience in troubleshooting them in this post.
Let’s assume, we have a portal with books/tutorials (with restricted access). We have a main “Books” site, and a list of child sites for an individual book. Each book site has private pages with content on it (to represent the book's chapter/pages). Users should be able to search for content from a parent “Books” site (global search), or within an individual book site.
Basically, the logic is similar to https://learn.liferay.com/ site.
The first issue we came up with was the private pages indexing issue, already described here.
The interesting thing is that it was working before, but after upgrading DXP 7.3 to FP2 it became broken. The client was frustrated, as the release date was close, and the search feature was not functional at all.
After analyzing the sources we found out that the approach for indexing the page content had changed completely. Initially the content was taken from the database (basically, from fragments placed on the page being indexed), but later on the LayoutCrawler was introduced, which tries to fetch page content by URL (using Guest account), and obviously can’t do that for the restricted private pages.
The solution/workaround for the private pages indexing was similar to the one in the blog mentioned above. The main difference - I have implemented a custom auto-login, which signs in the “crawler” user automatically without password (IMHO, providing raw password even in the portal settings is not secure).
LayoutModelDocumentContributorOverride - customized version of Liferay’s com.liferay.layout.internal.search.spi.model.index.contributor.LayoutModelDocumentContributor . The logic is the same, but it calls a custom crawler instead of Liferay’s one;
LayoutCrawler - custom layout crawler. It makes authenticated requests to fetch pages content (using current user’s session if it’s available, or auto-logins a predefined “crawler” user otherwise, and re-uses the session in subsequent calls);
LayoutCrawlerAutoLogin - custom auto-login to sign in a “crawler” user without providing password (generated “autoLoginKey” is used instead);
SessionIdThreadLocal - custom thread-local class to store sessionId to be reused in subsequent calls (in order not to create a session on each crawler call).
p.s. Also, after this implementation we had to blacklist the original LayoutModelDocumentContributor:
Otherwise, it may run after a custom one and break the indexed data.
There were two issues here: what is displayed in results and how it’s displayed:
The preview for search results contained some improper data.
We had to customize the layout and how the information is displayed.
If we look again into the sources of newly introduced page crawler - we’ll found out, that the logic for fetching content is very simple, but not too smart.
Actually, it just indexes everything that is after id=”wrapper”. From the default theme structure it means, that data from header (or even footer) will be included. This generates “strange” previews for search results, which displays data from header navigation, breadcrumbs, menus etc. instead of the actual page content.
Considering, we had created a customized version of LayoutModelDocumentContributor - I have included a content pre-processor step before putting it into Elastic index. In my case I get everything inside “content” (yes, “content” and not “wrapper” to exclude data from header), and cut-off all portlets data to index only fragments inside the content. Of course, this logic may be different according to the requirements and a custom theme structure:
This fix made the preview for pages more accurate, and not displaying the same “random" information for each page.
The default template for displaying search results contained some information, which should not be shown according to the requirements (icon, author, modified date, etc.):
Thanks to the Widget Templates feature it’s easy to modify the widget’s content template, including “Search Results Widget”. We had just to create a new “Search Results Template” based on “List Layout” one in a Global site, and remove the redundant elements:
After applying the display template for search results we’ve got the desired view:
Search results should be filtered against associated tags and categories. Only pages should be displayed in search results.
Using the Facet widgets we can filter the search results. Just added “Type Facet”, “Category Facet”, “Tag Category” to the search results page:
Also, pre-configured the “Type Facet” to display pages only:
Considering this selection makes no sense for the end user (as there is only on option to select - “Page”), we made it hidden (except edit mode):
When performing the “global” search (search from the main “Books” site) - search results should be displayed from any book (child site), but not from other Liferay sites.
Actually, Liferay provides the capability to select the “scope”:
But there are only options to restrict scope to “This Site” or “Everything”, we can’t choose an option like “This site and all child sites”.
Fortunately, we can use custom filters to restrict the search results by specifying the site’s groupId, as described here. We need to add a “Custom Filter” widget to the search results page for “parent” filter with the configuration below:
And also add a “Custom Filter” widget for each individual child site, specifying the appropriate groupId in the configuration:
But even after proper configuration it will not work, and you may be wondering - why? This is due to another Liferay bug.
However, the solution is simple: just give your search result page a different name - e.g. “Search Results” instead of “Search” to generate a different friendly URL from “/search”. Also, make sure you specify this page’s URL as “Destination Page” in the Search Bar configuration.
This way, you should be able to find the result from child sites, and also filter them by site (if you add the “Site Facet” to the search results page):
Even though Liferay search is a powerful tool, providing users with a flexible way for looking for the content they need, it’s node ideal, and has some issues currently. But all of them can be overcame, if you put enough effort, ask community support, analyze Liferay sources and try to make Liferay better 😏
Hope, this will be helpful for somebody.
Add your thoughts and questions into the comments, or contact me directly.