Liferay Search Troubleshooting

Liferay Search: Tweaks and Troubleshooting

Overview

 

Search is one of the fundamental Liferay features, which provides users with the possibility to quickly find the content they look for. It’s powerful, pretty much configurable, and flexible. 

However, during implementation of the latest project for our client we run into a bunch of issues. Some of them were Liferay bugs or missing functionality, other ones - configuration issues.

I’d like to share our experience in troubleshooting them in this post.

 

The Scenario

 

Let’s assume, we have a portal with books/tutorials (with restricted access). We have a main “Books” site, and a list of child sites for an individual book. Each book site has private pages with content on it (to represent the book's chapter/pages). Users should be able to search for content from a parent “Books” site (global search), or within an individual book site.

Basically, the logic is similar to https://learn.liferay.com/ site.

 

1. Private Pages Indexing

 

Problem

The first issue we came up with was the private pages indexing issue, already described here.

The interesting thing is that it was working before, but after upgrading DXP 7.3 to FP2 it became broken. The client was frustrated, as the release date was close, and the search feature was not functional at all.

After analyzing the sources we found out that the approach for indexing the page content had changed completely. Initially the content was taken from the database (basically, from fragments placed on the page being indexed), but later on the LayoutCrawler was introduced, which tries to fetch page content by URL (using Guest account), and obviously can’t do that for the restricted private pages.

Solution

The solution/workaround for the private pages indexing was similar to the one in the blog mentioned above. The main difference - I have implemented a custom auto-login, which signs in the “crawler” user automatically without password (IMHO, providing raw password even in the portal settings is not secure).

Components:

  • LayoutModelDocumentContributorOverride - customized version of Liferay’s com.liferay.layout.internal.search.spi.model.index.contributor.LayoutModelDocumentContributor . The logic is the same, but it calls a custom crawler instead of Liferay’s one;

  • LayoutCrawler - custom layout crawler. It makes authenticated requests to fetch pages content (using current user’s session if it’s available, or auto-logins a predefined “crawler” user otherwise, and re-uses the session in subsequent calls);

  • LayoutCrawlerAutoLogin - custom auto-login to sign in a “crawler” user without providing password (generated “autoLoginKey” is used instead);

  • SessionIdThreadLocal - custom thread-local class to store sessionId to be reused in subsequent calls (in order not to create a session on each crawler call).

Sources: https://github.com/liferay-apps/liferay-search-override

p.s. Also, after this implementation we had to blacklist the original LayoutModelDocumentContributor:

Otherwise, it may run after a custom one and break the indexed data.

 

2. Displaying Search Results

 

Problem

There were two issues here: what is displayed in results and how it’s displayed:

  1. The preview for search results contained some improper data.

  2. We had to customize the layout and how the information is displayed.

 

Solution 1: pre-processing content 

 If we look again into the sources of newly introduced page crawler - we’ll found out, that the logic for fetching content is very simple, but not too smart.

Actually, it just indexes everything that is after id=”wrapper”. From the default theme structure it means, that data from header (or even footer) will be included. This generates “strange” previews for search results, which displays data from header navigation, breadcrumbs, menus etc. instead of the actual page content. 

Considering, we had created a customized version of LayoutModelDocumentContributor - I have included a content pre-processor step before putting it into Elastic index. In my case I get everything inside “content” (yes, “content” and not “wrapper” to exclude data from header), and cut-off all portlets data to index only fragments inside the content. Of course, this logic may be different according to the requirements and a custom theme structure:

This fix made the preview for pages more accurate, and not displaying the same “random" information for each page.

 

Solution 2: custom widget template

The default template for displaying search results contained some information, which should not be shown according to the requirements (icon, author, modified date, etc.):

Thanks to the Widget Templates feature it’s easy to modify the widget’s content template, including “Search Results Widget”. We had just to create a new “Search Results Template” based on “List Layout” one in a Global site, and remove the redundant elements:

 

After applying the display template for search results we’ve got the desired view:

 

3. Filtering Search Results

 

Problem

Search results should be filtered against associated tags and categories. Only pages should be displayed in search results.

Solution

Using the Facet widgets we can filter the search results. Just added “Type Facet”, “Category Facet”, “Tag Category” to the search results page:

 

Also, pre-configured the “Type Facet” to display pages only:

 

Considering this selection makes no sense for the end user (as there is only on option to select - “Page”), we made it hidden (except edit mode):

 

4. Scoping Search Results

 

Problem

When performing the “global” search (search from the main “Books” site) - search results should be displayed from any book (child site), but not from other Liferay sites.

Actually, Liferay provides the capability to select the “scope”:

But there are only options to restrict scope to “This Site” or “Everything”, we can’t choose an option like “This site and all child sites”.

Solution

Fortunately, we can use custom filters to restrict the search results by specifying the site’s groupId, as described here. We need to add a “Custom Filter” widget to the search results page for “parent” filter with the configuration below:

And also add a “Custom Filter” widget for each individual child site, specifying the appropriate groupId in the configuration: 

But even after proper configuration it will not work, and you may be wondering - why? This is due to another Liferay bug.

However, the solution is simple: just give your search result page a different name - e.g. “Search Results” instead of “Search” to generate a different friendly URL from “/search”. Also, make sure you specify this page’s URL as “Destination Page” in the Search Bar configuration.

This way, you should be able to find the result from child sites, and also filter them by site (if you add the “Site Facet” to the search results page):

Conclusion

 

Even though Liferay search is a powerful tool, providing users with a flexible way for looking for the content they need, it’s node ideal, and has some issues currently. But all of them can be overcame, if you put enough effort, ask community support, analyze Liferay sources and try to make Liferay better ?

Hope, this will be helpful for somebody.

Add your thoughts and questions into the comments, or contact me directly.

 

Enjoy ?

Vitaliy

Blogs

Great article Vitaliy, I learned some new tricks that I'm going to use soon! We also use all the new search features a lot, and the entire mechanism is much better than it used to be in the past. Especially configuring the search with different "filtering" widgets.

The only trick I'd add to your list is that instead of adding the new “Custom Filter” widget for each child site, I usually add a single "Regexp" filter with all groupIds listed in a single field with the following pattern: "(groupid_1)|(groupid_2)|(groupid_3)|..."

Thanks for the info and the module!

We were also wondering what the hell happened after the FP2 update...

We had to change the following line in LayoutCrawler.java to get it to work on our PROD servers: ​​​​​​​themeDisplay.setServerName(company.getVirtualHostname());

Since we have web.server.protocol=https ​​​​​​​set in portal-ext.properties the URLs to crawl were being generated as https://localhost/..., which was of course throwing SSL errors.

We then had to add the SSL cert to the Java keystore.

​​​​​​​Don't know if there's a way to force https URLs and get this to work otherwise?

Hi Christian!

For local development you can either setup HTTPs: 

https://lifedev-solutions.blogspot.com/2021/03/liferay-tomcat-ssl-configuration.html

or just disable it (set web.server.protocol=http, or remove this property).

You can also set the web.server.host propery to specify the host, if needed.

But the main issue with layouts search is missing authorization, you can add it customized version of LayoutCrawler, see:

https://github.com/liferay-apps/liferay-search-override/blob/master/src/main/java/com/liferay/apps/search/override/crawler/LayoutCrawler.java#L66 

Vitaliy

Wonderful article, Vitaliy! I was looking into how to do this for an external SSO implementation. If I understand the process correctly, I think all I need to do is replace your doLogin logic with the logic from the relevant AutoLogin implementation. For example, if SAML SSO, then modules/dxp/apps/saml/saml-impl/src/main/java/com/liferay/saml/runtime/internal/auto/login/SamlSpAutoLogin.java.