Message Boards

Search score is decreasesing in each webcontent update. Is it a bug?

Marcel Tanuri, modified 2 Years ago.

Search score is decreasesing in each webcontent update. Is it a bug?

New Member Post: 1 Join Date: 7/26/21 Recent Posts

I've tested in 7.2 and 7.4. I've noticed that whenever I update a content (even with no change) its score decreases in Search Results.

Should it be considered a bug?

Steps to reproduce.

  1. Create a new webcontent
  2. Go to the search page and install Search Insights portlet
  3. Execute a search using a word that matches with the created web content
  4. A score will be displayed in the Response String area of the Search Insights. Save the score value in your notes
  5. Go back to the webcontent in editor mode and publish it again. No changes are needed, only republish it. Version will be 1.1
  6. Execute the same search again

Expected result: The same score that was calculated to the webcontent 1.0 should be calculated to the found webcontent 1.1

Given result: It has been calculated a lesser score comparing it with the previous score in the version 1.0

 


 

thumbnail
Russell Bohl, modified 2 Years ago.

RE: Search score is decreasesing in each webcontent update. Is it a bug?

Expert Posts: 291 Join Date: 2/13/13 Recent Posts

Hey Marcel, I'm impressed you noticed this. Your question was discussed internally by some Liferay Engineers, and it looks like it's working as expected. Here's why: 

Editing an article creates a new version. This new version is indexed alongside the first one. TF-IDF (Term Frequency-Inverse Document Frequency) scoring dictates that having additional indexed content that closely (in your case, identically) matches the original copy will cause the significance of matching terms in the document corpus to get lower, and thus a lower score results. In this case, it's the IDF part that's lowering the score, because now the search term appears in more documents in the index. The match is essentially less unique and thus deemed less significant.

Updated to add:

If you'd like to offset the score-lowring effect of article versions you could try to boost the score of some documents using the Custom Filter widget. 

For example, Web Content articles seem to have a field that indicates a version as the latest version of the article. The field is called head. It looks like it's true for the latest version, and false for a past version. See the Custom Filter documentation for inspiration.

In addition to the Custom Filter, some helpful tools in this quest: 

Search Insights widget
Search Results widget, with View in Document Form enabled

To Look at the raw indexed content (including older versions of the article) hit the Elasticsearch API directly. For example, in my running 7.4 local bundle (running the Sidecar Elasticsearch server) I used the following URL to look at all indexed web content:

http://localhost:9201/liferay-20099/_search?q=entryClassName:"com.liferay.journal.model.JournalArticle"&pretty=true
 

 

I hope that helps.