RE: Result Ranking synonyms case insensitive and various diacritics support

thumbnail
Jamie Sammons, modified 2 Years ago. New Member Posts: 13 Join Date: 1/16/19 Recent Posts

Hello,

With respect to overall Search philosophy we sugest to be able configure ResultRanking synonyms "the case insensitive way to reduce configuration complexity". Also all possible language diacritics combinations cloud be considered as well. e.g. instead of placing synonyms to site: Site, page, Page, páge, Páge, Layout, layout, ... inlimited number of another combinations, just provide tu the sute query the: page,layout sysnonyms. Easy to use, well?

The elastic can be configured for such a queries, but it is not adopted for the result ranking synonyms.

Please consitder using https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html ascifolding filter for such a purposes.

It can be extended to whole site search configuration.

Petr

thumbnail
Tibor Lipusz, modified 2 Years ago. New Member Posts: 17 Join Date: 3/16/12 Recent Posts

Hello Petr,

At first sight, I wanted to reply to share that some changes were introduced to Result Rankigs when it comes to aliases and case-sensitivity/insensitivity on https://liferay.atlassian.net/browse/LPS-141169.

However, after taking a second look at your points, I'm not sure you were referring to "aliases".

Allow me to take a closer a look and get back here.

-

Tibor

thumbnail
Tibor Lipusz, modified 1 Year ago. New Member Posts: 17 Join Date: 3/16/12 Recent Posts

Just to follow-up here and close the loop: we have had a meeting where you presented the problem in its original context. The pain point is clear.

As I also mentioned it on the call, we are in the phase of revisiting Result Rankings and requests like this can help to discover and plan enhancements.

This request is saved on my list and once there will be a related initiative or epic I can share here I will do.

If you prefer tracking it as a Feature Request ticket in our Jira, I can also create one.

Regards,

Tibor

thumbnail
Tibor Lipusz, modified 1 Year ago. New Member Posts: 17 Join Date: 3/16/12 Recent Posts

Just a technical note: what essentially is requested here is language-aware "Result Ranking Search Queries and Aliases".

So for example, the admin would only need to create 1 RR entry with the Brown foxes jump as the Search Query, but at search time the same RR entry would also be applied on searches with keywords that represent the same keywords in a different variation, like "brown foxes jump", "BROWn fox jump" or "brown fox jump" keywords. Stemming, punctuations, whitespaces, capitialization etc. would be considered according to the given language's standards.

The tricky part is not really the language analysis, as that is a solved problem, Liferay is also using the language specific analyzers from Elasticsearch for localized text fields. It's more about which language analyzer to apply to the keywords entered by the user.

The user's display locale? The site's default locale? Or should the platform try guessing in which language the keywords were entered? Or maybe RRs should have a language configuration and that information would also be considered at search time (like "RR's language equals display language") when looking for a potential RR to be applied.

Neverthless, that's an interesting problem.