Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-5142

Stemming and wildcards do not play nicely in search queries

    XMLWordPrintable

Details

    Description

      There is a slight issue with searching in that if you search for a part of a word and apply a wildcard, Lucene doesn't find the word you intended.

      e.g. if you search for "Management" (no quotes) on CAC, it returns a bunch of results. A search for "Managemen*", however, only returns one.

      The reason for this is that "Managemen" is not a real English word, and so is not stemmed. So, the query term does not match the stemmed version of "management", "manag" that we have in the index, and the correct results aren't returned. (Note: the attachment returned by the wildcard query is due to the indexing of the full filename, which then matches "managemen*")

      A solution to this may be to store the original word (as well as the stemmed) in a different field in the index. When a wildcard search term comes through, search the full and stemmed words. The cache may be bigger, and there may be a slight performance hit, but it will make searching a bit more reliable in these edge cases.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              8d92d19feb5e Jeremy Higgs
              Votes:
              31 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: