The search does 'word stemming' by default, even though you put quotes around the search term. The following examples are searches on confluence.atlassian.com.

      For example a search for the 'show to macro' with the search term: "

      {show-to}

      " will give you heaps of results with only 'show' as a keyword.
      The fact that not the whole seach term is used but a stemmed part seems to be related to the special chars like the hyphen in this search.

      A search for: "Theme Helper" will give you only one result with the right page. However, only the word 'theme' is highlighted in the result. It seems that the stemming is used for the highlighting. Unfortunately this gives the impression of an unsuccessful search.

        1. ss.jpg
          ss.jpg
          271 kB

            [CONFSERVER-3353] Search gives unexpected results

            Eric: Which version of Confluence have you tested this against? My tests were performed against 2.5

            Charles Miller (Inactive) added a comment - Eric: Which version of Confluence have you tested this against? My tests were performed against 2.5

            In the example screenshot I searched for "No tag", which would be analogous to your "source code" example.
            Your last bullet says that it will not match "This is a piece of source"
            Analogously I should not get "The tags are generated...", but the very last search result in the screenshot in fact returns such a phrase.

            Eric Lentz added a comment - In the example screenshot I searched for "No tag", which would be analogous to your "source code" example. Your last bullet says that it will not match "This is a piece of source" Analogously I should not get "The tags are generated...", but the very last search result in the screenshot in fact returns such a phrase.

            Unfortunately, these are all limitations of the Lucene search technology, and something we're not likely to be able to fix without significant changes to that library. Stemming is something that occurs during the creation of the index, so by the time it's searched the data is lost.

            I've verified the following in Confluence 2.5:

            If you search for "source code" in quotes:

            • You will match "This is a piece of source code", because the phrase is included
            • You will match "This is a piece of source and code", because the stop-word 'and' is ignored during indexing.
            • You will match "This is a piece of sourcing code" because the terms are stemmed during indexing.
            • You will not match "This is a piece of source banana code" because the phrase is not matched.
            • You will not match "This is a piece of source" or "This is a piece of code", because the phrase is not matched.

            A workaround is to go to General Configuration and set your "Indexing language" to "Other", then rebuild your search index. This will turn off stemming and stop-words entirely, for all searches.

            Charles Miller (Inactive) added a comment - Unfortunately, these are all limitations of the Lucene search technology, and something we're not likely to be able to fix without significant changes to that library. Stemming is something that occurs during the creation of the index, so by the time it's searched the data is lost. I've verified the following in Confluence 2.5: If you search for "source code" in quotes: You will match "This is a piece of source code", because the phrase is included You will match "This is a piece of source and code", because the stop-word 'and' is ignored during indexing. You will match "This is a piece of sourcing code" because the terms are stemmed during indexing. You will not match "This is a piece of source banana code" because the phrase is not matched. You will not match "This is a piece of source" or "This is a piece of code", because the phrase is not matched. A workaround is to go to General Configuration and set your "Indexing language" to "Other", then rebuild your search index. This will turn off stemming and stop-words entirely, for all searches.

            Eric Lentz added a comment -

            I don't fully agree with the word stemming commentary by Jens Schumacher. In the screenshot attached you'll see that "No tag" brought back entries with just the word "tag" or "tags". Perhaps it also found "No tag" enties and the highlighting issue is occuring as well giving the illusion that it is only getting the word "tag". It should never have retrieved any entry with just "tag".

            Eric Lentz added a comment - I don't fully agree with the word stemming commentary by Jens Schumacher. In the screenshot attached you'll see that "No tag" brought back entries with just the word "tag" or "tags". Perhaps it also found "No tag" enties and the highlighting issue is occuring as well giving the illusion that it is only getting the word "tag". It should never have retrieved any entry with just "tag".

            It's not just phrases or special characters. Searches for "inform", "information" and "informal" all return the same document set, even if the search term is quoted. Is there any way to turn stemming off?

            Paula Matuszek added a comment - It's not just phrases or special characters. Searches for "inform", "information" and "informal" all return the same document set, even if the search term is quoted. Is there any way to turn stemming off?

            Eric Lentz added a comment -

            Wanted "No tags" but got only "tag" and "tags" even though there were "No tags" like 2 of the documents under the "Production Support" space.

            Eric Lentz added a comment - Wanted "No tags" but got only "tag" and "tags" even though there were "No tags" like 2 of the documents under the "Production Support" space.

              Unassigned Unassigned
              jens@atlassian.com jens
              Affected customers:
              3 This affects my team
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: