Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-5942

Indexing with 'Other' analyzer removes numbers from search

      When indexing language is set to "other", searches ignore numbers.

      Example:

      Page contains string "foo"

      Search for "foo123" returns string "foo" as matching

      Workaround:

      Switch to a different Indexing Language besides 'Other' then rebuild the index

            [CONFSERVER-5942] Indexing with 'Other' analyzer removes numbers from search

            Minh Tran added a comment -

            A fix for this issue is available to Server and Data Center customers in Confluence 6.10.0
            Upgrade now or check out the Release Notes to see what other issues are resolved.

            Minh Tran added a comment - A fix for this issue is available to Server and Data Center customers in Confluence 6.10.0 Upgrade now or check out the Release Notes to see what other issues are resolved.

            Rob Lillywhite added a comment - - edited

            This is a critical defect for us and leaves us having to choose between turning stemming off or being able to include numbers searches.

            It appears to simply extract "foo" as the only searchable part of "foo123". I think this is the intended behaviour.

            - Andrew Lynch

            Whether this is the intended behaviour or not it is still a defect even if the defect is in the design.

            I for one cannot imagine why you would intend to exclude numbers from the search.

            Hopefully Atlassian will take note and fix this because without searching that works we might as well all be using SharePoint.

            Rob Lillywhite added a comment - - edited This is a critical defect for us and leaves us having to choose between turning stemming off or being able to include numbers searches. It appears to simply extract "foo" as the only searchable part of "foo123". I think this is the intended behaviour. - Andrew Lynch Whether this is the intended behaviour or not it is still a defect even if the defect is in the design. I for one cannot imagine why you would intend to exclude numbers from the search. Hopefully Atlassian will take note and fix this because without searching that works we might as well all be using SharePoint.

            Hi, I can confirm this affects the version used where I work, Confluence 3.5.13.

            This is a very frustrating search problem for us. I am using the wiki as part of software development (along with Jira). We had to disable English filter as stemming made searching for eg camel case api names very difficult. Replacing with the Other filter means that it is now impossible to search for different versions of the same product, or distinguish between eg Win7, Win32, or "win".

            There seems to be debate on whether this is a bug or intended behaviour. I don't know either, but I do think it is a bug to not have a filter that allows searching for both letters and numbers! Would it be best for me to keep pushing for this ticket, or to create a new issue for a filter that can allow mixed letter and number searching?

            Thanks.

            Stephen Edwards added a comment - Hi, I can confirm this affects the version used where I work, Confluence 3.5.13. This is a very frustrating search problem for us. I am using the wiki as part of software development (along with Jira). We had to disable English filter as stemming made searching for eg camel case api names very difficult. Replacing with the Other filter means that it is now impossible to search for different versions of the same product, or distinguish between eg Win7, Win32, or "win". There seems to be debate on whether this is a bug or intended behaviour. I don't know either, but I do think it is a bug to not have a filter that allows searching for both letters and numbers! Would it be best for me to keep pushing for this ticket, or to create a new issue for a filter that can allow mixed letter and number searching? Thanks.

            JoachimA added a comment -

            This problem is still happening in Confluence 4.0.x upon testing.

            JoachimA added a comment - This problem is still happening in Confluence 4.0.x upon testing.

            Surely the other analyzers should work in a similar manner to the English one?

            Christopher Owen [Atlassian] added a comment - Surely the other analyzers should work in a similar manner to the English one?

            This only seems to affect the 'Other' analyzer, and I'm not sure that this is a bug. When using the 'other' analyzer type, a SimpleAnalyzer will be used (which in turn uses a LowerCaseTokenizer ).

            /**
             * LowerCaseTokenizer performs the function of LetterTokenizer
             * and LowerCaseFilter together.  It divides text at non-letters and converts
             * them to lower case.  While it is functionally equivalent to the combination
             * of LetterTokenizer and LowerCaseFilter, there is a performance advantage
             * to doing the two tasks at once, hence this (redundant) implementation.
             * <P>
             * Note: this does a decent job for most European languages, but does a terrible
             * job for some Asian languages, where words are not separated by spaces.
             */
            

            It appears to simply extract "foo" as the only searchable part of "foo123". I think this is the intended behaviour.

            Andrew Lynch (Inactive) added a comment - This only seems to affect the 'Other' analyzer, and I'm not sure that this is a bug. When using the 'other' analyzer type, a SimpleAnalyzer will be used (which in turn uses a LowerCaseTokenizer ). /** * LowerCaseTokenizer performs the function of LetterTokenizer * and LowerCaseFilter together. It divides text at non-letters and converts * them to lower case . While it is functionally equivalent to the combination * of LetterTokenizer and LowerCaseFilter, there is a performance advantage * to doing the two tasks at once, hence this (redundant) implementation. * <P> * Note: this does a decent job for most European languages, but does a terrible * job for some Asian languages, where words are not separated by spaces. */ It appears to simply extract "foo" as the only searchable part of "foo123". I think this is the intended behaviour.

            We are currently using Version: 2.2.9 Build:#527 Sep 07, 2006 of Confluence and this problem still exists. The effects of having our Indexing language set to 'English' instead of 'Other' are unknown to me (our Confluence content is 99% Dutch).

            http://confluence.atlassian.com/display/DOC/Configuring+Indexing+Language --> "may improve the accuracy of Confluence search results"

            Mark van Straten added a comment - We are currently using Version: 2.2.9 Build:#527 Sep 07, 2006 of Confluence and this problem still exists. The effects of having our Indexing language set to 'English' instead of 'Other' are unknown to me (our Confluence content is 99% Dutch). http://confluence.atlassian.com/display/DOC/Configuring+Indexing+Language --> "may improve the accuracy of Confluence search results"

              hrehioui Hasnae (Inactive)
              christopher.owen@atlassian.com Christopher Owen [Atlassian]
              Affected customers:
              12 This affects my team
              Watchers:
              18 Start watching this issue

                Created:
                Updated:
                Resolved: