[CONFSERVER-5942] Indexing with 'Other' analyzer removes numbers from search

Type: Bug
Resolution: Fixed
Priority: Medium
Fix Version/s: 6.10.0
Affects Version/s: 2.1.5, 4.0, 5.1.4, 5.3, 6.0.7, 6.3.1, 6.3.3
Component/s: Search - Core
Labels:

Support reference count:
9
Symptom Severity:
Severity 2 - Major
UIS:
0
Bug Fix Policy:
View Atlassian Server bug fix policy

When indexing language is set to "other", searches ignore numbers.

Example:

Page contains string "foo"

Search for "foo123" returns string "foo" as matching

Workaround:

Switch to a different Indexing Language besides 'Other' then rebuild the index

is duplicated by

CONFSERVER-9776 Russian and "Other" indexing language configuration does not tokenise numbers properly for searching

Closed

is related to

CONFSERVER-2167 Poor search results when search query contains numbers

Closed

mentioned in: Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page Loading...; Page Loading...

(2 mentioned in)

Minh Tran added a comment - 26/Jun/2018 4:02 AM

A fix for this issue is available to Server and Data Center customers in Confluence 6.10.0
Upgrade now or check out the Release Notes to see what other issues are resolved.

Minh Tran added a comment - 26/Jun/2018 4:02 AM A fix for this issue is available to Server and Data Center customers in Confluence 6.10.0 Upgrade now or check out the Release Notes to see what other issues are resolved.

Rob Lillywhite added a comment - 13/Jun/2012 9:47 AM - edited

This is a critical defect for us and leaves us having to choose between turning stemming off or being able to include numbers searches.

It appears to simply extract "foo" as the only searchable part of "foo123". I think this is the intended behaviour.

- Andrew Lynch

Whether this is the intended behaviour or not it is still a defect even if the defect is in the design.

I for one cannot imagine why you would intend to exclude numbers from the search.

Hopefully Atlassian will take note and fix this because without searching that works we might as well all be using SharePoint.

Rob Lillywhite added a comment - 13/Jun/2012 9:47 AM - edited This is a critical defect for us and leaves us having to choose between turning stemming off or being able to include numbers searches. It appears to simply extract "foo" as the only searchable part of "foo123". I think this is the intended behaviour. - Andrew Lynch Whether this is the intended behaviour or not it is still a defect even if the defect is in the design. I for one cannot imagine why you would intend to exclude numbers from the search. Hopefully Atlassian will take note and fix this because without searching that works we might as well all be using SharePoint.

Stephen Edwards added a comment - 18/May/2012 10:40 AM

Hi, I can confirm this affects the version used where I work, Confluence 3.5.13.

This is a very frustrating search problem for us. I am using the wiki as part of software development (along with Jira). We had to disable English filter as stemming made searching for eg camel case api names very difficult. Replacing with the Other filter means that it is now impossible to search for different versions of the same product, or distinguish between eg Win7, Win32, or "win".

There seems to be debate on whether this is a bug or intended behaviour. I don't know either, but I do think it is a bug to not have a filter that allows searching for both letters and numbers! Would it be best for me to keep pushing for this ticket, or to create a new issue for a filter that can allow mixed letter and number searching?

Thanks.

Stephen Edwards added a comment - 18/May/2012 10:40 AM Hi, I can confirm this affects the version used where I work, Confluence 3.5.13. This is a very frustrating search problem for us. I am using the wiki as part of software development (along with Jira). We had to disable English filter as stemming made searching for eg camel case api names very difficult. Replacing with the Other filter means that it is now impossible to search for different versions of the same product, or distinguish between eg Win7, Win32, or "win". There seems to be debate on whether this is a bug or intended behaviour. I don't know either, but I do think it is a bug to not have a filter that allows searching for both letters and numbers! Would it be best for me to keep pushing for this ticket, or to create a new issue for a filter that can allow mixed letter and number searching? Thanks.

JoachimA added a comment - 24/Oct/2011 2:55 PM

This problem is still happening in Confluence 4.0.x upon testing.

JoachimA added a comment - 24/Oct/2011 2:55 PM This problem is still happening in Confluence 4.0.x upon testing.

Christopher Owen [Atlassian] added a comment - 22/Aug/2008 4:02 AM

Surely the other analyzers should work in a similar manner to the English one?

Christopher Owen [Atlassian] added a comment - 22/Aug/2008 4:02 AM Surely the other analyzers should work in a similar manner to the English one?

Andrew Lynch (Inactive) added a comment - 22/Aug/2008 1:50 AM

This only seems to affect the 'Other' analyzer, and I'm not sure that this is a bug. When using the 'other' analyzer type, a SimpleAnalyzer will be used (which in turn uses a LowerCaseTokenizer ).

/**
 * LowerCaseTokenizer performs the function of LetterTokenizer
 * and LowerCaseFilter together.  It divides text at non-letters and converts
 * them to lower case.  While it is functionally equivalent to the combination
 * of LetterTokenizer and LowerCaseFilter, there is a performance advantage
 * to doing the two tasks at once, hence this (redundant) implementation.
 * <P>
 * Note: this does a decent job for most European languages, but does a terrible
 * job for some Asian languages, where words are not separated by spaces.
 */

It appears to simply extract "foo" as the only searchable part of "foo123". I think this is the intended behaviour.

Andrew Lynch (Inactive) added a comment - 22/Aug/2008 1:50 AM This only seems to affect the 'Other' analyzer, and I'm not sure that this is a bug. When using the 'other' analyzer type, a SimpleAnalyzer will be used (which in turn uses a LowerCaseTokenizer ). /** * LowerCaseTokenizer performs the function of LetterTokenizer * and LowerCaseFilter together. It divides text at non-letters and converts * them to lower case . While it is functionally equivalent to the combination * of LetterTokenizer and LowerCaseFilter, there is a performance advantage * to doing the two tasks at once, hence this (redundant) implementation. * <P> * Note: this does a decent job for most European languages, but does a terrible * job for some Asian languages, where words are not separated by spaces. */ It appears to simply extract "foo" as the only searchable part of "foo123". I think this is the intended behaviour.

Mark van Straten added a comment - 25/Oct/2006 8:17 AM

We are currently using Version: 2.2.9 Build:#527 Sep 07, 2006 of Confluence and this problem still exists. The effects of having our Indexing language set to 'English' instead of 'Other' are unknown to me (our Confluence content is 99% Dutch).

http://confluence.atlassian.com/display/DOC/Configuring+Indexing+Language --> "may improve the accuracy of Confluence search results"

Mark van Straten added a comment - 25/Oct/2006 8:17 AM We are currently using Version: 2.2.9 Build:#527 Sep 07, 2006 of Confluence and this problem still exists. The effects of having our Indexing language set to 'English' instead of 'Other' are unknown to me (our Confluence content is 99% Dutch). http://confluence.atlassian.com/display/DOC/Configuring+Indexing+Language --> "may improve the accuracy of Confluence search results"

Assignee:: Hasnae (Inactive)

Reporter:: Christopher Owen [Atlassian]

Affected customers:: 12 This affects my team

Watchers:: 18 Start watching this issue

Created:: 18/Apr/2006 12:18 AM

Updated:: 16/Jul/2020 6:18 AM

Resolved:: 26/Jun/2018 3:52 AM

Details

Description

Workaround:

Attachments

Issue Links

Forms

Activity

Collapse comment: Minh Tran added a comment - 26/Jun/2018 4:02 AM

Expand comment: Minh Tran added a comment - 26/Jun/2018 4:02 AM

Collapse comment: Rob Lillywhite added a comment - 13/Jun/2012 9:47 AM, Edited by Rob Lillywhite - 13/Jun/2012 9:48 AM

Expand comment: Rob Lillywhite added a comment - 13/Jun/2012 9:47 AM, Edited by Rob Lillywhite - 13/Jun/2012 9:48 AM

Collapse comment: Stephen Edwards added a comment - 18/May/2012 10:40 AM

Expand comment: Stephen Edwards added a comment - 18/May/2012 10:40 AM

Collapse comment: JoachimA added a comment - 24/Oct/2011 2:55 PM

Expand comment: JoachimA added a comment - 24/Oct/2011 2:55 PM

Collapse comment: Christopher Owen [Atlassian] added a comment - 22/Aug/2008 4:02 AM

Expand comment: Christopher Owen [Atlassian] added a comment - 22/Aug/2008 4:02 AM

Collapse comment: Andrew Lynch (Inactive) added a comment - 22/Aug/2008 1:50 AM

Expand comment: Andrew Lynch (Inactive) added a comment - 22/Aug/2008 1:50 AM

Collapse comment: Mark van Straten added a comment - 25/Oct/2006 8:17 AM

Expand comment: Mark van Straten added a comment - 25/Oct/2006 8:17 AM

People

Dates