-
Bug
-
Resolution: Fixed
-
Medium (View bug fix roadmap)
-
3.0.3, 5.2.7
-
3
-
3
-
For instance, try to search for an issue containing the word 'customer'. You'll get a bunch of hits for 'custom', even if the word is quoted.
- has a derivative of
-
JRASERVER-33739 Stemming options for indexing in the english language
- Closed
-
JRASERVER-33911 List of words to exclude from stemming during indexing
- Closed
- is duplicated by
-
JRASERVER-9240 Searching exact word matches should not ignore "common" words
-
- Closed
-
-
JRASERVER-15006 Text-Search using Wildcards and German Umlauts does not work
-
- Closed
-
-
JRASERVER-10887 Searching for the term "HTTPS" returns false positives.
-
- Closed
-
- is related to
-
JRASERVER-6187 wildcard search fails to find matches
-
- Closed
-
-
JRASERVER-12947 Wildcard searching does not work on long english text
-
- Closed
-
-
JRASERVER-14641 Impossible to distinguish between a space and an underscore in a search query
-
- Closed
-
-
JRASERVER-19211 Changing the Indexing language does not inform the user that they must do a re-index.
-
- Closed
-
-
JRASERVER-14574 Searching on Text Field custom field does not return the expected result
-
- Gathering Impact
-
-
JRASERVER-13441 Provide option for partial searches in hyphen-separated numbers
- Closed
-
JRASERVER-14712 Cannot search JIRA issue summaries containing mixed English and Japanese characters
- Closed
-
JRASERVER-15087 Search, Quick Search doesn't find characters within a word
- Closed
- relates to
-
CONFSERVER-10856 Corrupt search with Umlaute
-
- Closed
-
-
JRASERVER-32054 Apostrophe is not a word separator
-
- Closed
-
-
JRASERVER-13672 Better searching when stemming is in place. Improve Lucene QueryParser to perform analysis on prefixed queries.
- Closed
-
JRASERVER-17463 Better exact-text searching
- Gathering Interest
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Wiki Page Loading...
-
Wiki Page Loading...
-
Wiki Page Loading...
-
Wiki Page Loading...
-
Page Loading...
To everybody interested this issue,
We've added two new indexing language options to JIRA in 6.0.5 called "English - Minimal Stemming" and "English - Moderate Stemming". The minimal stemmer uses the s-stemming algorithm and only stems plurals ending in "s". The moderate stemmer uses the KStem algorithm and uses a dictionary when it stems words to avoid conflating some words with others (for example, customer and customise). Moderate Stemming is the recommended choice for the English language and new installations of JIRA will use this indexing option by default. The existing algorithm has been renamed to "English - Aggressive Stemming" - existing installations will continue to use this stemmer until an alternative is manually specified and a reindex performed (a background reindex will work here).
JIRA 6.1 still using the Aggressive setting will have the backing algorithm for that automatically upgraded from the Porter algorithm to the slightly more advanced Snowball algorithm which many of the non-English languages have been using.
On a related note, stemming is a tricky business and has different requirements in different scenarios. For illustrative purposes, most instances want to treat custom and customise as the same root word (a word similar to bespoke) while some a small number of instances might have requirements that custom should refer to culture and so want to treat customise as a different word. Due to edge cases like this, we will never have a perfect "out of the box" solution for this that works for everyone. We've created a feature request at
JRA-33911to allow you to express interest if you find yourself requiring the ability to customise which words are stemmed.JRA-33911should also serve as a good place to discuss and vote on that.Happy searching,
Eric