Loading...

XML

Word

Printable

Details

Type: Suggestion
Resolution: Unresolved
Fix Version/s: None
Component/s: Indexing
Labels:
- affects-server

UIS:
3
Support reference count:
1
Feedback Policy:

We collect Jira feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

Description

NOTE: This suggestion is for JIRA Server. Using JIRA Cloud? See the corresponding suggestion.

(See linked issues for failing use cases.)

All of our indexing goes through the EnglishAnalyzer which passes the stream through several filters: lowercase, stop word removal, apostrophe removal, tokenizing on underscore and dash, and stemming. This works pretty well when you do a natural language kind of query but has numerous cases where it fails to return results that user experiences. What we need is some kind of secondary "exact text" index and use that to boost results.

One way would be to create a new "exact query" field in the SummaryIndexer, EnvironmentIndexer, and DescriptionIndexer and a PerFieldAnalyzerWrapper to use a "dumber" Analyzer that uses the WhitespaceTokenizer instead of the StandardTokenizer. Then we'd need to fix querying to understand that this new field exists. We could either overload the tilda operator and make it smart enough to search across both fields and combine the results in a smart way. Or we could make the equals operator handle text (right now it throws an exception). We'd also want some kind of UI configuration to enable/disable this since it changes behaviour and not everyone is going to want the new behaviour. This would likely break the existing unit tests and func tests for indexing and JQL so we'd need to fix those. We'd probably want to writer new Analyzers for German, French, and Spanish as well instead of just limiting this new behaviour to our English users. Possibly Chinese/Japanese/Korean as well. We'd want better func test coverage on searching in non-English languages to try to make sure this change doesn't have adverse effects for non-English users.

We'd also need to figure out how to handle text custom fields in all of this.

Another approach would be something more along the lines of the SynonymAnalyzer described in "Lucene in Action": have a single Analyzer emit multiple tokens with the same virtual position. Under the covers the Analyzer would multiplex TokenStreams from a "normal" Analyzer and a "dumb" Analyzer. I'm not sure if sharing the Reader between Tokenizers like that works, if not we'd might have to make changes to Lucene core. Having both kinds of tokens in the same index field seems like it OUGHT to end up giving better results for these "exact searches". But it might have adverse effects on normal search ordering. This approach would probably have less impact on JQL but has a higher risk of us running into shortcomings in Lucene APIs.

Attachments

Issue Links

is related to

JRASERVER-5567 Incorrect stemming causes some words to be unsearchable

Closed

JRASERVER-6187 wildcard search fails to find matches

Closed

JRASERVER-14641 Impossible to distinguish between a space and an underscore in a search query

Closed

JRASERVER-23088 Issues search does not search for numbers in a text field

Closed

JRASERVER-5930 Default search operator should be AND, not OR

Closed

JRASERVER-13672 Better searching when stemming is in place. Improve Lucene QueryParser to perform analysis on prefixed queries.

Closed

JRASERVER-21372 Allow exact-text searching in JQL

Under Consideration

relates to

JRASERVER-32054 Apostrophe is not a word separator

Closed

JRACLOUD-17463 Better exact-text searching

Gathering Interest

(2 is related to, 2 relates to)

Activity

People

Assignee:: Unassigned

Reporter:: Justus Pendleton (Inactive)

Votes:: 50 Vote for this issue

Watchers:: 27 Start watching this issue

Dates

Created:: 29/May/2009 6:10 AM

Updated:: 22/Mar/2024 2:12 AM

Time Tracking

Estimated:

240h

Remaining:

240h

Logged:

Not Specified