Loading...

XML

Word

Printable

Type: Suggestion
Resolution: Unresolved
Component/s: Infrastructure & Services, Work Item - Search - Backend - JVIS
Labels:
- affects-server
- jsw-s2

UIS:
13
Support reference count:
28

NOTE: This suggestion is for JIRA Cloud. Using JIRA Server? See the corresponding suggestion.

(See linked issues for failing use cases.)

All of our indexing goes through the EnglishAnalyzer which passes the stream through several filters: lowercase, stop word removal, apostrophe removal, tokenizing on underscore and dash, and stemming. This works pretty well when you do a natural language kind of query but has numerous cases where it fails to return results that user experiences. What we need is some kind of secondary "exact text" index and use that to boost results.

One way would be to create a new "exact query" field in the SummaryIndexer, EnvironmentIndexer, and DescriptionIndexer and a PerFieldAnalyzerWrapper to use a "dumber" Analyzer that uses the WhitespaceTokenizer instead of the StandardTokenizer. Then we'd need to fix querying to understand that this new field exists. We could either overload the tilda operator and make it smart enough to search across both fields and combine the results in a smart way. Or we could make the equals operator handle text (right now it throws an exception). We'd also want some kind of UI configuration to enable/disable this since it changes behaviour and not everyone is going to want the new behaviour. This would likely break the existing unit tests and func tests for indexing and JQL so we'd need to fix those. We'd probably want to writer new Analyzers for German, French, and Spanish as well instead of just limiting this new behaviour to our English users. Possibly Chinese/Japanese/Korean as well. We'd want better func test coverage on searching in non-English languages to try to make sure this change doesn't have adverse effects for non-English users.

We'd also need to figure out how to handle text custom fields in all of this.

Another approach would be something more along the lines of the SynonymAnalyzer described in "Lucene in Action": have a single Analyzer emit multiple tokens with the same virtual position. Under the covers the Analyzer would multiplex TokenStreams from a "normal" Analyzer and a "dumb" Analyzer. I'm not sure if sharing the Reader between Tokenizers like that works, if not we'd might have to make changes to Lucene core. Having both kinds of tokens in the same index field seems like it OUGHT to end up giving better results for these "exact searches". But it might have adverse effects on normal search ordering. This approach would probably have less impact on JQL but has a higher risk of us running into shortcomings in Lucene APIs.

is related to

JRACLOUD-5930 Default search operator should be AND, not OR

Closed

JRACLOUD-13672 Better searching when stemming is in place. Improve Lucene QueryParser to perform analysis on prefixed queries.

Closed

JRACLOUD-21372 Allow exact-text searching in JQL

Closed

JRASERVER-17463 Better exact-text searching

Gathering Interest

relates to

JRACLOUD-72794 Searching for text using quotation marks doesn't return the exact phrase

Closed

JRACLOUD-73342 JQL search is treating the letter a as a wildcard

Closed

JRACLOUD-75866 Searching JIRA issues for special characters (in text fields) does not work

Future Consideration

mentioned in: Page Loading...; Page Loading...; Page Loading...

resolves: ACE-5995 Loading...

(2 relates to, 3 mentioned in, 1 resolves)

Assignee:: Unassigned
Reporter:: Justus Pendleton (Inactive)
Votes:: 49 Vote for this issue
Watchers:: 32 Start watching this issue

Created:: 29/May/2009 6:10 AM
Updated:: Yesterday 5:14 AM

Estimated:

240h

Remaining:

240h

Logged:

Not Specified

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates

Time Tracking