Issue Details (XML | Word | Printable)

Key: JRA-14711
Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: David Soul [Atlassian]
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
JIRA

Improve search accuracy by reducing weighting of reoccuring terms

Created: 27/Mar/08 01:35 AM   Updated: 27/Mar/08 09:51 PM
Component/s: Filtering & Indexing, UI / Usability
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

Issue Links:
Reference
 

Participants: Anton Mazkovoi [Atlassian] and David Soul [Atlassian]
Since last comment: 28 weeks, 3 days ago
Labels:


 Description  « Hide
The default OR-based searches seem to rank reoccurances of a search term too high. When searching for multiple terms, a partially matching document that repeats one or more terms many times is ranked above a full match. My experience is that more complete matches are far more relevant than less complete matches, with reoccurance not being that significant.

If the results are getting 'spammed' by reoccuring terms then the problem becomes even more noticeable when you narrow your search, because adding more terms doesn't substantially affect the results.

Here is an example where the most highly ranked full match goes from first to 32nd as a result of allowing partial matches. The search terms I chose aren't the most accurate but do show how much of an effect recurrences have on the results:

Searching for 'CONF Error AND repository AND Exception' points straight to CONF-8422, which solved the problem the customer was trying to report.

Searching for 'CONF Error repository Exception' shifts that hit to the 32nd result, beneath results that did not solve my problem. The first result doesn't contain the term 'error' and has been boosted 9 occurances of 'repository' and of 4 'exception', mostly from a stack trace.

I believe it would make searching more accurate to lower the weight of recurring terms in Lucene, or ideally, switch to AND default searches. Does anyone else agree? Let me know if I'm wrong on any of this.



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Anton Mazkovoi [Atlassian] added a comment - 27/Mar/08 09:51 PM
As Dave mentioned, switching default searching to AND from OR (JRA-5930) will probably remove the need for this.