-
Suggestion
-
Resolution: Won't Fix
-
None
The default OR-based searches seem to rank reoccurances of a search term too high. When searching for multiple terms, a partially matching document that repeats one or more terms many times is ranked above a full match. My experience is that more complete matches are far more relevant than less complete matches, with reoccurance not being that significant.
If the results are getting 'spammed' by reoccuring terms then the problem becomes even more noticeable when you narrow your search, because adding more terms doesn't substantially affect the results.
Here is an example where the most highly ranked full match goes from first to 32nd as a result of allowing partial matches. The search terms I chose aren't the most accurate but do show how much of an effect recurrences have on the results:
Searching for 'CONF Error AND repository AND Exception' points straight to CONF-8422, which solved the problem the customer was trying to report.
Searching for 'CONF Error repository Exception' shifts that hit to the 32nd result, beneath results that did not solve my problem. The first result doesn't contain the term 'error' and has been boosted 9 occurances of 'repository' and of 4 'exception', mostly from a stack trace.
I believe it would make searching more accurate to lower the weight of recurring terms in Lucene, or ideally, switch to AND default searches. Does anyone else agree? Let me know if I'm wrong on any of this.
- relates to
-
JRASERVER-5930 Default search operator should be AND, not OR
- Closed