Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-14711

Improve search accuracy by reducing weighting of reoccuring terms

XMLWordPrintable

    • We collect Jira feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      The default OR-based searches seem to rank reoccurances of a search term too high. When searching for multiple terms, a partially matching document that repeats one or more terms many times is ranked above a full match. My experience is that more complete matches are far more relevant than less complete matches, with reoccurance not being that significant.

      If the results are getting 'spammed' by reoccuring terms then the problem becomes even more noticeable when you narrow your search, because adding more terms doesn't substantially affect the results.

      Here is an example where the most highly ranked full match goes from first to 32nd as a result of allowing partial matches. The search terms I chose aren't the most accurate but do show how much of an effect recurrences have on the results:

      Searching for 'CONF Error AND repository AND Exception' points straight to CONF-8422, which solved the problem the customer was trying to report.

      Searching for 'CONF Error repository Exception' shifts that hit to the 32nd result, beneath results that did not solve my problem. The first result doesn't contain the term 'error' and has been boosted 9 occurances of 'repository' and of 4 'exception', mostly from a stack trace.

      I believe it would make searching more accurate to lower the weight of recurring terms in Lucene, or ideally, switch to AND default searches. Does anyone else agree? Let me know if I'm wrong on any of this.

              Unassigned Unassigned
              david.soul@atlassian.com David Soul [Atlassian]
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: