Uploaded image for project: 'Jira Software Data Center'
  1. Jira Software Data Center
  2. JSWSERVER-20133

Fix the Lucene immense field indexing failure

    XMLWordPrintable

Details

    Description

      Issue Summary

      Both background and foreground reindex requests fail due to a field that has more than 32766 bytes in the field.

      Public KB: https://confluence.atlassian.com/jirakb/indexing-fails-due-to-immense-field-974366114.html

      Environment

      This affects JIRA 8.0 and above.
      In JIRA 8.0, Lucene was upgraded from 3.3 to 7.3. Lucene will now throw an exception instead of silently failing.
      See Lucene upgrade for more information.

      Steps to Reproduce

      1. Step 1 Insert more than 32766 characters into an indexed field.
      2. Step 2 Attempt to reindex

      Expected Results

      The Lucene Issue index completes without error.

      Actual Results

      The Issue reindex fails and the following error appears in the atlassian-jira.log:

      2019-07-02 16:32:24,247 IssueIndexer:thread-9 WARN 10.x.x.x /secure/admin/jira/IndexReIndex!reindex.jspa [c.a.jira.index.AccumulatingResultBuilder] Document contains at least one immense term in field="<fieldname>" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[list of entries in the field]...', original message: bytes can be at most 32766 in length; got 32767
      2019-07-02 16:32:24,248 JiraTaskExectionThread-2 WARN 10.x.x.x /secure/admin/jira/IndexReIndex!reindex.jspa [c.a.jira.index.AccumulatingResultBuilder] Indexing failed for Issue - '<#####>'

      Notes

      Relevant links:
      https://stackoverflow.com/questions/37070593/how-to-deal-with-document-contains-at-least-one-immense-term-in-solr
      https://issues.apache.org/jira/browse/LUCENE-5472
      https://discuss.elastic.co/t/error-document-contains-at-least-one-immense-term-in-field/66486

      Notes on fix:

      Starting with "Fix-Version", fields that exceed this limit will be removed before they are committed to Jira’s Lucene index in order to prevent entire indexing operations from failing. Each such event will emit an ERROR level message to the logs, allowing plugin developers to pinpoint the offending fields. The log entry looks like this:

      2019-09-04 20:06:24,091 IssueIndexer:thread-6 ERROR admin 1206x3196x1 ujgapq 0:0:0:0:0:0:0:1 /secure/admin/IndexReIndex!reindex.jspa [c.a.j.issue.index.DocumentScrubber] A document contained a potential immense term in field customfield_10220_timeline. The field has been removed from the document.
      

      Workaround

      Currently there is no known workaround for this behavior. A workaround will be added here when available

      Attachments

        Issue Links

          Activity

            People

              klopacinski Karol Lopacinski
              samann Sarah A
              Votes:
              1 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: