Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-76392

Attempts to index a Lucene field with immense term still fails the issue index

    XMLWordPrintable

Details

    Description

      Issue Summary

      According to the fixed bug JSWSERVER-20133, and the associated developer announcement, fields that exceed the limit of 32766 bytes will be removed before they are committed to Lucene. It appears the intention is to raise an ERROR upon such attempts to warn the plugin developer, but the index should be otherwise unaffected.

      However, there is evidence Jira tries to commit the data to Lucene, causing a failure mode.

      (1)

      The large value still makes it's way to the Lucene library:

      2023-10-17 02:07:51,592+0000 IssueIndexer:thread-1 WARN admin 127x701x1 tuce66 127.0.0.1 /secure/admin/IndexReIndex!reindex.jspa [c.a.jira.index.AccumulatingResultBuilder] Document contains at least one immense term in field="customfield_10300" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[88, 95, 116, 99, 114, 100, 95, 119, 115, 100, 95, 112, 101, 114, 105, 109, 101, 116, 101, 114, 95, 115, 99, 97, 95, 118, 51, 32, 104, 97]...', original message: bytes can be at most 32766 in length; got 192093
      java.lang.IllegalArgumentException: Document contains at least one immense term in field="customfield_10300" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[88, 95, 116, 99, 114, 100, 95, 119, 115, 100, 95, 112, 101, 114, 105, 109, 101, 116, 101, 114, 95, 115, 99, 97, 95, 118, 51, 32, 104, 97]...', original message: bytes can be at most 32766 in length; got 192093
      	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:796)
      	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
      	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
      	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:240)
      	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:497)
      	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1729)
      	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1464)
      	at com.atlassian.jira.index.MonitoringIndexWriter.addDocument(MonitoringIndexWriter.java:65)
      	at com.atlassian.jira.index.WriterWrapper.addDocuments(WriterWrapper.java:115)
      	at com.atlassian.jira.index.WriterWithStats.addDocuments(WriterWithStats.java:133)
      	at com.atlassian.jira.index.Operations$Create.perform(Operations.java:166)
      	at com.atlassian.jira.index.Operations$Completion.perform(Operations.java:346)
      	at com.atlassian.jira.index.DefaultIndexEngine$FlushPolicy.perform(DefaultIndexEngine.java:86)
      	at com.atlassian.jira.index.DefaultIndexEngine.write(DefaultIndexEngine.java:150)
      	at com.atlassian.jira.index.DefaultIndex.perform(DefaultIndex.java:28)
      	at com.atlassian.jira.issue.index.DefaultIssueIndexer$IndexIssueOperation.createResult(DefaultIssueIndexer.java:1078)
      	at com.atlassian.jira.issue.index.DefaultIssueIndexer$IndexIssueOperation.createResult(DefaultIssueIndexer.java:1052)
      	at com.atlassian.jira.issue.index.DefaultIssueIndexer$EntityOperation.lambda$null$7(DefaultIssueIndexer.java:910)
      	at java.base/java.util.Optional.map(Optional.java:265)
      	at com.atlassian.jira.issue.index.DefaultIssueIndexer$EntityOperation.lambda$createResult$8(DefaultIssueIndexer.java:910)
      	at java.base/java.util.HashMap.forEach(HashMap.java:1337)
      	at com.atlassian.jira.issue.index.DefaultIssueIndexer$EntityOperation.createResult(DefaultIssueIndexer.java:909)
      	at com.atlassian.jira.issue.index.DefaultIssueIndexer$EntityOperation.perform(DefaultIssueIndexer.java:884)
      	at com.atlassian.jira.issue.index.DefaultIssueIndexer.lambda$processInnerBatch$6(DefaultIssueIndexer.java:338)
      	at com.atlassian.jira.index.SimpleIndexingStrategy.apply(SimpleIndexingStrategy.java:7)
      	at com.atlassian.jira.index.SimpleIndexingStrategy.apply(SimpleIndexingStrategy.java:5)
      	at com.atlassian.jira.index.MultiThreadedIndexingStrategy$1.call(MultiThreadedIndexingStrategy.java:47)
      	at com.atlassian.jira.index.MultiThreadedIndexingStrategy$1.call(MultiThreadedIndexingStrategy.java:43)
      	at com.atlassian.jira.util.concurrent.BoundedExecutor$2.call(BoundedExecutor.java:68)
      	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      	at java.base/java.lang.Thread.run(Thread.java:829)
      Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 192093
      	at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:263)
      	at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
      	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
      	... 32 more
      
      (2)

      It does result in the failures flowing upward to the calling method:

      • The full reindex is marked as failed
      • Index recovery "catch up" phase is marked as failed and thus triggers a full reindex, see JRASERVER-76391

      Steps to Reproduce

      Expected Results

      • The message A document contained a potential immense term in field customfield_10220_timeline. The field has been removed from the document. is logged
      • No further errors are logged. The indexing proceeds normally. Possibly - the issue is not marked as failed.

      Actual Results

      • org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException is raised
      • The issue is marked as failed
      • The failure flows upward to the parent method (full reindex, index recovery, etc)

      Workaround

      Currently there is no known workaround for this behavior. A workaround will be added here when available

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              allewellyn@atlassian.com Alex [Atlassian,PSE]
              Votes:
              4 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: