Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-73160

Increasing Numeric Text Index Character Length Limitation for Lucene Indexes

    XMLWordPrintable

Details

    • We collect Jira feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

    Description

      Problem Definition

      Lucene by default splits a whole text into single words and creates an index for each word. By default, a maximal length for a word is 256 char. If it is longer, Lucene has problems with tokenizing the word. By default, Jira uses Lucene analyzers for tokenization, and it handles numeric and alphanumeric values differently. For numerical texts longer than 256 chars, Lucene removes commas and creates an index for each number array, but it also tries to create an index for a whole number, including commas, which exceeds 256 characters, causing a failure in tokenization, thus not creating index for the whole text, so when Jira looks for issues with a number in the string inside description, it can't find them.
      As for alphanumeric texts, the procedure is the same, but skipping the last step, which is creating an index for a whole word.
      When text longer than 256 char like in the below example is inserted in a text field such as Description Jira does not index the text:

      1,2,3,4,5,6,7,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106
      

      Suggested Solution

      Implementing an analyzing function for Lucene analyzer that supports more than 255 characters.

      Workaround

      As for the workaround, the string inside the description needs to be changed from digit to alphanumeric or splitting the string with whitespace. To change it to alphanumeric a letter needs to be put somewhere in the description.

      Attachments

        Activity

          People

            Unassigned Unassigned
            c55b673763fb alperf
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: