Increasing Numeric Text Index Character Length Limitation for Lucene Indexes

XMLWordPrintable

    • 1

      Problem Definition

      Lucene by default splits a whole text into single words and creates an index for each word. By default, a maximal length for a word is 256 char. If it is longer, Lucene has problems with tokenizing the word. By default, Jira uses Lucene analyzers for tokenization, and it handles numeric and alphanumeric values differently. For numerical texts longer than 256 chars, Lucene removes commas and creates an index for each number array, but it also tries to create an index for a whole number, including commas, which exceeds 256 characters, causing a failure in tokenization, thus not creating index for the whole text, so when Jira looks for issues with a number in the string inside description, it can't find them.
      As for alphanumeric texts, the procedure is the same, but skipping the last step, which is creating an index for a whole word.
      When text longer than 256 char like in the below example is inserted in a text field such as Description Jira does not index the text:

      1,2,3,4,5,6,7,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106
      

      Suggested Solution

      Implementing an analyzing function for Lucene analyzer that supports more than 255 characters.

      Workaround

      As for the workaround, the string inside the description needs to be changed from digit to alphanumeric or splitting the string with whitespace. To change it to alphanumeric a letter needs to be put somewhere in the description.

            Assignee:
            Unassigned
            Reporter:
            Alper Firengiz (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: