Details
-
Suggestion
-
Resolution: Unresolved
-
None
-
None
Description
Problem Definition
Lucene by default splits a whole text into single words and creates an index for each word. By default, a maximal length for a word is 256 char. If it is longer, Lucene has problems with tokenizing the word. By default, Jira uses Lucene analyzers for tokenization, and it handles numeric and alphanumeric values differently. For numerical texts longer than 256 chars, Lucene removes commas and creates an index for each number array, but it also tries to create an index for a whole number, including commas, which exceeds 256 characters, causing a failure in tokenization, thus not creating index for the whole text, so when Jira looks for issues with a number in the string inside description, it can't find them.
As for alphanumeric texts, the procedure is the same, but skipping the last step, which is creating an index for a whole word.
When text longer than 256 char like in the below example is inserted in a text field such as Description Jira does not index the text:
1,2,3,4,5,6,7,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106
Suggested Solution
Implementing an analyzing function for Lucene analyzer that supports more than 255 characters.
Workaround
As for the workaround, the string inside the description needs to be changed from digit to alphanumeric or splitting the string with whitespace. To change it to alphanumeric a letter needs to be put somewhere in the description.