Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-31531

Inaccurate Lucene whitespace tokenization of soft returns

    XMLWordPrintable

Details

    Description

      When content is separated by a soft return, Lucene doesn't see a soft return as whitespace or word boundary and gets rid of the soft return character. Impact:

      This is line 106[soft return]
      Next line

      Gets parsed and indexed as 106Next

      So searching for 106 returns nothing but searching for 106Next returns accurate results. However, obviously this isn't accurate.

      Rated as Major because makes index entries flawed

      Attachments

        Issue Links

          Activity

            People

              matt@atlassian.com Matt Ryall
              41091df5b622 Rhys Jones
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: