Inaccurate Lucene whitespace tokenization of soft returns

XMLWordPrintable

      When content is separated by a soft return, Lucene doesn't see a soft return as whitespace or word boundary and gets rid of the soft return character. Impact:

      This is line 106[soft return]
      Next line

      Gets parsed and indexed as 106Next

      So searching for 106 returns nothing but searching for 106Next returns accurate results. However, obviously this isn't accurate.

      Rated as Major because makes index entries flawed

            Assignee:
            Matt Ryall
            Reporter:
            Rhys Jones
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: