Mail Archives does not index unicode characters correctly aside from Email Subject affecting Search Feature

XMLWordPrintable

    • 1
    • Severity 3 - Minor

      Summary

      Mail Archives does not index unicode characters correctly aside from Email Subject affecting Search Feature

      If there are unicode characters in From Address, or the Content of the email in the Mail Archive, they are not properly indexed and cannot be searched.

      Environment

      Confluence 6.4.1 with the Encoding set correctly
      Indexing Language set to Chinese (also tried CJK)

      application.xml
      <sun.jnu.encoding>UTF-8</sun.jnu.encoding>
      <file.encoding.pkg>sun.io</file.encoding.pkg>
      <sun.io.unicode.encoding>UnicodeLittle</sun.io.unicode.encoding>
      <file.encoding>UTF-8</file.encoding>
      <default-encoding>UTF-8</default-encoding>
      
      Database Encoding
      postgres=# \l
                                             List of databases
              Name         |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
      ---------------------+----------+----------+-------------+-------------+-----------------------
       conf641             | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
      

      Steps to Reproduce

      1. Setup Confluence 6.4.1 with the correct Encoding
      2. Create a new Space
      3. In the new space, setup Mail Archives to any email account. In my testing, I tried to setup Yahoo Email Account
      4. Send an email to the Mail Archive account with Email Subject, Sender Name, and Email Content in Chinese Character.
      5. Confirm the email is being fetched into Confluence
      6. In the Search bar, search for any part of the Sender Name or Email Content

      Expected Results

      User can search for the result in Chinese Character

      Actual Results

      No result can be found if the Sender Name or Email Content is in unicode character (in my test, I tested with Chinese character)
      However, Chinese Character in Email Subject can be searched without issue.

      Note

      In the Mail Archive interface, all of the emails with unicode character Sender and Email Content can still be displayed correctly. Only the Search function is being affected

      Workaround

      No Workaround so far.

            Assignee:
            Unassigned
            Reporter:
            Damien Tan
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: