Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-20722

NPE - when attaching a file that the filename contains full-width alphabet character using PostgreSQL

    XMLWordPrintable

Details

    Description

      NOTE: This bug report is for Confluence Server. Using Confluence Cloud? See the corresponding bug report.

      The root cause that NPE occurs.

      In letter case conversion of the attached file name including the full-width alphabet character, the performed results of both Java "java.lang.String#toLowerCase()" method and PostgreSQL "lower()" function are NOT same.

      1. If the input is half-width upper alphabet character (A~Z)
        Java
      java.lang.String#toLowerCase()
      PostgreSQL lower()
      LANG=ja_JP.UTF-8
      PostgreSQL lower()
      LANG=C
      output half-width lower alphabet char half-width lower alphabet char half-width lower alphabet char
      2. If the input is full-width upper alphabet character (A~Z)
        Java
      java.lang.String#toLowerCase()
      PostgreSQL lower()
      LANG=ja_JP.UTF-8
      PostgreSQL lower()
      LANG=C
      output full-width lower alphabet char full-width lower alphabet char full-width upper alphabet char


      We guess that this looks like PostgreSQL-specific problem fundamentally.
      In PostgreSQL, initializing database with "C" locale setting (--no-locale) is strongly recommended. Please refer the following description for more detail:

      The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them.
      http://www.postgresql.org/docs/8.4/interactive/locale.html

      The solutions for this problem (Proposal).

      1. (Recommended) It is the most desirable to fix the Confluence application.
        So we really want to know the method.
      2. Initializes database with "japanese" locale and unifies the locale settings of both Java VM and PostgreSQL.
        But as previously described, it cannot recommend to use PostgreSQL with the locale setting except "C" basically.
      3. Fixes the source code of the lower function of PostgreSQL.
        However, even if this problem has been resolved by the fix, the possibility to widen impact to the outskirts (e.g. other applications connecting to PostgreSQL) is extremely high. So we would like to absolutely avoid the only way to fix PostgreSQL source code directly.

      The impact on Japanese customers and marketing.

      Most Japanese customers routinely use both full-width alphabet and digit characters to the filename.
      It is the very fatal function restrictions for Japanese customers, unless they can use full-width alphabet character to the name of the attached file, and it influences Confluence sales activities in Japan greatly.

      A step-by-step of how to reproduce this.

      1. Download the file (ABC.txt, filename alphabets except an extension are all full-width characters) that I attached here.
      2. Log in Confluence that you installed on the testing environment.
      3. Attach the downloaded ABC.txt file in Confluence page.

      We installed the above environment with English language default settings.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ae3b5d9e1a31 Akira Higuchi
              Votes:
              23 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: