Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-57749

Update health check with correct encoding requirements for Windows users on installation with PostgreSQL

      Problem Definition

      When upgrading Confluence to 6.14 or later, admins see the following message that English_United States.1252 is not UTF-8

      This is a false warning and can be ignored, provided the client encoding is set correctly.

      Background

      I've had several users on Community reach out to me regarding this confusion, for example:

      Until September 2019, the Database Setup for Postgresql doc specified that:

      Collation must also be set to utf8. Other collations, such as "C", are known to cause issues with Confluence.

      This is clear to Linux users, but for Windows users, their options might look like the following:

      • C
      • POSIX
      • English_United States.1252

      During setup of the database in Windows, users are to select a locale, and are presented with the following selections:

      Expected behavior

      Confluence health check does not give the false warning when using .1252 collations

      Suggested Solution

      We should update the health check to accept the Windows equivalent of UTF-8.

      Notes

      According to Postgres:

      UTF-8 encoding can be used with any locale.

      In addition, I reviewed this blog post, Locale in Windows, and it appears that you should be able to use English_United States.1252.

      Although I believe this is addressed with the following line:

      Choose the locale that best matches your geographic location.

        1. Capture.JPG
          Capture.JPG
          70 kB
        2. image.png
          image.png
          12 kB

          Form Name

            [CONFSERVER-57749] Update health check with correct encoding requirements for Windows users on installation with PostgreSQL

            I have added the following note to the documentation to indicate that you should use the UTF-8 equivalent character type and collation when running PostgreSQL on Windows.

            If you are running PostgreSQL on Windows use the equivalent character type and collation for your locale, for example English_United States.1252.

            I will leave this issue open, and update the description to make it about improving the upgrade check to recognise this collation.

            Rachel Robins added a comment - I have added the following note to the documentation to indicate that you should use the UTF-8 equivalent character type and collation when running PostgreSQL on Windows. If you are running PostgreSQL on Windows use the equivalent character type and collation for your locale, for example English_United States.1252. I will leave this issue open, and update the description to make it about improving the upgrade check to recognise this collation.

            Manse Wolken added a comment - - edited

            Well I currently run production confluence on postgres 9.6 with the collation 'C' because the postgres server I use offers nothing else (except for posix, which is an alias to 'C'). 

            What are the current issues with the 'C' collation? Any pointers, where I can look?

            Found it: https://confluence.atlassian.com/doc/troubleshooting-character-encodings-167194.html

            And the issue is: Some characters are not available as upper cases, so postgres keeps them in lower cases.

             

            Manse Wolken added a comment - - edited Well I currently run production confluence on postgres 9.6 with the collation 'C' because the postgres server I use offers nothing else (except for posix, which is an alias to 'C').  What are the current issues with the 'C' collation? Any pointers, where I can look? Found it:  https://confluence.atlassian.com/doc/troubleshooting-character-encodings-167194.html And the issue is: Some characters are not available as upper cases, so postgres keeps them in lower cases.  

            Dave C added a comment - - edited

            Hey, I'm one of the devs working on this. To give y'all an update:

            • utf8 is incorrectly reporting as being invalid as per CONFSERVER-58052. I updated that case, if you have utf8 you're definitely supported, no dramas. I know some folks are ending up here from that problem, hence me leaving a comment about it.
            • The windows-1252 locale is unfortunately not UTF-8, it's a Windows special extension of ISO-8859-1 (aka LATIN1) with different entries. It's also single byte, not multi-byte as per UTF-8, so characters after 127 (hex 0x7F) have different representations as UTF-8 has 2 bytes and 1252 only has 1. https://www.i18nqa.com/debug/utf8-debug.html shows us 128 - 255 and how they can differ. This can cause problems as per below:
              postgres=# select * from (values ('€'), ('‰')) as example;
               column1
              ---------
               ?
               %
              (2 rows)
              

              Because 1252 has no character representations for those certain characters it'll either ? or try to match to the closest.

            • PostgreSQL is fairly 'magical' in that it has automatic character set conversion as in https://www.postgresql.org/docs/9.6/multibyte.html (23.3.3) which will translate 1252 to UTF-8, provided the client encoding is set correctly. For the PostgreSQL JDBC driver it does this by default and doesn't like it if you change it.
            • Encoding in the database is often fixed as UTF-8, however collate and character type in PostgreSQL will have an impact on ordering, and also string operations such as initcap(), upper(), lower() and so on, as PostgreSQL uses system-level implementations. This means they definitely make a difference so just looking at encoding unfortunately won't 100% guarantee things are working as expected.
            • When PostgreSQL is installed it uses whichever default locale is configured, and the 'cluster' limits databases to be created under that (unless you use a separate template). This makes changing the locale a bit finnicky but it's possible, provided the locale exists. If it doesn't exist, it'll error when trying to create a database without one.
            • As UTF-8 doesn't fully exist on Windows currently (Server 2019 only has a Beta implementation of it), we can't set UTF-8 collation
            • When I tested PostgreSQL 11 on 2019 with UTF-8 it also wasn't implemented in PostgreSQL yet
            • C is suggested against completely as it sorts differently, as in https://confluence.atlassian.com/doc/database-setup-for-postgresql-173244522.html. Yes Jira is OK with it, Confluence doesn't like it though.

            At this current state we're looking at verifying if 1252 is OK and if so will add it to the white list. We're also fixing CONFSERVER-58052 at the same time. Will keep you updated when we have some progress, thanks so much for your patience while we get this fixed up for everyone!

            Dave C added a comment - - edited Hey, I'm one of the devs working on this. To give y'all an update: utf8 is incorrectly reporting as being invalid as per CONFSERVER-58052 . I updated that case, if you have utf8 you're definitely supported, no dramas. I know some folks are ending up here from that problem, hence me leaving a comment about it. The windows-1252 locale is unfortunately not UTF-8, it's a Windows special extension of ISO-8859-1 (aka LATIN1) with different entries. It's also single byte, not multi-byte as per UTF-8, so characters after 127 (hex 0x7F) have different representations as UTF-8 has 2 bytes and 1252 only has 1. https://www.i18nqa.com/debug/utf8-debug.html shows us 128 - 255 and how they can differ. This can cause problems as per below: postgres=# select * from (values ( '€' ), ( '‰' )) as example; column1 --------- ? % (2 rows) Because 1252 has no character representations for those certain characters it'll either ? or try to match to the closest. PostgreSQL is fairly 'magical' in that it has automatic character set conversion as in https://www.postgresql.org/docs/9.6/multibyte.html (23.3.3) which will translate 1252 to UTF-8, provided the client encoding is set correctly . For the PostgreSQL JDBC driver it does this by default and doesn't like it if you change it. Encoding in the database is often fixed as UTF-8, however collate and character type in PostgreSQL will have an impact on ordering, and also string operations such as initcap() , upper() , lower() and so on, as PostgreSQL uses system-level implementations. This means they definitely make a difference so just looking at encoding unfortunately won't 100% guarantee things are working as expected. When PostgreSQL is installed it uses whichever default locale is configured, and the 'cluster' limits databases to be created under that (unless you use a separate template). This makes changing the locale a bit finnicky but it's possible, provided the locale exists. If it doesn't exist, it'll error when trying to create a database without one. As UTF-8 doesn't fully exist on Windows currently (Server 2019 only has a Beta implementation of it), we can't set UTF-8 collation When I tested PostgreSQL 11 on 2019 with UTF-8 it also wasn't implemented in PostgreSQL yet C is suggested against completely as it sorts differently, as in https://confluence.atlassian.com/doc/database-setup-for-postgresql-173244522.html . Yes Jira is OK with it, Confluence doesn't like it though. At this current state we're looking at verifying if 1252 is OK and if so will add it to the white list. We're also fixing CONFSERVER-58052 at the same time. Will keep you updated when we have some progress, thanks so much for your patience while we get this fixed up for everyone!

            I think that the test for postgres should check for the "ENCODING" of the database. 

            Manse Wolken added a comment - I think that the test for postgres should check for the "ENCODING" of the database. 

            Manse Wolken added a comment - - edited

            I use collation "C" with ENCODING "utf-8" on Postgres. 

            Which, by the way, works perfect with jira. Will test with latest confluence.

            Here are the default postgres settings that I have for the database:

                   ENCODING = 'UTF8'
                   TABLESPACE = pg_default
                   LC_COLLATE = 'C'
                   LC_CTYPE = 'C'
                   CONNECTION LIMIT = -1; 

             

            By the way: Using the default postgres tar.gz installation, you only get collations "C" and "posix" where posix is another name for "C"....

             

            Manse Wolken added a comment - - edited I use collation "C" with ENCODING "utf-8" on Postgres.  Which, by the way, works perfect with jira. Will test with latest confluence. Here are the default postgres settings that I have for the database: ENCODING = 'UTF8' TABLESPACE = pg_default LC_COLLATE = 'C' LC_CTYPE = 'C' CONNECTION LIMIT = -1;   By the way: Using the default postgres tar.gz installation, you only get collations "C" and "posix" where posix is another name for "C"....  

            thesefer added a comment -

            After updating my docker setup to 6.15.1 I receive the following error, too. Which I find seems to be weird because it is utf8, isn't it?

             

            Database: The database collation 'en_US.utf8' is not supported by Confluence. You need to use 'utf-8'.
            

            thesefer added a comment - After updating my docker setup to 6.15.1 I receive the following error, too. Which I find seems to be weird because it is utf8, isn't it?   Database: The database collation 'en_US.utf8' is not supported by Confluence. You need to use 'utf-8' .

            RUn into the same problem after updating to latest version (btw - i did the update after your warning email a few days ago)!
            What is ist with you? The Problem exists for a while and not even a word from anybody@Atlassian?!?!?
            Shame!

            Gilbert Janeselli added a comment - RUn into the same problem after updating to latest version (btw - i did the update after your warning email a few days ago)! What is ist with you? The Problem exists for a while and not even a word from anybody@Atlassian?!?!? Shame!

            Josh added a comment -

            I just upgraded to the latest version due to the two vulnerabilities as well and received this error too

             

            Josh added a comment - I just upgraded to the latest version due to the two vulnerabilities as well and received this error too  

            cliper added a comment - - edited

            @IT Department, same here. Just upgraded today after recent Atlassian announcement of two vulnerabilities.

            Warning shows:

            cliper added a comment - - edited @IT Department, same here. Just upgraded today after recent Atlassian announcement of two vulnerabilities. Warning shows:

            Biswabhusan Panigrahi added a comment - - edited

            Hi, Recently i upgraded Jira software from 7.12 to 8.0.2. After upgraded i got some warning message like "Your database is using an unsupported collation: English_United.1252 but i checked postgres db encoding is UTF-8. It's running in windows. I just ignored and its working fine. I am little worried for future. Any this need to change from our side. Please let me know.

            Biswabhusan Panigrahi added a comment - - edited Hi, Recently i upgraded Jira software from 7.12 to 8.0.2. After upgraded i got some warning message like "Your database is using an unsupported collation: English_United.1252 but i checked postgres db encoding is UTF-8. It's running in windows. I just ignored and its working fine. I am little worried for future. Any this need to change from our side. Please let me know. . 

              jmoynihan Jamie (Inactive)
              smackie@atlassian.com Shannon S
              Affected customers:
              29 This affects my team
              Watchers:
              54 Start watching this issue

                Created:
                Updated:
                Resolved: