[CONFSERVER-57749] Update health check with correct encoding requirements for Windows users on installation with PostgreSQL

Type: Bug
Resolution: Fixed
Priority: Low
Fix Version/s: 7.2.0
Affects Version/s: 6.14.1
Component/s: Server - Installer / Setup
Labels:
- warranty

Support reference count:
43
Symptom Severity:
Severity 3 - Minor
UIS:
9
Bug Fix Policy:
View Atlassian Server bug fix policy

Problem Definition

When upgrading Confluence to 6.14 or later, admins see the following message that English_United States.1252 is not UTF-8

This is a false warning and can be ignored, provided the client encoding is set correctly.

Background

I've had several users on Community reach out to me regarding this confusion, for example:

Confluence on PostgreSQL what collation is recommended

Until September 2019, the Database Setup for Postgresql doc specified that:

Collation must also be set to utf8. Other collations, such as "C", are known to cause issues with Confluence.

This is clear to Linux users, but for Windows users, their options might look like the following:

C
POSIX
English_United States.1252

During setup of the database in Windows, users are to select a locale, and are presented with the following selections:

Expected behavior

Confluence health check does not give the false warning when using .1252 collations

Notes

According to Postgres:

UTF-8 encoding can be used with any locale.

In addition, I reviewed this blog post, Locale in Windows, and it appears that you should be able to use English_United States.1252.

Although I believe this is addressed with the following line:

Choose the locale that best matches your geographic location.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

Capture.JPG
70 kB
27/Feb/2019 1:40 PM
image.png
12 kB
12/Mar/2019 2:16 PM

relates to

CONFSERVER-58052 post-upgrade health-check throws false warning: The database collation 'en_US.utf8' is not supported by Confluence. You need to use 'utf-8'.

Closed

XPLN-992 You do not have permission to view this issue

XPLN-1027 You do not have permission to view this issue

XPLN-1238 You do not have permission to view this issue

mentioned in: Page No Confluence page found with the given URL.; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(6 mentioned in)

Form Name

Rachel Robins added a comment - 26/Sep/2019 4:59 AM

I have added the following note to the documentation to indicate that you should use the UTF-8 equivalent character type and collation when running PostgreSQL on Windows.

If you are running PostgreSQL on Windows use the equivalent character type and collation for your locale, for example English_United States.1252.

I will leave this issue open, and update the description to make it about improving the upgrade check to recognise this collation.

Rachel Robins added a comment - 26/Sep/2019 4:59 AM I have added the following note to the documentation to indicate that you should use the UTF-8 equivalent character type and collation when running PostgreSQL on Windows. If you are running PostgreSQL on Windows use the equivalent character type and collation for your locale, for example English_United States.1252. I will leave this issue open, and update the description to make it about improving the upgrade check to recognise this collation.

Manse Wolken added a comment - 03/Jun/2019 6:01 AM - edited

Well I currently run production confluence on postgres 9.6 with the collation 'C' because the postgres server I use offers nothing else (except for posix, which is an alias to 'C').

~~What are the current issues with the 'C' collation? Any pointers, where I can look?~~

Found it: https://confluence.atlassian.com/doc/troubleshooting-character-encodings-167194.html

And the issue is: Some characters are not available as upper cases, so postgres keeps them in lower cases.

Manse Wolken added a comment - 03/Jun/2019 6:01 AM - edited Well I currently run production confluence on postgres 9.6 with the collation 'C' because the postgres server I use offers nothing else (except for posix, which is an alias to 'C'). What are the current issues with the 'C' collation? Any pointers, where I can look? Found it: https://confluence.atlassian.com/doc/troubleshooting-character-encodings-167194.html And the issue is: Some characters are not available as upper cases, so postgres keeps them in lower cases.

Dave C added a comment - 24/May/2019 1:59 AM - edited

Hey, I'm one of the devs working on this. To give y'all an update:

utf8 is incorrectly reporting as being invalid as per ~~CONFSERVER-58052~~. I updated that case, if you have utf8 you're definitely supported, no dramas. I know some folks are ending up here from that problem, hence me leaving a comment about it.
The windows-1252 locale is unfortunately not UTF-8, it's a Windows special extension of ISO-8859-1 (aka LATIN1) with different entries. It's also single byte, not multi-byte as per UTF-8, so characters after 127 (hex 0x7F) have different representations as UTF-8 has 2 bytes and 1252 only has 1. https://www.i18nqa.com/debug/utf8-debug.html shows us 128 - 255 and how they can differ. This can cause problems as per below:
```
postgres=# select * from (values ('€'), ('‰')) as example;
 column1
---------
 ?
 %
(2 rows)
```
Because 1252 has no character representations for those certain characters it'll either ? or try to match to the closest.
PostgreSQL is fairly 'magical' in that it has automatic character set conversion as in https://www.postgresql.org/docs/9.6/multibyte.html (23.3.3) which will translate 1252 to UTF-8, provided the client encoding is set correctly. For the PostgreSQL JDBC driver it does this by default and doesn't like it if you change it.
Encoding in the database is often fixed as UTF-8, however collate and character type in PostgreSQL will have an impact on ordering, and also string operations such as initcap(), upper(), lower() and so on, as PostgreSQL uses system-level implementations. This means they definitely make a difference so just looking at encoding unfortunately won't 100% guarantee things are working as expected.
When PostgreSQL is installed it uses whichever default locale is configured, and the 'cluster' limits databases to be created under that (unless you use a separate template). This makes changing the locale a bit finnicky but it's possible, provided the locale exists. If it doesn't exist, it'll error when trying to create a database without one.
As UTF-8 doesn't fully exist on Windows currently (Server 2019 only has a Beta implementation of it), we can't set UTF-8 collation
When I tested PostgreSQL 11 on 2019 with UTF-8 it also wasn't implemented in PostgreSQL yet
C is suggested against completely as it sorts differently, as in https://confluence.atlassian.com/doc/database-setup-for-postgresql-173244522.html. Yes Jira is OK with it, Confluence doesn't like it though.

At this current state we're looking at verifying if 1252 is OK and if so will add it to the white list. We're also fixing ~~CONFSERVER-58052~~ at the same time. Will keep you updated when we have some progress, thanks so much for your patience while we get this fixed up for everyone!

Dave C added a comment - 24/May/2019 1:59 AM - edited Hey, I'm one of the devs working on this. To give y'all an update: utf8 is incorrectly reporting as being invalid as per CONFSERVER-58052 . I updated that case, if you have utf8 you're definitely supported, no dramas. I know some folks are ending up here from that problem, hence me leaving a comment about it. The windows-1252 locale is unfortunately not UTF-8, it's a Windows special extension of ISO-8859-1 (aka LATIN1) with different entries. It's also single byte, not multi-byte as per UTF-8, so characters after 127 (hex 0x7F) have different representations as UTF-8 has 2 bytes and 1252 only has 1. https://www.i18nqa.com/debug/utf8-debug.html shows us 128 - 255 and how they can differ. This can cause problems as per below: postgres=# select * from (values ( '€' ), ( '‰' )) as example; column1 --------- ? % (2 rows) Because 1252 has no character representations for those certain characters it'll either ? or try to match to the closest. PostgreSQL is fairly 'magical' in that it has automatic character set conversion as in https://www.postgresql.org/docs/9.6/multibyte.html (23.3.3) which will translate 1252 to UTF-8, provided the client encoding is set correctly . For the PostgreSQL JDBC driver it does this by default and doesn't like it if you change it. Encoding in the database is often fixed as UTF-8, however collate and character type in PostgreSQL will have an impact on ordering, and also string operations such as initcap() , upper() , lower() and so on, as PostgreSQL uses system-level implementations. This means they definitely make a difference so just looking at encoding unfortunately won't 100% guarantee things are working as expected. When PostgreSQL is installed it uses whichever default locale is configured, and the 'cluster' limits databases to be created under that (unless you use a separate template). This makes changing the locale a bit finnicky but it's possible, provided the locale exists. If it doesn't exist, it'll error when trying to create a database without one. As UTF-8 doesn't fully exist on Windows currently (Server 2019 only has a Beta implementation of it), we can't set UTF-8 collation When I tested PostgreSQL 11 on 2019 with UTF-8 it also wasn't implemented in PostgreSQL yet C is suggested against completely as it sorts differently, as in https://confluence.atlassian.com/doc/database-setup-for-postgresql-173244522.html . Yes Jira is OK with it, Confluence doesn't like it though. At this current state we're looking at verifying if 1252 is OK and if so will add it to the white list. We're also fixing CONFSERVER-58052 at the same time. Will keep you updated when we have some progress, thanks so much for your patience while we get this fixed up for everyone!

Manse Wolken added a comment - 15/May/2019 2:34 PM

I think that the test for postgres should check for the "ENCODING" of the database.

Manse Wolken added a comment - 15/May/2019 2:34 PM I think that the test for postgres should check for the "ENCODING" of the database.

Manse Wolken added a comment - 15/May/2019 2:25 PM - edited

I use collation "C" with ENCODING "utf-8" on Postgres.

Which, by the way, works perfect with jira. Will test with latest confluence.

Here are the default postgres settings that I have for the database:

       ENCODING = 'UTF8'
       TABLESPACE = pg_default
       LC_COLLATE = 'C'
       LC_CTYPE = 'C'
       CONNECTION LIMIT = -1;

By the way: Using the default postgres tar.gz installation, you only get collations "C" and "posix" where posix is another name for "C"....

Manse Wolken added a comment - 15/May/2019 2:25 PM - edited I use collation "C" with ENCODING "utf-8" on Postgres. Which, by the way, works perfect with jira. Will test with latest confluence. Here are the default postgres settings that I have for the database: ENCODING = 'UTF8' TABLESPACE = pg_default LC_COLLATE = 'C' LC_CTYPE = 'C' CONNECTION LIMIT = -1; By the way: Using the default postgres tar.gz installation, you only get collations "C" and "posix" where posix is another name for "C"....

thesefer added a comment - 23/Mar/2019 6:26 PM

After updating my docker setup to 6.15.1 I receive the following error, too. Which I find seems to be weird because it is utf8, isn't it?

Database: The database collation 'en_US.utf8' is not supported by Confluence. You need to use 'utf-8'.

thesefer added a comment - 23/Mar/2019 6:26 PM After updating my docker setup to 6.15.1 I receive the following error, too. Which I find seems to be weird because it is utf8, isn't it? Database: The database collation 'en_US.utf8' is not supported by Confluence. You need to use 'utf-8' .

Gilbert Janeselli added a comment - 23/Mar/2019 12:08 AM

RUn into the same problem after updating to latest version (btw - i did the update after your warning email a few days ago)!
What is ist with you? The Problem exists for a while and not even a word from anybody@Atlassian?!?!?
Shame!

Gilbert Janeselli added a comment - 23/Mar/2019 12:08 AM RUn into the same problem after updating to latest version (btw - i did the update after your warning email a few days ago)! What is ist with you? The Problem exists for a while and not even a word from anybody@Atlassian?!?!? Shame!

Josh added a comment - 21/Mar/2019 5:38 PM

I just upgraded to the latest version due to the two vulnerabilities as well and received this error too

Josh added a comment - 21/Mar/2019 5:38 PM I just upgraded to the latest version due to the two vulnerabilities as well and received this error too

cliper added a comment - 21/Mar/2019 8:21 AM - edited

@IT Department, same here. Just upgraded today after recent Atlassian announcement of two vulnerabilities.

Warning shows:

cliper added a comment - 21/Mar/2019 8:21 AM - edited @IT Department, same here. Just upgraded today after recent Atlassian announcement of two vulnerabilities. Warning shows:

Biswabhusan Panigrahi added a comment - 12/Mar/2019 11:33 AM - edited

Hi, Recently i upgraded Jira software from 7.12 to 8.0.2. After upgraded i got some warning message like "Your database is using an unsupported collation: English_United.1252 but i checked postgres db encoding is UTF-8. It's running in windows. I just ignored and its working fine. I am little worried for future. Any this need to change from our side. Please let me know.

Biswabhusan Panigrahi added a comment - 12/Mar/2019 11:33 AM - edited Hi, Recently i upgraded Jira software from 7.12 to 8.0.2. After upgraded i got some warning message like "Your database is using an unsupported collation: English_United.1252 but i checked postgres db encoding is UTF-8. It's running in windows. I just ignored and its working fine. I am little worried for future. Any this need to change from our side. Please let me know. .

Assignee:: Jamie (Inactive)

Reporter:: Shannon S

Affected customers:: 29 This affects my team

Watchers:: 54 Start watching this issue

Created:: 09/Jan/2019 2:44 PM

Updated:: 30/Aug/2023 4:46 PM

Resolved:: 04/Feb/2020 10:03 AM

Details

Description

Problem Definition

Background

Expected behavior

Suggested Solution

Notes

Attachments

Attachments

Issue Links

Forms

Activity

Collapse comment: Rachel Robins added a comment - 26/Sep/2019 4:59 AM

Expand comment: Rachel Robins added a comment - 26/Sep/2019 4:59 AM

Collapse comment: Manse Wolken added a comment - 03/Jun/2019 6:01 AM, Edited by Manse Wolken - 03/Jun/2019 6:20 AM

Expand comment: Manse Wolken added a comment - 03/Jun/2019 6:01 AM, Edited by Manse Wolken - 03/Jun/2019 6:20 AM

Collapse comment: Dave C added a comment - 24/May/2019 1:59 AM, Edited by Dave C - 24/May/2019 2:00 AM

Expand comment: Dave C added a comment - 24/May/2019 1:59 AM, Edited by Dave C - 24/May/2019 2:00 AM

Collapse comment: Manse Wolken added a comment - 15/May/2019 2:34 PM

Expand comment: Manse Wolken added a comment - 15/May/2019 2:34 PM

Collapse comment: Manse Wolken added a comment - 15/May/2019 2:25 PM, Edited by Manse Wolken - 15/May/2019 2:28 PM

Expand comment: Manse Wolken added a comment - 15/May/2019 2:25 PM, Edited by Manse Wolken - 15/May/2019 2:28 PM

Collapse comment: thesefer added a comment - 23/Mar/2019 6:26 PM

Expand comment: thesefer added a comment - 23/Mar/2019 6:26 PM

Collapse comment: Gilbert Janeselli added a comment - 23/Mar/2019 12:08 AM

Expand comment: Gilbert Janeselli added a comment - 23/Mar/2019 12:08 AM

Collapse comment: Josh added a comment - 21/Mar/2019 5:38 PM

Expand comment: Josh added a comment - 21/Mar/2019 5:38 PM

Collapse comment: cliper added a comment - 21/Mar/2019 8:21 AM, Edited by Monique Khairuliana - 11/Nov/2019 5:45 AM

Expand comment: cliper added a comment - 21/Mar/2019 8:21 AM, Edited by Monique Khairuliana - 11/Nov/2019 5:45 AM

Collapse comment: Biswabhusan Panigrahi added a comment - 12/Mar/2019 11:33 AM, Edited by Biswabhusan Panigrahi - 12/Mar/2019 11:35 AM

Expand comment: Biswabhusan Panigrahi added a comment - 12/Mar/2019 11:33 AM, Edited by Biswabhusan Panigrahi - 12/Mar/2019 11:35 AM

People

Dates