Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-30465

Make directory sync more robust when handling names with emoji characters

    • 2
    • 4
    • We collect Confluence feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.

      Directory sync via ldap connection will break if some condition fails to meet a constraint, collation, etc.

      Improvement request to digest that error, report it to the admin, and continue on with the sync through completion.

      Graceful handling of errors without breaking sync.

      For example:
      Mysql database, utf8 characterset, utf8_bin collation. User directory sync with ldap system. LDAP user has emoji in name. When directory sync occurs, confluence logs report this error:

      ERROR [scheduler_Worker-6] [atlassian.crowd.directory.DbCachingDirectoryPoller] pollChanges Error occurred while refreshing the cache for directory [ 13205505 ].
      org.springframework.jdbc.UncategorizedSQLException: Hibernate operation: could not update: com.atlassian.crowd.model.user.InternalUser#17850980; uncategorized SQLException for SQL []; SQL state [HY000]; error code [1366]; Incorrect string value: '\xF0\x9F\x90\x9C' for column 'first_name' at row 1; nested exception is java.sql.SQLException: Incorrect string value: '\xF0\x9F\x90\x9C' for column 'first_name' at row 1
      

      Another example:

      ERROR [scheduler_Worker-4] [atlassian.crowd.directory.DbCachingDirectoryPoller] pollChanges Error occurred while refreshing the cache for directory [ 59146241 ].
      org.springframework.jdbc.UncategorizedSQLException: Hibernate operation: could not update: [com.atlassian.crowd.model.user.InternalUser#69320708]; uncategorized SQLException for SQL []; SQL state [HY000]; error code [1366]; Incorrect string value: '\xF0\x9F\x8E\x88' for column 'first_name' at row 1; nested exception is java.sql.SQLException: Incorrect string value: '\xF0\x9F\x8E\x88' for column 'first_name' at row 1
      

      '\xF0\x9F\x90\x9C', turns out to be emoji ant. This occurred on mysql 5.1. Upgrading to mysql 5.5.3 and setting characterset on tables to utf8mb4 does not resolve the issue. See here for more details:
      http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

      Once upgraded to mysql 5.5.3 a command line client can insert an emoji character into the database and the database handles this fine. When Confluence tries to sync an emoji character, the sync breaks and Confluence throws an error.

            [CONFSERVER-30465] Make directory sync more robust when handling names with emoji characters

            Thank you for raising this suggestion.
            We regret to inform you that due to limited demand, we have no plans to implement it in the foreseeable future. In order to set expectations, we're closing this request now. Sometimes potentially valuable tickets do get closed where the Summary or Description has not caught the attention of the community. If you feel that this suggestion is valuable, consider describing in more detail or outlining how this request will help you achieve your goals. We may then be able to provide better guidance.
            For more context, check out our Community blog on our updated workflow for Suggestions
            Cheers,

            Confluence Product Management

            Adam Barnes (Inactive) added a comment - Thank you for raising this suggestion. We regret to inform you that due to limited demand, we have no plans to implement it in the foreseeable future. In order to set expectations, we're closing this request now. Sometimes potentially valuable tickets do get closed where the Summary or Description has not caught the attention of the community. If you feel that this suggestion is valuable, consider describing in more detail or outlining how this request will help you achieve your goals. We may then be able to provide better guidance. For more context, check out our Community blog on our updated workflow for Suggestions Cheers, Confluence Product Management

            As per the comment on JRA-36135, it looks like in MySQL 5.7, the default row_format is changing to DYNAMIC, so using utf8mb4 should work just fine when both MySQL 5.7 is released, and also Confluence supports MySQL 5.7.

            David Mason (Inactive) added a comment - As per the comment on JRA-36135 , it looks like in MySQL 5.7, the default row_format is changing to DYNAMIC, so using utf8mb4 should work just fine when both MySQL 5.7 is released, and also Confluence supports MySQL 5.7.

            IMHO, the ultimate reasons for this behaviour seems to be JRA-36135.

            Andreas van Rienen (Scandio) added a comment - IMHO, the ultimate reasons for this behaviour seems to be JRA-36135 .

            I've tested this works fine with characters/emoji that have what seems to be a 'native' character code.

            The difference is between characters that are in the BMP (Base Multilingual Plane) – what you are calling "native" – and those which are not. Bugs affecting storage and retrieval of "emoji" are caused by software that only supports the BMP part of Unicode.

            By having integration bugs with certain systems not supporting the "Supplementary Multilingual Plane", we're missing out on these characters:

            Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts such as Linear B, Egyptian hieroglyphs, and cuneiform scripts; historic and modern musical notation; mathematical alphanumerics; Emoji and other pictographic sets; reform orthographies like Shavian and Deseret; and game symbols for playing cards, Mah Jongg, and dominoes.

            None of these are likely to be very common in enterprise LDAP installations. So unless you can reproduce the syncing problem with BMP characters, then I think we'll have to treat this issue with a low priority for now.

            Please encourage the customer to remove the emoji from their LDAP data to get a working sync.

            Matt Ryall added a comment - I've tested this works fine with characters/emoji that have what seems to be a 'native' character code. The difference is between characters that are in the BMP ( Base Multilingual Plane ) – what you are calling "native" – and those which are not. Bugs affecting storage and retrieval of "emoji" are caused by software that only supports the BMP part of Unicode. By having integration bugs with certain systems not supporting the "Supplementary Multilingual Plane", we're missing out on these characters: Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts such as Linear B, Egyptian hieroglyphs, and cuneiform scripts; historic and modern musical notation; mathematical alphanumerics; Emoji and other pictographic sets; reform orthographies like Shavian and Deseret; and game symbols for playing cards, Mah Jongg, and dominoes. None of these are likely to be very common in enterprise LDAP installations. So unless you can reproduce the syncing problem with BMP characters, then I think we'll have to treat this issue with a low priority for now. Please encourage the customer to remove the emoji from their LDAP data to get a working sync.

            Ryan Goodwin (Inactive) added a comment - - edited

            It's clear that mysql can handle the characters as you can insert them directly into the database. I've tested this works fine with characters/emoji that have what seems to be a 'native' character code. See this page for examples:
            http://apps.timwhitlock.info/emoji/tables/unicode

            Any emoji with an image in the native column, carries with it a 3 slash-set character code. Anything greater than 3 slashes failed to update in my ldap tests on apache ldap 1.5, so I couldn't get an LDAP sync to take. Anything without a native image also failed to update successfully directly in mysql. The update statement would run, but the results would not show the emoji.

            Mysql command line output:

            mysql> select user_name, lower_user_name from cwd_user where lower_user_name like '%derp%';
            +-----------+-----------------+
            | user_name | lower_user_name |
            +-----------+-----------------+
            | Derpetit  | derpetit        |
            | derp      | derp♣           |
            | derp❤     | derp❤           |
            | derp      | derp            |
            +-----------+-----------------+
            4 rows in set (0.00 sec)
            
            

            Trying to add the emoji with no native image in mysql gui produced this response:

            1 row(s) affected, 1 warning(s): 1366 Incorrect string value: '\xF0\x9F\x8D\x80' for column 'lower_user_name' at row 1 Rows matched: 1 Changed: 1 Warnings: 1

            The name is updated, but the emoji is not added.

            Ryan Goodwin (Inactive) added a comment - - edited It's clear that mysql can handle the characters as you can insert them directly into the database. I've tested this works fine with characters/emoji that have what seems to be a 'native' character code. See this page for examples: http://apps.timwhitlock.info/emoji/tables/unicode Any emoji with an image in the native column, carries with it a 3 slash-set character code. Anything greater than 3 slashes failed to update in my ldap tests on apache ldap 1.5, so I couldn't get an LDAP sync to take. Anything without a native image also failed to update successfully directly in mysql. The update statement would run, but the results would not show the emoji. Mysql command line output: mysql> select user_name, lower_user_name from cwd_user where lower_user_name like '%derp%'; +-----------+-----------------+ | user_name | lower_user_name | +-----------+-----------------+ | Derpetit | derpetit | | derp | derp♣ | | derp❤ | derp❤ | | derp | derp | +-----------+-----------------+ 4 rows in set (0.00 sec) Trying to add the emoji with no native image in mysql gui produced this response: 1 row(s) affected, 1 warning(s): 1366 Incorrect string value: '\xF0\x9F\x8D\x80' for column 'lower_user_name' at row 1 Rows matched: 1 Changed: 1 Warnings: 1 The name is updated, but the emoji is not added.

            Matt Ryall added a comment -

            In your first case, where you're testing with MySQL, you may need to set the connection encoding to the same as the database encoding (utf8mb4) in the JDBC connection URL in Confluence, otherwise the data is not going to be passed successfully to the server.

            I'm not convinced that solving this issue in for emoji characters is very important. Is there any genuine reason for a user's name to contain emoji characters? LDAP integration and databases often have limitations in dealing with these characters which can't be helped in Confluence. It may be that the MySQL drivers or some other component can't handle these characters.

            I'd strongly suggest the customer updates their users in LDAP not to have emoji in their names. We have a large number of critical issues open against Confluence, so it's unlikely that this specific issue will ever be fixed.

            If there is an underlying problem with LDAP sync failing completely in more normal circumstances (i.e. without emoji), please provide the steps to reproduce it. If we have that information, we can treat this issue as a bug instead of an improvement, and reassess its priority on that basis.

            Matt Ryall added a comment - In your first case, where you're testing with MySQL, you may need to set the connection encoding to the same as the database encoding (utf8mb4) in the JDBC connection URL in Confluence, otherwise the data is not going to be passed successfully to the server. I'm not convinced that solving this issue in for emoji characters is very important. Is there any genuine reason for a user's name to contain emoji characters? LDAP integration and databases often have limitations in dealing with these characters which can't be helped in Confluence. It may be that the MySQL drivers or some other component can't handle these characters. I'd strongly suggest the customer updates their users in LDAP not to have emoji in their names. We have a large number of critical issues open against Confluence, so it's unlikely that this specific issue will ever be fixed. If there is an underlying problem with LDAP sync failing completely in more normal circumstances (i.e. without emoji), please provide the steps to reproduce it. If we have that information, we can treat this issue as a bug instead of an improvement, and reassess its priority on that basis.

            Hi Ryan, is there a support ticket with more specifics?

            John Masson added a comment - Hi Ryan, is there a support ticket with more specifics?

              Unassigned Unassigned
              rgoodwin Ryan Goodwin (Inactive)
              Votes:
              7 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: