Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-21224

MySQL databases incapabable of handling 4-byte UTF-8 Characters. JIRA should handle this gracefully

    • We collect Jira feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      When adding a comment that contains Unicode characters outside the BMP (e.g.- ⚔✏Ṣᴔᖲ⋂𒟤ၽ긄ְ𳨒𝴠㧁㸏㱠嫖㰾) JIRA throws the following error (Oops - an error has occurred):

      ...
      Cause:
      com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericEntityException: while inserting: [GenericEntity:Action][id,10030][body,?????????????????][author,admin][level,null][updated,2010-05-04 15:04:55.046][created,2010-05-04 15:04:55.046][issue,10011][rolelevel,null][type,comment][updateauthor,admin] (SQL Exception while executing the following:INSERT INTO jiraaction (ID, issueid, AUTHOR, actiontype, actionlevel, rolelevel, actionbody, CREATED, UPDATEAUTHOR, UPDATED, actionnum) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) (Incorrect string value: '\xF0\x92\x9F\xA4\xE1\x81...' for column 'actionbody' at row 1)) 
      ...
      com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericEntityException: while inserting: [GenericEntity:Action][id,10031][body,?????????????????][author,admin][level,null][updated,2010-05-04 15:07:43.156][created,2010-05-04 15:07:43.156][issue,10011][rolelevel,null][type,comment][updateauthor,admin] (SQL Exception while executing the following:INSERT INTO jiraaction (ID, issueid, AUTHOR, actiontype, actionlevel, rolelevel, actionbody, CREATED, UPDATEAUTHOR, UPDATED, actionnum) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) (Incorrect string value: '\xF0\x92\x9F\xA4\xE1\x81...' for column 'actionbody' at row 1))
      	at com.atlassian.jira.issue.comments.DefaultCommentManager.create(DefaultCommentManager.java:202)
      	at com.atlassian.jira.issue.comments.DefaultCommentManager.create(DefaultCommentManager.java:170)
      	at com.atlassian.jira.issue.comments.DefaultCommentManager.create(DefaultCommentManager.java:164)
      	at com.atlassian.jira.bc.issue.comment.DefaultCommentService.create(DefaultCommentService.java:156)
      ...
      

      It is IMPORTANT to note that this error may also crop up when the MySQL server and/or JIRA connection to it are mis-configured. In this case JIRA may also fail for characters in the Unicode BMP (see https://confluence.atlassian.com/display/JIRAKB/SQL+Exception+when+Entering%2C+Updating+or+Importing+an+Issue+in+JIRA+with+MySQL+Due+to+Encoding). In this case this issue does not apply and its just a configuration problem.

      This problem only happens in MySQL environment. MySQL has a known limitation in that it is unable to store 4 byte UTF-8 characters in MySQL before 5.5.3. I think we should:

      1. Add a startup warning admins when JIRA connects to a MySQL database with these problems.
      2. Update the documentation of MySQL database configuration to indicate this problem.
      3. Sanatising the input so that this error does not occur when connected to MySQL before 5.5.3. This will probably be much harder than it sounds because JIRA can get input data from anywhere (e.g. CSV, E-Mail, HTTP, SOAP etc). We will probably have to choose our battles.
      4. Update out documentation to indicate this is the database that gets around this problem.
      5. Update the upgrade guide once we work out an upgrade path.

            [JRASERVER-21224] MySQL databases incapabable of handling 4-byte UTF-8 Characters. JIRA should handle this gracefully

            Anders added a comment - - edited

            cf/Jira 本身还不支持utf8mb4 ,所以自己直接改数据库编码还不行。

            Anders added a comment - - edited cf/Jira 本身还不支持utf8mb4 ,所以自己直接改数据库编码还不行。

            This is being tracked at: https://jira.atlassian.com/browse/JRA-36135 now.

            Boris Berenberg (Inactive) added a comment - This is being tracked at: https://jira.atlassian.com/browse/JRA-36135 now.

            Ha! for clarity though, you should also confirm that you aren't able to enter such data into the JIRA description box etc, its a storage issue, not specific to any one plugin I have an idea that may work for me, I'll post on the other issue

            Andy Brook [Plugin People] added a comment - Ha! for clarity though, you should also confirm that you aren't able to enter such data into the JIRA description box etc, its a storage issue, not specific to any one plugin I have an idea that may work for me, I'll post on the other issue

            yes but my prob is with your plugin Andy that trigger that error
            i can't change db server

            Fabrizio Galletti added a comment - yes but my prob is with your plugin Andy that trigger that error i can't change db server

            You can try, but I don't think it will help. Mysql does not handle (all) multi-byte UTF-8 characters, just some of them, alternatives like Postgresql does have full (3,4,5,6) multi-byte UTF-8 character support.

            Andy Brook [Plugin People] added a comment - - edited You can try, but I don't think it will help. Mysql does not handle (all) multi-byte UTF-8 characters, just some of them, alternatives like Postgresql does have full (3,4,5,6) multi-byte UTF-8 character support.

            Fabrizio Galletti added a comment - - edited

            i've found some problem with another product that writes on JIRA.
            can we switch to CHARACTER SET = utf8mb4 COLLATE = utf8mb4_bin;
            or it's not a good solution?

            Fabrizio Galletti added a comment - - edited i've found some problem with another product that writes on JIRA. can we switch to CHARACTER SET = utf8mb4 COLLATE = utf8mb4_bin; or it's not a good solution?

            bain added a comment -

            Adding link to the connie issue.

            bain added a comment - Adding link to the connie issue.

            bain added a comment -

            Hello All,

            MySQL 5.0 and 5.1 just can't store 4 byte UTF-8 characters (aka 𒟤 = f0 92 9f a4 in UTF-8 )

            This is a known limitation in in MySQL. UTF-8 encoding allows characters of 1, 2, 3 or 4 bytes in length. MySQL only accepts characters from 1-3 bytes (i.e. characters inside of the BMP of Unicode) (ref: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html and http://dev.mysql.com/doc/refman/5.1/en/charset-unicode.html). This problem is going to be fixed in MySQL 5.5 with the new utf8mb4, utf16, utf32 (ref: http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html) but that has not been made GA so people should not be using it.

            We will get the MySQL JIRA documentation updated to indicate this limitation.

            bain added a comment - Hello All, MySQL 5.0 and 5.1 just can't store 4 byte UTF-8 characters (aka 𒟤 = f0 92 9f a4 in UTF-8 ) This is a known limitation in in MySQL. UTF-8 encoding allows characters of 1, 2, 3 or 4 bytes in length. MySQL only accepts characters from 1-3 bytes (i.e. characters inside of the BMP of Unicode) (ref: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html and http://dev.mysql.com/doc/refman/5.1/en/charset-unicode.html ). This problem is going to be fixed in MySQL 5.5 with the new utf8mb4, utf16, utf32 (ref: http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html ) but that has not been made GA so people should not be using it. We will get the MySQL JIRA documentation updated to indicate this limitation.

            Hi Scott,

            Thanks for the comment. If you could run the sql command that Chris suggested:

            select charset('jiraaction');

            and post the results here, it would be much appreciated

            Cheers,
            Peter

            Peter Leschev added a comment - Hi Scott, Thanks for the comment. If you could run the sql command that Chris suggested: select charset('jiraaction'); and post the results here, it would be much appreciated Cheers, Peter

            We ran into something similar, and I suspect that this is a real bug. See CONF-18509.

            Scott Dudley [Inactive] added a comment - We ran into something similar, and I suspect that this is a real bug. See CONF-18509 .

              Unassigned Unassigned
              jcooi JoachimA
              Votes:
              4 Vote for this issue
              Watchers:
              26 Start watching this issue

                Created:
                Updated:
                Resolved: