Details
Description
Issue Summary
Due to a bug (XERCESJ-1668) in the Apache Xerces library, which is used in the Bitbucket backup client for creating/parsing the XML database backup, an attempt to parse a large XML database backup containing certain special characters can result in a database restoration/migration attempt failing with a SAXParseException.
Steps to Reproduce
Note: As this issue has been primarily reproduced with data from actual Bitbucket instances, the internal attempts at reproducing/investigating this issue have been done through using copies of this same data. No steps are currently available to generate a database schema from scratch that is able to reproduce this issue.
- Use the Bitbucket backup client to generate a backup archive for a Bitbucket instance with a large database that contains many special characters (such as pull request comments with emojis)
- Use the standard restore process to attempt to restore the generated backup to a new external database.
OR
- Using the database migration wizard, attempt to migrate a large Bitbucket database containing many special characters (such as pull request comments with emojis) to a new database.
Expected Results
The database restoration completes as expected, with no exception being thrown.
Actual Results
The restore fails, with the following exception in the associated restoration attempt's logs:
2020-07-13 07:39:41,634 INFO Initializing 2020-07-13 07:39:43,253 INFO Unpacking bitbucket-20200709-172649-440.tar to /media/atl/bitbucket 2020-07-13 10:25:18,721 INFO Validating database before restore 2020-07-13 10:25:20,450 INFO Restoring database schema definition 2020-07-13 10:25:33,482 INFO Restoring database data 2020-07-13 10:25:40,133 ERROR bitbucket-20200709-172649-440.tar could not be restored com.atlassian.stash.internal.backup.liquibase.LiquibaseDataAccessException: SAX parsing error while parsing backup file; nested exception is org.xml.sax.SAXParseException; lineNumber: 10888480; columnNumber: 36; Invalid byte 2 of 4-byte UTF-8 sequence. at com.atlassian.stash.internal.backup.liquibase.DefaultLiquibaseMigrationDao.parse(DefaultLiquibaseMigrationDao.java:229) at com.atlassian.stash.internal.backup.liquibase.DefaultLiquibaseMigrationDao.scan(DefaultLiquibaseMigrationDao.java:215) ... 10 more frames available in the log file
Workaround
Follow the workaround steps listed in the following knowledge article: