Uploaded image for project: 'Bitbucket Data Center'
  1. Bitbucket Data Center
  2. BSERV-12471

Restoring large backup containing 4-byte UTF-8 characters can fail with SAXParseException

    XMLWordPrintable

Details

    Description

      Issue Summary

      Due to a bug (XERCESJ-1668) in the Apache Xerces library, which is used in the Bitbucket backup client for creating/parsing the XML database backup, an attempt to parse a large XML database backup containing certain special characters can result in a database restoration/migration attempt failing with a SAXParseException.

      Steps to Reproduce

      Note: As this issue has been primarily reproduced with data from actual Bitbucket instances, the internal attempts at reproducing/investigating this issue have been done through using copies of this same data. No steps are currently available to generate a database schema from scratch that is able to reproduce this issue.

      1. Use the Bitbucket backup client to generate a backup archive for a Bitbucket instance with a large database that contains many special characters (such as pull request comments with emojis)
      2. Use the standard restore process to attempt to restore the generated backup to a new external database.

      OR

      1. Using the database migration wizard, attempt to migrate a large Bitbucket database containing many special characters (such as pull request comments with emojis) to a new database.

      Expected Results

      The database restoration completes as expected, with no exception being thrown.

      Actual Results

      The restore fails, with the following exception in the associated restoration attempt's logs:

      2020-07-13 07:39:41,634 INFO         Initializing
      2020-07-13 07:39:43,253 INFO         Unpacking bitbucket-20200709-172649-440.tar to /media/atl/bitbucket
      2020-07-13 10:25:18,721 INFO         Validating database before restore
      2020-07-13 10:25:20,450 INFO         Restoring database schema definition
      2020-07-13 10:25:33,482 INFO         Restoring database data
      2020-07-13 10:25:40,133 ERROR        bitbucket-20200709-172649-440.tar could not be restored
      com.atlassian.stash.internal.backup.liquibase.LiquibaseDataAccessException: SAX parsing error while parsing backup file; nested exception is org.xml.sax.SAXParseException; lineNumber: 10888480; columnNumber: 36; Invalid byte 2 of 4-byte UTF-8 sequence.
      	at com.atlassian.stash.internal.backup.liquibase.DefaultLiquibaseMigrationDao.parse(DefaultLiquibaseMigrationDao.java:229)
      	at com.atlassian.stash.internal.backup.liquibase.DefaultLiquibaseMigrationDao.scan(DefaultLiquibaseMigrationDao.java:215)
      	... 10 more frames available in the log file
      

      Workaround

      Follow the workaround steps listed in the following knowledge article:

      Attachments

        Issue Links

          Activity

            People

              74d6667aa35c Josh Aguilar
              eslaughter@atlassian.com Evan Slaughter
              Votes:
              3 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Backbone Issue Sync