Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-78782

Importing a site export can fail with Invalid byte 2 of 4-byte UTF-8 sequence

    XMLWordPrintable

Details

    Description

      We don't plan to backport the fix for this bug to earlier Long Term Support versions

      The fix for this bug isn't suitable for backporting to a bug fix release for any previous LTS versions. This is often because the fix is considered too high risk to implement in an older version.

      The fix for this issue will be included in future Long Term Support versions.

      Issue Summary

      Imports of a site containing UTF-8 characters can fail with "Import failed. Check your server logs for more information. com.atlassian.confluence.importexport.ImportExportException: Unable to complete import: Invalid byte 2 of 4-byte UTF-8 sequence" shown on the web UI. This appears to be due to a bug (XERCESJ-1668) in the Apache Xerces library, and the underlying logs show the attempt failing with a SAXParseException.

      Steps to Reproduce

      1. Create a space with multiple pages containing many special characters.
      2. Export the site in xml format.
      3. Import the xml file.

      This can be difficult to reproduce as the UTF-8 character needs to be read in as the reader buffer is exhausted so it is only partially read and causes the rest to be added to the next buffer, causing the calculation to be off by one. 

      Alternatively you can import the following site export to see the issue:
      xmlexport-20220516-093414-6.zip

      Expected Results

      Import should complete without error.

      Actual Results

      The import fails with the following error to screen:

       

      The below exception is thrown in the confluence.log file:

      2022-05-16 09:36:20,874 ERROR [Long running task: Importing data] [confluence.importexport.xmlimport.BackupImporter] importEntities Cannot import the entities:
       -- url: /longrunningtaskxml.action | referer: http://10.108.15.254:8090/admin/restore-local-file.action | traceId: fe357468ab26f515 | userName: admin | action: longrunningtaskxml
      com.atlassian.confluence.importexport.ImportExportException: Unable to complete import: Invalid byte 2 of 4-byte UTF-8 sequence.
              at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.doImportInternal(DefaultXmlImporter.java:64)
              at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.doImport(DefaultXmlImporter.java:42)
              at com.atlassian.confluence.importexport.xmlimport.BackupImporter.importEntities(BackupImporter.java:402)
              at com.atlassian.confluence.importexport.xmlimport.BackupImporter.importEverything(BackupImporter.java:371)
              at com.atlassian.confluence.importexport.xmlimport.FileBackupImporter.importEverything(FileBackupImporter.java:170)
              at com.atlassian.confluence.importexport.xmlimport.BackupImporter$1.doInTransactionWithoutResult(BackupImporter.java:262)
              at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:36)
              at com.atlassian.confluence.importexport.xmlimport.RestorePluginStateStoreTransactionCallbackDecorator.doInTransaction(RestorePluginStateStoreTransactionCallbackDecorator.java:49)
              at com.atlassian.confluence.importexport.xmlimport.RestoreBandanaValuesTransactionCallbackDecorator.doInTransaction(RestoreBandanaValuesTransactionCallbackDecorator.java:56)
              at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)
              at com.atlassian.confluence.importexport.xmlimport.BackupImporter.doImportInternal(BackupImporter.java:224)
              at com.atlassian.confluence.importexport.Importer.doImport(Importer.java:73)
              at com.atlassian.confluence.importexport.DefaultImportExportManager.performImportInternal(DefaultImportExportManager.java:118)
              at com.atlassian.confluence.importexport.DefaultImportExportManager.doPerformImport(DefaultImportExportManager.java:106)
              at com.atlassian.confluence.importexport.DefaultImportExportManager.performImport(DefaultImportExportManager.java:101)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
              at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
              at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
              at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:295)
              at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98)
              at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
              at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
              at com.sun.proxy.$Proxy172.performImport(Unknown Source)
              at com.atlassian.confluence.importexport.actions.ImportLongRunningTask.runInternal(ImportLongRunningTask.java:78)
              at com.atlassian.confluence.util.longrunning.ConfluenceAbstractLongRunningTask.run(ConfluenceAbstractLongRunningTask.java:26)
              at com.atlassian.confluence.util.longrunning.ManagedTask.run(ManagedTask.java:39)
              at com.atlassian.confluence.impl.util.concurrent.ConfluenceExecutors$ThreadLocalContextTaskWrapper.lambda$wrap$1(ConfluenceExecutors.java:90)
              at com.atlassian.confluence.vcache.VCacheRequestContextOperations.lambda$doInRequestContext$0(VCacheRequestContextOperations.java:50)
              at com.atlassian.confluence.impl.vcache.VCacheRequestContextManager.doInRequestContextInternal(VCacheRequestContextManager.java:84)
              at com.atlassian.confluence.impl.vcache.VCacheRequestContextManager.doInRequestContext(VCacheRequestContextManager.java:68)
              at com.atlassian.confluence.vcache.VCacheRequestContextOperations.doInRequestContext(VCacheRequestContextOperations.java:49)
              at com.atlassian.confluence.vcache.VCacheRequestContextOperations.lambda$withRequestContext$2(VCacheRequestContextOperations.java:66)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: org.xml.sax.SAXParseException; lineNumber: 115232; columnNumber: 1140; Invalid byte 2 of 4-byte UTF-8 sequence.
              at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
              at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
              at com.atlassian.security.xml.RestrictedXMLReader.parse(RestrictedXMLReader.java:103)
              at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.parseBackup(DefaultXmlImporter.java:86)
              at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.initProgressMeter(DefaultXmlImporter.java:75)
              at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.doImportInternal(DefaultXmlImporter.java:47)
              ... 40 more
      Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
              at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
              at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
              at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
              at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
              at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown Source)
              at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
              at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
              at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
              at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
              at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
              ... 46 more 

      Workaround

      Use the atlassian-xml-cleaner-0.1.jar documented in 'Incorrect string value' error thrown when restoring XML backup in Confluence.

      This will clean the entities.xml file, however the imported site will now be missing emojis (replaced with question marks), or worse, the emojis are changed from their original form to another emoji. 

       

      Attachments

        1. image-2022-05-16-13-27-51-948.png
          38 kB
          Dean Norman
        2. xmlexport-20220516-093414-6.zip
          4.56 MB
          Dean Norman

        Issue Links

          Activity

            People

              glipatov George Lipatov
              7829eff5df87 Dean Norman
              Votes:
              26 Vote for this issue
              Watchers:
              35 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: