-
Bug
-
Resolution: Fixed
-
High
-
7.13.3
-
27
-
Severity 3 - Minor
-
99
-
We don't plan to backport the fix for this bug to earlier Long Term Support versions
The fix for this bug isn't suitable for backporting to a bug fix release for any previous LTS versions. This is often because the fix is considered too high risk to implement in an older version.
The fix for this issue will be included in future Long Term Support versions.
Issue Summary
Imports of a site containing UTF-8 characters can fail with "Import failed. Check your server logs for more information. com.atlassian.confluence.importexport.ImportExportException: Unable to complete import: Invalid byte 2 of 4-byte UTF-8 sequence" shown on the web UI. This appears to be due to a bug (XERCESJ-1668) in the Apache Xerces library, and the underlying logs show the attempt failing with a SAXParseException.
Steps to Reproduce
- Create a space with multiple pages containing many special characters.
- Export the site in xml format.
- Import the xml file.
This can be difficult to reproduce as the UTF-8 character needs to be read in as the reader buffer is exhausted so it is only partially read and causes the rest to be added to the next buffer, causing the calculation to be off by one.
Alternatively you can import the following site export to see the issue:
xmlexport-20220516-093414-6.zip
Expected Results
Import should complete without error.
Actual Results
The import fails with the following error to screen:
The below exception is thrown in the confluence.log file:
2022-05-16 09:36:20,874 ERROR [Long running task: Importing data] [confluence.importexport.xmlimport.BackupImporter] importEntities Cannot import the entities: -- url: /longrunningtaskxml.action | referer: http://10.108.15.254:8090/admin/restore-local-file.action | traceId: fe357468ab26f515 | userName: admin | action: longrunningtaskxml com.atlassian.confluence.importexport.ImportExportException: Unable to complete import: Invalid byte 2 of 4-byte UTF-8 sequence. at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.doImportInternal(DefaultXmlImporter.java:64) at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.doImport(DefaultXmlImporter.java:42) at com.atlassian.confluence.importexport.xmlimport.BackupImporter.importEntities(BackupImporter.java:402) at com.atlassian.confluence.importexport.xmlimport.BackupImporter.importEverything(BackupImporter.java:371) at com.atlassian.confluence.importexport.xmlimport.FileBackupImporter.importEverything(FileBackupImporter.java:170) at com.atlassian.confluence.importexport.xmlimport.BackupImporter$1.doInTransactionWithoutResult(BackupImporter.java:262) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:36) at com.atlassian.confluence.importexport.xmlimport.RestorePluginStateStoreTransactionCallbackDecorator.doInTransaction(RestorePluginStateStoreTransactionCallbackDecorator.java:49) at com.atlassian.confluence.importexport.xmlimport.RestoreBandanaValuesTransactionCallbackDecorator.doInTransaction(RestoreBandanaValuesTransactionCallbackDecorator.java:56) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140) at com.atlassian.confluence.importexport.xmlimport.BackupImporter.doImportInternal(BackupImporter.java:224) at com.atlassian.confluence.importexport.Importer.doImport(Importer.java:73) at com.atlassian.confluence.importexport.DefaultImportExportManager.performImportInternal(DefaultImportExportManager.java:118) at com.atlassian.confluence.importexport.DefaultImportExportManager.doPerformImport(DefaultImportExportManager.java:106) at com.atlassian.confluence.importexport.DefaultImportExportManager.performImport(DefaultImportExportManager.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:295) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212) at com.sun.proxy.$Proxy172.performImport(Unknown Source) at com.atlassian.confluence.importexport.actions.ImportLongRunningTask.runInternal(ImportLongRunningTask.java:78) at com.atlassian.confluence.util.longrunning.ConfluenceAbstractLongRunningTask.run(ConfluenceAbstractLongRunningTask.java:26) at com.atlassian.confluence.util.longrunning.ManagedTask.run(ManagedTask.java:39) at com.atlassian.confluence.impl.util.concurrent.ConfluenceExecutors$ThreadLocalContextTaskWrapper.lambda$wrap$1(ConfluenceExecutors.java:90) at com.atlassian.confluence.vcache.VCacheRequestContextOperations.lambda$doInRequestContext$0(VCacheRequestContextOperations.java:50) at com.atlassian.confluence.impl.vcache.VCacheRequestContextManager.doInRequestContextInternal(VCacheRequestContextManager.java:84) at com.atlassian.confluence.impl.vcache.VCacheRequestContextManager.doInRequestContext(VCacheRequestContextManager.java:68) at com.atlassian.confluence.vcache.VCacheRequestContextOperations.doInRequestContext(VCacheRequestContextOperations.java:49) at com.atlassian.confluence.vcache.VCacheRequestContextOperations.lambda$withRequestContext$2(VCacheRequestContextOperations.java:66) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.xml.sax.SAXParseException; lineNumber: 115232; columnNumber: 1140; Invalid byte 2 of 4-byte UTF-8 sequence. at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at com.atlassian.security.xml.RestrictedXMLReader.parse(RestrictedXMLReader.java:103) at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.parseBackup(DefaultXmlImporter.java:86) at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.initProgressMeter(DefaultXmlImporter.java:75) at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.doImportInternal(DefaultXmlImporter.java:47) ... 40 more Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence. at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source) at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ... 46 more
Workaround
Use the atlassian-xml-cleaner-0.1.jar documented in 'Incorrect string value' error thrown when restoring XML backup in Confluence.
This will clean the entities.xml file, however the imported site will now be missing emojis (replaced with question marks), or worse, the emojis are changed from their original form to another emoji.
It seems this problem may also cause another error, or at least a similar problem which also has to do with character/string length when reading the backup.
In my case importing one days 7.4.5 xml backup to another 7.4.5 instance crashed with error "Index 2048 out of bounds for length 2048", but an xml from the next day instead crashed with the error "Invalid byte 2 of 4-byte UTF-8 sequence" mentioned in this ticket. Both errors rendered Confluence unusable and had to restore a db backup to get it running again. The index out of bounds error had no reference to where the problem was in the xml file but the invalid byte error pointed to a specific row where I did find a multibyte emoji.
When trying to import the same backups into a 8.5.0 version it worked fine. So conclusion is, if an xml import in pre-8.3 versions crashes with something that sounds related to character/string length, upgrade to at least 8.3.
Adding this comment here in case others search for the index out of bounds error because it took me a long time finding what that one was about.