Details
-
Bug
-
Resolution: Fixed
-
High
-
7.13.3
-
27
-
Severity 3 - Minor
-
99
-
Description
We don't plan to backport the fix for this bug to earlier Long Term Support versions
The fix for this bug isn't suitable for backporting to a bug fix release for any previous LTS versions. This is often because the fix is considered too high risk to implement in an older version.
The fix for this issue will be included in future Long Term Support versions.
Issue Summary
Imports of a site containing UTF-8 characters can fail with "Import failed. Check your server logs for more information. com.atlassian.confluence.importexport.ImportExportException: Unable to complete import: Invalid byte 2 of 4-byte UTF-8 sequence" shown on the web UI. This appears to be due to a bug (XERCESJ-1668) in the Apache Xerces library, and the underlying logs show the attempt failing with a SAXParseException.
Steps to Reproduce
- Create a space with multiple pages containing many special characters.
- Export the site in xml format.
- Import the xml file.
This can be difficult to reproduce as the UTF-8 character needs to be read in as the reader buffer is exhausted so it is only partially read and causes the rest to be added to the next buffer, causing the calculation to be off by one.
Alternatively you can import the following site export to see the issue:
xmlexport-20220516-093414-6.zip
Expected Results
Import should complete without error.
Actual Results
The import fails with the following error to screen:
The below exception is thrown in the confluence.log file:
2022-05-16 09:36:20,874 ERROR [Long running task: Importing data] [confluence.importexport.xmlimport.BackupImporter] importEntities Cannot import the entities: -- url: /longrunningtaskxml.action | referer: http://10.108.15.254:8090/admin/restore-local-file.action | traceId: fe357468ab26f515 | userName: admin | action: longrunningtaskxml com.atlassian.confluence.importexport.ImportExportException: Unable to complete import: Invalid byte 2 of 4-byte UTF-8 sequence. at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.doImportInternal(DefaultXmlImporter.java:64) at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.doImport(DefaultXmlImporter.java:42) at com.atlassian.confluence.importexport.xmlimport.BackupImporter.importEntities(BackupImporter.java:402) at com.atlassian.confluence.importexport.xmlimport.BackupImporter.importEverything(BackupImporter.java:371) at com.atlassian.confluence.importexport.xmlimport.FileBackupImporter.importEverything(FileBackupImporter.java:170) at com.atlassian.confluence.importexport.xmlimport.BackupImporter$1.doInTransactionWithoutResult(BackupImporter.java:262) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:36) at com.atlassian.confluence.importexport.xmlimport.RestorePluginStateStoreTransactionCallbackDecorator.doInTransaction(RestorePluginStateStoreTransactionCallbackDecorator.java:49) at com.atlassian.confluence.importexport.xmlimport.RestoreBandanaValuesTransactionCallbackDecorator.doInTransaction(RestoreBandanaValuesTransactionCallbackDecorator.java:56) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140) at com.atlassian.confluence.importexport.xmlimport.BackupImporter.doImportInternal(BackupImporter.java:224) at com.atlassian.confluence.importexport.Importer.doImport(Importer.java:73) at com.atlassian.confluence.importexport.DefaultImportExportManager.performImportInternal(DefaultImportExportManager.java:118) at com.atlassian.confluence.importexport.DefaultImportExportManager.doPerformImport(DefaultImportExportManager.java:106) at com.atlassian.confluence.importexport.DefaultImportExportManager.performImport(DefaultImportExportManager.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:295) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212) at com.sun.proxy.$Proxy172.performImport(Unknown Source) at com.atlassian.confluence.importexport.actions.ImportLongRunningTask.runInternal(ImportLongRunningTask.java:78) at com.atlassian.confluence.util.longrunning.ConfluenceAbstractLongRunningTask.run(ConfluenceAbstractLongRunningTask.java:26) at com.atlassian.confluence.util.longrunning.ManagedTask.run(ManagedTask.java:39) at com.atlassian.confluence.impl.util.concurrent.ConfluenceExecutors$ThreadLocalContextTaskWrapper.lambda$wrap$1(ConfluenceExecutors.java:90) at com.atlassian.confluence.vcache.VCacheRequestContextOperations.lambda$doInRequestContext$0(VCacheRequestContextOperations.java:50) at com.atlassian.confluence.impl.vcache.VCacheRequestContextManager.doInRequestContextInternal(VCacheRequestContextManager.java:84) at com.atlassian.confluence.impl.vcache.VCacheRequestContextManager.doInRequestContext(VCacheRequestContextManager.java:68) at com.atlassian.confluence.vcache.VCacheRequestContextOperations.doInRequestContext(VCacheRequestContextOperations.java:49) at com.atlassian.confluence.vcache.VCacheRequestContextOperations.lambda$withRequestContext$2(VCacheRequestContextOperations.java:66) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.xml.sax.SAXParseException; lineNumber: 115232; columnNumber: 1140; Invalid byte 2 of 4-byte UTF-8 sequence. at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at com.atlassian.security.xml.RestrictedXMLReader.parse(RestrictedXMLReader.java:103) at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.parseBackup(DefaultXmlImporter.java:86) at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.initProgressMeter(DefaultXmlImporter.java:75) at com.atlassian.confluence.importexport.xmlimport.DefaultXmlImporter.doImportInternal(DefaultXmlImporter.java:47) ... 40 more Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence. at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source) at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ... 46 more
Workaround
Use the atlassian-xml-cleaner-0.1.jar documented in 'Incorrect string value' error thrown when restoring XML backup in Confluence.
This will clean the entities.xml file, however the imported site will now be missing emojis (replaced with question marks), or worse, the emojis are changed from their original form to another emoji.