Summary

      Confluence intermittently fails when importing a word .docx document. Sometimes when selecting the file, you will receive an error "The selected file is not a valid binary Word 97-2003 document", other times when you attempt to do the actual import, you will see another error:

      The following error(s) occurred:
      Could not import document. The document contains an image of size 6096000 x 2667000, which exceeds the maximum size of 900 x 1200. Reduce the size of the image or contact your administrator to raise the size limit.

      This happens even though there are no images or attachments of that size.

      Steps to Reproduce

      1. Select document with a large number of images (actual number of images is unknown, as this is intermittent).
      2. Attempt to import through the import process. You will either encounter the first or second error.

      Expected Results

      The import should be successful, as it is a valid file type with no images that large.

      Actual Results

      The import fails.
      The below exception is thrown in the xxxxxxx.log file:

      @400000005660c0e915fa5d9c 337678313 [http-bio-1990-exec-39] WARN com.benryan.webwork.WordImportAction - Failed to delete uploaded file /data/jirastudio/confluence/home/temp/upload_b2c8c4b6_902f_4391_9574_e82a259f8d79_00000001.tmp
      @400000005660c0fc1debd7ec 2015-12-03 22:23:46,497 WARN [scheduler_Worker-2] [search.lucene.queue.JournalIndexTaskQueue] lambda$flushQueue$314 Failed to process index task for entry 'JournalEntry{id=2643, journalId=JournalIdentifier{journalName=main_index}, creationDate=2015-12-03 22:23:41.456, type=ADD_CHANGE_DOCUMENT, message=com.atlassian.confluence.pages.Page-5111846}': null
      @400000005660c0fc1debdfbc java.lang.NullPointerException
      @400000005660c0fc1debe3a4 	at org.apache.lucene.document.DateTools.dateToString(DateTools.java:87)
      @400000005660c0fc1dec06cc 	at com.atlassian.bonnie.LuceneUtils.dateToString(LuceneUtils.java:32)
      @400000005660c0fc1dec0ab4 	at com.atlassian.confluence.search.lucene.extractor.EntityDateExtractor.addFields(EntityDateExtractor.java:22)
      @400000005660c0fc1dec0ab4 	at com.atlassian.confluence.search.lucene.ChangeDocumentBuilder.getDocument(ChangeDocumentBuilder.java:124)
      @400000005660c0fc1dec0e9c 	at com.atlassian.confluence.search.lucene.tasks.AddChangeDocumentIndexTask.perform(AddChangeDocumentIndexTask.java:79)
      

      Notes

      Was able to reproduce in a customer instance and my own testing instance with a document provided to me, but could not create my own document that threw the same errors.

      Workaround

      Change to a .doc format, which will import successfully.

            [CONFCLOUD-53541] Import Word Document Fails Intermittently

            Hi everyone,

            This is Kateryna from the Confluence team. Thank you for previously raising this bug and bringing it to our attention.
            Within our company roadmap and work capacity, we try to address or review each bug request but admit that not each one will be resolved. To continue the culture of being honest and open, we are closing this bug to focus on our upcoming roadmap for all Confluence users.
            As we continue to roll out features we do look at requests made by our users and if you feel like this bug is still impacting your team please let us know by posting a comment and we will review.
            Thank you again for providing valuable feedback to our team!

            Best,
            Kateryna

            Kateryna Barmina added a comment - Hi everyone, This is Kateryna from the Confluence team. Thank you for previously raising this bug and bringing it to our attention. Within our company roadmap and work capacity, we try to address or review each bug request but admit that not each one will be resolved. To continue the culture of being honest and open, we are closing this bug to focus on our upcoming roadmap for all Confluence users. As we continue to roll out features we do look at requests made by our users and if you feel like this bug is still impacting your team please let us know by posting a comment and we will review. Thank you again for providing valuable feedback to our team! Best, Kateryna

            Hi Eric and everyone,

            This is Kateryna from the Confluence team. Thank you for raising this bug and bringing it to our attention. After our investigation into this issue, we weren’t able to reproduce the behavior within our instances. We would like to get to the root cause of what you’re experiencing. Could you provide some more details or a screenshot to help us narrow down this case?
            We will keep this issue open for another 14 days, but if we don’t hear back by then we will need to close this out to focus on our upcoming roadmap for all Confluence users.

            Best,
            Kateryna

            Kateryna Barmina added a comment - Hi Eric and everyone, This is Kateryna from the Confluence team. Thank you for raising this bug and bringing it to our attention. After our investigation into this issue, we weren’t able to reproduce the behavior within our instances. We would like to get to the root cause of what you’re experiencing. Could you provide some more details or a screenshot to help us narrow down this case? We will keep this issue open for another 14 days, but if we don’t hear back by then we will need to close this out to focus on our upcoming roadmap for all Confluence users. Best, Kateryna

            From some reason none of the workarounds work today.

            Angela Greaves added a comment - From some reason none of the workarounds work today.

            Saving a .docx file to .doc solved the issue for me. Still, since we're trying to import a large number of documents it would be really helpful if .docx could be imported as well. 

            Philip Steen added a comment - Saving a .docx file to .doc solved the issue for me. Still, since we're trying to import a large number of documents it would be really helpful if .docx could be imported as well. 

            The workaround suggested does not solve the problem and keeps giving the same error.

            Robert Smeets added a comment - The workaround suggested does not solve the problem and keeps giving the same error.

              Unassigned Unassigned
              efranklin Eric Franklin (Inactive)
              Affected customers:
              13 This affects my team
              Watchers:
              19 Start watching this issue

                Created:
                Updated:
                Resolved: