Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-24885

WordTextExtractor assumes all DOT files are word templates and logs errors

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Low
    • None
    • 4.1, 5.1.3
    • None

    Description

      One of our users uploaded a file with a .dot extension to Confluence. The file is not a word template. (In this case it was a http://en.wikipedia.org/wiki/DOT_language file). The extractor should really go to more effort to detect the type of a file before just assuming based on file extension and then logging stack traces like this one:

      2012-03-08 23:36:00,087 WARN [scheduler_Worker-5] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: orgtree.dot v.1 (1973452911) jp
      olley)
      com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded.
              at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:41)
              at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
              at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36)
              at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
              at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97)
              at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
              at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40)
              at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44)
              at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331)
              at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20)
              at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424)
              at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405)
              at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197)
              at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149)
              at sun.reflect.GeneratedMethodAccessor1860.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
              at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
              at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
              at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
              at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
              at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
              at $Proxy44.flushQueue(Unknown Source)
              at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30)
              at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63)
              at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46)
              at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
              at org.quartz.core.JobRunShell.run(JobRunShell.java:199)
              at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20)
              at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
      Caused by: com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded.
              at com.aspose.words.Document.a(Unknown Source)
              at com.aspose.words.Document.b(Unknown Source)
              at com.aspose.words.Document.a(Unknown Source)
              at com.aspose.words.Document.<init>(Unknown Source)
              at com.aspose.words.Document.<init>(Unknown Source)
              at com.aspose.words.Document.<init>(Unknown Source)
              at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37)
              ... 30 more
      

      Attachments

        Activity

          People

            shaffenden Steve Haffenden (Inactive)
            don.willis@atlassian.com Don Willis
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: