Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-26926

Error indexing attachment / .csv files

    XMLWordPrintable

Details

    Description

      Symptoms

      I think that closed issue CONF-18733 is not really resolved. We have many entries in our log showing these kind of issues:

      2012-10-12 22:30:34,235 WARN [Indexer: 1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: meta_mailinfo_sec01.csv v.1 (59509056) g6922)
      com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0
      	at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103)
      	at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
      	at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36)
      	at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
      	at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97)
      	at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
      	at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73)
      	at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43)
      	at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21)
      	at com.atlassian.confluence.search.lucene.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:125)
      	at com.atlassian.confluence.search.lucene.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:86)
      	at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128)
      	at com.atlassian.confluence.search.lucene.ReindexWorkBatch.run(ReindexWorkBatch.java:56)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: java.io.IOException: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0
      	at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:120)
      	at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151)
      	at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:89)
      	... 18 more
      

      Workaround

      The underlying issue was partially resolved in Confluence 3.5.11, so the warning message does not occur for many csv files uploaded in Confluence 3.5.1.0 or above. However, any csv files that were uploaded in versions older that Confluence 3.5.11 will still have the incorrect Content-type, as do some csvs that were uploaded post 3.5.11, and when Confluence performs a re-index those warning messages still occur.

      To fix the Content-type and resolve the warning message for older documents, you will need to run this command against your database:

      UPDATE PUBLIC.ATTACHMENTS SET CONTENTTYPE = 'text/csv' WHERE TITLE LIKE '%.csv' AND CONTENTTYPE LIKE '%excel%';
      

      For more information, please see this KB article.

      Resolution

      This issue is resolved in Confluence 6.0.6. To get the fix, upgrade to that version.

      NB: The fix contains two parts - an upgrade task that changes the 'media_type' for all csv attachments to 'text/csv', no matter what they were previously, and a change to intercept save on any csv attachment and enforce media_type of 'text/csv'. If for any reason the upgrade task needs to be run again (eg after a space import which has attachments with incorrect media_types, it can be run manually by going to <your-confluence-url>/admin/force-upgrade.action, and selecting 'CorrectCsvAttachmentMimeTypeUpgradeTask'.

      Attachments

        1. xy.csv
          0.0 kB

        Issue Links

          Activity

            People

              dunterwurzacher Denise Unterwurzacher [Atlassian] (Inactive)
              9168e3332326 Michael Michael
              Votes:
              9 Vote for this issue
              Watchers:
              41 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: