Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-20594

Use Apache POI returned information to attempt to index Office 2007 where incorrect extension was used

    XMLWordPrintable

Details

    • We collect Confluence feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

    Description

      NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.

      I carried out a test upgrade from Confluence 3.0.2 to 3.3 over the weekend, and noticed that the re-index threw over 2000 errors relating to attachments. Some of them were problematic PDFs, and I've voted on CONF-18962 to get those resolved.
      However, the vast majority of issues were relating to .xls and .csv files not being properly indexed.
      In many of the cases the following appeared:

      org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
      

      ,and POI is correct. We use a tool called FlorenceSoft DiffEngineX to carry out diffs between Excel documents, and this inexplicably creates Office 2007 format output, but mistakenly used a .xls (instead of .xlsx) extension.
      I'm not aware of any other tools that make this mistake, but I'm sure we're not the only ones who have content saved with the wrong extension. Considering POI is able to guess that it might be Office 2007 content, perhaps Confluence could capture the error, and try to re-index the documents as Excel 2007? It would be fantastic, and I'd really appreciate it.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              b0d88db9bee7 David Corley
              Votes:
              3 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: