Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-39892

PDF extractor throws data format exception error in logs

    XMLWordPrintable

Details

    Description

      NOTE: This bug report is for Confluence Server. Using Confluence Cloud? See the corresponding bug report.

      The pdf indexer throws a lot of error messages when indexing pdf files.

      ERROR [Indexer: 3] [apache.pdfbox.filter.FlateFilter] decode FlateFilter: stop reading corrupt stream due to a DataFormatException
      

      This is probably caused by a bug in the pdfbox.
      https://issues.apache.org/jira/browse/PDFBOX-2497

      The bug above is fixed in 1.8.8 although we are using 1.8.10 and still seeing the error message. it can possibly be a regression.

      Workaround :

      Do note that this workaround is only tested in small instances and if you're facing any issues after applying this, restore back the PDFBOX version to the default bundled version and clear the plugin cache with a restart.
      This is only applicable if your PDFBOX version is 1.8.x.

      1. Download this PDFBOX version 1.8.12 here
      2. Shutdown Confluence
      3. Go to <Confluence Installation Directory>\confluence\WEB-INF\lib and search for PDFBOX 1.8.xx jar file. Remove the jar file and keep it somewhere in a non-Confluence folder.
        It is important not to leave two versions of the same plugin jar file in the installation directory as all of them will be deployed upon start up.
      4. Insert the PDFBOX 1.8.12 version here.
      5. Clear the plugin cache
      6. Start Confluence

      The errors will not appear again after a content index.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rgadami Rodrigo Girardi Adami
              Votes:
              11 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: