[AI-396] PDF extractor throws data format exception error in logs - Create and track feature requests for Atlassian products.

Type: Bug
Resolution: Fixed
Priority: Medium
Component/s: Admin - General, Search - Core (DO NOT USE)
Labels:

Symptom Severity:
Severity 3 - Minor
UIS:
56

NOTE: This bug report is for Confluence Cloud. Using Confluence Server? See the corresponding bug report.

The pdf indexer throws a lot of error messages when indexing pdf files.

ERROR [Indexer: 3] [apache.pdfbox.filter.FlateFilter] decode FlateFilter: stop reading corrupt stream due to a DataFormatException

This is probably caused by a bug in the pdfbox.
https://issues.apache.org/jira/browse/PDFBOX-2497

The bug above is fixed in 1.8.8 although we are using 1.8.10 and still seeing the error message. it can possibly be a regression.

Workaround :

Do note that this workaround is only tested in small instances and if you're facing any issues after applying this, restore back the PDFBOX version to the default bundled version and clear the plugin cache with a restart.
This is only applicable if your PDFBOX version is 1.8.x.

Download this PDFBOX version 1.8.12 here
Shutdown Confluence
Go to <Confluence Installation Directory>\confluence\WEB-INF\lib and search for PDFBOX 1.8.xx jar file. Remove the jar file and keep it somewhere in a non-Confluence folder.
It is important not to leave two versions of the same plugin jar file in the installation directory as all of them will be deployed upon start up.
Insert the PDFBOX 1.8.12 version here.
Clear the plugin cache
- https://confluence.atlassian.com/display/CONFKB/How+to+clear+Confluence+plugins+cache
Start Confluence

The errors will not appear again after a content index.

is related to

CONFSERVER-39892 PDF extractor throws data format exception error in logs

Closed

Assignee:: Unassigned

Reporter:: Rodrigo Girardi Adami

Affected customers:: 11 This affects my team

Watchers:: 20 Start watching this issue

Created:: 12/Nov/2015 5:03 PM

Updated:: 10/Apr/2024 3:36 AM

Resolved:: 02/Sep/2016 12:40 AM

Details

Description

Workaround :

Attachments

Issue Links

Forms

Activity

People

Dates