-
Bug
-
Resolution: Fixed
-
Medium
-
Severity 3 - Minor
-
56
NOTE: This bug report is for Confluence Cloud. Using Confluence Server? See the corresponding bug report.
The pdf indexer throws a lot of error messages when indexing pdf files.
ERROR [Indexer: 3] [apache.pdfbox.filter.FlateFilter] decode FlateFilter: stop reading corrupt stream due to a DataFormatException
This is probably caused by a bug in the pdfbox.
https://issues.apache.org/jira/browse/PDFBOX-2497
The bug above is fixed in 1.8.8 although we are using 1.8.10 and still seeing the error message. it can possibly be a regression.
Workaround :
Do note that this workaround is only tested in small instances and if you're facing any issues after applying this, restore back the PDFBOX version to the default bundled version and clear the plugin cache with a restart.
This is only applicable if your PDFBOX version is 1.8.x.
- Download this PDFBOX version 1.8.12 here
- Shutdown Confluence
- Go to <Confluence Installation Directory>\confluence\WEB-INF\lib and search for PDFBOX 1.8.xx jar file. Remove the jar file and keep it somewhere in a non-Confluence folder.
It is important not to leave two versions of the same plugin jar file in the installation directory as all of them will be deployed upon start up.
- Insert the PDFBOX 1.8.12 version here.
- Clear the plugin cache
- Start Confluence
The errors will not appear again after a content index.
- is related to
-
CONFSERVER-39892 PDF extractor throws data format exception error in logs
-
- Closed
-
[AI-396] PDF extractor throws data format exception error in logs
Component/s | Original: Search - Core [ 46383 ] | |
Component/s | Original: Integrations - Office Macros [ 46351 ] | |
Component/s | New: Search - Core [ 75296 ] | |
Component/s | New: Admin Experience [ 74216 ] | |
Fix Version/s | Original: 5.10.4 [ 68162 ] | |
Key |
Original:
|
New:
|
Support reference count | Original: 19 | |
Symptom Severity | Original: Severity 2 - Major [ 14431 ] | New: Severity 3 - Minor [ 14432 ] |
Affects Version/s | Original: 5.10.0 [ 68013 ] | |
Affects Version/s | Original: 5.9.5 [ 67959 ] | |
Affects Version/s | Original: 5.9.2 [ 67894 ] | |
Affects Version/s | Original: 5.8.15 [ 67883 ] | |
Project | Original: Confluence Cloud [ 18513 ] | New: Atlassian Intelligence [ 23110 ] |
Workflow | Original: Confluence Workflow - Public Facing - Restricted v5 - TEMP [ 2365018 ] | New: JAC Bug Workflow v3 [ 3405481 ] |
Status | Original: Resolved [ 5 ] | New: Closed [ 6 ] |
Workflow | Original: Confluence Workflow - Public Facing - Restricted v5 [ 2236634 ] | New: Confluence Workflow - Public Facing - Restricted v5 - TEMP [ 2365018 ] |
Workflow | Original: Confluence Workflow - Public Facing - Restricted v5.1 - TEMP [ 2200776 ] | New: Confluence Workflow - Public Facing - Restricted v5 [ 2236634 ] |
Workflow | Original: Confluence Workflow - Public Facing - Restricted v5 - TEMP [ 2147592 ] | New: Confluence Workflow - Public Facing - Restricted v5.1 - TEMP [ 2200776 ] |
Workflow | Original: Confluence Workflow - Public Facing - Restricted v5 [ 1895865 ] | New: Confluence Workflow - Public Facing - Restricted v5 - TEMP [ 2147592 ] |
Workflow | Original: Confluence Workflow - Public Facing - Restricted v3 [ 1793179 ] | New: Confluence Workflow - Public Facing - Restricted v5 [ 1895865 ] |
Description |
Original:
The pdf indexer throws a lot of error messages when indexing pdf files. {code} ERROR [Indexer: 3] [apache.pdfbox.filter.FlateFilter] decode FlateFilter: stop reading corrupt stream due to a DataFormatException {code} This is probably caused by a bug in the pdfbox. https://issues.apache.org/jira/browse/PDFBOX-2497 The bug above is fixed in 1.8.8 although we are using 1.8.10 and still seeing the error message. it can possibly be a regression. h3. Workaround : (!) Do note that this workaround is only tested in small instances and if you're facing any issues after applying this, restore back the PDFBOX version to the default bundled version and clear the plugin cache with a restart. (!) This is only applicable if your PDFBOX version is 1.8.x. # Download this [PDFBOX version 1.8.12 here|http://search.maven.org/remotecontent?filepath=org/apache/pdfbox/pdfbox/1.8.12/pdfbox-1.8.12.jar] # Shutdown Confluence # Go to {{<Confluence Installation Directory>\confluence\WEB-INF\lib}} and search for PDFBOX 1.8.xx jar file. Remove the jar file and keep it somewhere in a non-Confluence folder. (!) It is important not to leave two versions of the same plugin jar file in the installation directory as all of them will be deployed upon start up. # Insert the PDFBOX 1.8.12 version here. # Clear the plugin cache #- https://confluence.atlassian.com/display/CONFKB/How+to+clear+Confluence+plugins+cache # Start Confluence The errors will not appear again after a content index. |
New:
{panel:bgColor=#e7f4fa} *NOTE:* This bug report is for *Confluence Cloud*. Using *Confluence Server*? [See the corresponding bug report|http://jira.atlassian.com/browse/CONFSERVER-39892]. {panel} The pdf indexer throws a lot of error messages when indexing pdf files. {code} ERROR [Indexer: 3] [apache.pdfbox.filter.FlateFilter] decode FlateFilter: stop reading corrupt stream due to a DataFormatException {code} This is probably caused by a bug in the pdfbox. https://issues.apache.org/jira/browse/PDFBOX-2497 The bug above is fixed in 1.8.8 although we are using 1.8.10 and still seeing the error message. it can possibly be a regression. h3. Workaround : (!) Do note that this workaround is only tested in small instances and if you're facing any issues after applying this, restore back the PDFBOX version to the default bundled version and clear the plugin cache with a restart. (!) This is only applicable if your PDFBOX version is 1.8.x. # Download this [PDFBOX version 1.8.12 here|http://search.maven.org/remotecontent?filepath=org/apache/pdfbox/pdfbox/1.8.12/pdfbox-1.8.12.jar] # Shutdown Confluence # Go to {{<Confluence Installation Directory>\confluence\WEB-INF\lib}} and search for PDFBOX 1.8.xx jar file. Remove the jar file and keep it somewhere in a non-Confluence folder. (!) It is important not to leave two versions of the same plugin jar file in the installation directory as all of them will be deployed upon start up. # Insert the PDFBOX 1.8.12 version here. # Clear the plugin cache #- https://confluence.atlassian.com/display/CONFKB/How+to+clear+Confluence+plugins+cache # Start Confluence The errors will not appear again after a content index. |
Link |
New:
This issue is related to |
Project Import | New: Sat Apr 01 14:06:06 UTC 2017 [ 1491055566265 ] |