Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-37917

Bug in the pdfbox plugin causes OOM Heap space

    XMLWordPrintable

Details

    Description

      Confluence is throwing this error message in the logs:

      2015-06-11 08:24:18,444 WARN [Indexer: 4] [apache.pdfbox.cos.COSDocument] getObjectsByType java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
      - referer: http://URL/admin/search-indexes.action | url: /admin/reindex.action | userName:user | action: reindex
      java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
      at org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:294)
      at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:656)
      at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1219)
      at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:59)
      at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:41)
      

      and It seems to be throwing an out of memory for the indexer due to this bug as well:

      - referer: http://URL/admin/search-indexes.action | url: /admin/reindex.action | userName: user | action: reindex
      java.lang.OutOfMemoryError: Java heap space
      at java.util.Arrays.copyOf(Unknown Source)
      at java.io.ByteArrayOutputStream.grow(Unknown Source)
      at java.io.ByteArrayOutputStream.ensureCapacity(Unknown Source)
      at java.io.ByteArrayOutputStream.write(Unknown Source)
      at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:172)
      at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98)
      at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:308)
      at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248)
      at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183)
      at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:107)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
      at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
      at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
      at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
      at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
      

      This is caused by a bug in the PDFBOX stated here: https://issues.apache.org/jira/browse/PDFBOX-1756

      Confluence 5.7.3, 5.8.2 and 5.8.4 ships the version 1.8.4 of pdfbox, which is affected by the bug.

      Workaround

      1) Disable the indexing of PDF attachments using this guide
      OR
      2) Update the pdfbox plugin manually in Confluence_install\confluence\WEB-INF\lib folder by replacing the original pdf plugin with a version 1.8.6 or newer. Download the newer version here

      Attachments

        Activity

          People

            Unassigned Unassigned
            rgadami Rodrigo Girardi Adami
            Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: