While reindexing, the pdf extractor can report this error:

      java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider

      at org.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:905)

      at org.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:489)

      at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:46)

      at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:31)

      at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:28)

      at com.atlassian.confluence.search.lucene.ConfluenceObjectToDocumentConverter.convert(ConfluenceObjectToDocumentConverter.java:20)

      at com.atlassian.confluence.search.lucene.ConfluenceObjectQueue$1.indexCollection(ConfluenceObjectQueue.java:75)

      at com.atlassian.bonnie.index.QueueProcessingRunnableImpl.run(QueueProcessingRunnableImpl.java:39)

      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

      at java.lang.reflect.Method.invoke(Method.java:585)

      at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:284)

      at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:155)

      at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:122)

      at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:56)

      at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:144)

      at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:174)

      at $Proxy62.run(Unknown Source)

      at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:987)

      at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:528)

      at java.lang.Thread.run(Thread.java:595)

        1. pdfbox-0.7.2.jar
          3.12 MB
          Tom Davies

            [CONFSERVER-8580] Indexing unprintable/encrypted PDFs fails

            The root cause of this problem is that we were not catching the exception, this has been fixed under CONF-8608

            Tom Davies added a comment - The root cause of this problem is that we were not catching the exception, this has been fixed under CONF-8608

            More importantly that this one issue - why does an error with indexing one document affect the whole indexing process?

            Has this error been fixed?

            Scott Farquhar added a comment - More importantly that this one issue - why does an error with indexing one document affect the whole indexing process? Has this error been fixed?

            This issue can silently stop the indexing process. Replacing the jar fixes that problem.

            m@ (Inactive) added a comment - This issue can silently stop the indexing process. Replacing the jar fixes that problem.

            A couple of the Support Case's that exhibit this error suggest that this error maybe causing the indexing process to simply stop without feedback to the user at all.

            m@ (Inactive) added a comment - A couple of the Support Case's that exhibit this error suggest that this error maybe causing the indexing process to simply stop without feedback to the user at all.

            Tom Davies added a comment -

            In fact the version of pdfbox in 2.5 doesn't correctly extract text from unprintable PDFs – we need to roll back to 0.7.2

            The workaround for this bug is to replace pdfbox-0.7.3.jar in WEB-INF/lib with the pdfbox-0.7.2.jar attached to this issue.

            Tom Davies added a comment - In fact the version of pdfbox in 2.5 doesn't correctly extract text from unprintable PDFs – we need to roll back to 0.7.2 The workaround for this bug is to replace pdfbox-0.7.3.jar in WEB-INF/lib with the pdfbox-0.7.2.jar attached to this issue.

            BouncyCastle is a dependency of PDFBox which is needed to open encrypted PDFs.

            Until this issue is resolved you can download the jar from this page:
            http://www.bouncycastle.org/latest_releases.html

            m@ (Inactive) added a comment - BouncyCastle is a dependency of PDFBox which is needed to open encrypted PDFs. Until this issue is resolved you can download the jar from this page: http://www.bouncycastle.org/latest_releases.html

              tom@atlassian.com Tom Davies
              mjensen m@ (Inactive)
              Affected customers:
              0 This affects my team
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: