When indexing, we are seeing this warning:

      2011-04-19 22:29:48,063 WARN [Indexer: 4] [apache.pdfbox.util.PDFStreamEngine] processOperator java.io.IOException: Error: expected hex character and not :32
      - url: /admin/reindex.action | userName: admin | referer: https://confluenceurl/admin/search-indexes.action | action: reindex
      java.io.IOException: Error: expected hex character and not :32
      

      Which is a bug in PDFBox 1.2.1 and has been fixed in 1.3.1: https://issues.apache.org/jira/browse/PDFBOX-790

            [CONFSERVER-22358] Upgrade PDFBox to 1.3.1

            Thanks Steve,

            Can you please tell me how to fix it?
            What version incorporates this fix?

            Cheers,
            Leon

            Leon Kolchinsky added a comment - Thanks Steve, Can you please tell me how to fix it? What version incorporates this fix? Cheers, Leon

            Michael S added a comment -

            For anyone else seeing errors with PDF content indexing the file names themselves should be still be indexed. This only affects indexing of the content within PDF files.

            Michael S added a comment - For anyone else seeing errors with PDF content indexing the file names themselves should be still be indexed. This only affects indexing of the content within PDF files.

            Hello Lachlan,

            From what I can see, .pdf files aren't indexed which is bad!
            This should be fixed sooner rather than later.

            Cheers,
            Leon

            Leon Kolchinsky added a comment - Hello Lachlan, From what I can see, .pdf files aren't indexed which is bad! This should be fixed sooner rather than later. Cheers, Leon

            lachland added a comment -

            Hi Leon, we've had some issues appear upon updating the version, so we've rolled back the change for now and are going to see if the update is still viable.

            lachland added a comment - Hi Leon, we've had some issues appear upon updating the version, so we've rolled back the change for now and are going to see if the update is still viable.

            Leon Kolchinsky added a comment - - edited

            Here is a log snapshot:

            2011-12-14 16:22:17,313 WARN [Indexer: 1] [apache.pdfbox.util.PDFStreamEngine] processOperator java.io.IOException: Error: expected hex character and not  :32
            java.io.IOException: Error: expected hex character and not  :32
                    at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:336)
                    at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:139)
                    at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:556)
                    at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:390)
                    at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:386)
                    at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
                    at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:567)
                    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:250)
                    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208)
                    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:378)
                    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:302)
                    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:258)
                    at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50)
                    at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
                    at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36)
                    at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
                    at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97)
                    at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
                    at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73)
                    at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43)
                    at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:128)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:88)
                    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.run(ReindexWorkBatch.java:58)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
                    at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
                    at java.util.concurrent.FutureTask.run(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                    at java.lang.Thread.run(Unknown Source)
            2011-12-14 16:22:17,314 WARN [Indexer: 1] [apache.pdfbox.util.PDFStreamEngine] processOperator java.io.IOException: Error: expected hex character and not  :32
            java.io.IOException: Error: expected hex character and not  :32
                    at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:336)
                    at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:139)
                    at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:556)
                    at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:390)
                    at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:386)
                    at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
                    at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:567)
                    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:250)
                    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208)
                    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:378)
                    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:302)
                    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:258)
                    at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50)
                    at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
                    at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36)
                    at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
                    at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97)
                    at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
                    at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73)
                    at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43)
                    at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:128)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:88)
                    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.run(ReindexWorkBatch.java:58)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
                    at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
                    at java.util.concurrent.FutureTask.run(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                    at java.lang.Thread.run(Unknown Source)
            2011-12-14 16:22:17,314 WARN [Indexer: 1] [apache.pdfbox.util.PDFStreamEngine] processOperator java.io.IOException: Error: expected hex character and not  :32
            java.io.IOException: Error: expected hex character and not  :32
                    at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:336)
                    at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:139)
                    at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:556)
                    at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:390)
                    at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:386)
                    at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
                    at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:567)
                    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:250)
                    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208)
                    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:378)
                    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:302)
                    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:258)
                    at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50)
                    at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
                    at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36)
                    at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
                    at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97)
                    at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
                    at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73)
                    at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43)
                    at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:128)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:88)
                    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.run(ReindexWorkBatch.java:58)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
                    at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
                    at java.util.concurrent.FutureTask.run(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                    at java.lang.Thread.run(Unknown Source)
            2011-12-14 16:22:17,315 WARN [Indexer: 1] [apache.pdfbox.util.PDFStreamEngine] processOperator java.io.IOException: Error: expected hex character and not  :32
            java.io.IOException: Error: expected hex character and not  :32
                    at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:336)
                    at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:139)
                    at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:556)
                    at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:390)
                    at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:386)
                    at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
                    at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:567)
                    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:250)
                    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208)
                    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:378)
                    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:302)
                    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:258)
                    at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50)
                    at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
                    at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36)
                    at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
                    at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97)
                    at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
                    at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73)
                    at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43)
                    at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:128)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:88)
                    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.run(ReindexWorkBatch.java:58)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
                    at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
                    at java.util.concurrent.FutureTask.run(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                    at java.lang.Thread.run(Unknown Source)
            2011-12-14 16:22:17,315 WARN [Indexer: 1] [apache.pdfbox.util.PDFStreamEngine] processOperator java.io.IOException: Error: expected hex character and not  :32
            java.io.IOException: Error: expected hex character and not  :32
                    at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:336)
                    at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:139)
                    at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:556)
                    at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:390)
                    at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:386)
                    at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
                    at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:567)
                    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:250)
                    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208)
                    at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:378)
                    at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:302)
                    at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:258)
                    at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50)
                    at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
                    at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36)
                    at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
                    at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97)
                    at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
                    at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73)
                    at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43)
                    at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:128)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:88)
                    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128)
                    at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.run(ReindexWorkBatch.java:58)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
                    at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
                    at java.util.concurrent.FutureTask.run(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                    at java.lang.Thread.run(Unknown Source)
            

            Leon Kolchinsky added a comment - - edited Here is a log snapshot: 2011-12-14 16:22:17,313 WARN [Indexer: 1] [apache.pdfbox.util.PDFStreamEngine] processOperator java.io.IOException: Error: expected hex character and not :32 java.io.IOException: Error: expected hex character and not :32 at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:336) at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:139) at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:556) at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:390) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:386) at org.apache.pdfbox.util. operator .ShowText.process(ShowText.java:45) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:567) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:250) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:378) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:302) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:258) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:128) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:88) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.run(ReindexWorkBatch.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) 2011-12-14 16:22:17,314 WARN [Indexer: 1] [apache.pdfbox.util.PDFStreamEngine] processOperator java.io.IOException: Error: expected hex character and not :32 java.io.IOException: Error: expected hex character and not :32 at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:336) at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:139) at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:556) at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:390) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:386) at org.apache.pdfbox.util. operator .ShowText.process(ShowText.java:45) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:567) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:250) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:378) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:302) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:258) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:128) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:88) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.run(ReindexWorkBatch.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) 2011-12-14 16:22:17,314 WARN [Indexer: 1] [apache.pdfbox.util.PDFStreamEngine] processOperator java.io.IOException: Error: expected hex character and not :32 java.io.IOException: Error: expected hex character and not :32 at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:336) at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:139) at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:556) at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:390) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:386) at org.apache.pdfbox.util. operator .ShowText.process(ShowText.java:45) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:567) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:250) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:378) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:302) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:258) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:128) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:88) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.run(ReindexWorkBatch.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) 2011-12-14 16:22:17,315 WARN [Indexer: 1] [apache.pdfbox.util.PDFStreamEngine] processOperator java.io.IOException: Error: expected hex character and not :32 java.io.IOException: Error: expected hex character and not :32 at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:336) at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:139) at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:556) at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:390) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:386) at org.apache.pdfbox.util. operator .ShowText.process(ShowText.java:45) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:567) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:250) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:378) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:302) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:258) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:128) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:88) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.run(ReindexWorkBatch.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) 2011-12-14 16:22:17,315 WARN [Indexer: 1] [apache.pdfbox.util.PDFStreamEngine] processOperator java.io.IOException: Error: expected hex character and not :32 java.io.IOException: Error: expected hex character and not :32 at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:336) at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:139) at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:556) at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:390) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:386) at org.apache.pdfbox.util. operator .ShowText.process(ShowText.java:45) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:567) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:250) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:208) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:378) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:302) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:258) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:128) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:88) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128) at com.atlassian.confluence.search.lucene.reindex.ReindexWorkBatch.run(ReindexWorkBatch.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source)

            Fixed where?
            I've installed the latest 4.1 version and still getting a bunch of those errors for every PDF file Confluence indexes.
            And PDFBox version is still 1.2.1 ;(

            Leon Kolchinsky added a comment - Fixed where? I've installed the latest 4.1 version and still getting a bunch of those errors for every PDF file Confluence indexes. And PDFBox version is still 1.2.1 ;(

            QA'd this locally using a PDF that was previously giving this exception when indexing, all good now.

            Steve Lancashire (Inactive) added a comment - QA'd this locally using a PDF that was previously giving this exception when indexing, all good now.

            Thanks Lachlan,

            I can see that this issue is "In Progress".
            Is anyone going to be assign to this in a near future?

            Cheers,
            Leon

            Leon Kolchinsky added a comment - Thanks Lachlan, I can see that this issue is "In Progress". Is anyone going to be assign to this in a near future? Cheers, Leon

            lachland added a comment -

            Note: upgrading to 1.6.0 rather than 1.3.1.

            lachland added a comment - Note: upgrading to 1.6.0 rather than 1.3.1.

            Hi,

            I've simply replaced pdfbox-1.2.1.jar with pdfbox-1.6.0.jar (under confluence-3.5.13-std/confluence/WEB-INF/lib/) and run reindex again.
            Though I don't know if it will break anything else.....

            Leon Kolchinsky added a comment - Hi, I've simply replaced pdfbox-1.2.1.jar with pdfbox-1.6.0.jar (under confluence-3.5.13-std/confluence/WEB-INF/lib/) and run reindex again. Though I don't know if it will break anything else.....

              slancashire Steve Lancashire (Inactive)
              rhartono Roy Hartono [Atlassian]
              Affected customers:
              17 This affects my team
              Watchers:
              14 Start watching this issue

                Created:
                Updated:
                Resolved: