Details
-
Bug
-
Resolution: Won't Fix
-
Low
-
None
-
3.4.2, 3.4.6, 3.5, 3.5.4
Description
Summary of the Bug
Indexer is not able to index/extract RTF documents which is generated by "ГАРАНТ" (Russian government legal documents base).
The following stack trace is recorded on logs
2011-05-20 22:29:28,850 WARN [Indexer: 2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: 110-п_от_15_05_2009_Постановление_Правительства_Ханты-Мансийского_АО_-_Югры.rtf v.1 (1179649) adminconf) -- referer: http://localhost:8354/admin/search-indexes.action | url: /admin/reindex.action | userName: adminconf | action: reindex com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded. at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:41) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:78) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:62) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:51) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded. at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.b(Unknown Source) at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37) ... 16 more Caused by: java.lang.NullPointerException: style at asposewobfuscated.am.c(Unknown Source) at com.aspose.words.aav.a(Unknown Source) at com.aspose.words.wp.a(Unknown Source) at com.aspose.words.wp.d(Unknown Source) at com.aspose.words.fq.gg(Unknown Source) at com.aspose.words.fq.d(Unknown Source) at com.aspose.words.fq.read(Unknown Source) ... 22 more
Steps to Reproduce
- Download the attached file
- Attach into Confluence
- Wait for a minute (indexer run every minute)
- Check atlassian-confluence.log
Steps to create the bad RTF document
- Go to http://english.garant.ru/
- Open demo version
- Open any full text available document.
- Press "Export to word button"
Workaround
- Open the problematic document on Microsoft Office
- Re-save the problematic document on Microsoft Office
- Re-attached
Attachments
Issue Links
- mentioned in
-
Page Loading...