-
Bug
-
Resolution: Unresolved
-
Low
-
None
-
5.9.1, 5.8.10, 5.9.5, 5.9.7, 5.10.8
-
23
-
Severity 3 - Minor
-
2
-
NOTE: This bug report is for Confluence Server. Using Confluence Cloud? See the corresponding bug report.
Summary
In the current System Properties documentation there is a setting officeconnector.textextract.word.docxmaxsize, and this is ignored by Confluence when set in setenv.sh.
Environment
- Confluence 5.8.x or Confluence 5.9.x
Steps to Reproduce
- Add to setenv.sh
setenv.sh
CATALINA_OPTS="-Dofficeconnector.textextract.word.docxmaxsize=1000 ${CATALINA_OPTS}"
- Insert the attached file lorum.docx to a page.
- Check the logs, the error
atlassian-confluence.log
com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word XML document: Document too big for text extraction, bailing out
does not appear.
Expected Results
The error
com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word XML document: Document too big for text extraction, bailing out
should appear as the file size is now 1K.
Actual Results
The file is indexed correctly.
Notes
For the attached file you will need to increase your Java Heap Space in setenv.sh to something like -Xmx8192m.
The value officeconnector.textextract.word.docxmaxsize is referenced in com.atlassian.confluence.extra.officeconnector.index.word#WordXMLTextExtractor.java as
private static final long MAX_XML_SIZE = Long.getLong("officeconnector.textextract.word.docxmaxsize", 1024 * 1024 * 16); // Maximum, 16Mb of XML text, more than enough for most files ... if (finalSize > MAX_XML_SIZE) { throw new ExtractorException("Document too big for text extraction, bailing out"); }
Workaround
There is no workaround.
- is related to
-
CONFSERVER-40914 Check size of attachment before content indexing
-
- Closed
-
- relates to
-
AI-206 Confluence ignores the system property officeconnector.textextract.word.docxmaxsize
-
- Closed
-
- Testing discovered
-
CONFSERVER-40432 Filter Out All Media Files from Microsoft Word Documents to Improve Indexing in Confluence
- Closed
- mentioned in
-
Page No Confluence page found with the given URL.
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
Same issue here on Confluence 5.9.10