-
Bug
-
Resolution: Low Engagement
-
Low
-
None
-
5.9.1, 5.8.10, 5.9.5, 5.9.7, 5.10.8
-
23
-
Severity 3 - Minor
-
2
-
NOTE: This bug report is for Confluence Server. Using Confluence Cloud? See the corresponding bug report.
Summary
In the current System Properties documentation there is a setting officeconnector.textextract.word.docxmaxsize, and this is ignored by Confluence when set in setenv.sh.
Environment
- Confluence 5.8.x or Confluence 5.9.x
Steps to Reproduce
- Add to setenv.sh
setenv.sh
CATALINA_OPTS="-Dofficeconnector.textextract.word.docxmaxsize=1000 ${CATALINA_OPTS}"
- Insert the attached file lorum.docx to a page.
- Check the logs, the error
atlassian-confluence.log
com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word XML document: Document too big for text extraction, bailing out
does not appear.
Expected Results
The error
com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word XML document: Document too big for text extraction, bailing out
should appear as the file size is now 1K.
Actual Results
The file is indexed correctly.
Notes
For the attached file you will need to increase your Java Heap Space in setenv.sh to something like -Xmx8192m.
The value officeconnector.textextract.word.docxmaxsize is referenced in com.atlassian.confluence.extra.officeconnector.index.word#WordXMLTextExtractor.java as
private static final long MAX_XML_SIZE = Long.getLong("officeconnector.textextract.word.docxmaxsize", 1024 * 1024 * 16); // Maximum, 16Mb of XML text, more than enough for most files ... if (finalSize > MAX_XML_SIZE) { throw new ExtractorException("Document too big for text extraction, bailing out"); }
Workaround
There is no workaround.
- is related to
-
CONFSERVER-40914 Check size of attachment before content indexing
-
- Closed
-
- relates to
-
AI-206 Confluence ignores the system property officeconnector.textextract.word.docxmaxsize
-
- Closed
-
- Testing discovered
-
CONFSERVER-40432 Filter Out All Media Files from Microsoft Word Documents to Improve Indexing in Confluence
- Closed
- mentioned in
-
Page No Confluence page found with the given URL.
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
[CONFSERVER-40176] Confluence ignores the system property officeconnector.textextract.word.docxmaxsize
QA Demo Status | New: Not Needed [ 14332 ] | |
QA Kickoff Status | New: Not Needed [ 14236 ] | |
Resolution | New: Low Engagement [ 10300 ] | |
Status | Original: Gathering Impact [ 12072 ] | New: Closed [ 6 ] |
Labels | Original: affects-cloud affects-server office-connector p20 | New: affects-cloud affects-server cleanup-seos-fy25 office-connector p20 |
UIS | Original: 3 | New: 2 |
Remote Link | New: This issue links to "Page (Confluence)" [ 682354 ] |
UIS | Original: 2 | New: 3 |
UIS | Original: 3 | New: 2 |
UIS | Original: 2 | New: 3 |
UIS | Original: 3 | New: 2 |
UIS | Original: 2 | New: 3 |
UIS | Original: 3 | New: 2 |