-
Bug
-
Resolution: Low Engagement
-
Low
-
None
-
5.9.1, 5.8.10, 5.9.5, 5.9.7, 5.10.8
-
23
-
Severity 3 - Minor
-
2
-
NOTE: This bug report is for Confluence Server. Using Confluence Cloud? See the corresponding bug report.
Summary
In the current System Properties documentation there is a setting officeconnector.textextract.word.docxmaxsize, and this is ignored by Confluence when set in setenv.sh.
Environment
- Confluence 5.8.x or Confluence 5.9.x
Steps to Reproduce
- Add to setenv.sh
setenv.sh
CATALINA_OPTS="-Dofficeconnector.textextract.word.docxmaxsize=1000 ${CATALINA_OPTS}"
- Insert the attached file lorum.docx to a page.
- Check the logs, the error
atlassian-confluence.log
com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word XML document: Document too big for text extraction, bailing out
does not appear.
Expected Results
The error
com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word XML document: Document too big for text extraction, bailing out
should appear as the file size is now 1K.
Actual Results
The file is indexed correctly.
Notes
For the attached file you will need to increase your Java Heap Space in setenv.sh to something like -Xmx8192m.
The value officeconnector.textextract.word.docxmaxsize is referenced in com.atlassian.confluence.extra.officeconnector.index.word#WordXMLTextExtractor.java as
private static final long MAX_XML_SIZE = Long.getLong("officeconnector.textextract.word.docxmaxsize", 1024 * 1024 * 16); // Maximum, 16Mb of XML text, more than enough for most files ... if (finalSize > MAX_XML_SIZE) { throw new ExtractorException("Document too big for text extraction, bailing out"); }
Workaround
There is no workaround.
- is related to
-
CONFSERVER-40914 Check size of attachment before content indexing
-
- Closed
-
- relates to
-
AI-206 Confluence ignores the system property officeconnector.textextract.word.docxmaxsize
-
- Closed
-
- Testing discovered
-
CONFSERVER-40432 Filter Out All Media Files from Microsoft Word Documents to Improve Indexing in Confluence
- Closed
- mentioned in
-
Page No Confluence page found with the given URL.
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
Hi,
At Atlassian, our goal is to ensure we’re providing the best experience for our customers. With our new Data Center strategy, Atlassian's focus is on security, compliance, and performance and is a key driver in prioritizing bugs. Closing the bugs that do not fall into those categories will allow us to focus on the ones in the most current versions of our products.
This bug is being closed due to a lack of engagement in the last four years, including no new watchers, votes, or comments; this inactivity suggests a low impact.
Please note the comments on this thread are not being monitored.
You can read more about our bug fix policy here and how we prioritize bugs.
To learn more about our recent investments in Confluence Data Center, please check our public roadmap and dashboards containing recently resolved issues, current work, and future plans.
Kind regards,
Confluence Data Center