Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-40176

Confluence ignores the system property officeconnector.textextract.word.docxmaxsize

      NOTE: This bug report is for Confluence Server. Using Confluence Cloud? See the corresponding bug report.

      Summary

      In the current System Properties documentation there is a setting officeconnector.textextract.word.docxmaxsize, and this is ignored by Confluence when set in setenv.sh.

      Environment

      • Confluence 5.8.x or Confluence 5.9.x

      Steps to Reproduce

      1. Add to setenv.sh
        setenv.sh
        CATALINA_OPTS="-Dofficeconnector.textextract.word.docxmaxsize=1000 ${CATALINA_OPTS}"
        
      2. Insert the attached file lorum.docx to a page.
      3. Check the logs, the error
        atlassian-confluence.log
        com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word XML document: Document too big for text extraction, bailing out
        

        does not appear.

      Expected Results

      The error

      atlassian-confluence.log
      com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word XML document: Document too big for text extraction, bailing out
      

      should appear as the file size is now 1K.

      Actual Results

      The file is indexed correctly.

      Notes

      For the attached file you will need to increase your Java Heap Space in setenv.sh to something like -Xmx8192m.

      The value officeconnector.textextract.word.docxmaxsize is referenced in com.atlassian.confluence.extra.officeconnector.index.word#WordXMLTextExtractor.java as

      WordXMLTextExtractor.java
      private static final long MAX_XML_SIZE = Long.getLong("officeconnector.textextract.word.docxmaxsize", 1024 * 1024 * 16); // Maximum, 16Mb of XML text, more than enough for most files
      ...
      if (finalSize > MAX_XML_SIZE) {
                              throw new ExtractorException("Document too big for text extraction, bailing out");
                          }
      

      Workaround

      There is no workaround.

            [CONFSERVER-40176] Confluence ignores the system property officeconnector.textextract.word.docxmaxsize

            George Varghese made changes -
            QA Demo Status New: Not Needed [ 14332 ]
            QA Kickoff Status New: Not Needed [ 14236 ]
            Resolution New: Low Engagement [ 10300 ]
            Status Original: Gathering Impact [ 12072 ] New: Closed [ 6 ]
            George Varghese made changes -
            Labels Original: affects-cloud affects-server office-connector p20 New: affects-cloud affects-server cleanup-seos-fy25 office-connector p20

            Atlassian Update - 14 April 2025

            Hi,

            At Atlassian, our goal is to ensure we’re providing the best experience for our customers. With our new Data Center strategy, Atlassian's focus is on security, compliance, and performance and is a key driver in prioritizing bugs. Closing the bugs that do not fall into those categories will allow us to focus on the ones in the most current versions of our products.

            This bug is being closed due to a lack of engagement in the last four years, including no new watchers, votes, or comments; this inactivity suggests a low impact.

            Please note the comments on this thread are not being monitored.

            You can read more about our bug fix policy here and how we prioritize bugs.

            To learn more about our recent investments in Confluence Data Center, please check our public roadmap and dashboards containing recently resolved issues, current work, and future plans.

            Kind regards,
            Confluence Data Center

            George Varghese added a comment - Atlassian Update - 14 April 2025 Hi, At Atlassian, our goal is to ensure we’re providing the best experience for our customers. With our new Data Center strategy, Atlassian's focus is on security, compliance, and performance and is a key driver in prioritizing bugs. Closing the bugs that do not fall into those categories will allow us to focus on the ones in the most current versions of our products. This bug is being closed due to a lack of engagement in the last four years , including no new watchers, votes, or comments; this inactivity suggests a low impact. Please note the comments on this thread are not being monitored. You can read more about our bug fix policy here and how we prioritize bugs. To learn more about our recent investments in Confluence Data Center, please check our public roadmap and dashboards containing recently resolved issues , current work, and future plans . Kind regards, Confluence Data Center
            SET Analytics Bot made changes -
            UIS Original: 3 New: 2
            Mohit Sharma made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 682354 ]
            SET Analytics Bot made changes -
            UIS Original: 2 New: 3
            SET Analytics Bot made changes -
            UIS Original: 3 New: 2
            SET Analytics Bot made changes -
            UIS Original: 2 New: 3
            SET Analytics Bot made changes -
            UIS Original: 3 New: 2
            SET Analytics Bot made changes -
            UIS Original: 2 New: 3

              Unassigned Unassigned
              jrichards@atlassian.com James Richards
              Affected customers:
              13 This affects my team
              Watchers:
              26 Start watching this issue

                Created:
                Updated:
                Resolved: