Uploaded image for project: 'Atlassian Intelligence'
  1. Atlassian Intelligence
  2. AI-206

Confluence ignores the system property officeconnector.textextract.word.docxmaxsize

    • Severity 3 - Minor

      NOTE: This bug report is for Confluence Cloud. Using Confluence Server? See the corresponding bug report.

      Summary

      In the current System Properties documentation there is a setting officeconnector.textextract.word.docxmaxsize, and this is ignored by Confluence when set in setenv.sh.

      Environment

      • Confluence 5.8.x or Confluence 5.9.x

      Steps to Reproduce

      1. Add to setenv.sh
        setenv.sh
        CATALINA_OPTS="-Dofficeconnector.textextract.word.docxmaxsize=1000 ${CATALINA_OPTS}"
        
      2. Insert the attached file lorum.docx to a page.
      3. Check the logs, the error
        atlassian-confluence.log
        com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word XML document: Document too big for text extraction, bailing out
        

        does not appear.

      Expected Results

      The error

      atlassian-confluence.log
      com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word XML document: Document too big for text extraction, bailing out
      

      should appear as the file size is now 1K.

      Actual Results

      The file is indexed correctly.

      Notes

      For the attached file you will need to increase your Java Heap Space in setenv.sh to something like -Xmx8192m.

      The value officeconnector.textextract.word.docxmaxsize is referenced in com.atlassian.confluence.extra.officeconnector.index.word#WordXMLTextExtractor.java as

      WordXMLTextExtractor.java
      private static final long MAX_XML_SIZE = Long.getLong("officeconnector.textextract.word.docxmaxsize", 1024 * 1024 * 16); // Maximum, 16Mb of XML text, more than enough for most files
      ...
      if (finalSize > MAX_XML_SIZE) {
                              throw new ExtractorException("Document too big for text extraction, bailing out");
                          }
      

      Workaround

      There is no workaround.

        1. lorum.docx
          402 kB
          James Richards

            [AI-206] Confluence ignores the system property officeconnector.textextract.word.docxmaxsize

            pqz made changes -
            Component/s Original: Search - Indexing [ 46493 ]
            Component/s New: Search - Indexing [ 75295 ]
            Key Original: CONFCLOUD-40176 New: AI-206
            Support reference count Original: 16
            Affects Version/s Original: 5.10.8 [ 68317 ]
            Affects Version/s Original: 5.9.7 [ 67988 ]
            Affects Version/s Original: 5.9.5 [ 67959 ]
            Affects Version/s Original: 5.8.10 [ 67851 ]
            Affects Version/s Original: 5.9.1 [ 67850 ]
            Project Original: Confluence Cloud [ 18513 ] New: Atlassian Intelligence [ 23110 ]
            Monique Khairuliana (Inactive) made changes -
            Workflow Original: Confluence Workflow - Public Facing - Restricted v5 - TEMP [ 2365499 ] New: JAC Bug Workflow v3 [ 3424520 ]
            Status Original: Resolved [ 5 ] New: Closed [ 6 ]
            Alice Wang (Inactive) made changes -
            Resolution New: Obsolete [ 11 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            SET Analytics Bot made changes -
            Support reference count Original: 1 New: 16
            SET Analytics Bot made changes -
            Support reference count Original: 16 New: 1
            Neha Bhayana made changes -
            Labels Original: affects-cloud affects-server office-connector New: affects-cloud affects-server cc-integration office-connector
            Katherine Yabut made changes -
            Workflow Original: Confluence Workflow - Public Facing - Restricted v5 [ 2237268 ] New: Confluence Workflow - Public Facing - Restricted v5 - TEMP [ 2365499 ]
            Katherine Yabut made changes -
            Workflow Original: Confluence Workflow - Public Facing - Restricted v5.1 - TEMP [ 2198952 ] New: Confluence Workflow - Public Facing - Restricted v5 [ 2237268 ]
            Katherine Yabut made changes -
            Workflow Original: Confluence Workflow - Public Facing - Restricted v5 - TEMP [ 2133266 ] New: Confluence Workflow - Public Facing - Restricted v5.1 - TEMP [ 2198952 ]
            Katherine Yabut made changes -
            Workflow Original: Confluence Workflow - Public Facing - Restricted v5 [ 1896097 ] New: Confluence Workflow - Public Facing - Restricted v5 - TEMP [ 2133266 ]

              Unassigned Unassigned
              jrichards@atlassian.com James Richards
              Affected customers:
              10 This affects my team
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: