Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-23376

Mixed formatting in single cell in Excel document causes Confluence indexing errors

      NOTE: This bug report is for Confluence Server. Using Confluence Cloud? See the corresponding bug report.

      Steps to reproduce:

      1. Create an Excel spreadsheet and enter some text in a cell
      2. Select part of that cell and change its formatting (e.g. bold, italics, underline)
      3. Save this spreadsheet and upload to Confluence
      4. Rebuild search index

      Error in logs:

      2011-10-02 14:23:25,893 ERROR [Indexer: 1] [officeconnector.index.excel.ExcelXMLTextExtractor] endDocument expected [ 1 ] entries but read [ 2 ]
       -- referer: http://localhost:5350/admin/search-indexes.action | url: /admin/reindex.action | userName: admin | action: reindex
      

      An example .xlsx is attached.

      Here are the contents of sharedStrings.xml when the .xlsx is unarchived, which appears to contain the relevant data:

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
      	<si>
      		<r>
      			<t>foo</t>
      		</r>
      		<r>
      			<rPr>
      				<u/>
      				<sz val="10"/>
      				<rFont val="Verdana"/>
      			</rPr>
      			<t>bar</t>
      		</r>
      		<phoneticPr fontId="1" type="noConversion"/>
      	</si>
      </sst>
      

            [CONFSERVER-23376] Mixed formatting in single cell in Excel document causes Confluence indexing errors

            Still seeing in 5.8.5

            Rilwan_Ahmed_NC added a comment - Still seeing in 5.8.5

            Seeing in v5.8.5.

            tel-inform Lizenzen added a comment - Seeing in v5.8.5.

            Seeing in v5.8.4 - you planning on fixing this any time soon.

            James DeWinde added a comment - Seeing in v5.8.4 - you planning on fixing this any time soon.

            Chris Shim added a comment -

            Seeing this in 5.7 as well.

            Chris Shim added a comment - Seeing this in 5.7 as well.

            Happens in 5.6.4.

            Jason Smith added a comment - Happens in 5.6.4.

            @Ankur: I did not.

            James Johnson added a comment - @Ankur: I did not.

            James, did you get answer to your ques. We are also seeing same exceptions in the log file.

            Ankur Dhawan added a comment - James, did you get answer to your ques. We are also seeing same exceptions in the log file.

            I have two questions about this bug that I hope someone can answer.

            1) It appears at least part of an Excel document with mixed formatting in a single cell is indexed. However, some text is not indexed. From testing, it's not clear what will and will not be indexed due to this issue. Does anyone know what will and will not be indexed in an Excel document with mixed formatting?

            2) Does one indexing error message in the log pertain to one file? Or can one error message be in regards to x files (where x is greater than 1)?

            James Johnson added a comment - I have two questions about this bug that I hope someone can answer. 1) It appears at least part of an Excel document with mixed formatting in a single cell is indexed. However, some text is not indexed. From testing, it's not clear what will and will not be indexed due to this issue. Does anyone know what will and will not be indexed in an Excel document with mixed formatting? 2) Does one indexing error message in the log pertain to one file? Or can one error message be in regards to x files (where x is greater than 1)?

            We see this error quite often in our logs, version 5.4.3.
            This is 5th bug/problem we see after upgrading to 5.4.3. Most cases have been open for quite some time and recommended solution from community on all of them is to simply filter out log entries so we don't see them. Doesn't really inspire a lot of trust in the bug solving process at atlassian.

            Richard Barkestam added a comment - We see this error quite often in our logs, version 5.4.3. This is 5th bug/problem we see after upgrading to 5.4.3. Most cases have been open for quite some time and recommended solution from community on all of them is to simply filter out log entries so we don't see them. Doesn't really inspire a lot of trust in the bug solving process at atlassian.

            As a workaround to have not all that noise in the log file I added this at the end of the file \confluence\WEB-INF\classes\log4j.properties

            # CONF-23376
            log4j.logger.com.atlassian.confluence.extra.officeconnector.index.excel.ExcelXMLTextExtractor=FATAL
            

            Of course this does not solve the root cause of the issue...

            Michael Michael added a comment - As a workaround to have not all that noise in the log file I added this at the end of the file \confluence\WEB-INF\classes\log4j.properties # CONF-23376 log4j.logger.com.atlassian.confluence.extra.officeconnector.index.excel.ExcelXMLTextExtractor=FATAL Of course this does not solve the root cause of the issue...

              briosa Blake Riosa (Inactive)
              rchang Robert Chang
              Affected customers:
              56 This affects my team
              Watchers:
              64 Start watching this issue

                Created:
                Updated:
                Resolved: