-
Suggestion
-
Resolution: Won't Fix
-
JDK 1.6.0_21, Centos 5.5
NOTE: This suggestion is for Confluence Cloud. Using Confluence Server? See the corresponding suggestion.
I carried out a test upgrade from Confluence 3.0.2 to 3.3 over the weekend, and noticed that the re-index threw over 2000 errors relating to attachments. Some of them were problematic PDFs, and I've voted on CONF-18962 to get those resolved.
However, the vast majority of issues were relating to .xls and .csv files not being properly indexed.
In many of the cases the following appeared:
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
,and POI is correct. We use a tool called FlorenceSoft DiffEngineX to carry out diffs between Excel documents, and this inexplicably creates Office 2007 format output, but mistakenly used a .xls (instead of .xlsx) extension.
I'm not aware of any other tools that make this mistake, but I'm sure we're not the only ones who have content saved with the wrong extension. Considering POI is able to guess that it might be Office 2007 content, perhaps Confluence could capture the error, and try to re-index the documents as Excel 2007? It would be fantastic, and I'd really appreciate it.
- is related to
-
CONFSERVER-20594 Use Apache POI returned information to attempt to index Office 2007 where incorrect extension was used
- Closed
- relates to
-
CONFCLOUD-35799 View Powerpoint Macro keep spinning
-
- Closed
-
[AI-772] Use Apache POI returned information to attempt to index Office 2007 where incorrect extension was used
Component/s | Original: Search - Core [ 46383 ] | |
Component/s | New: Search - Core [ 75296 ] | |
Key |
Original:
|
New:
|
Affects Version/s | Original: 3.3 [ 67569 ] | |
Project | Original: Confluence Cloud [ 18513 ] | New: Atlassian Intelligence [ 23110 ] |
Workflow | Original: JAC Suggestion Workflow [ 3405829 ] | New: JAC Suggestion Workflow 3 [ 3624586 ] |
Status | Original: RESOLVED [ 5 ] | New: Closed [ 6 ] |
Workflow | Original: Confluence Workflow - Public Facing v3 [ 2240189 ] | New: JAC Suggestion Workflow [ 3405829 ] |
Workflow | Original: Confluence Workflow - Public Facing v3 - TEMP [ 2152241 ] | New: Confluence Workflow - Public Facing v3 [ 2240189 ] |
Workflow | Original: Confluence Workflow - Public Facing v3 [ 1890498 ] | New: Confluence Workflow - Public Facing v3 - TEMP [ 2152241 ] |
Workflow | Original: Confluence Workflow - Public Facing v2 [ 1806856 ] | New: Confluence Workflow - Public Facing v3 [ 1890498 ] |
Description |
Original:
I carried out a test upgrade from Confluence 3.0.2 to 3.3 over the weekend, and noticed that the re-index threw over 2000 errors relating to attachments. Some of them were problematic PDFs, and I've voted on However, the vast majority of issues were relating to .xls and .csv files not being properly indexed. In many of the cases the following appeared: {noformat} org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF) {noformat} ,and POI is correct. We use a tool called FlorenceSoft DiffEngineX to carry out diffs between Excel documents, and this inexplicably creates Office 2007 format output, but mistakenly used a .xls (instead of .xlsx) extension. I'm not aware of any other tools that make this mistake, but I'm sure we're not the only ones who have content saved with the wrong extension. Considering POI is able to guess that it might be Office 2007 content, perhaps Confluence could capture the error, and try to re-index the documents as Excel 2007? It would be fantastic, and I'd really appreciate it. |
New:
{panel:bgColor=#e7f4fa} *NOTE:* This suggestion is for *Confluence Cloud*. Using *Confluence Server*? [See the corresponding suggestion|http://jira.atlassian.com/browse/CONFSERVER-20594]. {panel} I carried out a test upgrade from Confluence 3.0.2 to 3.3 over the weekend, and noticed that the re-index threw over 2000 errors relating to attachments. Some of them were problematic PDFs, and I've voted on However, the vast majority of issues were relating to .xls and .csv files not being properly indexed. In many of the cases the following appeared: {noformat} org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF) {noformat} ,and POI is correct. We use a tool called FlorenceSoft DiffEngineX to carry out diffs between Excel documents, and this inexplicably creates Office 2007 format output, but mistakenly used a .xls (instead of .xlsx) extension. I'm not aware of any other tools that make this mistake, but I'm sure we're not the only ones who have content saved with the wrong extension. Considering POI is able to guess that it might be Office 2007 content, perhaps Confluence could capture the error, and try to re-index the documents as Excel 2007? It would be fantastic, and I'd really appreciate it. |
Link |
New:
This issue is related to |
Project Import | New: Sat Apr 01 14:06:06 UTC 2017 [ 1491055566265 ] |