-
Suggestion
-
Resolution: Won't Do
-
None
-
0
-
2
-
NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.
Problem Definition
Currently Confluence only strips out a limited number of media files embedded in a Microsoft Word document before indexing it:
- .png
- .emf
- .wmf
- .jpg
- .jpeg
- .gif
If a document has an embedded file not listed here, it may not get indexed if it is too large.
Background
Currently Confluence does not index files if the content with the removed media listed above is greater than 16Mb. There's a system property that can be set to make this larger, but this isn't used. See CONF-40176 for more details.
Suggested Solution
In com.atlassian.confluence.extra.officeconnector.index.word#WordXMLTextExtractor it has
if (!(name.contains("/media/") || processedName.endsWith(".png") || processedName.endsWith(".emf") || processedName.endsWith(".wmf") || processedName.endsWith(".jpg") || processedName.endsWith(".jpeg") || processedName.endsWith(".gif") ))
Either
- Strip out all content in the /media folder
or - Add all media types that are possible to add to a Word document. See Types of media files you can add.
Notes
Similar issues occur with other Microsoft Office documents (e.g. PowerPoint).
- Discovered while testing
-
CONFSERVER-40176 Confluence ignores the system property officeconnector.textextract.word.docxmaxsize
-
- Closed
-
- relates to
-
AI-781 Filter Out All Media Files from Microsoft Word Documents to Improve Indexing in Confluence
- Closed
Thank you for raising this suggestion.
We regret to inform you that due to limited demand, we have no plans to implement it in the foreseeable future. In order to set expectations, we're closing this request now. Sometimes potentially valuable tickets do get closed where the Summary or Description has not caught the attention of the community. If you feel that this suggestion is valuable, consider describing in more detail or outlining how this request will help you achieve your goals. We may then be able to provide better guidance.
For more context, check out our Community blog on our updated workflow for Suggestions
Cheers,
Confluence Product Management