Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-60151

Implement AIP support on Confluence

    XMLWordPrintable

Details

    • Suggestion
    • Resolution: Unresolved
    • None
    • Content - Attachments
    • None
    • 2
    • We collect Confluence feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

    Description

      Azure Information Protection (AIP) issue with Doc and PPT files is a form of encryption used in document based on their labels. As described by their documentation, this is what happens:

      When you apply a sensitivity label, the label information will persist with your document or email, even as it is shared between devices, applications, and cloud services. Applying a sensitivity label may also result in changes to your document or email according to your organization's configuration, such as:

      Encryption with Information Rights Management may be applied to your document or email. A header or footer may appear in your document or email. A watermark may appear in your document

      All the details can be found here:

      This is not implement yet on Confluence and raises some errors on the backend:

      2020-07-31 14:12:34,056 WARN [attachment-text-extraction-worker-0] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: AIP PROD Test Document with KPMG Only Label.docx v.1 (367296548) babypraveena@kpmg.com)
      com.atlassian.bonnie.search.extractor.ExtractorException: java.lang.Exception: Error reading content of Word XML document: Unknown file format: Unknown
      	at com.atlassian.confluence.extra.officeconnector.index.word.WordXMLTextExtractor.extractText(WordXMLTextExtractor.java:53)
      	at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:41)
      	at com.atlassian.confluence.extra.officeconnector.index.util.AttachmentTextExtractorAdapter.apply(AttachmentTextExtractorAdapter.java:29)
      	at com.atlassian.confluence.extra.officeconnector.index.word.WordXMLTextExtractor.extract(WordXMLTextExtractor.java:38)
      	at com.atlassian.confluence.internal.index.attachment.DelegatingAttachmentTextExtractor.lambda$extract$1(DelegatingAttachmentTextExtractor.java:35)
      	at com.atlassian.confluence.internal.index.attachment.DelegatingAttachmentTextExtractor.dt_access$105(DelegatingAttachmentTextExtractor.java)
      	at java.util.Optional.flatMap(Optional.java:241)
      	at com.atlassian.confluence.internal.index.attachment.DelegatingAttachmentTextExtractor.extract(DelegatingAttachmentTextExtractor.java:35)
      	at com.atlassian.confluence.internal.index.attachment.AttachmentTextExtractionFunction.apply(AttachmentTextExtractionFunction.java:70)
      	at com.atlassian.confluence.internal.index.attachment.AttachmentTextExtractionFunction.apply(AttachmentTextExtractionFunction.java:22)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Delegatin
      
      
      Caused by: java.lang.Exception: Error reading content of Word XML document: Unknown file format: Unknown
      	at com.atlassian.plugins.conversion.extract.xml.WordXMLExtractor.extractText(WordXMLExtractor.java:15)
      	at com.atlassian.confluence.extra.officeconnector.index.word.WordXMLTextExtractor.extractText(WordXMLTextExtractor.java:51)
      	... 28 more
      Caused by: com.aspose.words.UnsupportedFileFormatException: Unknown file format: Unknown
      	at com.aspose.words.Document.zzY(Unknown Source)
      	at com.aspose.words.Document.zzZ(Unknown Source)
      	at com.aspose.words.Document.<init>(Unknown Source)
      	at com.aspose.words.Document.<init>(Unknown Source)
      	at com.aspose.words.Document.<init>(Unknown Source)
      	at com.atlassian.plugins.conversion.extract.xml.WordXMLExtractor.extractText(WordXMLExtractor.java:13)
      	... 29 more
      2020-07-31 14:12:34,730 WARN [attachment-text-extraction-worker-2] [bonnie.search.extractor.BaseAttachmentConten
      

      Since this issue is related to Aspose, checking their forums, they are working on this right now, but it's not fully implemented:

      It will be great if Confluence supports it once it's possible.

      Attachments

        Activity

          People

            Unassigned Unassigned
            6eec25a24f71 Diego Martins
            Votes:
            15 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated: