When indexing larger attachments, it would benefit our clients greatly if we checked the size of attachment before indexing same. In other words, when an attachment is too large, instead of attempting the indexing, bail earlier if the attachment is too large.

      System property atlassian.indexing.contentbody.maxsize is meant to provide this functionality, however has been reported not to be working.

      Workaround

      As of 5.9 there is 'canary' functionality in the Office Connector that will abort indexing on large documents if they would cause an OutOfMemory error. More details can be found here.

      NB: The canary is now enabled by default and does not require a dark feature.

            [CONFSERVER-40914] Check size of attachment before content indexing

            A fix for this issue is now available for Confluence Server customers.
            Upgrade now or check out the Release Notes to see what other issues are resolved.

            Feng Xu (Inactive) added a comment - A fix for this issue is now available for Confluence Server customers. Upgrade now or check out the Release Notes to see what other issues are resolved.

            Atlassian update

            Thank you for interest regarding this bug. You may have seen some recent activity on this issue, so I wanted to give an update on where we are at.
            Presently this issue indicates a solution to a symptom related to failing to extract text from attachments. We are currently investigating this and a number of related issues regarding text extraction and indexing, in order to provide the best possible solution.
            Thank you for your patience whilst we investigate this issue.
            Regards,
            Confluence Product Management

            Adam Barnes (Inactive) added a comment - Atlassian update Thank you for interest regarding this bug. You may have seen some recent activity on this issue, so I wanted to give an update on where we are at. Presently this issue indicates a solution to a symptom related to failing to extract text from attachments. We are currently investigating this and a number of related issues regarding text extraction and indexing, in order to provide the best possible solution. Thank you for your patience whilst we investigate this issue. Regards, Confluence Product Management

            Apologies for the confusion, I resolved this as a duplicate, however we actually have this functionality today in the system property atlassian.indexing.contentbody.maxsize.
            On further analysis it appears that this property is not being correctly observed. I have converted this to a bug and re-opened.

            Adam Barnes (Inactive) added a comment - Apologies for the confusion, I resolved this as a duplicate, however we actually have this functionality today in the system property atlassian.indexing.contentbody.maxsize. On further analysis it appears that this property is not being correctly observed. I have converted this to a bug and re-opened.

            To further clarify this, it would be nice to be able to have the "size check" be something that is configurable. Some instances may not want to index anything over 5MB while others don't want to index anything over 15MB and others not index over 20MB. Making this configurable so that the administrator can state a limit on how large/small files can be and get indexed is preferable. Making a hard upper limit of the Lucene limitation would be preferable as a default so that application resources are not being utilized for files that will not be indexed due to overall size.

            James Roberts added a comment - To further clarify this, it would be nice to be able to have the "size check" be something that is configurable. Some instances may not want to index anything over 5MB while others don't want to index anything over 15MB and others not index over 20MB. Making this configurable so that the administrator can state a limit on how large/small files can be and get indexed is preferable. Making a hard upper limit of the Lucene limitation would be preferable as a default so that application resources are not being utilized for files that will not be indexed due to overall size.

            How can this issue be marked as 'Resolved' when the other issue is not even 'Assigned' to be worked.

            I would like to remind you of the Atlassian motto, and that we are watching.

            Stephen Gramm added a comment - How can this issue be marked as 'Resolved' when the other issue is not even 'Assigned' to be worked. I would like to remind you of the Atlassian motto, and that we are watching.

              mfedoryshyn Maksym Fedoryshyh
              ctalk chucktalk
              Affected customers:
              7 This affects my team
              Watchers:
              19 Start watching this issue

                Created:
                Updated:
                Resolved: