Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-39462

Skip Context Extraction for large indexes when file exceeds max config setting

    • We collect Confluence feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.

      You can configure the maximum indexable size of an attachment - https://confluence.atlassian.com/conf57/configuring-attachment-size-701435663.html

      However, the way this works is that the content is still all extracted and then simply not saved in the index.

      Customers often use this setting because large documents are resource intensive to extract the content from and often fail. With the current implementation we're helping to not store the content, but we're still expending resources to extract the content.

      Recommend that this setting be modified so that these files are excluded earlier in the process and not extracted at all.

            [CONFSERVER-39462] Skip Context Extraction for large indexes when file exceeds max config setting

            We have now completed a range of work relating to how we index attachments.

            Please review how attachments are indexed for full details.

            Directly related to this ticket we have introduced a new system property atlassian.indexing.attachment.maxsize (default 100MB) which defines if an uploaded file is larger than the limit set by this property, text extraction and indexing will be skipped.

            Adam Barnes (Inactive) added a comment - We have now completed a range of work relating to how we index attachments. Please review how attachments are indexed  for full details. Directly related to this ticket we have introduced a new system property  atlassian.indexing.attachment.maxsize (default 100MB) which defines if an uploaded file is larger than the limit set by this property, text extraction and indexing will be skipped.

            Re-opening as this is still a valid suggestion with respect to attachment indexing

            Adam Barnes (Inactive) added a comment - Re-opening as this is still a valid suggestion with respect to attachment indexing

            Atlassian update

            Thank you for raising this suggestion. We regret to inform you that due to limited demand, we have no plans to implement it in the foreseeable future. In order to set expectations, we're closing this request now. Sometimes potentially valuable tickets do get closed where the Summary or Description has not caught the attention of the community. If you feel that this suggestion is valuable, consider describing in more detail or outlining how this request will help you achieve your goals. We may then be able to provide better guidance. Thanks again.
            Regards,
            Confluence Product Management

            Adam Barnes (Inactive) added a comment - Atlassian update Thank you for raising this suggestion. We regret to inform you that due to limited demand, we have no plans to implement it in the foreseeable future. In order to set expectations, we're closing this request now. Sometimes potentially valuable tickets do get closed where the Summary or Description has not caught the attention of the community. If you feel that this suggestion is valuable, consider describing in more detail or outlining how this request will help you achieve your goals. We may then be able to provide better guidance. Thanks again. Regards, Confluence Product Management

            To further expand on this, there is a known limitation with Lucene where files over a certain size (~25MB) will never get indexed. It would be nice if Confluence checked the size of the content before kicking off an indexing task and didn't even attempt to index something if it was over that certain size (especially during user upload).

            James Roberts added a comment - To further expand on this, there is a known limitation with Lucene where files over a certain size (~25MB) will never get indexed. It would be nice if Confluence checked the size of the content before kicking off an indexing task and didn't even attempt to index something if it was over that certain size (especially during user upload).

              Unassigned Unassigned
              jvirgil Jay Virgil
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: