• Our product teams collect and evaluate feedback from a number of different sources. To learn more about how we use customer feedback in the planning process, check out our new feature policy.

      NOTE: This suggestion is for Confluence Cloud. Using Confluence Server? See the corresponding suggestion.

      We have a large amount of content that we would like to put into Confluence, but we are worried about the amount of space it will take and the additional size of the backups.

      If we could store the files in a directory that:
      a) Confluence can see locally (maybe within the confluence directories)
      b) Can be accessed via a URL (maybe manual setup within Tomcat)

      If the indexer could also look at these directories we can find these URLs easily.

      If you could even index external sites that might be useful (although potentially time and space consuming).

            [AI-670] Allow indexing of content outside of Confluence

            Bob Swift added a comment -

            This requirement is solved for many use cases with Cache for Confluence. See How to index external content using the CACHE macro.

            Bob Swift added a comment - This requirement is solved for many use cases with Cache for Confluence . See How to index external content using the CACHE macro .

            I consider this issue still very relevant, I believe together with the many watchers and voters of this issue. The use case is having a company intranet based on Confluence (with all it's great features), but still having some external sources of (html) information as well. Of course you want a single point to search everything, and as Confluence makes up the largest part of that intranet, integrating those external sources into Confluence search is a logical thing to do.

            Stijn Debruyckere added a comment - I consider this issue still very relevant, I believe together with the many watchers and voters of this issue. The use case is having a company intranet based on Confluence (with all it's great features), but still having some external sources of (html) information as well. Of course you want a single point to search everything, and as Confluence makes up the largest part of that intranet, integrating those external sources into Confluence search is a logical thing to do.

            As this ticket has aged for many years without comment we are resolving it as obsolete.
            If this request is still of interest to you, please comment with your user case so that we can understand the present day requirement.

            Adam Barnes (Inactive) added a comment - As this ticket has aged for many years without comment we are resolving it as obsolete. If this request is still of interest to you, please comment with your user case so that we can understand the present day requirement.

            W Rijsemus added a comment - - edited

            I know this comment is already 3 years old but not seeing this as core to Confluence is about the biggest disagreement I can have with Atlassian. To the credit of the current PM's of Atlassian I was able to make my point in a separate session. The logic is very simple. Confluence is a knowledge warehouse. If these warehouse goods are not found, it is a 'warehouse of waste'. It might as well not have been produced.

            Since Confluence will never become the center of an Enterprise operation, it will always need to reside next to other backends. Therefore, Enterprise/Federated/External/Community Search is a GIVEN and not providing it, a capital sin. I've been trying this for many years to work and it is my single most important topic second to none. Providing a standardized proper connector to an industry standard Enterprise Search is def not a luxury.

            Indexing is BTW not the problem. I've done that with all the Engines already, including the GSA. It is the scrubbing of content via the ACL and the the successive presentation through Crowd that is the hard part. Crowd is not SAML 2.0 compliant hence it refuses to display the result page, even if I was able to scrub the query with the ACL.

            All this requires a standard API/Connector. To the Enterprise Search it should not matter which backend to search

            W Rijsemus added a comment - - edited I know this comment is already 3 years old but not seeing this as core to Confluence is about the biggest disagreement I can have with Atlassian. To the credit of the current PM's of Atlassian I was able to make my point in a separate session. The logic is very simple. Confluence is a knowledge warehouse. If these warehouse goods are not found, it is a 'warehouse of waste'. It might as well not have been produced. Since Confluence will never become the center of an Enterprise operation, it will always need to reside next to other backends. Therefore, Enterprise/Federated/External/Community Search is a GIVEN and not providing it, a capital sin. I've been trying this for many years to work and it is my single most important topic second to none. Providing a standardized proper connector to an industry standard Enterprise Search is def not a luxury. Indexing is BTW not the problem. I've done that with all the Engines already, including the GSA. It is the scrubbing of content via the ACL and the the successive presentation through Crowd that is the hard part. Crowd is not SAML 2.0 compliant hence it refuses to display the result page, even if I was able to scrub the query with the ACL. All this requires a standard API/Connector. To the Enterprise Search it should not matter which backend to search

            AudraA added a comment -

            Karsten - please keep me updated on the progress of your plugin. We do not see this functionality as core to Confluence currently, we would like to rely on an outside partner to build this functionality as a macro or plugin.

            If you're interested to know how we decide on which features to implement, please read this:
            http://confluence.atlassian.com/display/DEV/Implementation+of+New+Features+and+Improvements

            AudraA added a comment - Karsten - please keep me updated on the progress of your plugin. We do not see this functionality as core to Confluence currently, we would like to rely on an outside partner to build this functionality as a macro or plugin. If you're interested to know how we decide on which features to implement, please read this: http://confluence.atlassian.com/display/DEV/Implementation+of+New+Features+and+Improvements

            KarstenK added a comment -

            Hi guys,

            I'm currently developing a macro plugin with that you can search an external index. It's based on Lucene 2.2.0 and works currently fine.
            The only thing thats not working is adding documents to an already existing index. Till now I have to create the index completly new, which is a pain when indexing many documents. But I'm working on it.

            If someone is interested in this feel free to contact me.

            KarstenK added a comment - Hi guys, I'm currently developing a macro plugin with that you can search an external index. It's based on Lucene 2.2.0 and works currently fine. The only thing thats not working is adding documents to an already existing index. Till now I have to create the index completly new, which is a pain when indexing many documents. But I'm working on it. If someone is interested in this feel free to contact me.

            Neil Crow added a comment -

            I would like to be able to add urls which get crawled and indexed.
            I would expect to see this done in a fashion similar to nutch, which is built on top of Lucene, which in turn is already being used by Confluence.

            So my wish is for an embedded Nutch (or similar) built inside of confluence, with a web-admin console for configuration including adding urls, specifying crawl-depth, scheduling crawl intervals etc.

            Microsoft Sharepoint already has this capability ...

            Neil Crow added a comment - I would like to be able to add urls which get crawled and indexed. I would expect to see this done in a fashion similar to nutch , which is built on top of Lucene, which in turn is already being used by Confluence. So my wish is for an embedded Nutch (or similar) built inside of confluence, with a web-admin console for configuration including adding urls, specifying crawl-depth, scheduling crawl intervals etc. Microsoft Sharepoint already has this capability ...

            +1

            KarstenK added a comment -

            Hello,
            since there was a long time after the last update I want to ask about the status of this issue??
            This would be a realy great feature, because not everything that already exist must be put to Confluence.

            KarstenK added a comment - Hello, since there was a long time after the last update I want to ask about the status of this issue?? This would be a realy great feature, because not everything that already exist must be put to Confluence.

            Our use of Confluence is as a knowledge repository. We have a large number of existing documents that have not been loaded into our confluence instance and are not likely to be attached in the near future. These are progressively being linked to from confluence. It would be great if they could be included in search indexing. I understand this is not a trivial matter. But would be a feather in the cap of the product were Atlassian able to somehow "pull it off".
            Hope it happens. Best of luck !!

            kai berberich added a comment - Our use of Confluence is as a knowledge repository. We have a large number of existing documents that have not been loaded into our confluence instance and are not likely to be attached in the near future. These are progressively being linked to from confluence. It would be great if they could be included in search indexing. I understand this is not a trivial matter. But would be a feather in the cap of the product were Atlassian able to somehow "pull it off". Hope it happens. Best of luck !!

              Unassigned Unassigned
              mark.johnson Mark Johnson
              Votes:
              64 Vote for this issue
              Watchers:
              40 Start watching this issue

                Created:
                Updated:
                Resolved: