Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-52489

Content properties missing in search index after reindex of Confluence

    XMLWordPrintable

Details

    Description

      The fix for this bug has been released to our Long Term Support release.

      The fix for this bug is now available in the latest release of Confluence 7.13 and 7.19

      In one of our plugins, we are currently using an index schema to add content properties to the search index of Confluence, as documented here: Content Properties in the REST API

      This works perfectly when inserting/updating content properties. Unfortunately, due to problems with the search index in our Confluence instance, we had to rebuild the search indexes as documented here: Rebuilding the search index

      Now after the reindex, all content properties were not present in the index anymore. They are still attached to the content, but not in the index anymore. Only after updating the content properties, they were being indexed again.

      Steps to reproduce

      1. Install a fresh instance of Confluence. (We were able to reproduce this with version 5.8.10 and 5.10.6)
      2. Choose "Example Site" to init the instance with the Demonstration Space.
      3. Install the attached test plugin. All it does is defining the following index schema:
        <content-property-index-schema key="test-plugin-content-property-index-schema">
            <key property-key="metadata">
                <extract path="likes" type="number" />
            </key>
        </content-property-index-schema>
        
      1. Find out the pageId of page "Welcome to Confluence" => e.g. 12345
      2. Check the content properties of the page with the following command:
        curl -u admin:admin -X GET "http://localhost:8090/rest/api/content/12345/property" | python -mjson.tool
        
      1. The page shouldn't have any content properties yet. Now add content properties with the following command:
        curl -i -u admin:admin -X POST -H "Content-Type: application/json" -d '{ "key" : "metadata", "value" : { "likes": 5 }}' http://localhost:8090/rest/api/content/12345/property
        
      1. Check the content properties of the page again to make sure they've been added successfully:
        curl -u admin:admin -X GET "http://localhost:8090/rest/api/content/12345/property" | python -mjson.tool
        
      1. You should see the inserted content properties. Now try to use CQL to search by our indexed content properties:
        http://localhost:8090/rest/api/content/search?cql=space%3Dds%20AND%20content.property%5Bmetadata%5D.likes%3C%3D5
        
      1. The search should return the page "Welcome to Confluence". Now reindex Confluence as described here: Rebuilding the search index
      2. Now execute the search again:
        http://localhost:8090/rest/api/content/search?cql=space%20=%20ds%20AND%20content.property[metadata].likes%20%3C=%205
        

      No results found! All content properties are missing in the index after reindexing Confluence!

      Expected result

      All indexed content properties stay in the search index, even after reindexing Confluence.

      Any help would be appreciated

      Root cause

      ThreadLocalCache (com.atlassian.confluence.cache.ThreadLocalCache) is not initialized in reindexing threads. Let's look at how content properties are indexed.

      Content properties are extracted in com.atlassian.confluence.plugins.contentproperty.index.extractor.ContentPropertiesExtractor#addFields. This method relies on ThreadLocalCache to maintain permission exemptions. ThreadLocalCache however needs to be initialized before it can be used, see com.atlassian.confluence.cache.ThreadLocalCache#init. We're doing this in scheduler threads (com.atlassian.confluence.impl.schedule.caesium.JobRunnerWrapper#runJob) which handle indexing for content updates, but not reindexing threads (com.atlassian.confluence.internal.index.ConcurrentBatchIndexer#accept) for whole site reindexing.

      How to fix this bug properly?

      Well, the fix should be very simple: just call ThreadLocalCache#init in reindexing threads.

      Is there a workaround?

      Not really a workaround but If you have control of a plugin's source code, there is a hacky way to index content properties properly during reindex without having to modify Confluence core: just add 2 extractors to the plugin:

      1. An extractor right before (ie. priority > 900) ContentPropertiesExtractor (priority = 900) and call ThreadLocalCache#init inside it.
      2. Another extractor right after (ie. priority < 900) ContentPropertiesExtractor (priority = 900) and call ThreadLocalCache#dispose inside it.

      Prior to Confluence 7.14, these extractors need to implement the extractor module. From 7.14 onward, they must implement the newer extractor2 module.

      Attachments

        1. test-1.0.0.jar
          2 kB
        2. test-1.0.0-source.zip
          10 kB

        Issue Links

          Activity

            People

              5339cdd01cf4 Jeffery Xie
              326aaff085b8 Remo Siegwart
              Votes:
              44 Vote for this issue
              Watchers:
              44 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: