Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-99731

A delay in updating a site reindex job's progress on Confluence Data Center can cause that site reindex job to be inaccurately marked as "REBUILD FAILED" under Content indexing's Recent activity section

XMLWordPrintable

      Issue Summary

      In Confluence Data Center, upon triggering a site reindex from the Confluence web interface (Administration > Content indexing), a site reindex job is created.

      That site reindex job creates a separate background job named IndexRebuildMonitoring that:

      • runs every 3 seconds,
      • updates the site reindex job’s progress via lastRebuildingUpdate timestamp-values, and
      • the timestamp-values are written into the bandana table (row where the bandana key matches "reindex.status"), like this sample:
        bandanaid
         
        bandanacontext
         
        bandanakey
         
        bandanavalue
         
        1146893 _GLOBAL reindex.status <com.atlassian.confluence.index.status.ReIndexJob>
        <id>0bf078c1-0d3c-48c4-a895-fd8d201830b3</id>
        <startTime>2025-05-07T04:17:25.547215Z</startTime>
        <finishTime>2025-05-07T04:18:44.767343Z</finishTime>
        <stage>REBUILD_FAILED</stage>
        <acknowledged>true</acknowledged>
        <rebuildingProgress>
        <total>223600</total>
        <processed>184196</processed>
        </rebuildingProgress>
        <lastRebuildingUpdate>2025-05-07T04:25:53.080182500Z</lastRebuildingUpdate>
        <propagatingProgress>
        <total>1</total>
        <processed>0</processed>
        </propagatingProgress>
        <nodeStatuses class="list">
        <com.atlassian.confluence.index.status.ReIndexNodeStatus>
        <nodeId>545a90f3</nodeId>
        <state>REBUILDING</state>
        <progress>
        <total>0</total>
        <processed>0</processed>
        </progress>
        </com.atlassian.confluence.index.status.ReIndexNodeStatus>
        <com.atlassian.confluence.index.status.ReIndexNodeStatus>
        <nodeId>545a9112</nodeId>
        <state>WAITING</state>
        <progress>
        <total>0</total>
        <processed>0</processed>
        </progress>
        </com.atlassian.confluence.index.status.ReIndexNodeStatus>
        </nodeStatuses>
        <createdBy class="com.atlassian.confluence.user.ConfluenceUserImpl">
        <key>
        <userkey>402894d795f111470195f11228080000</userkey>
        </key>
        <name>admin</name>
        <lowerName>admin</lowerName>
        </createdBy>
        <spaceKeys/>
        </com.atlassian.confluence.index.status.ReIndexJob>

                 note the lastRebuildingUpdate sub-element in the XML payload shown above

      A different background job (ReIndexHouseKeepingJobRunner) is always running every 60 seconds.

      If this job finds an active/ongoing reindex, it then checks if the lastRebuildingUpdate value (in the bandana row shown above) has been updated within a timeframe specified by the confluence.rendex.noupdate.max.seconds system property (the default is 60 seconds). This is a per-cluster job, which means any one of the Confluence cluster member-nodes may execute it (including the node that is reindexing).

       

      However, if no updates have been detected in the value of lastRebuildingUpdate by the ReIndexHouseKeepingJobRunner job for more than  confluence.rendex.noupdate.max.seconds duration, then that reindex attempt, even if it is still progressing in the background, may be inaccurately marked as failed.

      Steps to Reproduce

      While this situation has been encountered in some customer environments, specific triggers to replicate this are currently unclear.

      The corresponding modules also do not have sufficient logging to help in isolating potential causes.

      Inducing an artificial delay (via programmatic debug-breakpoints) in between updates to lastRebuildingUpdate has helped with local replication.

      Expected Results

      Site reindex operation should get launched, reindex progress percentage will be shown on the Content indexing screen, reindexing will eventually finish, and the index snapshot(s) will be propagated to remaining cluster member(s).

      Actual Results

      The reindexing of content may run without errors/interruptions.

      However, if no updates have been detected in the value of lastRebuildingUpdate by the ReIndexHouseKeepingJobRunner job for more than confluence.rendex.noupdate.max.seconds duration, then:

      • it will falsely mark that reindex attempt as "REBUILD FAILED" in the corresponding reindex’s bandana row (as shown in the sample row above)
      • the failure will be displayed to the user as a “REBUILD FAILED" reindex job under Content indexing > Recent activity section:

        the timestamps here will follow the client-browser-locale/geo and may therefore not match the Confluence server time if it is in a different timezone
      • the following log message will be recorded in the atlassian-confluence-index.log.*:
      2025-05-07 14:18:44,766 WARN [Caesium-1-4] [index.status.schedule.ReIndexHouseKeepingJobRunner] lambda$repairRebuildingJobIfNeeded$1 There was no updates for current re-index job for a while. Last update received at 2025-05-07T04:17:25.547215Z. Marking it as REBUILD_FAILED 

      Workaround

      Modify <ConfluenceInstallDir>/bin/setenv.sh, add and set confluence.rendex.noupdate.max.seconds to a high enough value that exceeds the total time taken for reindexing.
      Since any node (reindexing or not) can execute the ReIndexHouseKeepingJobRunner job, all cluster member nodes must have this property set.

      For example, if the total time to reindex averages say ~22 hours, then set confluence.rendex.noupdate.max.seconds to 86400 (24 hours converted to seconds after factoring in extra couple hours for wiggle room).

      Detailed steps are outlined on this KB article: When rebuilding the Content Indexing, it is marked as REBUILD FAILED but it keeps progressing afterwards

              971c305d4b2e Garvit Sharma
              5c3a8aca27ce Mohit Sharma
              Votes:
              1 Vote for this issue
              Watchers:
              17 Start watching this issue

                Created:
                Updated:
                Resolved: