Loading...

Type: Bug
Resolution: Fixed
Priority: High
Fix Version/s: 9.2.6, 8.5.24, 9.5.2
Affects Version/s: 7.19.30, 9.4.1, 9.2.4, 8.5.22
Component/s: Search - Indexing
Labels:

Support reference count:
60
Symptom Severity:
Severity 3 - Minor
UIS:
138

Issue Summary

In Confluence Data Center, upon triggering a site reindex from the Confluence web interface (Administration > Content indexing), a site reindex job is created.

That site reindex job creates a separate background job named IndexRebuildMonitoring that:

runs every 3 seconds,

updates the site reindex job’s progress via lastRebuildingUpdate timestamp-values, and

the timestamp-values are written into the bandana table (row where the bandana key matches "reindex.status"), like this sample:

bandanaid	bandanacontext	bandanakey	bandanavalue
`1146893`	`_GLOBAL`	reindex.status	`<com.atlassian.confluence.index.status.ReIndexJob>` `<id>0bf078c1-0d3c-48c4-a895-fd8d201830b3</id>` `<startTime>2025-05-07T04:17:25.547215Z</startTime>` `<finishTime>2025-05-07T04:18:44.767343Z</finishTime>` `<stage>REBUILD_FAILED</stage>` `<acknowledged>true</acknowledged>` `<rebuildingProgress>` `<total>223600</total>` `<processed>184196</processed>` `</rebuildingProgress>` <lastRebuildingUpdate>2025-05-07T04:25:53.080182500Z</lastRebuildingUpdate> `<propagatingProgress>` `<total>1</total>` `<processed>0</processed>` `</propagatingProgress>` `<nodeStatuses class="list">` `<com.atlassian.confluence.index.status.ReIndexNodeStatus>` `<nodeId>545a90f3</nodeId>` `<state>REBUILDING</state>` `<progress>` `<total>0</total>` `<processed>0</processed>` `</progress>` `</com.atlassian.confluence.index.status.ReIndexNodeStatus>` `<com.atlassian.confluence.index.status.ReIndexNodeStatus>` `<nodeId>545a9112</nodeId>` `<state>WAITING</state>` `<progress>` `<total>0</total>` `<processed>0</processed>` `</progress>` `</com.atlassian.confluence.index.status.ReIndexNodeStatus>` `</nodeStatuses>` `<createdBy class="com.atlassian.confluence.user.ConfluenceUserImpl">` `<key>` `<userkey>402894d795f111470195f11228080000</userkey>` `</key>` `<name>admin</name>` `<lowerName>admin</lowerName>` `</createdBy>` `<spaceKeys/>` `</com.atlassian.confluence.index.status.ReIndexJob>`

note the lastRebuildingUpdate sub-element in the XML payload shown above

A different background job (ReIndexHouseKeepingJobRunner) is always running every 60 seconds.

If this job finds an active/ongoing reindex, it then checks if the lastRebuildingUpdate value (in the bandana row shown above) has been updated within a timeframe specified by the confluence.rendex.noupdate.max.seconds system property (the default is 60 seconds). This is a per-cluster job, which means any one of the Confluence cluster member-nodes may execute it (including the node that is reindexing).

However, if no updates have been detected in the value of lastRebuildingUpdate by the ReIndexHouseKeepingJobRunner job for more than confluence.rendex.noupdate.max.seconds duration, then that reindex attempt, even if it is still progressing in the background, may be inaccurately marked as failed.

Steps to Reproduce

While this situation has been encountered in some customer environments, specific triggers to replicate this are currently unclear.

The corresponding modules also do not have sufficient logging to help in isolating potential causes.

Inducing an artificial delay (via programmatic debug-breakpoints) in between updates to lastRebuildingUpdate has helped with local replication.

Expected Results

Site reindex operation should get launched, reindex progress percentage will be shown on the Content indexing screen, reindexing will eventually finish, and the index snapshot(s) will be propagated to remaining cluster member(s).

Actual Results

The reindexing of content may run without errors/interruptions.

However, if no updates have been detected in the value of lastRebuildingUpdate by the ReIndexHouseKeepingJobRunner job for more than confluence.rendex.noupdate.max.seconds duration, then:

it will falsely mark that reindex attempt as "REBUILD FAILED" in the corresponding reindex’s bandana row (as shown in the sample row above)

the failure will be displayed to the user as a “REBUILD FAILED" reindex job under Content indexing > Recent activity section:

the timestamps here will follow the client-browser-locale/geo and may therefore not match the Confluence server time if it is in a different timezone

the following log message will be recorded in the atlassian-confluence-index.log.*:

2025-05-07 14:18:44,766 WARN [Caesium-1-4] [index.status.schedule.ReIndexHouseKeepingJobRunner] lambda$repairRebuildingJobIfNeeded$1 There was no updates for current re-index job for a while. Last update received at 2025-05-07T04:17:25.547215Z. Marking it as REBUILD_FAILED

Workaround

Modify <ConfluenceInstallDir>/bin/setenv.sh, add and set confluence.rendex.noupdate.max.seconds to a high enough value that exceeds the total time taken for reindexing.
Since any node (reindexing or not) can execute the ReIndexHouseKeepingJobRunner job, all cluster member nodes must have this property set.

For example, if the total time to reindex averages say ~22 hours, then set confluence.rendex.noupdate.max.seconds to 86400 (24 hours converted to seconds after factoring in extra couple hours for wiggle room).

Detailed steps are outlined on this KB article: When rebuilding the Content Indexing, it is marked as REBUILD FAILED but it keeps progressing afterwards