Loading...

Type: Bug
Resolution: Fixed
Priority: Low
Fix Version/s: 9.2.6, 8.5.24, 9.5.2
Affects Version/s: 8.5.3, 8.8.0, 7.19.19
Component/s: Search - Indexing
Labels:

Support reference count:
10
Symptom Severity:
Severity 2 - Major
UIS:
42

Issue Summary

When a new clustered node starts up, the node will restore the latest index snapshot from the Shared Home but then inadvertently restores an old copy of an index snapshot created from an older index rebuild/propagation .

If the copy of the restored index snapshot is more than 48 hours old (e.g. 1 month old), the node's index is now out of date
- With journalentry table only retaining the past 48 hours worth of index objects, the node is now missing the majority of the past month's indexed objects

Steps to Reproduce

Deploy a clustered Confluence Data Center (start with one node only)
Once the node is up and running, scale the cluster to two nodes
Create a few brand new pages so there are items to index

Navigate to Confluence Administration » General Configuration » Content Indexing

Initiate a Site reindex rebuild

Upon completion:

The <Shared-Home>/index-snapshots folder will have a copy of the newly saved index snapshot zip files

:

e.g. Oct 14, 12:58pm

index-snapshots % ls -l  
total 376
index-snapshots% ls -l
-rw-------. 1 confluence confluence 322M Sep 16 10:26 IndexSnapshot_change_index_18934700.zip
-rw-------. 1 confluence confluence 332M Oct 14 12:59 IndexSnapshot_change_index_19385800.zip

-rw-------. 1 confluence confluence 2.4G Sep 16 10:25 IndexSnapshot_main_index_18933300.zip
-rw-------. 1 confluence confluence 2.5G Oct 14 12:58 IndexSnapshot_main_index_19385400.zip

The second node will restore the propagated index

Now create some more new content so there are items to index

Navigate to Confluence Administration » General Configuration » Scheduled Jobs

Manually run the Clean Journal Entries (normally run at 2am daily)

This will create a new index snapshot with _id files in the <Shared-Home>/index-snapshots folder

e.g. Nov 14 02:04am

index-snapshots% ls -l
-rw-------. 1 confluence confluence 322M Sep 16 10:26 IndexSnapshot_change_index_18934700.zip
-rw-------. 1 confluence confluence 332M Oct 14 12:59 IndexSnapshot_change_index_19385800.zip
-rw-------. 1 confluence confluence 334M Nov 14 02:05 IndexSnapshot_change_index_20020900.zip
-rw-------. 1 confluence confluence 8    Nov 14 02:05 IndexSnapshot_change_index_journal_id

-rw-------. 1 confluence confluence 1.3M Nov 14 02:05 IndexSnapshot_edge_index_30019973.zip
-rw-------. 1 confluence confluence 8    Nov 14 02:05 IndexSnapshot_edge_index_journal_id

-rw-------. 1 confluence confluence 2.4G Sep 16 10:25 IndexSnapshot_main_index_18933300.zip
-rw-------. 1 confluence confluence 2.5G Oct 14 12:58 IndexSnapshot_main_index_19385400.zip
-rw-------. 1 confluence confluence 2.5G Nov 14 02:04 IndexSnapshot_main_index_20019500.zip
-rw-------. 1 confluence confluence 8    Nov 14 02:04 IndexSnapshot_main_index_journal_id

All rows in journalentry older than 48 hours are deleted, except the largest ID row of each type. Specifically. the latest RESTORE_INDEX_SNAPSHOT will be retained for system_maintenance:

e.g.

entry_id,journal_name,creationdate,type,message,triedtimes
215300,system_maintenance,2023-10-24 18:58:01.123,RESTORE_INDEX_SNAPSHOT,"{""sourceNodeId"":""1020ab4c"",""indexSnapshots"":[{""index"":""MAIN_INDEX"",""journalId"":19385400},{""index"":""CHANGE_INDEX"",""journalId"":19385800}]}",0

Manually run the Clean Index Snapshots (normally run at 3am daily)
- By default, this will retain up to 3 copies of index-snapshot zip files (as you can see above)

Shutdown Node 2
- Clear out Node 2's local home directory
Start up Node 2

Expected Results

Node 2 should initiate an index-recovery from Shared Home from the latest index-snapshot (typically from the 2am index-snapshot created by 'Clean Journal Entries' scheduled job) and then catch up on any outstanding indexed items since the 2am index-snapshot.

Actual Results

Node 2:

Initiate an index-recovery from Shared Home from the latest index-snapshot (typically from the 2am index-snapshot created by 'Clean Journal Entries' scheduled job)

Shortly after the index-recovery is done, it will continue to then restore the older index snapshot due to the RESTORE_INDEX_SNAPSHOT row in journalentry table

Nov 23 06:11am Node startup

2023-11-23 06:11:20,838 ERROR [Caesium-1-2] [impl.system.runner.ReIndexMaintenanceTaskRunner] shouldReIndex The job id abcfd123-d122-77fa-9585-6241dd0211bb is in COMPLETE stage, which is not REBUILDING
2023-11-23 06:11:20,840 INFO [Caesium-1-2] [impl.system.runner.RestoreIndexSnapshotMaintenanceTaskRunner] doRestore Restoring index snapshots
2023-11-23 06:11:20,854 INFO [Caesium-1-2] [impl.system.runner.RestoreIndexSnapshotMaintenanceTaskRunner] doRestore Index snapshot IndexSnapshot[JournalId=main_index, JournalEntryId=19385400] has been restored
2023-11-23 06:11:20,864 INFO [Caesium-1-2] [impl.system.runner.RestoreIndexSnapshotMaintenanceTaskRunner] doRestore Index snapshot IndexSnapshot[JournalId=change_index, JournalEntryId=19385800] has been restored
2023-11-23 06:11:20,864 INFO [Caesium-1-2] [impl.system.runner.RestoreIndexSnapshotMaintenanceTaskRunner] doRestore All index snapshots have been restored successfully

In the above example, the node would now need to catch up on 1 month of indexed objects
- However, with only the past 48 hours of indexed objects retained in journalentry table, the node is now missing almost all of the past month's indexed items
- The older the last retained propagated index file, the more the node's index will be missing data

Workaround

Follow the steps in Configuring system properties and add in the following JVM flag for each node:
```
-Dindex.snapshot.retain.count=1
```
Restart each node (one at a time) for the change to take effect.
Once all nodes have the above flag configured, manually run the Confluence Administration » Scheduled Jobs » Clean Index Snapshots and this will retain just the latest index file set
The next time a node starts up from scratch, it will only restore the latest index set

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(2 mentioned in)

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Issue Links

Activity

People

Dates