-
Bug
-
Resolution: Unresolved
-
Low
-
None
-
8.5.3, 8.8.0, 7.19.19
-
5
-
Severity 2 - Major
-
30
-
Issue Summary
When a new clustered node starts up, the node will restore the latest index snapshot from the Shared Home but then inadvertently restores an old copy of an index snapshot created from an older index rebuild/propagation
.
- If the copy of the restored index snapshot is more than 48 hours old (e.g. 1 month old), the node's index is now out of date
- With journalentry table only retaining the past 48 hours worth of index objects, the node is now missing the majority of the past month's indexed objects
- With journalentry table only retaining the past 48 hours worth of index objects, the node is now missing the majority of the past month's indexed objects
Steps to Reproduce
- Deploy a clustered Confluence Data Center (start with one node only)
- Once the node is up and running, scale the cluster to two nodes
- Create a few brand new pages so there are items to index
- Navigate to Confluence Administration » General Configuration » Content Indexing
- Initiate a Site reindex rebuild
- Upon completion:
- The <Shared-Home>/index-snapshots folder will have a copy of the newly saved index snapshot zip files
:
e.g. Oct 14, 12:58pmindex-snapshots % ls -l total 376 index-snapshots% ls -l -rw-------. 1 confluence confluence 322M Sep 16 10:26 IndexSnapshot_change_index_18934700.zip -rw-------. 1 confluence confluence 332M Oct 14 12:59 IndexSnapshot_change_index_19385800.zip -rw-------. 1 confluence confluence 2.4G Sep 16 10:25 IndexSnapshot_main_index_18933300.zip -rw-------. 1 confluence confluence 2.5G Oct 14 12:58 IndexSnapshot_main_index_19385400.zip
- The second node will restore the propagated index
- The <Shared-Home>/index-snapshots folder will have a copy of the newly saved index snapshot zip files
- Now create some more new content so there are items to index
- Navigate to Confluence Administration » General Configuration » Scheduled Jobs
- Manually run the Clean Journal Entries (normally run at 2am daily)
- This will create a new index snapshot with _id files in the <Shared-Home>/index-snapshots folder
e.g. Nov 14 02:04am
index-snapshots% ls -l -rw-------. 1 confluence confluence 322M Sep 16 10:26 IndexSnapshot_change_index_18934700.zip -rw-------. 1 confluence confluence 332M Oct 14 12:59 IndexSnapshot_change_index_19385800.zip -rw-------. 1 confluence confluence 334M Nov 14 02:05 IndexSnapshot_change_index_20020900.zip -rw-------. 1 confluence confluence 8 Nov 14 02:05 IndexSnapshot_change_index_journal_id -rw-------. 1 confluence confluence 1.3M Nov 14 02:05 IndexSnapshot_edge_index_30019973.zip -rw-------. 1 confluence confluence 8 Nov 14 02:05 IndexSnapshot_edge_index_journal_id -rw-------. 1 confluence confluence 2.4G Sep 16 10:25 IndexSnapshot_main_index_18933300.zip -rw-------. 1 confluence confluence 2.5G Oct 14 12:58 IndexSnapshot_main_index_19385400.zip -rw-------. 1 confluence confluence 2.5G Nov 14 02:04 IndexSnapshot_main_index_20019500.zip -rw-------. 1 confluence confluence 8 Nov 14 02:04 IndexSnapshot_main_index_journal_id
- All rows in journalentry older than 48 hours are deleted, except the largest ID row of each type. Specifically. the latest RESTORE_INDEX_SNAPSHOT will be retained for system_maintenance:
e.g.
entry_id,journal_name,creationdate,type,message,triedtimes 215300,system_maintenance,2023-10-24 18:58:01.123,RESTORE_INDEX_SNAPSHOT,"{""sourceNodeId"":""1020ab4c"",""indexSnapshots"":[{""index"":""MAIN_INDEX"",""journalId"":19385400},{""index"":""CHANGE_INDEX"",""journalId"":19385800}]}",0
- This will create a new index snapshot with _id files in the <Shared-Home>/index-snapshots folder
- Manually run the Clean Index Snapshots (normally run at 3am daily)
- By default, this will retain up to 3 copies of index-snapshot zip files (as you can see above)
- Manually run the Clean Journal Entries (normally run at 2am daily)
- Shutdown Node 2
- Clear out Node 2's local home directory
- Start up Node 2
Expected Results
Node 2 should initiate an index-recovery from Shared Home from the latest index-snapshot (typically from the 2am index-snapshot created by 'Clean Journal Entries' scheduled job) and then catch up on any outstanding indexed items since the 2am index-snapshot.
Actual Results
Node 2:
- Initiate an index-recovery from Shared Home from the latest index-snapshot (typically from the 2am index-snapshot created by 'Clean Journal Entries' scheduled job)
- Shortly after the index-recovery is done, it will continue to then restore the older index snapshot due to the RESTORE_INDEX_SNAPSHOT row in journalentry table Nov 23 06:11am Node startup
2023-11-23 06:11:20,838 ERROR [Caesium-1-2] [impl.system.runner.ReIndexMaintenanceTaskRunner] shouldReIndex The job id abcfd123-d122-77fa-9585-6241dd0211bb is in COMPLETE stage, which is not REBUILDING 2023-11-23 06:11:20,840 INFO [Caesium-1-2] [impl.system.runner.RestoreIndexSnapshotMaintenanceTaskRunner] doRestore Restoring index snapshots 2023-11-23 06:11:20,854 INFO [Caesium-1-2] [impl.system.runner.RestoreIndexSnapshotMaintenanceTaskRunner] doRestore Index snapshot IndexSnapshot[JournalId=main_index, JournalEntryId=19385400] has been restored 2023-11-23 06:11:20,864 INFO [Caesium-1-2] [impl.system.runner.RestoreIndexSnapshotMaintenanceTaskRunner] doRestore Index snapshot IndexSnapshot[JournalId=change_index, JournalEntryId=19385800] has been restored 2023-11-23 06:11:20,864 INFO [Caesium-1-2] [impl.system.runner.RestoreIndexSnapshotMaintenanceTaskRunner] doRestore All index snapshots have been restored successfully
- In the above example, the node would now need to catch up on 1 month of indexed objects
- However, with only the past 48 hours of indexed objects retained in journalentry table, the node is now missing almost all of the past month's indexed items
- The older the last retained propagated index file, the more the node's index will be missing data
- However, with only the past 48 hours of indexed objects retained in journalentry table, the node is now missing almost all of the past month's indexed items
Workaround
- Follow the steps in Configuring system properties and add in the following JVM flag for each node:
-Dindex.snapshot.retain.count=1
- Restart each node (one at a time) for the change to take effect.
- Once all nodes have the above flag configured, manually run the Confluence Administration » Scheduled Jobs » Clean Index Snapshots and this will retain just the latest index file set
- The next time a node starts up from scratch, it will only restore the latest index set