-
Type:
Suggestion
-
Resolution: Fixed
-
None
-
Component/s: Documentation - All
NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.
The current documentation do not sufficiently cover the technical details of the index recovery feature introduced in Confluence 5.9:
- https://confluence.atlassian.com/display/DOC/Confluence+Data+Center+Technical+Overview
- https://confluence.atlassian.com/display/DOC/Confluence+Data+Center+disaster+recovery
An important item to note about the index recovery process is that it is performed only on node startup. The first document does touch upon this, but also contains some misleading information, namely:
If you need to reindex Confluence for any reason, this is done on one node, and then picked up by the other nodes automatically.
This is because if you rebuild the index on one node, other exist nodes will not actually pick it up unless those nodes are new or require index recovery. Additionally, the recovery is only done on startup, and will not affect nodes currently running. The way index recovery works in 5.9 and higher is as follows:
On start up, a node will check the status of each of its indexes in its local home to see if it is missing, corrupt or out of date.
It does this by comparing the index entry id stored in the local journal file, e.g. journal/main_index file, in the journal folder in the node's local directory, with the index entry id in the database:
- If the local journal file is missing or has an entry id of 0, the index is missing.
- If the local journal entry id is not found in the database table "journalentry", then the index is out of date.
- If the index cannot be read programmatically by lucene API, the index is corrupt.
If any of the above scenarios are found to be true, the node will attempt to recover the index in the following order:
- Try recover search index by checking the shared home directory for a valid index backup file.
- Try ask other nodes in the cluster for an index backup.
- Perform a reindex.
Additionally, there are optional JVM parameters that affect index recovery which should be documented:
confluence.cluster.index.recovery.generation.timeout
Default: 120 seconds. The amount of time, in seconds, that the confluence node needing index recovery will wait for an index snapshot to be created by another node, before it gives up and fails the index recovery attempt.
confluence.cluster.index.recovery.query.timeout
Default: 10 seconds. The amount of time, in seconds, that the confluence node needing recovery will wait for an index snapshot offer from any of its peers, before it gives up and fails the index recovery attempt.
confluence.cluster.index.recovery.num.attempts
Default: 1 attempt. The number of attempts the node will try to recover its index.
Note that if this is set to 0, then effectively the 5.9 index recovery feature is disabled. If an index is found to be out of date, unavailable, or corrupt, then that node will automatically rebuild its own index after startup.
Finally, because of the 5.9 index recovery feature, if an administrator needs to rebuild from scratch the index across all nodes in a cluster (i.e. if all nodes are suspected to have untrustworthy index files), they'll need to temporarily disable the index recovery feature. Otherwise, on startup the node without the index will simply try to recover from another node which defeats the purpose of reindex from scratch. This process is laid out in the following KB: https://confluence.atlassian.com/confkb/how-to-rebuild-the-content-indexes-from-scratch-on-confluence-data-center-833941594.html
- relates to
-
CONFCLOUD-43391 Document technical details for the Confluence DC 5.9+ Index Recovery feature
- Closed
-
CONFSERVER-40071 Improve Confluence Data Center Documentation
- Gathering Interest