-
Type:
Suggestion
-
Resolution: Unresolved
-
None
-
Component/s: Data Center - Deployments, Data Center - Index, Data Center - Installer
-
None
-
1
Issue Summary
Jira innately uses the latest index snapshot from the snapshots folder without checking the validity of the snapshot as a part of Jira start up. Index snapshot, if corrupt, Jira fails to start up causing down time. By default , Jira has JVM parameter “-Dcom.atlassian.jira.startup.allow.full.reindex" which will force Jira to trigger full reindexing if there is bad snapshot. But "full reindexing" is not the viable option for large Jira instance since it might take 5 to 8 hours to complete the full reindexing. So JVM parameter “-Dcom.atlassian.jira.startup.allow.full.reindex" is disabled in the large Jira instances.
Error message
ERROR [c.a.jira.cluster.DefaultClusterManager] Current node: i-06af4d86580b8ccc0-10.132.1.11. Couldn't recover index even though it had been found in shared.
com.atlassian.jira.issue.index.IndexException: java.io.EOFException: unexpectd EOF when reading frame at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:182)
INFO [c.a.jira.cluster.DefaultClusterManager] Current node: Will not trigger full foreground reindex, because system property com.atlassian.jira.startup.allow.full.reindex is set to false.
ERROR [c.a.jira.cluster.DefaultClusterManager] Failed to prepare local index. Jira is in an unhealthy state.
Root cause
Lack of logic in Jira to use the valid older snapshot if the latest snapshot is corrupt causes Jira to fail to start up.
Steps to Reproduce
Go to $JIRA_HOME/caches/indexesV2/snapshots/ folder and add a corrupt Index snapshot. Start Jira and let it start up. Jira will use the bad index snapshot without checking the validity of the snapshot and eventually fails to start up or run the full reindexing which will take even longer for Jira to come up.
Expected Results
Jira discards the corrupt latest snapshot and leverages the second latest valid index snapshot for the Jira start up.
Actual Results
Jira will use the bad index snapshot without checking the validity of the snapshot and eventually fails to start up. Jira might trigger full reindexing which takes really long time to complete affecting the availability of Jira.
Workaround
The corrupt index snapshot has to be manually removed from the snapshot folder making sure the next latest snapshot is good so that Jira can restore the index. Jira might take little longer time to come up depending on how old the good snapshot is.