Add the ability to validate Index snapshot and use older valid snapshot if the latest is corrupt for Jira installation

XMLWordPrintable

    • 1

      Issue Summary

      Jira innately uses the latest index snapshot from the snapshots folder without checking the validity of the snapshot as a part of Jira start up. Index snapshot, if corrupt, Jira fails to start up causing down time. By default , Jira has JVM parameter “-Dcom.atlassian.jira.startup.allow.full.reindex"  which will force Jira to trigger full reindexing if there is bad snapshot. But "full reindexing" is not the viable option for large Jira instance since it might take 5 to 8 hours to complete the full reindexing. So JVM parameter “-Dcom.atlassian.jira.startup.allow.full.reindex" is disabled in the large Jira instances.

      Error message 

      ERROR [c.a.jira.cluster.DefaultClusterManager] Current node: i-06af4d86580b8ccc0-10.132.1.11. Couldn't recover index even though it had been found in shared.
       com.atlassian.jira.issue.index.IndexException: java.io.EOFException: unexpectd EOF when reading frame at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:182) 
      INFO [c.a.jira.cluster.DefaultClusterManager] Current node: Will not trigger full foreground reindex, because system property com.atlassian.jira.startup.allow.full.reindex is set to false.
      ERROR      [c.a.jira.cluster.DefaultClusterManager] Failed to prepare local index. Jira is in an unhealthy state.
      

       

      Root cause

      Lack of logic in Jira to use the valid older snapshot if the latest snapshot is corrupt causes Jira to fail to start up.

      Steps to Reproduce

       Go to $JIRA_HOME/caches/indexesV2/snapshots/ folder and add a corrupt Index snapshot. Start Jira and let it start up. Jira will use the bad index snapshot without checking the validity of the snapshot and eventually fails to start up or run the full reindexing which will take even longer for Jira to come up.

      Expected Results

      Jira discards the corrupt latest snapshot and leverages the second latest valid index snapshot for the Jira start up.

      Actual Results

      Jira will use the bad index snapshot without checking the validity of the snapshot and eventually fails to start up. Jira might trigger full reindexing which takes really long time to complete affecting the availability of Jira.

      Workaround

      The corrupt index snapshot has to be manually removed from the snapshot folder making sure the next latest snapshot is good so that Jira can restore the index. Jira might take little longer time to come up depending on how old the good snapshot is. 

            Assignee:
            Unassigned
            Reporter:
            Amrit Tulachan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: