Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-66860

Index recovery can result in a corrupt index despite completing successfully

    XMLWordPrintable

Details

    Description

      This bug is related to JRASERVER-66859. We upgraded in AWS CloudFormation, so nodes are torn down and created anew, and they are then meant to recover from snapshot. However, the first node did not recover the index at all, and the second one did but it was corrupt. For example, dashboards not showing data, particular search results were showing 23 issues instead of 11000, etc.

      Timeline:

      • 07:30: Node 1 spun up, upgrade commences
      • 08:00: Node 2 spun up
      • 08:01: Node 2 begins to restore the index, and completes in 30s (a successful restore should take around 15 minutes)
      • 08:35: We kick off a manual restore from index snapshot on Node 1
      • 09:10: We realise Node 2's index is corrupt, despite the successful restore from snapshot, and we kick off another restore from snapshot
      • 09:24: Index recovery completes and the indexes are both working now

      NB: I confirmed that the snapshots were accessible by Jira, because we used them to recovery the indexes manually.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dunterwurzacher Denise Unterwurzacher [Atlassian] (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: