During an index snapshot restore Jira holds an exclusive write lock on the index but allows traffic unexpectedly when clustering is configured

XMLWordPrintable

    • 9.12
    • 11
    • Severity 3 - Minor
    • 1
    • Hide
      Atlassian Update – 23 January 2026

      Dear Customers,

      We're happy to announce that this issue is fixed in 10.3.17, 11.3.2 and 11.4.0 releases.

      Previously, some parts of the index snapshot restoration were performed under an index write lock because no valid Jira index was available, but issue operations were not appropriately blocked. It resulted in those operations getting stuck on the mentioned index lock, leading to timeouts and an inconsistency between the DB and the indexes.

      We fixed the bug by introducing two additional mechanisms: blocking the node restoring the snapshot, and cluster-wide synchronization during index snapshot restoration.

      Now, when the node restoring the index enters the critical section during which the index on that node is unavailable, it will be blocked entirely, preventing users or any automations from making changes that could lead to inconsistencies. This is consistent with other situations where no index is available. The time the node is unavailable is minimized to add little to no disruption to end users.

      Cluster-wide synchronization ensures that only one node can enter the critical section during snapshot restoration, preventing the situation where all nodes start the index snapshot restore at once and become unavailable. Such a case is possible after a full reindex, where all nodes start restoring the snapshot created by the node that finished full reindex. Adding this synchronization was crucial to prevent the situation where the cluster would be reduced from N to 1 node during snapshot restoration. It can be turned off in "Advanced Settings" by changing the jira.index.synchronized.index.snapshot.restoration property to false.  This would result in concurrent snapshot restoration, allowing the overall index propagation to finish sooner, but at the cost of reduced Jira availability during the restoration. 

      Best regards,

      Michał Błajet
      Senior Software Engineer
      Jira Data Center

      Show
      Atlassian Update – 23 January 2026 Dear Customers, We're happy to announce that this issue is fixed in 10.3.17, 11.3.2 and 11.4.0 releases. Previously, some parts of the index snapshot restoration were performed under an index write lock because no valid Jira index was available, but issue operations were not appropriately blocked. It resulted in those operations getting stuck on the mentioned index lock, leading to timeouts and an inconsistency between the DB and the indexes. We fixed the bug by introducing two additional mechanisms: blocking the node restoring the snapshot, and cluster-wide synchronization during index snapshot restoration. Now, when the node restoring the index enters the critical section during which the index on that node is unavailable, it will be blocked entirely, preventing users or any automations from making changes that could lead to inconsistencies. This is consistent with other situations where no index is available. The time the node is unavailable is minimized to add little to no disruption to end users. Cluster-wide synchronization ensures that only one node can enter the critical section during snapshot restoration, preventing the situation where all nodes start the index snapshot restore at once and become unavailable. Such a case is possible after a full reindex, where all nodes start restoring the snapshot created by the node that finished full reindex. Adding this synchronization was crucial to prevent the situation where the cluster would be reduced from N to 1 node during snapshot restoration. It can be turned off in "Advanced Settings" by changing the jira.index.synchronized.index.snapshot.restoration property to false .  This would result in concurrent snapshot restoration, allowing the overall index propagation to finish sooner, but at the cost of reduced Jira availability during the restoration.  Best regards, Michał Błajet Senior Software Engineer Jira Data Center

      Issue Summary

      During the index restore process, Jira will obtain an exclusive write lock, which means that no other read or update operations can be performed on the index. Jira still handles traffic during the restore process however, which causes update operations to timeout.

      Worse, because entity version isn't being updated, these update operations performed during the restore process won't be replicated to the other nodes, even after the index restore process completes.

      This is reproducible on Data Center: Yes

      Steps to Reproduce

      1. Create a 2 node JDC cluster
      2. On node1, connect a JVM debugger.
      3. Place a thread-specific breakpoint on something within the index recovery/catch-up process (such as com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.ReplaceIndexRunner#catchUp).
      4. One node2, initiate a full reindex and wait for it to complete.
      5. Wait for the breakpoint to hit due to the FULL_REINDEX_END cluster message.
      6. Observe that you can still access the UI, although it will be extremely laggy.

      Expected Results

      Access to Jira is blocked while the index snapshot is being restored ( this occurs correctly when using 1 node without clustering enabled )

      Actual Results

      Jira is accessible
      The below exception is thrown in the atlassian-jira.log file:

      2024-09-27 02:32:24,925+0000 http-nio-8080-exec-2 url: /jira/secure/WorkflowUIDispatcher.jspa; user: admin ERROR admin 151x523x1 18ejmbe 172.29.254.120,172.50.0.3 /secure/WorkflowUIDispatcher.jspa [c.a.j.issue.index.DefaultIndexManager] Wait attempt timed out - waited 30000 milliseconds
      com.atlassian.jira.issue.index.IndexException: Wait attempt timed out - waited 30000 milliseconds
      
      2024-09-27 02:32:24,928+0000 http-nio-8080-exec-2 url: /jira/secure/WorkflowUIDispatcher.jspa; user: admin ERROR admin 151x523x1 18ejmbe 172.29.254.120,172.50.0.3 /secure/WorkflowUIDispatcher.jspa [c.a.j.issue.index.DefaultIndexManager] Could not reindex: com.atlassian.jira.issue.util.IssueObjectIssuesIterable (1 items): [SCRUM-1]
      com.atlassian.jira.issue.index.exception.CannotGetIndexLockException: Can not get index lock.
      

      Workaround

      The following bash script will identify affected issues from the error logs and re-index them over the REST API

      #!/bin/bash
      # Run in Jira log directory or copy atlassian-jira.log files to the current working directory
      # Extract unique Jira issue keys
      issue_keys=$(grep 'Could not reindex: com.atlassian.jira.issue.util.IssueObjectIssuesIterable' atlassian-jira.log* | grep -o '\[[A-Z][A-Z0-9]*-[0-9]\+\]' | sed 's/\[\(.*\)\]/\1/' | sort | uniq )
      
      # Base URL without trailing / ( ex: https://jira.example.com )
      base_url="(bseurl)"
      
      # Personal access token
      token="(token)"
      
      # Loop through each unique issue key and make a curl request
      for issue_key in $issue_keys; do
        echo "Reindexing issue: $issue_key"
        curl -H "Authorization: Bearer $token" -X POST "${base_url}/rest/api/2/reindex/issue?issueId=${issue_key}"
        echo -e "\n"
      
      done
      

              Assignee:
              Michał Błajet
              Reporter:
              Jeff Curry
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: