Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-63189

JIRA is not able to obtain DefaultIndexManager write lock due to plugin holding it

    XMLWordPrintable

Details

    Description

      Summary

      JIRA needs to obtain DefaultIndexManager WriteLock before doing major actions with Lucene index (Full locked reindex, apply index snapshot, apply index from another node in JDC).
      In case of Plugin owns the lock, JIRA will never able to get the lock.

      Steps to Reproduce

      1. Install JIRA DC with 3+ nodes (problem is not specific to JDC)
      2. Install plugin which uses DefaultIndexManager lock. Eg: Structure

      Expected Results

      JIRA is able to run Index operations.

      Actual Results

      JIRA is not able to run Index operations.
      The below exception is thrown in the jira.log file:

      2016-10-18 15:38:51,732 ClusterMessageHandlerServiceThread:thread-1 ERROR      [jira.issue.index.DefaultIndexManager] Wait attempt timed out - waited 30000 milliseconds
      com.atlassian.jira.issue.index.IndexException: Wait attempt timed out - waited 30000 milliseconds
      	at com.atlassian.jira.issue.index.DefaultIndexManager.obtain(DefaultIndexManager.java:936)
      	at com.atlassian.jira.issue.index.DefaultIndexManager.access$900(DefaultIndexManager.java:97)
      	at com.atlassian.jira.issue.index.DefaultIndexManager$IndexLock.tryLock(DefaultIndexManager.java:1298)
      	at com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock(DefaultIndexManager.java:375)
      ...
      	at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:124)
      	at com.atlassian.jira.index.ha.DefaultIndexCopyService$MessageConsumer.restoreIndex(DefaultIndexCopyService.java:175)
      	at com.atlassian.jira.index.ha.DefaultIndexCopyService$MessageConsumer.receive(DefaultIndexCopyService.java:195)
      	at com.atlassian.jira.cluster.OfBizMessageHandlerService.sendLocalFromNode(OfBizMessageHandlerService.java:260)
      	at com.atlassian.jira.cluster.OfBizMessageHandlerService.handleReceivedMessages(OfBizMessageHandlerService.java:153)
      	at com.atlassian.jira.cluster.OfBizMessageHandlerService.access$000(OfBizMessageHandlerService.java:34)
      	at com.atlassian.jira.cluster.OfBizMessageHandlerService$1.run(OfBizMessageHandlerService.java:59)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      2016-10-18 15:38:52,495 ClusterMessageHandlerServiceThread:thread-1 ERROR      [atlassian.jira.cluster.OfBizMessageHandlerService] There was a problem handling a cluster message
      java.lang.RuntimeException: com.atlassian.jira.issue.index.IndexException: Failed to acquire reindex lock
      	at com.atlassian.jira.index.ha.DefaultIndexCopyService$MessageConsumer.restoreIndex(DefaultIndexCopyService.java:179)
      	at com.atlassian.jira.index.ha.DefaultIndexCopyService$MessageConsumer.receive(DefaultIndexCopyService.java:195)
      ...
      Caused by: com.atlassian.jira.issue.index.IndexException: Failed to acquire reindex lock
      	at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:126)
      	at com.atlassian.jira.index.ha.DefaultIndexCopyService$MessageConsumer.restoreIndex(DefaultIndexCopyService.java:175)
      	... 12 more
      

      Notes

      This specific error was caused by bug in Structure 3.3.1 and fixed in 3.3.3 (Structure+3.3.3+Release+Notes). All JIRA DC clients should update to latest Structure.
      See related suggested ticket to improve logging: JRA-63188
      Almworks refactored the code in Structure 3.4 and switched to optimistic locking (no lock is held). That should prevent other lock related problems in the future.

      Workaround

      • Start disable plugins one by one and restart JIRA each time.

      Note on Won't Fix

      Update: 2017-04-20
      Dev team reviewed the bug and at this point implementing proper fix (introducing timeout to plugin operation) will be time consuming and risky as it could damage JIRA index if we were to interrupt plugin.
      Following changes related to this problem were done:

      • Almworks changed their code in Structure and switched to optimistic locking
      • Additional logging were implemented and enabled by default in recent version of JIRA (see JRASERVER-63188)

      With all that being said, we decided that we stop working on this and mark it as Won't Fix. We will reopen it if problem reoccurs and gets more traction.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Votes:
              3 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: