Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-63834

Deadlock in DelegatingCache

    XMLWordPrintable

Details

    Description

      {info}This is not a support request. This is a bug report found by ALM Works developers on a test instance of JIRA, with technical findings. Please assign this ticket to a developer.{info}

      On our test JIRA instance (JIRA 7.1.2), we've encountered a case where any administration page in JIRA would fail to render. All non-administrative pages render normally, however. The problem is that the request serving threads participate in a deadlock.

      Example stack trace:

      "http-nio-8080-exec-24" #111 daemon prio=5 os_prio=0 tid=0x00007efd3186b000 nid=0x132 waiting on condition [0x00007efd1802f000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00000000dc306748> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
      	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
      	at com.atlassian.beehive.simple.SingleNodeLock.lock(SingleNodeLock.java:60)
      	at com.atlassian.jira.config.DefaultReindexMessageManager.getMessage(DefaultReindexMessageManager.java:170)
      	at jsp.decorators.admin_jsp._jspService(admin_jsp.java:315)
      	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
      	at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
      ...

      Any thread that serves a request to an administrative page becomes locked on the same lock without progress.

      This lock is taken by this thread, which is also locked:

      "http-nio-8080-exec-14" #99 daemon prio=5 os_prio=0 tid=0x00007efd30ed2800 nid=0x126 waiting on condition [0x00007efd19057000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00000000d2700198> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
      	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
      	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2218)
      	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
      	at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
      	at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
      	at com.atlassian.cache.memory.DelegatingCache.get(DelegatingCache.java:173)
      	at com.atlassian.cache.memory.DelegatingCache.get(DelegatingCache.java:158)
      	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.resolve(CachingOfBizPropertyEntryStore.java:111)
      	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.getType(CachingOfBizPropertyEntryStore.java:141)
      	at com.atlassian.jira.propertyset.CachingOfBizPropertySet.exists(CachingOfBizPropertySet.java:139)
      	at com.atlassian.jira.config.DefaultReindexMessageManager.getMessage(DefaultReindexMessageManager.java:175)
      	at jsp.decorators.admin_jsp._jspService(admin_jsp.java:315)
      	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
      	at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
      ...
      

      It is not apparent which thread has locked the Guava cache segment. However, there's also another thread locked on the same segment:

      "Caesium-1-3" #148 daemon prio=5 os_prio=0 tid=0x00007efd39918000 nid=0x156 waiting on condition [0x00007efd2aa16000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00000000d2700198> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
      	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
      	at com.google.common.cache.LocalCache$Segment.remove(LocalCache.java:3067)
      	at com.google.common.cache.LocalCache.remove(LocalCache.java:4173)
      	at com.google.common.cache.LocalCache$LocalManualCache.invalidate(LocalCache.java:4765)
      	at com.atlassian.cache.memory.DelegatingCache.remove(DelegatingCache.java:237)
      	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.invalidateCacheEntry(CachingOfBizPropertyEntryStore.java:239)
      	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.removeEntry(CachingOfBizPropertyEntryStore.java:160)
      	at com.atlassian.jira.propertyset.CachingOfBizPropertySet.remove(CachingOfBizPropertySet.java:165)
      	at com.atlassian.sal.jira.pluginsettings.JiraPluginSettings.putActual(JiraPluginSettings.java:24)
      	at com.atlassian.sal.core.pluginsettings.AbstractStringPluginSettings.put(AbstractStringPluginSettings.java:97)
      	at com.atlassian.upm.core.impl.NamespacedPluginSettings.put(NamespacedPluginSettings.java:34)
      	at com.atlassian.upm.notification.PluginSettingsNotificationCache.saveNotifications(PluginSettingsNotificationCache.java:325)
      ...
      

      Note that this thread holds the OneShotLatch lock, acquired in DelegatingCache.acquireLockFor(). And there's another thread that is waiting to acquire this lock:

      "Caesium-1-2" #147 daemon prio=5 os_prio=0 tid=0x00007efd38a64000 nid=0x155 waiting on condition [0x00007efd2a59d000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00000000e7747500> (a com.atlassian.cache.impl.OneShotLatch$Sync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
      	at com.atlassian.cache.impl.OneShotLatch.await(OneShotLatch.java:31)
      	at com.atlassian.cache.memory.DelegatingCache.acquireLockFor(DelegatingCache.java:455)
      	at com.atlassian.cache.memory.DelegatingCache.remove(DelegatingCache.java:234)
      	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.invalidateCacheEntry(CachingOfBizPropertyEntryStore.java:239)
      	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.removeEntry(CachingOfBizPropertyEntryStore.java:160)
      	at com.atlassian.jira.propertyset.CachingOfBizPropertySet.remove(CachingOfBizPropertySet.java:165)
      	at com.atlassian.sal.jira.pluginsettings.JiraPluginSettings.putActual(JiraPluginSettings.java:24)
      	at com.atlassian.sal.core.pluginsettings.AbstractStringPluginSettings.put(AbstractStringPluginSettings.java:85)
      	at com.atlassian.analytics.client.upload.S3EventUploader.storeLastUploadDate(S3EventUploader.java:91)
      	at com.atlassian.analytics.client.upload.S3EventUploader.execute(S3EventUploader.java:84)
      	at com.atlassian.scheduler.compat.local.LocalOnlyPluginJob.execute(LocalOnlyPluginJob.java:30)
      ...
      

      I don't have source code for JIRA 7.1.2 at the moment, but by looking into sources of atlassian-cache-memory-2.11.1 (supplied with JIRA 7.1.9), I've noticed indeed a deadlock situation where locks are acquired in different order:

      DelegatingCache.remove():

      • acquireLockFor(key)
      • internalCache.invalidate() — acquires segment lock internally
        DelegatingCache.get():
      • internalCache.get() — acquires segment lock internally
      • acquireLockFor(key)

      It might be that this problem is related. Needs more thorough investigation.

      Attachments

        1. threaddumps.txt
          343 kB
          Keri Duthie

        Activity

          People

            Unassigned Unassigned
            bbf762edcc79 Igor Sereda [ALM Works]
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: