Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-71695

Poor Performance due to ProjectRoleActorsZduSafeCache

    XMLWordPrintable

Details

    Description

      Issue Summary

      Jira nodes experience poor performance due to contention with the ProjectRoleActorsZduSafeCache.

      Steps to Reproduce

      N/A

      Expected Results

      Jira will experience no-cache contention for ProjectRoleActorsZduSafeCache

      Actual Results

      Thread dumps will show a high number of threads waiting on ProjectRoleActorsZduSafeCache

      for var in $(ls *thread*); do printf "%s\n" "$var"; awk -v RS= -v ORS='\n\n' '/com.atlassian.jira.security.roles.ProjectRoleActorsZduSafeCache.get/{print}' $var | awk 'NR==1;/^$/{getline;print}' | wc -l; done
      jira_threads.txt
           108
      jira_threads.txt
           102
      jira_threads.txt
            86
      jira_threads.txt
            85
      jira_threads.txt
            90
      jira_threads.txt
            91
      

      Sample thread:

      https-jsse-nio-8443-exec-1 
      jdk.internal.misc.Unsafe.park(java.base@11.0.8/Native Method)
      java.util.concurrent.locks.LockSupport.park(java.base@11.0.8/LockSupport.java:194)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.8/AbstractQueuedSynchronizer.java:885)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.8/AbstractQueuedSynchronizer.java:917)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.8/AbstractQueuedSynchronizer.java:1240)
      java.util.concurrent.locks.ReentrantLock.lock(java.base@11.0.8/ReentrantLock.java:267)
      com.atlassian.jira.cluster.distribution.localq.tape.TapeLocalQCacheOpQueue.isClosed(TapeLocalQCacheOpQueue.java:111)
      com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpQueueWithStats.isClosed(LocalQCacheOpQueueWithStats.java:89)
      com.atlassian.jira.cluster.distribution.localq.LocalQCacheManager.addToAllQueues(LocalQCacheManager.java:357)
      com.atlassian.jira.cluster.distribution.localq.LocalQCacheReplicator.replicateToQueue(LocalQCacheReplicator.java:85)
      com.atlassian.jira.cluster.distribution.localq.LocalQCacheReplicator.replicateRemovalNotification(LocalQCacheReplicator.java:71)
      com.atlassian.jira.cluster.cache.ehcache.AbstractJiraCacheReplicator.notifyElementRemoved(AbstractJiraCacheReplicator.java:78)
      net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementRemoved(RegisteredEventListeners.java:156)
      net.sf.ehcache.event.RegisteredEventListeners.notifyElementRemoved(RegisteredEventListeners.java:136)
       net.sf.ehcache.Cache.notifyRemoveInternalListeners(Cache.java:2449)
       net.sf.ehcache.Cache.removeInternal(Cache.java:2423)
       net.sf.ehcache.Cache.remove(Cache.java:2325)
       net.sf.ehcache.Cache.remove(Cache.java:2243)
      net.sf.ehcache.constructs.EhcacheDecoratorAdapter.remove(EhcacheDecoratorAdapter.java:155)
      com.atlassian.cache.ehcache.SynchronizedLoadingCacheDecorator.remove(SynchronizedLoadingCacheDecorator.java:62)
      net.sf.ehcache.constructs.EhcacheDecoratorAdapter.remove(EhcacheDecoratorAdapter.java:155)
       com.atlassian.cache.ehcache.LoadingCache.remove(LoadingCache.java:195)
      com.atlassian.cache.ehcache.DelegatingCache.remove(DelegatingCache.java:146)
      com.atlassian.jira.cache.DeferredReplicationCache.lambda$remove$2(DeferredReplicationCache.java:74)
      com.atlassian.jira.cache.DeferredReplicationCache$$Lambda$1712/0x0000000802e2b840.get(Unknown Source)
      com.atlassian.jira.cluster.cache.ehcache.BlockingParallelCacheReplicator.runDeferred(BlockingParallelCacheReplicator.java:172)
      com.atlassian.jira.cache.DeferredReplicationCache.remove(DeferredReplicationCache.java:73)
      com.atlassian.jira.security.roles.ProjectRoleActorsZduSafeCache$1$$Lambda$14865/0x0000000801cd3840.accept(Unknown Source)
      

       

      Workaround

      Upgrade to Jira 8.12.2 onwards as this contains the fix. 

      Background

      We had a call with the author of the changes who was able to give us some background on this.
      There was a need to optimize the role actors for given project cache by rewriting it (too much data being deleted on cache eviction). This solved https://jira.atlassian.com/browse/JRASERVER-69446. The ProjectRoleActorsZduSafeCache class has been introduced to make it work when Jira is in a mixed mode, during the ZDU. Unfortunately it caused some problems with cache replication between the old and new cache and in some cases unnecessary cache invalidations.

      Proposed way of testing if the Jira upgrade will solve the problem

      The problem should be present only when there are a lot of project role actors modifications. The client could setup Jira 8.11.1 and trigger a lot of such operations (e.g. by some automation), upgrade, e.g. to 8.13, do the test again and see if the problem persists.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              clevine Chris Levine
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: