-
Bug
-
Resolution: Fixed
-
Low
-
8.11.0, 8.11.1
-
8.11
-
1
-
Severity 2 - Major
-
Issue Summary
Jira nodes experience poor performance due to contention with the ProjectRoleActorsZduSafeCache.
Steps to Reproduce
N/A
Expected Results
Jira will experience no-cache contention for ProjectRoleActorsZduSafeCache
Actual Results
Thread dumps will show a high number of threads waiting on ProjectRoleActorsZduSafeCache
for var in $(ls *thread*); do printf "%s\n" "$var"; awk -v RS= -v ORS='\n\n' '/com.atlassian.jira.security.roles.ProjectRoleActorsZduSafeCache.get/{print}' $var | awk 'NR==1;/^$/{getline;print}' | wc -l; done jira_threads.txt 108 jira_threads.txt 102 jira_threads.txt 86 jira_threads.txt 85 jira_threads.txt 90 jira_threads.txt 91
Sample thread:
https-jsse-nio-8443-exec-1 jdk.internal.misc.Unsafe.park(java.base@11.0.8/Native Method) java.util.concurrent.locks.LockSupport.park(java.base@11.0.8/LockSupport.java:194) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.8/AbstractQueuedSynchronizer.java:885) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.8/AbstractQueuedSynchronizer.java:917) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.8/AbstractQueuedSynchronizer.java:1240) java.util.concurrent.locks.ReentrantLock.lock(java.base@11.0.8/ReentrantLock.java:267) com.atlassian.jira.cluster.distribution.localq.tape.TapeLocalQCacheOpQueue.isClosed(TapeLocalQCacheOpQueue.java:111) com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpQueueWithStats.isClosed(LocalQCacheOpQueueWithStats.java:89) com.atlassian.jira.cluster.distribution.localq.LocalQCacheManager.addToAllQueues(LocalQCacheManager.java:357) com.atlassian.jira.cluster.distribution.localq.LocalQCacheReplicator.replicateToQueue(LocalQCacheReplicator.java:85) com.atlassian.jira.cluster.distribution.localq.LocalQCacheReplicator.replicateRemovalNotification(LocalQCacheReplicator.java:71) com.atlassian.jira.cluster.cache.ehcache.AbstractJiraCacheReplicator.notifyElementRemoved(AbstractJiraCacheReplicator.java:78) net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementRemoved(RegisteredEventListeners.java:156) net.sf.ehcache.event.RegisteredEventListeners.notifyElementRemoved(RegisteredEventListeners.java:136) net.sf.ehcache.Cache.notifyRemoveInternalListeners(Cache.java:2449) net.sf.ehcache.Cache.removeInternal(Cache.java:2423) net.sf.ehcache.Cache.remove(Cache.java:2325) net.sf.ehcache.Cache.remove(Cache.java:2243) net.sf.ehcache.constructs.EhcacheDecoratorAdapter.remove(EhcacheDecoratorAdapter.java:155) com.atlassian.cache.ehcache.SynchronizedLoadingCacheDecorator.remove(SynchronizedLoadingCacheDecorator.java:62) net.sf.ehcache.constructs.EhcacheDecoratorAdapter.remove(EhcacheDecoratorAdapter.java:155) com.atlassian.cache.ehcache.LoadingCache.remove(LoadingCache.java:195) com.atlassian.cache.ehcache.DelegatingCache.remove(DelegatingCache.java:146) com.atlassian.jira.cache.DeferredReplicationCache.lambda$remove$2(DeferredReplicationCache.java:74) com.atlassian.jira.cache.DeferredReplicationCache$$Lambda$1712/0x0000000802e2b840.get(Unknown Source) com.atlassian.jira.cluster.cache.ehcache.BlockingParallelCacheReplicator.runDeferred(BlockingParallelCacheReplicator.java:172) com.atlassian.jira.cache.DeferredReplicationCache.remove(DeferredReplicationCache.java:73) com.atlassian.jira.security.roles.ProjectRoleActorsZduSafeCache$1$$Lambda$14865/0x0000000801cd3840.accept(Unknown Source)
Workaround
Upgrade to Jira 8.12.2 onwards as this contains the fix.
Background
We had a call with the author of the changes who was able to give us some background on this.
There was a need to optimize the role actors for given project cache by rewriting it (too much data being deleted on cache eviction). This solved https://jira.atlassian.com/browse/JRASERVER-69446. The ProjectRoleActorsZduSafeCache class has been introduced to make it work when Jira is in a mixed mode, during the ZDU. Unfortunately it caused some problems with cache replication between the old and new cache and in some cases unnecessary cache invalidations.
Proposed way of testing if the Jira upgrade will solve the problem
The problem should be present only when there are a lot of project role actors modifications. The client could setup Jira 8.11.1 and trigger a lot of such operations (e.g. by some automation), upgrade, e.g. to 8.13, do the test again and see if the problem persists.