Loading...

Type: Bug
Resolution: Fixed
Priority: Low
Fix Version/s: 8.12.2, 8.13.0
Affects Version/s: 8.11.0, 8.11.1
Component/s: Data Center - Other
Labels:

Introduced in Version:
8.11
Support reference count:
1
Symptom Severity:
Severity 2 - Major
Bug Fix Policy:
View Atlassian Server bug fix policy

Issue Summary

Jira nodes experience poor performance due to contention with the ProjectRoleActorsZduSafeCache.

Steps to Reproduce

N/A

Expected Results

Jira will experience no-cache contention for ProjectRoleActorsZduSafeCache

Actual Results

Thread dumps will show a high number of threads waiting on ProjectRoleActorsZduSafeCache

for var in $(ls *thread*); do printf "%s\n" "$var"; awk -v RS= -v ORS='\n\n' '/com.atlassian.jira.security.roles.ProjectRoleActorsZduSafeCache.get/{print}' $var | awk 'NR==1;/^$/{getline;print}' | wc -l; done
jira_threads.txt
     108
jira_threads.txt
     102
jira_threads.txt
      86
jira_threads.txt
      85
jira_threads.txt
      90
jira_threads.txt
      91

Sample thread:

https-jsse-nio-8443-exec-1 
jdk.internal.misc.Unsafe.park(java.base@11.0.8/Native Method)
java.util.concurrent.locks.LockSupport.park(java.base@11.0.8/LockSupport.java:194)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.8/AbstractQueuedSynchronizer.java:885)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.8/AbstractQueuedSynchronizer.java:917)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.8/AbstractQueuedSynchronizer.java:1240)
java.util.concurrent.locks.ReentrantLock.lock(java.base@11.0.8/ReentrantLock.java:267)
com.atlassian.jira.cluster.distribution.localq.tape.TapeLocalQCacheOpQueue.isClosed(TapeLocalQCacheOpQueue.java:111)
com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpQueueWithStats.isClosed(LocalQCacheOpQueueWithStats.java:89)
com.atlassian.jira.cluster.distribution.localq.LocalQCacheManager.addToAllQueues(LocalQCacheManager.java:357)
com.atlassian.jira.cluster.distribution.localq.LocalQCacheReplicator.replicateToQueue(LocalQCacheReplicator.java:85)
com.atlassian.jira.cluster.distribution.localq.LocalQCacheReplicator.replicateRemovalNotification(LocalQCacheReplicator.java:71)
com.atlassian.jira.cluster.cache.ehcache.AbstractJiraCacheReplicator.notifyElementRemoved(AbstractJiraCacheReplicator.java:78)
net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementRemoved(RegisteredEventListeners.java:156)
net.sf.ehcache.event.RegisteredEventListeners.notifyElementRemoved(RegisteredEventListeners.java:136)
 net.sf.ehcache.Cache.notifyRemoveInternalListeners(Cache.java:2449)
 net.sf.ehcache.Cache.removeInternal(Cache.java:2423)
 net.sf.ehcache.Cache.remove(Cache.java:2325)
 net.sf.ehcache.Cache.remove(Cache.java:2243)
net.sf.ehcache.constructs.EhcacheDecoratorAdapter.remove(EhcacheDecoratorAdapter.java:155)
com.atlassian.cache.ehcache.SynchronizedLoadingCacheDecorator.remove(SynchronizedLoadingCacheDecorator.java:62)
net.sf.ehcache.constructs.EhcacheDecoratorAdapter.remove(EhcacheDecoratorAdapter.java:155)
 com.atlassian.cache.ehcache.LoadingCache.remove(LoadingCache.java:195)
com.atlassian.cache.ehcache.DelegatingCache.remove(DelegatingCache.java:146)
com.atlassian.jira.cache.DeferredReplicationCache.lambda$remove$2(DeferredReplicationCache.java:74)
com.atlassian.jira.cache.DeferredReplicationCache$$Lambda$1712/0x0000000802e2b840.get(Unknown Source)
com.atlassian.jira.cluster.cache.ehcache.BlockingParallelCacheReplicator.runDeferred(BlockingParallelCacheReplicator.java:172)
com.atlassian.jira.cache.DeferredReplicationCache.remove(DeferredReplicationCache.java:73)
com.atlassian.jira.security.roles.ProjectRoleActorsZduSafeCache$1$$Lambda$14865/0x0000000801cd3840.accept(Unknown Source)

Workaround

Upgrade to Jira 8.12.2 onwards as this contains the fix.

Background

We had a call with the author of the changes who was able to give us some background on this.
There was a need to optimize the role actors for given project cache by rewriting it (too much data being deleted on cache eviction). This solved https://jira.atlassian.com/browse/JRASERVER-69446. The ProjectRoleActorsZduSafeCache class has been introduced to make it work when Jira is in a mixed mode, during the ZDU. Unfortunately it caused some problems with cache replication between the old and new cache and in some cases unnecessary cache invalidations.

Proposed way of testing if the Jira upgrade will solve the problem

The problem should be present only when there are a lot of project role actors modifications. The client could setup Jira 8.11.1 and trigger a lot of such operations (e.g. by some automation), upgrade, e.g. to 8.13, do the test again and see if the problem persists.

is related to

JRASERVER-69446 Removing actor from project role can make Jira unresponsive

Closed

SSE-783 Loading...

YET-144 Loading...

is caused by: YET-1 Loading...

is resolved by: YET-138 Loading...

mentioned in: Page Loading...; Page Loading...

(2 mentioned in)

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Background

Proposed way of testing if the Jira upgrade will solve the problem

Attachments

Issue Links

Forms

Activity

People

Dates