-
Bug
-
Resolution: Fixed
-
High
-
6.4.14, 7.6.15, 7.13.12, 8.5.3
-
6.04
-
42
-
Severity 2 - Major
-
130
-
Summary
Under high load, there could be performance degradation of requests using CachingOfBizPropertyEntryStore cache while loading the data and populating the values. This is due to contention in cache internals. CachingOfBizPropertyEntryStore is used a cache for propertyentry table. If DB performance is low, that affects loading the values into the cache, increasing the contention.
Environment
- Datacenter
Steps to Reproduce
- Apply high load to put CachingOfBizPropertyEntryStore cache under pressure.
- Start updating different elements of the cache (some examples):
- Update user locale
- Last login time (see
JRASERVER-70468) - Dismissed banners
Expected Results
READ and UPDATE operations for a different part of the cache (different keys) can run concurrently without blocking each other.
Actual Results
READ and UPDATE operations for a different part of the cache (different keys) both in WAITING state waiting for the same lock (in the same stripe). So updates are blocking reads for other keys.
Notes
A typical symptom of the problem:
- Large (majority) of the threads are waiting while executing CachingOfBizPropertyEntryStore.resolve
- at the same time and same lock:
- Updates on some elements of the cache in CachingOfBizPropertyEntryStore.setEntry method.
- Updates from other nodes from cluster add to the contentions, eg:
[c.a.j.c.distribution.localq.LocalQCacheOpReader] LocalQCacheOp{cacheName='com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.cache', action=REMOVE, key=CacheKey[entityName=ApplicationUser,entityId=16782], value=null, creationTimeInMillis=1561687285335}
- Updates from other nodes from cluster add to the contentions, eg:
- or loading cache in CachingOfBizPropertyEntryStore.loadPropertySetData. If loading the cache is slow due to DB problems, that will worsen the situation.
- Updates on some elements of the cache in CachingOfBizPropertyEntryStore.setEntry method.
Example from the thread dump:
- 70%+ threads are in WAITING state in the CachingOfBizPropertyEntryStore.resolve method
- Different threads reading different values from the same cache but blocked on the same lock - 0x0000000085534e48:
- getCustomFieldNameTranslation - READ
"http-nio-8080-exec-541" #3540942 daemon prio=5 os_prio=0 tid=0x00007f7b980d2800 nid=0xa43d waiting on condition [0x00007f75ffead000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000085534e48> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) .... at com.atlassian.jira.cache.DeferredReplicationCache.get(DeferredReplicationCache.java:48) at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.resolve(CachingOfBizPropertyEntryStore.java:103) at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.getEntry(CachingOfBizPropertyEntryStore.java:123) at com.atlassian.jira.propertyset.CachingOfBizPropertySet.get(CachingOfBizPropertySet.java:189) at com.opensymphony.module.propertyset.AbstractPropertySet.getString(AbstractPropertySet.java:305) at com.atlassian.jira.web.action.admin.translation.TranslationManagerImpl.getCustomFieldNameTranslation(TranslationManagerImpl.java:229) at com.atlassian.jira.web.action.admin.translation.TranslationManagerImpl.getCustomFieldNameTranslation(TranslationManagerImpl.java:222)
- removeDismissFlagForUser - UPDATE
"http-nio-8080-exec-547" #3540948 daemon prio=5 os_prio=0 tid=0x00007f7b980da000 nid=0xa445 waiting on condition [0x00007f75fceab000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000085534e48> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50) at com.atlassian.cache.ehcache.LoadingCache.remove(LoadingCache.java:191) at com.atlassian.cache.ehcache.DelegatingCache.remove(DelegatingCache.java:146) at com.atlassian.jira.cache.DeferredReplicationCache.lambda$remove$2(DeferredReplicationCache.java:74) at com.atlassian.jira.cache.DeferredReplicationCache$$Lambda$476/1199143721.get(Unknown Source) at com.atlassian.jira.cluster.cache.ehcache.BlockingParallelCacheReplicator.runDeferred(BlockingParallelCacheReplicator.java:172) at com.atlassian.jira.cache.DeferredReplicationCache.remove(DeferredReplicationCache.java:73) at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.invalidateCacheEntry(CachingOfBizPropertyEntryStore.java:239) at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.setEntry(CachingOfBizPropertyEntryStore.java:142) ... at com.atlassian.jira.user.flag.FlagDismissalServiceImpl.removeDismissFlagForUser(FlagDismissalServiceImpl.java:44)
Example rows from propertyentry table:
select * from propertyentry where entity_name = 'jira.properties' limit 2; id | entity_name | entity_id | property_key | propertytype 10209 | jira.properties | 1 | jira.path.attachments | 5 10210 | jira.properties | 1 | jira.path.attachments.use.default.directory | 1 10211 | jira.properties | 1 | jira.option.allowattachments | 1 10213 | jira.properties | 1 | jira.path.backup | 5 ...
Workaround
- Way to reduce the impact - increase the number of stripes in Ehcache to significantly low down the chance of collusion, see
JRASERVER-70518
- is related to
-
JRASERVER-70468 As a Jira Administrator I want to configure user accounts for integration jobs with low login overhead
- Closed
-
JRASERVER-70518 Increase number of cache stripes for EHCache cache
- Closed
- relates to
-
JRASERVER-72909 Universal Plugin Manager flushing jira.properties cache causing contention and performance problem
- Closed
-
PSR-604 Loading...
- Mentioned in
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...