Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-70519

Performance degradation due to contention in CachingOfBizPropertyEntryStore cache

XMLWordPrintable

      Summary

      Under high load, there could be performance degradation of requests using CachingOfBizPropertyEntryStore cache while loading the data and populating the values. This is due to contention in cache internals. CachingOfBizPropertyEntryStore is used a cache for propertyentry table. If DB performance is low, that affects loading the values into the cache, increasing the contention.

      Environment

      • Datacenter

      Steps to Reproduce

      1. Apply high load to put CachingOfBizPropertyEntryStore cache under pressure.
      2. Start updating different elements of the cache (some examples):
        • Update user locale
        • Last login time (see JRASERVER-70468)
        • Dismissed banners

      Expected Results

      READ and UPDATE operations for a different part of the cache (different keys) can run concurrently without blocking each other.

      Actual Results

      READ and UPDATE operations for a different part of the cache (different keys) both in WAITING state waiting for the same lock (in the same stripe). So updates are blocking reads for other keys.

      Notes

      A typical symptom of the problem:

      • Large (majority) of the threads are waiting while executing CachingOfBizPropertyEntryStore.resolve
      • at the same time and same lock:
        • Updates on some elements of the cache in CachingOfBizPropertyEntryStore.setEntry method.
          • Updates from other nodes from cluster add to the contentions, eg:
            [c.a.j.c.distribution.localq.LocalQCacheOpReader] LocalQCacheOp{cacheName='com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.cache', action=REMOVE, key=CacheKey[entityName=ApplicationUser,entityId=16782], value=null, creationTimeInMillis=1561687285335} 
            
        • or loading cache in CachingOfBizPropertyEntryStore.loadPropertySetData. If loading the cache is slow due to DB problems, that will worsen the situation.

      Example from the thread dump:

      • 70%+ threads are in WAITING state in the CachingOfBizPropertyEntryStore.resolve method
      • Different threads reading different values from the same cache but blocked on the same lock - 0x0000000085534e48:
      • getCustomFieldNameTranslation - READ
        "http-nio-8080-exec-541" #3540942 daemon prio=5 os_prio=0 tid=0x00007f7b980d2800 nid=0xa43d waiting on condition [0x00007f75ffead000]
           java.lang.Thread.State: WAITING (parking)
        	at sun.misc.Unsafe.park(Native Method)
        	- parking to wait for  <0x0000000085534e48> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        ....
        	at com.atlassian.jira.cache.DeferredReplicationCache.get(DeferredReplicationCache.java:48)
        	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.resolve(CachingOfBizPropertyEntryStore.java:103)
        	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.getEntry(CachingOfBizPropertyEntryStore.java:123)
        	at com.atlassian.jira.propertyset.CachingOfBizPropertySet.get(CachingOfBizPropertySet.java:189)
        	at com.opensymphony.module.propertyset.AbstractPropertySet.getString(AbstractPropertySet.java:305)
        	at com.atlassian.jira.web.action.admin.translation.TranslationManagerImpl.getCustomFieldNameTranslation(TranslationManagerImpl.java:229)
        	at com.atlassian.jira.web.action.admin.translation.TranslationManagerImpl.getCustomFieldNameTranslation(TranslationManagerImpl.java:222)
        
      • removeDismissFlagForUser - UPDATE
        "http-nio-8080-exec-547" #3540948 daemon prio=5 os_prio=0 tid=0x00007f7b980da000 nid=0xa445 waiting on condition [0x00007f75fceab000]
           java.lang.Thread.State: WAITING (parking)
        	at sun.misc.Unsafe.park(Native Method)
        	- parking to wait for  <0x0000000085534e48> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
        	at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50)
        	at com.atlassian.cache.ehcache.LoadingCache.remove(LoadingCache.java:191)
        	at com.atlassian.cache.ehcache.DelegatingCache.remove(DelegatingCache.java:146)
        	at com.atlassian.jira.cache.DeferredReplicationCache.lambda$remove$2(DeferredReplicationCache.java:74)
        	at com.atlassian.jira.cache.DeferredReplicationCache$$Lambda$476/1199143721.get(Unknown Source)
        	at com.atlassian.jira.cluster.cache.ehcache.BlockingParallelCacheReplicator.runDeferred(BlockingParallelCacheReplicator.java:172)
        	at com.atlassian.jira.cache.DeferredReplicationCache.remove(DeferredReplicationCache.java:73)
        	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.invalidateCacheEntry(CachingOfBizPropertyEntryStore.java:239)
        	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.setEntry(CachingOfBizPropertyEntryStore.java:142)
        ...
        	at com.atlassian.jira.user.flag.FlagDismissalServiceImpl.removeDismissFlagForUser(FlagDismissalServiceImpl.java:44)
        

      Example rows from propertyentry table:

      select * from propertyentry where entity_name = 'jira.properties' limit 2;
        id   |   entity_name   | entity_id |       property_key                             | propertytype
       10209 | jira.properties |         1 | jira.path.attachments                       | 5
       10210 | jira.properties |         1 | jira.path.attachments.use.default.directory | 1
       10211 | jira.properties |         1 | jira.option.allowattachments                | 1
       10213 | jira.properties |         1 | jira.path.backup                            | 5
      ...
      

      Workaround

      • Way to reduce the impact - increase the number of stripes in Ehcache to significantly low down the chance of collusion, see JRASERVER-70518

              mswinarski Maciej Swinarski (Inactive)
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Votes:
              13 Vote for this issue
              Watchers:
              29 Start watching this issue

                Created:
                Updated:
                Resolved: