Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-65977

[Data Center] Lock contention on ehcache in DefaultGlobalPermissionManager under high load with many users

    XMLWordPrintable

Details

    Description

      Actual behaviour

      When JIRA is having  problem replicating ehcache changes, one will see all ehcache-replicator  threads stuck in calls like this

      ehcache-replicator-10 State: RUNNABLE tid: 59
      java.net.SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) SocketInputStream.java
      java.net.SocketInputStream.socketRead(FileDescriptor, byte[], int, int, int) SocketInputStream.java:116
      java.net.SocketInputStream.read(byte[], int, int, int) SocketInputStream.java:171
      java.net.SocketInputStream.read(byte[], int, int) SocketInputStream.java:141
      java.io.BufferedInputStream.fill() BufferedInputStream.java:246
      java.io.BufferedInputStream.read() BufferedInputStream.java:265
      java.io.DataInputStream.readByte() DataInputStream.java:265
      sun.rmi.transport.tcp.TCPChannel.createConnection() TCPChannel.java:246
      sun.rmi.transport.tcp.TCPChannel.newConnection() TCPChannel.java:202
      sun.rmi.server.UnicastRef.newCall(RemoteObject, Operation[], int, long) UnicastRef.java:342
      sun.rmi.registry.RegistryImpl_Stub.lookup(String)
      java.rmi.Naming.lookup(String) Naming.java:101
      net.sf.ehcache.distribution.RMICacheManagerPeerProvider.lookupRemoteCachePeer(String) RMICacheManagerPeerProvider.java:127
      com.atlassian.jira.cluster.distribution.JiraCacheManagerPeerProvider.lambda$getCachePeerAsync$2(String) JiraCacheManagerPeerProvider.java:76
      com.atlassian.jira.cluster.distribution.JiraCacheManagerPeerProvider$$Lambda$45.get()
      com.atlassian.jira.cluster.distribution.ClassLoaderSwitchingSupplier.get() ClassLoaderSwitchingSupplier.java:20
      java.util.concurrent.CompletableFuture$AsyncSupply.run() CompletableFuture.java:1590
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1142
      java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:617
      java.lang.Thread.run() Thread.java:748
      

      Many threads will be stuck in BlockingCache.acquiredLockForKey. They try to get data from ehcache. 
      Stack traces at the moment of snapshot capture:

      http-nio-8080-exec-799 url:/  [DAEMON] State: RUNNABLE tid: 3280645
      java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(Thread) ReentrantReadWriteLock.java:547
      java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(int) ReentrantReadWriteLock.java:488
      java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) AbstractQueuedSynchronizer.java:1282
      java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock() ReentrantReadWriteLock.java:727
      net.sf.ehcache.concurrent.ReadWriteLockSync.lock(LockType) ReadWriteLockSync.java:50
      net.sf.ehcache.constructs.blocking.BlockingCache.acquiredLockForKey(Object, Sync, LockType) BlockingCache.java:196
      net.sf.ehcache.constructs.blocking.BlockingCache.get(Object) BlockingCache.java:158
      com.atlassian.cache.ehcache.LoadingCache.get(Object) LoadingCache.java:75
      net.sf.ehcache.constructs.blocking.BlockingCache.get(Serializable) BlockingCache.java:318
      com.atlassian.cache.ehcache.DelegatingCachedReference.get() DelegatingCachedReference.java:62
      com.atlassian.jira.security.GlobalPermissionsCache.hasPermission(GlobalPermissionEntry) GlobalPermissionsCache.java:57
      com.atlassian.jira.security.DefaultGlobalPermissionManager.hasPermission(GlobalPermissionEntry) DefaultGlobalPermissionManager.java:419
      ...
      com.atlassian.jira.security.DefaultGlobalPermissionManager.hasPermissionIgnoreRecovery(GlobalPermissionKey, ApplicationUser) DefaultGlobalPermissionManager.java:308
      com.atlassian.jira.security.DefaultGlobalPermissionManager.hasPermission(GlobalPermissionKey, ApplicationUser) DefaultGlobalPermissionManager.java:270
      com.atlassian.jira.security.DefaultGlobalPermissionManager.hasPermission(int, ApplicationUser) DefaultGlobalPermissionManager.java:264
      com.atlassian.jira.security.DefaultPermissionManager.hasPermission(int, ApplicationUser) DefaultPermissionManager.java:81
      com.atlassian.jira.security.ApplicationRequiredPermissionManager.hasPermission(int, ApplicationUser) ApplicationRequiredPermissionManager.java:60
      ...
      

      Expected behaviour
      Ehcache updates should not block gets.

      Cause
      DefaultGlobalPermissionManager is using cached reference which means all threads need to fight for single ReentrantReadWriteLock. Single view issue for user with 150 groups can generate over 55k accesses to this cache.  This lock uses CAS and as a result high contention may kill its performance. Our tests show that with 200 concurrent threads performance degrades 1000 times

       

      Attachments

        Issue Links

          Activity

            People

              ajakubowski Adam Jakubowski (Inactive)
              ajakubowski Adam Jakubowski (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: