Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-63865

all JIRA threads are blocked waiting for cache UsersSecurityLevels

XMLWordPrintable

      Summary

      JIRA is very slow or inaccessible. Almost all incoming http threads are blocked waiting for cache getUsersSecurityLevels to be populated.
      Problem could be caused by two factors:

      • Too frequent cache flush
      • Slow cache population

      Environment

      • Large number of issue: 1M+
      • Large number of users: 50k+
      • High concurrent load: 300 request/min

      Steps to Reproduce

      1. TBD

      Expected Results

      JIRA can handle high load

      Actual Results

      JIRA is very slow.
      Thread-dump will show that there are a lot of threads in java.lang.Thread.State: WAITING state waiting for cache.

      Notes

      Please note that doAddUserToGroup() or addActorsToProjectRole() will cause flush of the projectAndUserToSecurityLevelCache cache.

      More detailed break-down from thread dump

      • Number of threads and state: based on Future 0000001303b43078
        find ./ -name '*.td' | xargs -n 1 -I %  sh -c '/bin/echo -n %; grep -B 3 -A 20 0000001303b43078 % | grep -e java.lang.Thread.State | sed 's/prio.*//' |sort | uniq -c '
        .//jira_threads.1484815453.txt1.td 214    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815466.txt1.td 208    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815479.txt1.td 212    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815492.txt1.td 
           1    java.lang.Thread.State: RUNNABLE
           211    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815505.txt1.td 191    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815518.txt1.td 178    java.lang.Thread.State: WAITING (parking)
        
      • Number of threads and state with getUsersSecurityLevels method, majority of the thread doing getUsersSecurityLevels are waiting for cache.
        find ./ -name '*.td' | xargs -n 1 -I %  sh -c '/bin/echo -n %; grep -B 20 com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels % | grep -e java.lang.Thread.State | sort | uniq -c '
        .//jira_threads.1484815453.txt1.td 295    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815466.txt1.td 295    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815479.txt1.td   1    java.lang.Thread.State: RUNNABLE
         307    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815492.txt1.td   1    java.lang.Thread.State: RUNNABLE
         302    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815505.txt1.td 266    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815518.txt1.td 257    java.lang.Thread.State: WAITING (parking)
        
      • Overall getUsersSecurityLevels in thread dumps (quick check):
        grep -c 'com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels' *.td
        
        jira_threads.1484815453.txt1.td:590
        jira_threads.1484815466.txt1.td:590
        jira_threads.1484815479.txt1.td:616
        jira_threads.1484815492.txt1.td:606
        jira_threads.1484815505.txt1.td:534
        jira_threads.1484815518.txt1.td:516
        

      Example of the thread:

      "ajp-nio-8009-exec-1654" #794541 daemon prio=5 tid=0x00007f563d925000 nid=0xd431  waiting on condition   
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(ZJ)V(Native Method)
      	- parking to wait for  <0x0000001303b43078> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
      	at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()Z(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(I)V(AbstractQueuedSynchronizer.java:967)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(I)V(AbstractQueuedSynchronizer.java:1283)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock()V(ReentrantReadWriteLock.java:751)
      	at com.atlassian.cache.memory.BlockingCacheLoader.load(Ljava/lang/Object;)Ljava/lang/Object;(BlockingCacheLoader.java:50)
      	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Ljava/lang/Object;Lcom/google/common/cache/CacheLoader;)Lcom/google/common/util/concurrent/ListenableFuture;(LocalCache.java:3573)
      	at com.google.common.cache.LocalCache$Segment.loadSync(Ljava/lang/Object;ILcom/google/common/cache/LocalCache$LoadingValueReference;Lcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:2350)
      	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(Ljava/lang/Object;ILcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:2313)
      	- locked <0x0000001085e4a360> (a com.google.common.cache.LocalCache$StrongAccessEntry)
      	at com.google.common.cache.LocalCache$Segment.get(Ljava/lang/Object;ILcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:2228)
      	at com.google.common.cache.LocalCache.get(Ljava/lang/Object;Lcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:3970)
      	at com.google.common.cache.LocalCache.getOrLoad(Ljava/lang/Object;)Ljava/lang/Object;(LocalCache.java:3974)
      	at com.google.common.cache.LocalCache$LocalManualCache.get(Ljava/lang/Object;)Ljava/lang/Object;(LocalCache.java:4834)
      	at com.atlassian.cache.memory.DelegatingCache$DelegatingLoadingCache.get(Ljava/lang/Object;)Ljava/lang/Object;(DelegatingCache.java:324)
      	at com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels(Lorg/ofbiz/core/entity/GenericValue;Lcom/atlassian/crowd/embedded/api/User;)Ljava/util/List;(IssueSecurityLevelManagerImpl.java:225)
      	at com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels(Lcom/atlassian/jira/project/Project;Lcom/atlassian/crowd/embedded/api/User;)Ljava/util/List;(IssueSecurityLevelManagerImpl.java:272)
      ...
      

      Workaround

      • While working on support case we identified that problem was caused by too frequent cache flush events.
        • Large amount of cache flush events were triggered by SOAP calls to JIRA from integration framework.

              Unassigned Unassigned
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Votes:
              20 Vote for this issue
              Watchers:
              23 Start watching this issue

                Created:
                Updated: