all JIRA threads are blocked waiting for cache UsersSecurityLevels

XMLWordPrintable

    • 6.04
    • 25
    • Severity 1 - Critical
    • 16

      Summary

      JIRA is very slow or inaccessible. Almost all incoming http threads are blocked waiting for cache getUsersSecurityLevels to be populated.
      Problem could be caused by two factors:

      • Too frequent cache flush
      • Slow cache population

      Environment

      • Large number of issue: 1M+
      • Large number of users: 50k+
      • High concurrent load: 300 request/min

      Steps to Reproduce

      1. TBD

      Expected Results

      JIRA can handle high load

      Actual Results

      JIRA is very slow.
      Thread-dump will show that there are a lot of threads in java.lang.Thread.State: WAITING state waiting for cache.

      Notes

      Please note that doAddUserToGroup() or addActorsToProjectRole() will cause flush of the projectAndUserToSecurityLevelCache cache.

      More detailed break-down from thread dump

      • Number of threads and state: based on Future 0000001303b43078
        find ./ -name '*.td' | xargs -n 1 -I %  sh -c '/bin/echo -n %; grep -B 3 -A 20 0000001303b43078 % | grep -e java.lang.Thread.State | sed 's/prio.*//' |sort | uniq -c '
        .//jira_threads.1484815453.txt1.td 214    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815466.txt1.td 208    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815479.txt1.td 212    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815492.txt1.td 
           1    java.lang.Thread.State: RUNNABLE
           211    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815505.txt1.td 191    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815518.txt1.td 178    java.lang.Thread.State: WAITING (parking)
        
      • Number of threads and state with getUsersSecurityLevels method, majority of the thread doing getUsersSecurityLevels are waiting for cache.
        find ./ -name '*.td' | xargs -n 1 -I %  sh -c '/bin/echo -n %; grep -B 20 com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels % | grep -e java.lang.Thread.State | sort | uniq -c '
        .//jira_threads.1484815453.txt1.td 295    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815466.txt1.td 295    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815479.txt1.td   1    java.lang.Thread.State: RUNNABLE
         307    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815492.txt1.td   1    java.lang.Thread.State: RUNNABLE
         302    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815505.txt1.td 266    java.lang.Thread.State: WAITING (parking)
        .//jira_threads.1484815518.txt1.td 257    java.lang.Thread.State: WAITING (parking)
        
      • Overall getUsersSecurityLevels in thread dumps (quick check):
        grep -c 'com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels' *.td
        
        jira_threads.1484815453.txt1.td:590
        jira_threads.1484815466.txt1.td:590
        jira_threads.1484815479.txt1.td:616
        jira_threads.1484815492.txt1.td:606
        jira_threads.1484815505.txt1.td:534
        jira_threads.1484815518.txt1.td:516
        

      Example of the thread:

      "ajp-nio-8009-exec-1654" #794541 daemon prio=5 tid=0x00007f563d925000 nid=0xd431  waiting on condition   
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(ZJ)V(Native Method)
      	- parking to wait for  <0x0000001303b43078> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
      	at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()Z(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(I)V(AbstractQueuedSynchronizer.java:967)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(I)V(AbstractQueuedSynchronizer.java:1283)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock()V(ReentrantReadWriteLock.java:751)
      	at com.atlassian.cache.memory.BlockingCacheLoader.load(Ljava/lang/Object;)Ljava/lang/Object;(BlockingCacheLoader.java:50)
      	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Ljava/lang/Object;Lcom/google/common/cache/CacheLoader;)Lcom/google/common/util/concurrent/ListenableFuture;(LocalCache.java:3573)
      	at com.google.common.cache.LocalCache$Segment.loadSync(Ljava/lang/Object;ILcom/google/common/cache/LocalCache$LoadingValueReference;Lcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:2350)
      	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(Ljava/lang/Object;ILcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:2313)
      	- locked <0x0000001085e4a360> (a com.google.common.cache.LocalCache$StrongAccessEntry)
      	at com.google.common.cache.LocalCache$Segment.get(Ljava/lang/Object;ILcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:2228)
      	at com.google.common.cache.LocalCache.get(Ljava/lang/Object;Lcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:3970)
      	at com.google.common.cache.LocalCache.getOrLoad(Ljava/lang/Object;)Ljava/lang/Object;(LocalCache.java:3974)
      	at com.google.common.cache.LocalCache$LocalManualCache.get(Ljava/lang/Object;)Ljava/lang/Object;(LocalCache.java:4834)
      	at com.atlassian.cache.memory.DelegatingCache$DelegatingLoadingCache.get(Ljava/lang/Object;)Ljava/lang/Object;(DelegatingCache.java:324)
      	at com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels(Lorg/ofbiz/core/entity/GenericValue;Lcom/atlassian/crowd/embedded/api/User;)Ljava/util/List;(IssueSecurityLevelManagerImpl.java:225)
      	at com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels(Lcom/atlassian/jira/project/Project;Lcom/atlassian/crowd/embedded/api/User;)Ljava/util/List;(IssueSecurityLevelManagerImpl.java:272)
      ...
      

      Workaround

      • While working on support case we identified that problem was caused by too frequent cache flush events.
        • Large amount of cache flush events were triggered by SOAP calls to JIRA from integration framework.

            Assignee:
            Unassigned
            Reporter:
            Andriy Yakovlev [Atlassian]
            Votes:
            20 Vote for this issue
            Watchers:
            23 Start watching this issue

              Created:
              Updated: