-
Bug
-
Resolution: Unresolved
-
Low
-
None
-
6.4.14, 7.2.9, 7.6.17
-
6.04
-
25
-
Severity 1 - Critical
-
16
-
Summary
JIRA is very slow or inaccessible. Almost all incoming http threads are blocked waiting for cache getUsersSecurityLevels to be populated.
Problem could be caused by two factors:
- Too frequent cache flush
- Slow cache population
Environment
- Large number of issue: 1M+
- Large number of users: 50k+
- High concurrent load: 300 request/min
Steps to Reproduce
- TBD
Expected Results
JIRA can handle high load
Actual Results
JIRA is very slow.
Thread-dump will show that there are a lot of threads in java.lang.Thread.State: WAITING state waiting for cache.
Notes
Please note that doAddUserToGroup() or addActorsToProjectRole() will cause flush of the projectAndUserToSecurityLevelCache cache.
More detailed break-down from thread dump
- Number of threads and state: based on Future 0000001303b43078
find ./ -name '*.td' | xargs -n 1 -I % sh -c '/bin/echo -n %; grep -B 3 -A 20 0000001303b43078 % | grep -e java.lang.Thread.State | sed 's/prio.*//' |sort | uniq -c ' .//jira_threads.1484815453.txt1.td 214 java.lang.Thread.State: WAITING (parking) .//jira_threads.1484815466.txt1.td 208 java.lang.Thread.State: WAITING (parking) .//jira_threads.1484815479.txt1.td 212 java.lang.Thread.State: WAITING (parking) .//jira_threads.1484815492.txt1.td 1 java.lang.Thread.State: RUNNABLE 211 java.lang.Thread.State: WAITING (parking) .//jira_threads.1484815505.txt1.td 191 java.lang.Thread.State: WAITING (parking) .//jira_threads.1484815518.txt1.td 178 java.lang.Thread.State: WAITING (parking)
- Number of threads and state with getUsersSecurityLevels method, majority of the thread doing getUsersSecurityLevels are waiting for cache.
find ./ -name '*.td' | xargs -n 1 -I % sh -c '/bin/echo -n %; grep -B 20 com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels % | grep -e java.lang.Thread.State | sort | uniq -c ' .//jira_threads.1484815453.txt1.td 295 java.lang.Thread.State: WAITING (parking) .//jira_threads.1484815466.txt1.td 295 java.lang.Thread.State: WAITING (parking) .//jira_threads.1484815479.txt1.td 1 java.lang.Thread.State: RUNNABLE 307 java.lang.Thread.State: WAITING (parking) .//jira_threads.1484815492.txt1.td 1 java.lang.Thread.State: RUNNABLE 302 java.lang.Thread.State: WAITING (parking) .//jira_threads.1484815505.txt1.td 266 java.lang.Thread.State: WAITING (parking) .//jira_threads.1484815518.txt1.td 257 java.lang.Thread.State: WAITING (parking)
- Overall getUsersSecurityLevels in thread dumps (quick check):
grep -c 'com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels' *.td jira_threads.1484815453.txt1.td:590 jira_threads.1484815466.txt1.td:590 jira_threads.1484815479.txt1.td:616 jira_threads.1484815492.txt1.td:606 jira_threads.1484815505.txt1.td:534 jira_threads.1484815518.txt1.td:516
Example of the thread:
"ajp-nio-8009-exec-1654" #794541 daemon prio=5 tid=0x00007f563d925000 nid=0xd431 waiting on condition java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(ZJ)V(Native Method) - parking to wait for <0x0000001303b43078> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync) at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()Z(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(I)V(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(I)V(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock()V(ReentrantReadWriteLock.java:751) at com.atlassian.cache.memory.BlockingCacheLoader.load(Ljava/lang/Object;)Ljava/lang/Object;(BlockingCacheLoader.java:50) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(Ljava/lang/Object;Lcom/google/common/cache/CacheLoader;)Lcom/google/common/util/concurrent/ListenableFuture;(LocalCache.java:3573) at com.google.common.cache.LocalCache$Segment.loadSync(Ljava/lang/Object;ILcom/google/common/cache/LocalCache$LoadingValueReference;Lcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:2350) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(Ljava/lang/Object;ILcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:2313) - locked <0x0000001085e4a360> (a com.google.common.cache.LocalCache$StrongAccessEntry) at com.google.common.cache.LocalCache$Segment.get(Ljava/lang/Object;ILcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:2228) at com.google.common.cache.LocalCache.get(Ljava/lang/Object;Lcom/google/common/cache/CacheLoader;)Ljava/lang/Object;(LocalCache.java:3970) at com.google.common.cache.LocalCache.getOrLoad(Ljava/lang/Object;)Ljava/lang/Object;(LocalCache.java:3974) at com.google.common.cache.LocalCache$LocalManualCache.get(Ljava/lang/Object;)Ljava/lang/Object;(LocalCache.java:4834) at com.atlassian.cache.memory.DelegatingCache$DelegatingLoadingCache.get(Ljava/lang/Object;)Ljava/lang/Object;(DelegatingCache.java:324) at com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels(Lorg/ofbiz/core/entity/GenericValue;Lcom/atlassian/crowd/embedded/api/User;)Ljava/util/List;(IssueSecurityLevelManagerImpl.java:225) at com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl.getUsersSecurityLevels(Lcom/atlassian/jira/project/Project;Lcom/atlassian/crowd/embedded/api/User;)Ljava/util/List;(IssueSecurityLevelManagerImpl.java:272) ...
Workaround
- While working on support case we identified that problem was caused by too frequent cache flush events.
- Large amount of cache flush events were triggered by SOAP calls to JIRA from integration framework.