-
Bug
-
Resolution: Fixed
-
High
-
7.6.9, 8.5.3, 7.13.18, 8.13.1, 8.13.4, 8.18.2
-
7.06
-
15
-
Severity 2 - Major
-
30
-
The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0/8.20.8 we have removed the contention caused by user login action. See JRASERVER-70468 for more information.
Issue Summary
Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:
- A large number of threads waiting for ReadLock (action - /rest/internal/2/user/mention/search)
- At the same time, a group of thread waiting for a WriteLock (action - /plugins/servlet/samlconsumer)
both on DirectoryUserIndexer.java.
This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the UserIndex, but can't as it also waits on another thread which is currently having a ReadLock on UserIndex and doing the search.
This causes cascading contention and degrades user experience in Jira for other actions (see JSWSERVER-20336).
Steps to Reproduce
Details from the customer instance:
- 300k+ user
- Massive amount of users logging in simultaneously at the start of the day
- Lot of action attempting to do @mention users at the same time
Have not attempted to repro in-house.
Expected Results
Jira handles the start-of-the-day login pressure without hiccups
Actual Results
- High load average:
- BLOCKED threads at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag
thread_dump_9.txt 85 thread_dump_10.txt 84 thread_dump_11.txt 85 thread_dump_12.txt 84 thread_dump_13.txt 84 thread_dump_14.txt 85
- These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
Notes
Related thread dumps.
- Synchronised Threads waiting during authentication:
"http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000] java.lang.Thread.State: BLOCKED (on object monitor) at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318) - waiting to lock <0x0000000145c5e268> (a java.lang.Object) at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98) at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30) at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142) at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324) at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266) at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592) at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281) at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278) at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357) at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109) ...
- Thread which waits for WriteLock at DirectoryUserIndexer.refreshSearcher
"http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127) at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159) at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320) - locked <0x0000000145c5e268> (a java.lang.Object) at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98) at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30) at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142) at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324) at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266) at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592) at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281) at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278) at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357) at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109) [...]
- RUNNABLE thread holding the ReadLock and executing the seach
"http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68) - locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile) at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125) at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119) .... at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262) at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228) ... at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426) at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417) ...
- Note: DirectoryUserIndexer.internalSearch
Workaround
Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:
- Descrese number of users in the system (if possible), to reduce the cost of computing @mention
- Increase CPU cores to improve the time spent in the lock
- Identify bot user accounts creating unnatural login pressure
- Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate
- is related to
-
JSWSERVER-20336 Searching and Mentioning users may cause performance issues and high CPU load
- Closed
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...