Loading...

Type: Bug
Resolution: Fixed
Priority: High
Fix Version/s: 8.19.1, 8.20.0
Affects Version/s: 7.6.9, 7.13.18, 8.5.3, 8.13.1, 8.13.4, 8.18.2
Component/s: User Management - Others
Labels:

Introduced in Version:
7.06
Support reference count:
15
Symptom Severity:
Severity 2 - Major
UIS:
30

Fix

The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0/8.20.8 we have removed the contention caused by user login action. See JRASERVER-70468 for more information.

Issue Summary

Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:

A large number of threads waiting for ReadLock (action - /rest/internal/2/user/mention/search)
At the same time, a group of thread waiting for a WriteLock (action - /plugins/servlet/samlconsumer)
both on DirectoryUserIndexer.java.

This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the UserIndex, but can't as it also waits on another thread which is currently having a ReadLock on UserIndex and doing the search.
This causes cascading contention and degrades user experience in Jira for other actions (see ~~JSWSERVER-20336~~).

Steps to Reproduce

Details from the customer instance:

300k+ user
Massive amount of users logging in simultaneously at the start of the day
Lot of action attempting to do @mention users at the same time

Have not attempted to repro in-house.

Expected Results

Jira handles the start-of-the-day login pressure without hiccups

Actual Results

High load average:

BLOCKED threads at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag

thread_dump_9.txt
85
thread_dump_10.txt
84
thread_dump_11.txt
85
thread_dump_12.txt
84
thread_dump_13.txt
84
thread_dump_14.txt
85

These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

Notes

Related thread dumps.

Synchronised Threads waiting during authentication:

"http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
	- waiting to lock <0x0000000145c5e268> (a java.lang.Object)
	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
	at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
	at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
	at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
...

Thread which waits for WriteLock at DirectoryUserIndexer.refreshSearcher

"http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
	- locked <0x0000000145c5e268> (a java.lang.Object)
	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
	at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
	at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
	at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
[...]

RUNNABLE thread holding the ReadLock and executing the seach

"http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
   java.lang.Thread.State: RUNNABLE
	at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
	- locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
	at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
	at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
....
	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
...
	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
...

Note: DirectoryUserIndexer.internalSearch

Workaround

Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:

Descrese number of users in the system (if possible), to reduce the cost of computing @mention
Increase CPU cores to improve the time spent in the lock
Identify bot user accounts creating unnatural login pressure
Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

2020-11-25_20-07-42.png
18/Feb/2021 8:11 AM
260 kB
Suddha
2020-11-30_07-48-50.png
18/Feb/2021 8:20 AM
582 kB
Suddha
2020-11-30_13-59-25.png
18/Feb/2021 8:21 AM
543 kB
Suddha
2020-11-30_14-08-53.png
18/Feb/2021 8:23 AM
246 kB
Suddha
2020-11-30_14-15-20.png
18/Feb/2021 8:23 AM
337 kB
Suddha
2020-11-30_14-14-57.png
18/Feb/2021 8:23 AM
673 kB
Suddha

is related to

JSWSERVER-20336 Searching and Mentioning users may cause performance issues and high CPU load

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

relates to: LOGIN-1 Loading...; PSR-601 Loading...

(5 mentioned in, 2 relates to)

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Notes

Workaround

Attachments

Attachments

Issue Links

Activity

People

Dates