[JSWSERVER-20844] Severe performance degradation for user mentions and user login action due to contention in IndexedUserDao

Type: Bug
Resolution: Fixed
Priority: High (View bug fix roadmap)
Fix Version/s: 8.19.1, 8.20.0
Affects Version/s: 7.6.9, 8.5.3, 7.13.18, 8.13.1, 8.13.4, 8.18.2
Component/s: User Management - Others
Labels:

Introduced in Version:
7.06
Support reference count:
15
Symptom Severity:
Severity 2 - Major
UIS:
30
Bug Fix Policy:
View Atlassian Server bug fix policy

Fix

The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0/8.20.8 we have removed the contention caused by user login action. See JRASERVER-70468 for more information.

Issue Summary

Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:

A large number of threads waiting for ReadLock (action - /rest/internal/2/user/mention/search)
At the same time, a group of thread waiting for a WriteLock (action - /plugins/servlet/samlconsumer)
both on DirectoryUserIndexer.java.

This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the UserIndex, but can't as it also waits on another thread which is currently having a ReadLock on UserIndex and doing the search.
This causes cascading contention and degrades user experience in Jira for other actions (see ~~JSWSERVER-20336~~).

Steps to Reproduce

Details from the customer instance:

300k+ user
Massive amount of users logging in simultaneously at the start of the day
Lot of action attempting to do @mention users at the same time

Have not attempted to repro in-house.

Expected Results

Jira handles the start-of-the-day login pressure without hiccups

Actual Results

High load average:

BLOCKED threads at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag

thread_dump_9.txt
85
thread_dump_10.txt
84
thread_dump_11.txt
85
thread_dump_12.txt
84
thread_dump_13.txt
84
thread_dump_14.txt
85

These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

Notes

Related thread dumps.

Synchronised Threads waiting during authentication:

"http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
	- waiting to lock <0x0000000145c5e268> (a java.lang.Object)
	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
	at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
	at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
	at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
...

Thread which waits for WriteLock at DirectoryUserIndexer.refreshSearcher

"http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
	- locked <0x0000000145c5e268> (a java.lang.Object)
	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
	at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
	at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
	at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
[...]

RUNNABLE thread holding the ReadLock and executing the seach

"http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
   java.lang.Thread.State: RUNNABLE
	at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
	- locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
	at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
	at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
....
	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
...
	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
...

Note: DirectoryUserIndexer.internalSearch

Workaround

Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:

Descrese number of users in the system (if possible), to reduce the cost of computing @mention
Increase CPU cores to improve the time spent in the lock
Identify bot user accounts creating unnatural login pressure
Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

2020-11-25_20-07-42.png
260 kB
18/Feb/2021 8:11 AM
2020-11-30_07-48-50.png
582 kB
18/Feb/2021 8:20 AM
2020-11-30_13-59-25.png
543 kB
18/Feb/2021 8:21 AM
2020-11-30_14-08-53.png
246 kB
18/Feb/2021 8:23 AM
2020-11-30_14-15-20.png
337 kB
18/Feb/2021 8:23 AM
2020-11-30_14-14-57.png
673 kB
18/Feb/2021 8:23 AM

is related to

JSWSERVER-20336 Searching and Mentioning users may cause performance issues and high CPU load

Closed

mentioned in: Page Failed to load; Page Failed to load; Page Failed to load; Page Failed to load; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

relates to: LOGIN-1 Loading...; PSR-601 Loading...

(5 mentioned in, 2 relates to)

Form Name

Conny Postma made changes - 17/Jan/2025 11:38 AM

Remote Link

Original: This issue links to "Page (Atlassian Documentation)" [ 610533 ]

Suddha made changes - 18/Jun/2024 8:05 AM

Remote Link

Original: This issue links to "PSR-601 (Bulldog)" [ 533886 ]

New: This issue links to "PSR-601 (JIRA Server (Bulldog))" [ 533886 ]

Suddha made changes - 18/Jun/2024 8:05 AM

Remote Link

Original: This issue links to "LOGIN-1 (Bulldog)" [ 594402 ]

New: This issue links to "LOGIN-1 (JIRA Server (Bulldog))" [ 594402 ]

Jakub Reczycki made changes - 22/Aug/2022 7:20 AM

Remote Link

New: This issue links to "Page (Confluence)" [ 674833 ]

Maciej Swinarski (Inactive) made changes - 11/Apr/2022 4:07 PM

Description

Original: {panel:title=Fix|bgColor=#FFFFCE}
The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0 we have removed the contention caused by user login action. See [~~JRASERVER-70468~~|https://jira.atlassian.com/browse/JRASERVER-70468] for more information.
{panel}

h3. Issue Summary
Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:
# A large number of threads waiting for *ReadLock* (action - /rest/internal/2/user/mention/search)
# At the same time, a group of thread waiting for a *WriteLock* (action - /plugins/servlet/samlconsumer)
both on {{DirectoryUserIndexer.java}}.

This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the {{UserIndex}}, but can't as it also waits on another thread which is currently having a ReadLock on {{UserIndex}} and doing the search.
This causes cascading contention and degrades user experience in Jira for other actions (see ~~JSWSERVER-20336~~).

h3. Steps to Reproduce
Details from the customer instance:
* 300k+ user
* Massive amount of users logging in simultaneously at the start of the day
* Lot of action attempting to do {{@mention}} users at the same time

Have not attempted to repro in-house.

h3. Expected Results
Jira handles the start-of-the-day login pressure without hiccups

h3. Actual Results
* High load average:
!2020-11-25_20-07-42.png|thumbnail!
* BLOCKED threads at {{com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag}}
{code}
thread_dump_9.txt
85
thread_dump_10.txt
84
thread_dump_11.txt
85
thread_dump_12.txt
84
thread_dump_13.txt
84
thread_dump_14.txt
85
{code}
* These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

h3. Notes
Related thread dumps.
* *Synchronised* Threads waiting during authentication:
{noformat}
"http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
- waiting to lock <0x0000000145c5e268> (a java.lang.Object)
at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
...
{noformat}
* Thread which waits for *WriteLock* at {{DirectoryUserIndexer.refreshSearcher}}
{noformat}
"http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
- locked <0x0000000145c5e268> (a java.lang.Object)
at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
[...]
{noformat}
* RUNNABLE thread holding the *ReadLock* and executing the seach
{noformat}
"http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
- locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
....
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
...
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
...
{noformat}
** Note: {{DirectoryUserIndexer.internalSearch}}

h3. Workaround
Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:
* Descrese number of users in the system (if possible), to reduce the cost of computing {{@mention}}
* Increase CPU cores to improve the time spent in the lock
* Identify bot user accounts creating unnatural login pressure
* Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate

New: {panel:title=Fix|bgColor=#FFFFCE}
The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0/8.20.8 we have removed the contention caused by user login action. See [~~JRASERVER-70468~~|https://jira.atlassian.com/browse/JRASERVER-70468] for more information.
{panel}

h3. Issue Summary
Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:
# A large number of threads waiting for *ReadLock* (action - /rest/internal/2/user/mention/search)
# At the same time, a group of thread waiting for a *WriteLock* (action - /plugins/servlet/samlconsumer)
both on {{DirectoryUserIndexer.java}}.

This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the {{UserIndex}}, but can't as it also waits on another thread which is currently having a ReadLock on {{UserIndex}} and doing the search.
This causes cascading contention and degrades user experience in Jira for other actions (see ~~JSWSERVER-20336~~).

h3. Steps to Reproduce
Details from the customer instance:
* 300k+ user
* Massive amount of users logging in simultaneously at the start of the day
* Lot of action attempting to do {{@mention}} users at the same time

Have not attempted to repro in-house.

h3. Expected Results
Jira handles the start-of-the-day login pressure without hiccups

h3. Actual Results
* High load average:
!2020-11-25_20-07-42.png|thumbnail!
* BLOCKED threads at {{com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag}}
{code}
thread_dump_9.txt
85
thread_dump_10.txt
84
thread_dump_11.txt
85
thread_dump_12.txt
84
thread_dump_13.txt
84
thread_dump_14.txt
85
{code}
* These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

h3. Notes
Related thread dumps.
* *Synchronised* Threads waiting during authentication:
{noformat}
"http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
- waiting to lock <0x0000000145c5e268> (a java.lang.Object)
at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
...
{noformat}
* Thread which waits for *WriteLock* at {{DirectoryUserIndexer.refreshSearcher}}
{noformat}
"http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
- locked <0x0000000145c5e268> (a java.lang.Object)
at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
[...]
{noformat}
* RUNNABLE thread holding the *ReadLock* and executing the seach
{noformat}
"http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
- locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
....
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
...
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
...
{noformat}
** Note: {{DirectoryUserIndexer.internalSearch}}

h3. Workaround
Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:
* Descrese number of users in the system (if possible), to reduce the cost of computing {{@mention}}
* Increase CPU cores to improve the time spent in the lock
* Identify bot user accounts creating unnatural login pressure
* Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate

Maciej Swinarski (Inactive) made changes - 21/Mar/2022 3:37 PM

Remote Link

New: This issue links to "Page (Confluence)" [ 628580 ]

Maciej Swinarski (Inactive) made changes - 19/Jan/2022 3:32 PM

Description

Original: h3. Issue Summary
Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:
# A large number of threads waiting for *ReadLock* (action - /rest/internal/2/user/mention/search)
# At the same time, a group of thread waiting for a *WriteLock* (action - /plugins/servlet/samlconsumer)
both on {{DirectoryUserIndexer.java}}.

This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the {{UserIndex}}, but can't as it also waits on another thread which is currently having a ReadLock on {{UserIndex}} and doing the search.
This causes cascading contention and degrades user experience in Jira for other actions (see ~~JSWSERVER-20336~~).

h3. Steps to Reproduce
Details from the customer instance:
* 300k+ user
* Massive amount of users logging in simultaneously at the start of the day
* Lot of action attempting to do {{@mention}} users at the same time

Have not attempted to repro in-house.

h3. Expected Results
Jira handles the start-of-the-day login pressure without hiccups

h3. Actual Results
* High load average:
!2020-11-25_20-07-42.png|thumbnail!
* BLOCKED threads at {{com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag}}
{code}
thread_dump_9.txt
85
thread_dump_10.txt
84
thread_dump_11.txt
85
thread_dump_12.txt
84
thread_dump_13.txt
84
thread_dump_14.txt
85
{code}
* These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

h3. Notes
Related thread dumps.
* *Synchronised* Threads waiting during authentication:
{noformat}
"http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
- waiting to lock <0x0000000145c5e268> (a java.lang.Object)
at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
...
{noformat}
* Thread which waits for *WriteLock* at {{DirectoryUserIndexer.refreshSearcher}}
{noformat}
"http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
- locked <0x0000000145c5e268> (a java.lang.Object)
at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
[...]
{noformat}
* RUNNABLE thread holding the *ReadLock* and executing the seach
{noformat}
"http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
- locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
....
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
...
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
...
{noformat}
** Note: {{DirectoryUserIndexer.internalSearch}}

h3. Workaround
Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:
* Descrese number of users in the system (if possible), to reduce the cost of computing {{@mention}}
* Increase CPU cores to improve the time spent in the lock
* Identify bot user accounts creating unnatural login pressure
* Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate

New: {panel:title=Fix|bgColor=#FFFFCE}
The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0 we have removed the contention caused by user login action. See [~~JRASERVER-70468~~|https://jira.atlassian.com/browse/JRASERVER-70468] for more information.
{panel}

h3. Issue Summary
Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:
# A large number of threads waiting for *ReadLock* (action - /rest/internal/2/user/mention/search)
# At the same time, a group of thread waiting for a *WriteLock* (action - /plugins/servlet/samlconsumer)
both on {{DirectoryUserIndexer.java}}.

This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the {{UserIndex}}, but can't as it also waits on another thread which is currently having a ReadLock on {{UserIndex}} and doing the search.
This causes cascading contention and degrades user experience in Jira for other actions (see ~~JSWSERVER-20336~~).

h3. Steps to Reproduce
Details from the customer instance:
* 300k+ user
* Massive amount of users logging in simultaneously at the start of the day
* Lot of action attempting to do {{@mention}} users at the same time

Have not attempted to repro in-house.

h3. Expected Results
Jira handles the start-of-the-day login pressure without hiccups

h3. Actual Results
* High load average:
!2020-11-25_20-07-42.png|thumbnail!
* BLOCKED threads at {{com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag}}
{code}
thread_dump_9.txt
85
thread_dump_10.txt
84
thread_dump_11.txt
85
thread_dump_12.txt
84
thread_dump_13.txt
84
thread_dump_14.txt
85
{code}
* These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

h3. Notes
Related thread dumps.
* *Synchronised* Threads waiting during authentication:
{noformat}
"http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
- waiting to lock <0x0000000145c5e268> (a java.lang.Object)
at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
...
{noformat}
* Thread which waits for *WriteLock* at {{DirectoryUserIndexer.refreshSearcher}}
{noformat}
"http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
- locked <0x0000000145c5e268> (a java.lang.Object)
at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
[...]
{noformat}
* RUNNABLE thread holding the *ReadLock* and executing the seach
{noformat}
"http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
- locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
....
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
...
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
...
{noformat}
** Note: {{DirectoryUserIndexer.internalSearch}}

h3. Workaround
Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:
* Descrese number of users in the system (if possible), to reduce the cost of computing {{@mention}}
* Increase CPU cores to improve the time spent in the lock
* Identify bot user accounts creating unnatural login pressure
* Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate

Maciej Swinarski (Inactive) made changes - 19/Jan/2022 2:01 PM

Remote Link

New: This issue links to "Page (Confluence)" [ 611899 ]

Maciej Swinarski (Inactive) made changes - 19/Jan/2022 8:50 AM

Labels

Original: deltaUserCache deltaUserLogin performance performance-scalability pse-request

New: deltaUserCache deltaUserLogin fixedByDelta performance performance-scalability pse-request

Maciej Swinarski (Inactive) made changes - 17/Jan/2022 1:36 PM

Remote Link

New: This issue links to "Page (Atlassian Documentation)" [ 610533 ]

Assignee:: Maciej Swinarski (Inactive)

Reporter:: Suddha

Affected customers:: 4 This affects my team

Watchers:: 25 Start watching this issue

Created:: 18/Feb/2021 8:17 AM

Updated:: 17/Jan/2025 11:38 AM

Resolved:: 19/Oct/2021 1:54 PM

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Notes

Workaround

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates

Backbone Issue Sync