Uploaded image for project: 'Jira Software Data Center'
  1. Jira Software Data Center
  2. JSWSERVER-20844

Severe performance degradation for user mentions and user login action due to contention in IndexedUserDao

      Fix

      The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0/8.20.8 we have removed the contention caused by user login action. See JRASERVER-70468 for more information.

      Issue Summary

      Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:

      1. A large number of threads waiting for ReadLock (action - /rest/internal/2/user/mention/search)
      2. At the same time, a group of thread waiting for a WriteLock (action - /plugins/servlet/samlconsumer)
        both on DirectoryUserIndexer.java.

      This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the UserIndex, but can't as it also waits on another thread which is currently having a ReadLock on UserIndex and doing the search.
      This causes cascading contention and degrades user experience in Jira for other actions (see JSWSERVER-20336).

      Steps to Reproduce

      Details from the customer instance:

      • 300k+ user
      • Massive amount of users logging in simultaneously at the start of the day
      • Lot of action attempting to do @mention users at the same time

      Have not attempted to repro in-house.

      Expected Results

      Jira handles the start-of-the-day login pressure without hiccups

      Actual Results

      • High load average:
      • BLOCKED threads at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag
        thread_dump_9.txt
        85
        thread_dump_10.txt
        84
        thread_dump_11.txt
        85
        thread_dump_12.txt
        84
        thread_dump_13.txt
        84
        thread_dump_14.txt
        85
        
      • These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

      Notes

      Related thread dumps.

      • Synchronised Threads waiting during authentication:
        "http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
           java.lang.Thread.State: BLOCKED (on object monitor)
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
        	- waiting to lock <0x0000000145c5e268> (a java.lang.Object)
        	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
        	at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
        	at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
        	at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
        	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
        	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
        	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
        ...
        
      • Thread which waits for WriteLock at DirectoryUserIndexer.refreshSearcher
        "http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
           java.lang.Thread.State: WAITING (parking)
        	at sun.misc.Unsafe.park(Native Method)
        	- parking to wait for  <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
        	- locked <0x0000000145c5e268> (a java.lang.Object)
        	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
        	at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
        	at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
        	at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
        	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
        	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
        	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
        [...]
        
      • RUNNABLE thread holding the ReadLock and executing the seach
        "http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
           java.lang.Thread.State: RUNNABLE
        	at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
        	- locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
        	at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
        	at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
        ....
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
        ...
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
        ...
        
        • Note: DirectoryUserIndexer.internalSearch

      Workaround

      Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:

      • Descrese number of users in the system (if possible), to reduce the cost of computing @mention
      • Increase CPU cores to improve the time spent in the lock
      • Identify bot user accounts creating unnatural login pressure
      • Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate

        1. 2020-11-25_20-07-42.png
          2020-11-25_20-07-42.png
          260 kB
        2. 2020-11-30_07-48-50.png
          2020-11-30_07-48-50.png
          582 kB
        3. 2020-11-30_13-59-25.png
          2020-11-30_13-59-25.png
          543 kB
        4. 2020-11-30_14-08-53.png
          2020-11-30_14-08-53.png
          246 kB
        5. 2020-11-30_14-15-20.png
          2020-11-30_14-15-20.png
          337 kB
        6. 2020-11-30_14-14-57.png
          2020-11-30_14-14-57.png
          673 kB

          Form Name

            [JSWSERVER-20844] Severe performance degradation for user mentions and user login action due to contention in IndexedUserDao

            Conny Postma made changes -
            Remote Link Original: This issue links to "Page (Atlassian Documentation)" [ 610533 ]
            Suddha made changes -
            Remote Link Original: This issue links to "PSR-601 (Bulldog)" [ 533886 ] New: This issue links to "PSR-601 (JIRA Server (Bulldog))" [ 533886 ]
            Suddha made changes -
            Remote Link Original: This issue links to "LOGIN-1 (Bulldog)" [ 594402 ] New: This issue links to "LOGIN-1 (JIRA Server (Bulldog))" [ 594402 ]
            Jakub Reczycki made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 674833 ]
            Maciej Swinarski (Inactive) made changes -
            Description Original: {panel:title=Fix|bgColor=#FFFFCE}
            The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0 we have removed the contention caused by user login action. See [JRASERVER-70468|https://jira.atlassian.com/browse/JRASERVER-70468] for more information.
            {panel}

            h3. Issue Summary
            Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:
             # A large number of threads waiting for *ReadLock* (action - /rest/internal/2/user/mention/search)
             # At the same time, a group of thread waiting for a *WriteLock* (action - /plugins/servlet/samlconsumer)
             both on {{DirectoryUserIndexer.java}}.

            This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the {{UserIndex}}, but can't as it also waits on another thread which is currently having a ReadLock on {{UserIndex}} and doing the search.
            This causes cascading contention and degrades user experience in Jira for other actions (see JSWSERVER-20336).

            h3. Steps to Reproduce
            Details from the customer instance:
            * 300k+ user
            * Massive amount of users logging in simultaneously at the start of the day
            * Lot of action attempting to do {{@mention}} users at the same time

            Have not attempted to repro in-house.

            h3. Expected Results
            Jira handles the start-of-the-day login pressure without hiccups

            h3. Actual Results
            * High load average:
            !2020-11-25_20-07-42.png|thumbnail!
            * BLOCKED threads at {{com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag}}
            {code}
            thread_dump_9.txt
            85
            thread_dump_10.txt
            84
            thread_dump_11.txt
            85
            thread_dump_12.txt
            84
            thread_dump_13.txt
            84
            thread_dump_14.txt
            85
            {code}
            * These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

            h3. Notes
            Related thread dumps.
            * *Synchronised* Threads waiting during authentication:
            {noformat}
            "http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
               java.lang.Thread.State: BLOCKED (on object monitor)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
            - waiting to lock <0x0000000145c5e268> (a java.lang.Object)
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
            at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
            at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
            at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
            at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
            at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
            at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
            ...
            {noformat}
            * Thread which waits for *WriteLock* at {{DirectoryUserIndexer.refreshSearcher}}
            {noformat}
            "http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
               java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
            at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
            - locked <0x0000000145c5e268> (a java.lang.Object)
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
            at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
            at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
            at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
            at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
            at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
            at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
            [...]
            {noformat}
            * RUNNABLE thread holding the *ReadLock* and executing the seach
            {noformat}
            "http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
               java.lang.Thread.State: RUNNABLE
            at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
            - locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
            at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
            at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
            ....
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
            ...
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
            ...
            {noformat}
            ** Note: {{DirectoryUserIndexer.internalSearch}}

            h3. Workaround
            Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:
            * Descrese number of users in the system (if possible), to reduce the cost of computing {{@mention}}
            * Increase CPU cores to improve the time spent in the lock
            * Identify bot user accounts creating unnatural login pressure
            * Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate
            New: {panel:title=Fix|bgColor=#FFFFCE}
            The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0/8.20.8 we have removed the contention caused by user login action. See [JRASERVER-70468|https://jira.atlassian.com/browse/JRASERVER-70468] for more information.
            {panel}

            h3. Issue Summary
            Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:
             # A large number of threads waiting for *ReadLock* (action - /rest/internal/2/user/mention/search)
             # At the same time, a group of thread waiting for a *WriteLock* (action - /plugins/servlet/samlconsumer)
             both on {{DirectoryUserIndexer.java}}.

            This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the {{UserIndex}}, but can't as it also waits on another thread which is currently having a ReadLock on {{UserIndex}} and doing the search.
            This causes cascading contention and degrades user experience in Jira for other actions (see JSWSERVER-20336).

            h3. Steps to Reproduce
            Details from the customer instance:
            * 300k+ user
            * Massive amount of users logging in simultaneously at the start of the day
            * Lot of action attempting to do {{@mention}} users at the same time

            Have not attempted to repro in-house.

            h3. Expected Results
            Jira handles the start-of-the-day login pressure without hiccups

            h3. Actual Results
            * High load average:
            !2020-11-25_20-07-42.png|thumbnail!
            * BLOCKED threads at {{com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag}}
            {code}
            thread_dump_9.txt
            85
            thread_dump_10.txt
            84
            thread_dump_11.txt
            85
            thread_dump_12.txt
            84
            thread_dump_13.txt
            84
            thread_dump_14.txt
            85
            {code}
            * These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

            h3. Notes
            Related thread dumps.
            * *Synchronised* Threads waiting during authentication:
            {noformat}
            "http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
               java.lang.Thread.State: BLOCKED (on object monitor)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
            - waiting to lock <0x0000000145c5e268> (a java.lang.Object)
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
            at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
            at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
            at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
            at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
            at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
            at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
            ...
            {noformat}
            * Thread which waits for *WriteLock* at {{DirectoryUserIndexer.refreshSearcher}}
            {noformat}
            "http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
               java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
            at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
            - locked <0x0000000145c5e268> (a java.lang.Object)
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
            at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
            at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
            at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
            at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
            at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
            at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
            [...]
            {noformat}
            * RUNNABLE thread holding the *ReadLock* and executing the seach
            {noformat}
            "http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
               java.lang.Thread.State: RUNNABLE
            at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
            - locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
            at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
            at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
            ....
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
            ...
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
            ...
            {noformat}
            ** Note: {{DirectoryUserIndexer.internalSearch}}

            h3. Workaround
            Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:
            * Descrese number of users in the system (if possible), to reduce the cost of computing {{@mention}}
            * Increase CPU cores to improve the time spent in the lock
            * Identify bot user accounts creating unnatural login pressure
            * Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate
            Maciej Swinarski (Inactive) made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 628580 ]
            Maciej Swinarski (Inactive) made changes -
            Description Original: h3. Issue Summary
            Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:
             # A large number of threads waiting for *ReadLock* (action - /rest/internal/2/user/mention/search)
             # At the same time, a group of thread waiting for a *WriteLock* (action - /plugins/servlet/samlconsumer)
             both on {{DirectoryUserIndexer.java}}.

            This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the {{UserIndex}}, but can't as it also waits on another thread which is currently having a ReadLock on {{UserIndex}} and doing the search.
            This causes cascading contention and degrades user experience in Jira for other actions (see JSWSERVER-20336).

            h3. Steps to Reproduce
            Details from the customer instance:
            * 300k+ user
            * Massive amount of users logging in simultaneously at the start of the day
            * Lot of action attempting to do {{@mention}} users at the same time

            Have not attempted to repro in-house.

            h3. Expected Results
            Jira handles the start-of-the-day login pressure without hiccups

            h3. Actual Results
            * High load average:
            !2020-11-25_20-07-42.png|thumbnail!
            * BLOCKED threads at {{com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag}}
            {code}
            thread_dump_9.txt
            85
            thread_dump_10.txt
            84
            thread_dump_11.txt
            85
            thread_dump_12.txt
            84
            thread_dump_13.txt
            84
            thread_dump_14.txt
            85
            {code}
            * These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

            h3. Notes
            Related thread dumps.
            * *Synchronised* Threads waiting during authentication:
            {noformat}
            "http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
               java.lang.Thread.State: BLOCKED (on object monitor)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
            - waiting to lock <0x0000000145c5e268> (a java.lang.Object)
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
            at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
            at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
            at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
            at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
            at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
            at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
            ...
            {noformat}
            * Thread which waits for *WriteLock* at {{DirectoryUserIndexer.refreshSearcher}}
            {noformat}
            "http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
               java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
            at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
            - locked <0x0000000145c5e268> (a java.lang.Object)
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
            at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
            at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
            at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
            at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
            at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
            at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
            [...]
            {noformat}
            * RUNNABLE thread holding the *ReadLock* and executing the seach
            {noformat}
            "http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
               java.lang.Thread.State: RUNNABLE
            at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
            - locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
            at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
            at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
            ....
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
            ...
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
            ...
            {noformat}
            ** Note: {{DirectoryUserIndexer.internalSearch}}

            h3. Workaround
            Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:
            * Descrese number of users in the system (if possible), to reduce the cost of computing {{@mention}}
            * Increase CPU cores to improve the time spent in the lock
            * Identify bot user accounts creating unnatural login pressure
            * Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate
            New: {panel:title=Fix|bgColor=#FFFFCE}
            The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0 we have removed the contention caused by user login action. See [JRASERVER-70468|https://jira.atlassian.com/browse/JRASERVER-70468] for more information.
            {panel}

            h3. Issue Summary
            Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:
             # A large number of threads waiting for *ReadLock* (action - /rest/internal/2/user/mention/search)
             # At the same time, a group of thread waiting for a *WriteLock* (action - /plugins/servlet/samlconsumer)
             both on {{DirectoryUserIndexer.java}}.

            This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the {{UserIndex}}, but can't as it also waits on another thread which is currently having a ReadLock on {{UserIndex}} and doing the search.
            This causes cascading contention and degrades user experience in Jira for other actions (see JSWSERVER-20336).

            h3. Steps to Reproduce
            Details from the customer instance:
            * 300k+ user
            * Massive amount of users logging in simultaneously at the start of the day
            * Lot of action attempting to do {{@mention}} users at the same time

            Have not attempted to repro in-house.

            h3. Expected Results
            Jira handles the start-of-the-day login pressure without hiccups

            h3. Actual Results
            * High load average:
            !2020-11-25_20-07-42.png|thumbnail!
            * BLOCKED threads at {{com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag}}
            {code}
            thread_dump_9.txt
            85
            thread_dump_10.txt
            84
            thread_dump_11.txt
            85
            thread_dump_12.txt
            84
            thread_dump_13.txt
            84
            thread_dump_14.txt
            85
            {code}
            * These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

            h3. Notes
            Related thread dumps.
            * *Synchronised* Threads waiting during authentication:
            {noformat}
            "http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
               java.lang.Thread.State: BLOCKED (on object monitor)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
            - waiting to lock <0x0000000145c5e268> (a java.lang.Object)
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
            at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
            at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
            at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
            at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
            at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
            at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
            ...
            {noformat}
            * Thread which waits for *WriteLock* at {{DirectoryUserIndexer.refreshSearcher}}
            {noformat}
            "http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
               java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
            at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
            - locked <0x0000000145c5e268> (a java.lang.Object)
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
            at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
            at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
            at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
            at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
            at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
            at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
            at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
            [...]
            {noformat}
            * RUNNABLE thread holding the *ReadLock* and executing the seach
            {noformat}
            "http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
               java.lang.Thread.State: RUNNABLE
            at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
            - locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
            at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
            at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
            ....
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
            at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
            ...
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
            at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
            ...
            {noformat}
            ** Note: {{DirectoryUserIndexer.internalSearch}}

            h3. Workaround
            Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:
            * Descrese number of users in the system (if possible), to reduce the cost of computing {{@mention}}
            * Increase CPU cores to improve the time spent in the lock
            * Identify bot user accounts creating unnatural login pressure
            * Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate
            Maciej Swinarski (Inactive) made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 611899 ]
            Maciej Swinarski (Inactive) made changes -
            Labels Original: deltaUserCache deltaUserLogin performance performance-scalability pse-request New: deltaUserCache deltaUserLogin fixedByDelta performance performance-scalability pse-request
            Maciej Swinarski (Inactive) made changes -
            Remote Link New: This issue links to "Page (Atlassian Documentation)" [ 610533 ]

              mswinarski Maciej Swinarski (Inactive)
              smitra2@atlassian.com Suddha
              Affected customers:
              4 This affects my team
              Watchers:
              25 Start watching this issue

                Created:
                Updated:
                Resolved: