Severe performance degradation for user mentions and user login action due to contention in IndexedUserDao

XMLWordPrintable

    • 7.06
    • 15
    • Severity 2 - Major
    • 30

      Fix

      The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0/8.20.8 we have removed the contention caused by user login action. See JRASERVER-70468 for more information.

      Issue Summary

      Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:

      1. A large number of threads waiting for ReadLock (action - /rest/internal/2/user/mention/search)
      2. At the same time, a group of thread waiting for a WriteLock (action - /plugins/servlet/samlconsumer)
        both on DirectoryUserIndexer.java.

      This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the UserIndex, but can't as it also waits on another thread which is currently having a ReadLock on UserIndex and doing the search.
      This causes cascading contention and degrades user experience in Jira for other actions (see JSWSERVER-20336).

      Steps to Reproduce

      Details from the customer instance:

      • 300k+ user
      • Massive amount of users logging in simultaneously at the start of the day
      • Lot of action attempting to do @mention users at the same time

      Have not attempted to repro in-house.

      Expected Results

      Jira handles the start-of-the-day login pressure without hiccups

      Actual Results

      • High load average:
      • BLOCKED threads at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag
        thread_dump_9.txt
        85
        thread_dump_10.txt
        84
        thread_dump_11.txt
        85
        thread_dump_12.txt
        84
        thread_dump_13.txt
        84
        thread_dump_14.txt
        85
        
      • These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

      Notes

      Related thread dumps.

      • Synchronised Threads waiting during authentication:
        "http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
           java.lang.Thread.State: BLOCKED (on object monitor)
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
        	- waiting to lock <0x0000000145c5e268> (a java.lang.Object)
        	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
        	at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
        	at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
        	at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
        	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
        	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
        	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
        ...
        
      • Thread which waits for WriteLock at DirectoryUserIndexer.refreshSearcher
        "http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
           java.lang.Thread.State: WAITING (parking)
        	at sun.misc.Unsafe.park(Native Method)
        	- parking to wait for  <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
        	- locked <0x0000000145c5e268> (a java.lang.Object)
        	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
        	at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
        	at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
        	at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
        	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
        	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
        	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
        [...]
        
      • RUNNABLE thread holding the ReadLock and executing the seach
        "http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
           java.lang.Thread.State: RUNNABLE
        	at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
        	- locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
        	at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
        	at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
        ....
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
        ...
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
        ...
        
        • Note: DirectoryUserIndexer.internalSearch

      Workaround

      Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:

      • Descrese number of users in the system (if possible), to reduce the cost of computing @mention
      • Increase CPU cores to improve the time spent in the lock
      • Identify bot user accounts creating unnatural login pressure
      • Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate

        1. 2020-11-30_14-15-20.png
          2020-11-30_14-15-20.png
          337 kB
        2. 2020-11-30_14-14-57.png
          2020-11-30_14-14-57.png
          673 kB
        3. 2020-11-30_14-08-53.png
          2020-11-30_14-08-53.png
          246 kB
        4. 2020-11-30_13-59-25.png
          2020-11-30_13-59-25.png
          543 kB
        5. 2020-11-30_07-48-50.png
          2020-11-30_07-48-50.png
          582 kB
        6. 2020-11-25_20-07-42.png
          2020-11-25_20-07-42.png
          260 kB

            Assignee:
            Maciej Swinarski (Inactive)
            Reporter:
            Suddha
            Votes:
            4 Vote for this issue
            Watchers:
            25 Start watching this issue

              Created:
              Updated:
              Resolved: