Uploaded image for project: 'Jira Software Server and Data Center'
  1. Jira Software Server and Data Center
  2. JSWSERVER-20844

Severe performance degradation for user mentions and user login action due to contention in IndexedUserDao

    XMLWordPrintable

Details

    Description

      Fix

      The main fix for this problem was delivered in 8.19.1 where we got rid of the contention caused by mentions. Note that in 8.22.0/8.20.8 we have removed the contention caused by user login action. See JRASERVER-70468 for more information.

      Issue Summary

      Mostly during the start of the day (when Jira has heavy login pressure from users), customer experienced a severe performance degradation (up to the point that system looks unresponsive). Upon further digging into thread dumps, we found 2 sets of problems:

      1. A large number of threads waiting for ReadLock (action - /rest/internal/2/user/mention/search)
      2. At the same time, a group of thread waiting for a WriteLock (action - /plugins/servlet/samlconsumer)
        both on DirectoryUserIndexer.java.

      This is caused by contention in the code and leads to situation, where a lot of threads (waiting for a ReadLock) are waiting on one thread that's supposed to get a WriteLock to update the UserIndex, but can't as it also waits on another thread which is currently having a ReadLock on UserIndex and doing the search.
      This causes cascading contention and degrades user experience in Jira for other actions (see JSWSERVER-20336).

      Steps to Reproduce

      Details from the customer instance:

      • 300k+ user
      • Massive amount of users logging in simultaneously at the start of the day
      • Lot of action attempting to do @mention users at the same time

      Have not attempted to repro in-house.

      Expected Results

      Jira handles the start-of-the-day login pressure without hiccups

      Actual Results

      • High load average:
      • BLOCKED threads at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag
        thread_dump_9.txt
        85
        thread_dump_10.txt
        84
        thread_dump_11.txt
        85
        thread_dump_12.txt
        84
        thread_dump_13.txt
        84
        thread_dump_14.txt
        85
        
      • These threads are blocked by one WAITING thread which needs to get a WriteLock at at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)

      Notes

      Related thread dumps.

      • Synchronised Threads waiting during authentication:
        "http-nio-9080-exec-637" #261926 daemon prio=5 os_prio=0 tid=0x00007fb344586ef0 nid=0x5df0 waiting for monitor entry [0x00007fb28d24b000]
           java.lang.Thread.State: BLOCKED (on object monitor)
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:318)
        	- waiting to lock <0x0000000145c5e268> (a java.lang.Object)
        	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
        	at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
        	at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
        	at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
        	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
        	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
        	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
        ...
        
      • Thread which waits for WriteLock at DirectoryUserIndexer.refreshSearcher
        "http-nio-9080-exec-611" #256200 daemon prio=5 os_prio=0 tid=0x00007fb3503bfaf0 nid=0xe398 waiting on condition [0x00007fb28ce47000]
           java.lang.Thread.State: WAITING (parking)
        	at sun.misc.Unsafe.park(Native Method)
        	- parking to wait for  <0x0000000145c5e1e8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.refreshSearcher(DirectoryUserIndexer.java:127)
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.index(DirectoryUserIndexer.java:159)
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.update(IndexedUserDao.java:320)
        	- locked <0x0000000145c5e268> (a java.lang.Object)
        	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.update(DelegatingUserDao.java:98)
        	at com.atlassian.jira.crowd.embedded.ofbiz.SwitchingUserDao.update(SwitchingUserDao.java:30)
        	at com.atlassian.crowd.directory.CachingDirectory.updateUser(CachingDirectory.java:142)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserAndSetActiveFlag(DbCachingRemoteDirectory.java:324)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.updateUserFromRemoteDirectory(DbCachingRemoteDirectory.java:266)
        	at com.atlassian.crowd.directory.RemoteDirectory.userAuthenticated(RemoteDirectory.java:592)
        	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.userAuthenticated(DbCachingRemoteDirectory.java:281)
        	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.userAuthenticated(DirectoryManagerGeneric.java:278)
        	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.userAuthenticated(ApplicationServiceGeneric.java:2357)
        	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.userAuthenticated(CrowdServiceImpl.java:109)
        [...]
        
      • RUNNABLE thread holding the ReadLock and executing the seach
        "http-nio-9080-exec-635" #261922 daemon prio=5 os_prio=0 tid=0x00007fb35030d5f0 nid=0x5dec runnable [0x00007fb28e3ab000]
           java.lang.Thread.State: RUNNABLE
        	at org.apache.lucene.store.RAMFile.numBuffers(RAMFile.java:68)
        	- locked <0x00000001d3000040> (a org.apache.lucene.store.RAMFile)
        	at org.apache.lucene.store.RAMInputStream.setCurrentBuffer(RAMInputStream.java:125)
        	at org.apache.lucene.store.RAMInputStream.nextBuffer(RAMInputStream.java:119)
        ....
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.internalSearch(DirectoryUserIndexer.java:262)
        	at com.atlassian.jira.bc.user.search.DirectoryUserIndexer.search(DirectoryUserIndexer.java:228)
        ...
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.trySearching(IndexedUserDao.java:426)
        	at com.atlassian.jira.crowd.embedded.ofbiz.IndexedUserDao.search(IndexedUserDao.java:417)
        ...
        
        • Note: DirectoryUserIndexer.internalSearch

      Workaround

      Since problem is caused by a number of factors, we suggest to have multi-factor approach to reduce it:

      • Descrese number of users in the system (if possible), to reduce the cost of computing @mention
      • Increase CPU cores to improve the time spent in the lock
      • Identify bot user accounts creating unnatural login pressure
      • Increase session timeout or remove bot-killer to reduce frequency of users needing to login/authenticate

      Attachments

        1. 2020-11-25_20-07-42.png
          260 kB
          Suddha
        2. 2020-11-30_07-48-50.png
          582 kB
          Suddha
        3. 2020-11-30_13-59-25.png
          543 kB
          Suddha
        4. 2020-11-30_14-08-53.png
          246 kB
          Suddha
        5. 2020-11-30_14-14-57.png
          673 kB
          Suddha
        6. 2020-11-30_14-15-20.png
          337 kB
          Suddha

        Issue Links

          Activity

            People

              mswinarski Maciej Swinarski
              smitra2@atlassian.com Suddha
              Votes:
              4 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: