Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-64384

ActiveUsersCountForLicense cache is populated inefficiently leading to performance degradation across the application

      Summary

      JIRA populates activeUsersCountForLicense cache in inefficient way

      Environment

      • Large number of users: 10k+
        // SQL to get all users
        select directory_id,count(1) from cwd_user group by directory_id;    
        //SQL to get active users
        select directory_id,count(1) from cwd_user where active=1 group by directory_id;      
        

      Steps to Reproduce

      1. TBD
      2. Enable debug logging for com.atlassian.jira.application.DefaultApplicationRoleManager

      Expected Results

      JIRA populates the cache fast.

      Actual Results

      Cache population is slow, majority of the incoming http threads are waiting for getUserCount method (activeUsersCountForLicense cache) to be populated.

      Enabling debug logging will show cache population time:

      2017-03-10 12:16:16,700 http-nio-8080-exec-539 DEBUG user 736x6651698x30 1se 1.0.0.0 /secure/MyJiraHome.jspa [c.a.jira.application.DefaultApplicationRoleManager] JIRA Software count: 4800
      2017-03-10 12:16:16,700 http-nio-8080-exec-539 DEBUG user 736x6651698x30 1se 1.0.0.0 /secure/MyJiraHome.jspa [c.a.jira.application.DefaultApplicationRoleManager] Count application users: 7.938 s JIRA Software
      

      Notes

      Note that JIRA with license JIRA Software (Server) Unlimited Users is not affected, as license check is always true.
      (License is checked in 7.2.9 - JRASERVER-65462)

      Threads waiting for getUserCount method:

       grep -A 20 'http-nio' jira_threads.1488804924.txt | grep -A 19 'java.lang.Thread.State: WAITING' | grep -c 'com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount'
      73
      

      Example of the waiting thread (one of 73):

      "http-nio-8080-exec-756" #365303 daemon prio=5 os_prio=0 tid=0x00007fda1801e000 nid=0x3664 waiting on condition [0x00007fde3abd3000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x000000055dce6220> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      ..
      	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:75)
      	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:97)
      	at com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount(DefaultApplicationRoleManager.java:407)
      	at com.atlassian.jira.application.DefaultApplicationRoleManager.isRoleLimitExceeded(DefaultApplicationRoleManager.java:258)
      	at com.atlassian.jira.application.ApplicationAuthorizationServiceImpl.getLicensingExactErrors(ApplicationAuthorizationServiceImpl.java:135)
      	at com.atlassian.jira.application.ApplicationAuthorizationServiceImpl.getLicensingAccessErrors(ApplicationAuthorizationServiceImpl.java:78)
      ...
      	at com.atlassian.greenhopper.license.SoftwareAuthorizationServiceImpl.getSoftwareLicensingAccessErrors(SoftwareAuthorizationServiceImpl.java:76)
      	at com.atlassian.greenhopper.license.SoftwareAuthorizationServiceImpl.verifySoftwareLicense(SoftwareAuthorizationServiceImpl.java:50)
      	at com.pyxis.greenhopper.jira.license.GreenHopperLicenseManagerImpl.verify(GreenHopperLicenseManagerImpl.java:73)
      	at com.pyxis.greenhopper.jira.license.GreenHopperLicenseManagerImpl.isValid(GreenHopperLicenseManagerImpl.java:94)
      	at com.atlassian.greenhopper.license.LicenseServiceImpl.getLicenseInfo(LicenseServiceImpl.java:46)
      	at com.atlassian.greenhopper.license.LicenseServiceImpl.isLicensed(LicenseServiceImpl.java:66)
      ...
      

      This bug can be amplificated by JRA-64839. JIRA outage will be caused by combination of 2 factors:

      • Too many cache flush events, see JRA-64839
        • caused by allowing some user Application access
      • Slow cache population, current ticket
      • Those two thread will fight for cache, meantime all threads will wait (this instability could be 5+ min)
        • one will flush
          "http-nio-8080-exec-828" #365452 daemon prio=5 os_prio=0 tid=0x00007fda18083000 nid=0x38e0 waiting on condition [0x00007fde368df000]
             java.lang.Thread.State: WAITING (parking)
          ...
          	at com.atlassian.cache.ehcache.LoadingCache.removeAll(LoadingCache.java:320)
          ...
          	at com.atlassian.jira.application.DefaultApplicationRoleManager.clearUserCounts(DefaultApplicationRoleManager.java:651)
          
        • another will populate:
          "http-nio-8080-exec-831" #365455 daemon prio=5 os_prio=0 tid=0x00007fda540a8800 nid=0x38e3 runnable [0x00007fde36485000]
             java.lang.Thread.State: RUNNABLE
          ..
          	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.processUsers(DelegatingUserDao.java:58).
          ...
          	at com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:100)
          	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:76)
          	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:97)
          	at com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount(DefaultApplicationRoleManager.java:407)
          

      Workaround

      If possible reduce total number of users not used in JIRA (delete user or reduce LDAP sync scope)
      JIRA uses following SQL to load users in processUsers method:

      SELECT U.ID, U.directory_id, U.user_name, U.lower_user_name, U.active, U.created_date, U.updated_date, U.first_name, U.lower_first_name, U.last_name, U.lower_last_name, U.display_name, U.lower_display_name, U.email_address, U.lower_email_address, U.CREDENTIAL, U.deleted_externally, U.EXTERNAL_ID, D.active, D.directory_position FROM cwd_user U INNER JOIN cwd_directory D ON U.directory_id = D.ID WHERE D.active <> 0 ORDER BY U.lower_user_name, D.directory_position;
      

      Note on Fix

      We partially mitigated the problem by implementing JRASERVER-65352, this introduces timeout to the cache population operation and optional async loading.
      We will monitor the problem and would review it again if it gets more traction.
       
       

            [JRASERVER-64384] ActiveUsersCountForLicense cache is populated inefficiently leading to performance degradation across the application

            Thiago Masutti made changes -
            Remote Link Original: This issue links to "RUM-1767 (JIRA Server)" [ 286005 ] New: This issue links to "RUM-1767 (JIRA Server (Bulldog))" [ 286005 ]
            Carlos Moya (Inactive) made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 601856 ]
            Roman Kolosovskiy (Inactive) made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 601959 ]
            Maciej Swinarski (Inactive) made changes -
            Labels Original: affects-server performance-scalability pse-request New: affects-server deltaUserLicense performance-scalability pse-request
            Andriy Yakovlev [Atlassian] made changes -
            Link New: This issue relates to JRASERVER-73032 [ JRASERVER-73032 ]
            Deniz Secilir made changes -
            Description Original: h3. Summary

            JIRA populates *activeUsersCountForLicense* cache in inefficient way
            h3. Environment
             * Large number of users: 10k+
            {code:java}
            // SQL to get all users
            select directory_id,count(1) from cwd_user group by directory_id;
            //SQL to get active users
            select directory_id,count(1) from cwd_user where active=1 group by directory_id;
            {code}

            h3. Steps to Reproduce
             # TBD
             # Enable debug logging for _com.atlassian.jira.application.DefaultApplicationRoleManager_

            h3. Expected Results

            JIRA populates the cache fast.
            h3. Actual Results

            Cache population is slow, majority of the incoming http threads are waiting for {{getUserCount}} method (_activeUsersCountForLicense_ cache) to be populated.

            Enabling debug logging will show cache population time:
            {noformat}
            2017-03-10 12:16:16,700 http-nio-8080-exec-539 DEBUG user 736x6651698x30 1se 1.0.0.0 /secure/MyJiraHome.jspa [c.a.jira.application.DefaultApplicationRoleManager] JIRA Software count: 4800
            2017-03-10 12:16:16,700 http-nio-8080-exec-539 DEBUG user 736x6651698x30 1se 1.0.0.0 /secure/MyJiraHome.jspa [c.a.jira.application.DefaultApplicationRoleManager] Count application users: 7.938 s JIRA Software
            {noformat}
            h3. Notes

            Note that JIRA with license _JIRA Software (Server) Unlimited Users_ is not affected, as license check is always true.
             (License is checked in 7.2.9 - JRASERVER-65462)

            Threads waiting for {{getUserCount}} method:
            {noformat}
             grep -A 20 'http-nio' jira_threads.1488804924.txt | grep -A 19 'java.lang.Thread.State: WAITING' | grep -c 'com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount'
            73
            {noformat}
            Example of the waiting thread (one of 73):
            {code:java}
            "http-nio-8080-exec-756" #365303 daemon prio=5 os_prio=0 tid=0x00007fda1801e000 nid=0x3664 waiting on condition [0x00007fde3abd3000]
               java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for <0x000000055dce6220> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
            ..
            at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:75)
            at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:97)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount(DefaultApplicationRoleManager.java:407)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.isRoleLimitExceeded(DefaultApplicationRoleManager.java:258)
            at com.atlassian.jira.application.ApplicationAuthorizationServiceImpl.getLicensingExactErrors(ApplicationAuthorizationServiceImpl.java:135)
            at com.atlassian.jira.application.ApplicationAuthorizationServiceImpl.getLicensingAccessErrors(ApplicationAuthorizationServiceImpl.java:78)
            ...
            at com.atlassian.greenhopper.license.SoftwareAuthorizationServiceImpl.getSoftwareLicensingAccessErrors(SoftwareAuthorizationServiceImpl.java:76)
            at com.atlassian.greenhopper.license.SoftwareAuthorizationServiceImpl.verifySoftwareLicense(SoftwareAuthorizationServiceImpl.java:50)
            at com.pyxis.greenhopper.jira.license.GreenHopperLicenseManagerImpl.verify(GreenHopperLicenseManagerImpl.java:73)
            at com.pyxis.greenhopper.jira.license.GreenHopperLicenseManagerImpl.isValid(GreenHopperLicenseManagerImpl.java:94)
            at com.atlassian.greenhopper.license.LicenseServiceImpl.getLicenseInfo(LicenseServiceImpl.java:46)
            at com.atlassian.greenhopper.license.LicenseServiceImpl.isLicensed(LicenseServiceImpl.java:66)
            ...
            {code}
            This bug can be amplificated by JRA-64839. JIRA outage will be caused by combination of 2 factors:
             * Too many cache flush events, see JRA-64839
             ** caused by allowing some user Application access
             * Slow cache population, current ticket
             * Those two thread will fight for cache, meantime all threads will wait (this instability could be 5+ min)
             ** one will flush
            {code:java}
            "http-nio-8080-exec-828" #365452 daemon prio=5 os_prio=0 tid=0x00007fda18083000 nid=0x38e0 waiting on condition [0x00007fde368df000]
               java.lang.Thread.State: WAITING (parking)
            ...
            at com.atlassian.cache.ehcache.LoadingCache.removeAll(LoadingCache.java:320)
            ...
            at com.atlassian.jira.application.DefaultApplicationRoleManager.clearUserCounts(DefaultApplicationRoleManager.java:651)
            {code}

             ** another will populate:
            {code:java}
            "http-nio-8080-exec-831" #365455 daemon prio=5 os_prio=0 tid=0x00007fda540a8800 nid=0x38e3 runnable [0x00007fde36485000]
               java.lang.Thread.State: RUNNABLE
            ..
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.processUsers(DelegatingUserDao.java:58).
            ...
            at com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:100)
            at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:76)
            at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:97)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount(DefaultApplicationRoleManager.java:407)
            {code}

            h3. Workaround

            If possible reduce total number of users not used in JIRA (delete user or reduce LDAP sync scope)
             JIRA uses following SQL to load users in {{processUsers}} method:
            {code:java}
            SELECT U.ID, U.directory_id, U.user_name, U.lower_user_name, U.active, U.created_date, U.updated_date, U.first_name, U.lower_first_name, U.last_name, U.lower_last_name, U.display_name, U.lower_display_name, U.email_address, U.lower_email_address, U.CREDENTIAL, U.deleted_externally, U.EXTERNAL_ID, D.active, D.directory_position FROM cwd_user U INNER JOIN cwd_directory D ON U.directory_id = D.ID WHERE D.active <> 0 ORDER BY U.lower_user_name, D.directory_position;
            {code}
            h3. Note on Fix

            We partially mitigated the problem by implementing JRASERVER-65352, this introduces timeout to the cache population operation and optional async loading.
             We will monitor the problem and would review it again if it gets more traction.
             
            New: h3. Summary

            JIRA populates *activeUsersCountForLicense* cache in inefficient way
            h3. Environment
             * Large number of users: 10k+
            {code:java}
            // SQL to get all users
            select directory_id,count(1) from cwd_user group by directory_id;
            //SQL to get active users
            select directory_id,count(1) from cwd_user where active=1 group by directory_id;
            {code}

            h3. Steps to Reproduce
             # TBD
             # Enable debug logging for _com.atlassian.jira.application.DefaultApplicationRoleManager_

            h3. Expected Results

            JIRA populates the cache fast.
            h3. Actual Results

            Cache population is slow, majority of the incoming http threads are waiting for {{getUserCount}} method (_activeUsersCountForLicense_ cache) to be populated.

            Enabling debug logging will show cache population time:
            {noformat}
            2017-03-10 12:16:16,700 http-nio-8080-exec-539 DEBUG user 736x6651698x30 1se 1.0.0.0 /secure/MyJiraHome.jspa [c.a.jira.application.DefaultApplicationRoleManager] JIRA Software count: 4800
            2017-03-10 12:16:16,700 http-nio-8080-exec-539 DEBUG user 736x6651698x30 1se 1.0.0.0 /secure/MyJiraHome.jspa [c.a.jira.application.DefaultApplicationRoleManager] Count application users: 7.938 s JIRA Software
            {noformat}
            h3. Notes

            Note that JIRA with license _JIRA Software (Server) Unlimited Users_ is not affected, as license check is always true.
             (License is checked in 7.2.9 - JRASERVER-65462)

            Threads waiting for {{getUserCount}} method:
            {noformat}
             grep -A 20 'http-nio' jira_threads.1488804924.txt | grep -A 19 'java.lang.Thread.State: WAITING' | grep -c 'com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount'
            73
            {noformat}
            Example of the waiting thread (one of 73):
            {code:java}
            "http-nio-8080-exec-756" #365303 daemon prio=5 os_prio=0 tid=0x00007fda1801e000 nid=0x3664 waiting on condition [0x00007fde3abd3000]
               java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for <0x000000055dce6220> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
            ..
            at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:75)
            at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:97)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount(DefaultApplicationRoleManager.java:407)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.isRoleLimitExceeded(DefaultApplicationRoleManager.java:258)
            at com.atlassian.jira.application.ApplicationAuthorizationServiceImpl.getLicensingExactErrors(ApplicationAuthorizationServiceImpl.java:135)
            at com.atlassian.jira.application.ApplicationAuthorizationServiceImpl.getLicensingAccessErrors(ApplicationAuthorizationServiceImpl.java:78)
            ...
            at com.atlassian.greenhopper.license.SoftwareAuthorizationServiceImpl.getSoftwareLicensingAccessErrors(SoftwareAuthorizationServiceImpl.java:76)
            at com.atlassian.greenhopper.license.SoftwareAuthorizationServiceImpl.verifySoftwareLicense(SoftwareAuthorizationServiceImpl.java:50)
            at com.pyxis.greenhopper.jira.license.GreenHopperLicenseManagerImpl.verify(GreenHopperLicenseManagerImpl.java:73)
            at com.pyxis.greenhopper.jira.license.GreenHopperLicenseManagerImpl.isValid(GreenHopperLicenseManagerImpl.java:94)
            at com.atlassian.greenhopper.license.LicenseServiceImpl.getLicenseInfo(LicenseServiceImpl.java:46)
            at com.atlassian.greenhopper.license.LicenseServiceImpl.isLicensed(LicenseServiceImpl.java:66)
            ...
            {code}
            This bug can be amplificated by JRA-64839. JIRA outage will be caused by combination of 2 factors:
             * Too many cache flush events, see JRA-64839
             ** caused by allowing some user Application access
             * Slow cache population, current ticket
             * Those two thread will fight for cache, meantime all threads will wait (this instability could be 5+ min)
             ** one will flush
            {code:java}
            "http-nio-8080-exec-828" #365452 daemon prio=5 os_prio=0 tid=0x00007fda18083000 nid=0x38e0 waiting on condition [0x00007fde368df000]
               java.lang.Thread.State: WAITING (parking)
            ...
            at com.atlassian.cache.ehcache.LoadingCache.removeAll(LoadingCache.java:320)
            ...
            at com.atlassian.jira.application.DefaultApplicationRoleManager.clearUserCounts(DefaultApplicationRoleManager.java:651)
            {code}

             *
             ** another will populate:
            {code:java}
            "http-nio-8080-exec-831" #365455 daemon prio=5 os_prio=0 tid=0x00007fda540a8800 nid=0x38e3 runnable [0x00007fde36485000]
               java.lang.Thread.State: RUNNABLE
            ..
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.processUsers(DelegatingUserDao.java:58).
            ...
            at com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:100)
            at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:76)
            at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:97)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount(DefaultApplicationRoleManager.java:407)
            {code}

            h3. Workaround

            If possible reduce total number of users not used in JIRA (delete user or reduce LDAP sync scope)
             JIRA uses following SQL to load users in {{processUsers}} method:
            {code:java}
            SELECT U.ID, U.directory_id, U.user_name, U.lower_user_name, U.active, U.created_date, U.updated_date, U.first_name, U.lower_first_name, U.last_name, U.lower_last_name, U.display_name, U.lower_display_name, U.email_address, U.lower_email_address, U.CREDENTIAL, U.deleted_externally, U.EXTERNAL_ID, D.active, D.directory_position FROM cwd_user U INNER JOIN cwd_directory D ON U.directory_id = D.ID WHERE D.active <> 0 ORDER BY U.lower_user_name, D.directory_position;
            {code}
            h3. Note on Fix

            We partially mitigated the problem by implementing JRASERVER-65352, this introduces timeout to the cache population operation and optional async loading.
             We will monitor the problem and would review it again if it gets more traction.
              
             
            Deniz Secilir made changes -
            Description Original: h3. Summary
            JIRA populates *activeUsersCountForLicense* cache in inefficient way

            h3. Environment
            * Large number of users: 10k+
            {code:borderStyle=solid}
            // SQL to get all users
            select directory_id,count(1) from cwd_user group by directory_id;
            //SQL to get active users
            select directory_id,count(1) from cwd_user where active=1 group by directory_id;
            {code}

            h3. Steps to Reproduce
            # TBD
            # Enable debug logging for _com.atlassian.jira.application.DefaultApplicationRoleManager_

            h3. Expected Results
            JIRA populates the cache fast.

            h3. Actual Results
            Cache population is slow, majority of the incoming http threads are waiting for {{getUserCount}} method (_activeUsersCountForLicense_ cache) to be populated.

            Enabling debug logging will show cache population time:
            {noformat}
            2017-03-10 12:16:16,700 http-nio-8080-exec-539 DEBUG user 736x6651698x30 1se 1.0.0.0 /secure/MyJiraHome.jspa [c.a.jira.application.DefaultApplicationRoleManager] JIRA Software count: 4800
            2017-03-10 12:16:16,700 http-nio-8080-exec-539 DEBUG user 736x6651698x30 1se 1.0.0.0 /secure/MyJiraHome.jspa [c.a.jira.application.DefaultApplicationRoleManager] Count application users: 7.938 s JIRA Software
            {noformat}
             
            h3. Notes
            Note that JIRA with license _JIRA Software (Server) Unlimited Users_ is not affected, as license check is always true.
            (License is checked in 7.2.9 - JRASERVER-65462)

            Threads waiting for {{getUserCount}} method:
            {noformat}
             grep -A 20 'http-nio' jira_threads.1488804924.txt | grep -A 19 'java.lang.Thread.State: WAITING' | grep -c 'com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount'
            73
            {noformat}

            Example of the waiting thread (one of 73):
            {code:borderStyle=solid}
            "http-nio-8080-exec-756" #365303 daemon prio=5 os_prio=0 tid=0x00007fda1801e000 nid=0x3664 waiting on condition [0x00007fde3abd3000]
               java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for <0x000000055dce6220> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
            ..
            at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:75)
            at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:97)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount(DefaultApplicationRoleManager.java:407)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.isRoleLimitExceeded(DefaultApplicationRoleManager.java:258)
            at com.atlassian.jira.application.ApplicationAuthorizationServiceImpl.getLicensingExactErrors(ApplicationAuthorizationServiceImpl.java:135)
            at com.atlassian.jira.application.ApplicationAuthorizationServiceImpl.getLicensingAccessErrors(ApplicationAuthorizationServiceImpl.java:78)
            ...
            at com.atlassian.greenhopper.license.SoftwareAuthorizationServiceImpl.getSoftwareLicensingAccessErrors(SoftwareAuthorizationServiceImpl.java:76)
            at com.atlassian.greenhopper.license.SoftwareAuthorizationServiceImpl.verifySoftwareLicense(SoftwareAuthorizationServiceImpl.java:50)
            at com.pyxis.greenhopper.jira.license.GreenHopperLicenseManagerImpl.verify(GreenHopperLicenseManagerImpl.java:73)
            at com.pyxis.greenhopper.jira.license.GreenHopperLicenseManagerImpl.isValid(GreenHopperLicenseManagerImpl.java:94)
            at com.atlassian.greenhopper.license.LicenseServiceImpl.getLicenseInfo(LicenseServiceImpl.java:46)
            at com.atlassian.greenhopper.license.LicenseServiceImpl.isLicensed(LicenseServiceImpl.java:66)
            ...
            {code}

            This bug can be amplificated by JRA-64839. JIRA outage will be caused by combination of 2 factors:
            * Too many cache flush events, see JRA-64839
            ** caused by allowing some user Application access
            * Slow cache population, current ticket
            * Those two thread will fight for cache, meantime all threads will wait (this instability could be 5+ min)
            ** one will flush
            {code:borderStyle=solid}
            "http-nio-8080-exec-828" #365452 daemon prio=5 os_prio=0 tid=0x00007fda18083000 nid=0x38e0 waiting on condition [0x00007fde368df000]
               java.lang.Thread.State: WAITING (parking)
            ...
            at com.atlassian.cache.ehcache.LoadingCache.removeAll(LoadingCache.java:320)
            ...
            at com.atlassian.jira.application.DefaultApplicationRoleManager.clearUserCounts(DefaultApplicationRoleManager.java:651)
            {code}
            ** another will populate:
            {code:borderStyle=solid}
            "http-nio-8080-exec-831" #365455 daemon prio=5 os_prio=0 tid=0x00007fda540a8800 nid=0x38e3 runnable [0x00007fde36485000]
               java.lang.Thread.State: RUNNABLE
            ..
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.processUsers(DelegatingUserDao.java:58).
            ...
            at com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:100)
            at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:76)
            at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:97)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount(DefaultApplicationRoleManager.java:407)
            {code}

            h3.Workaround
            If possible reduce total number of users not used in JIRA (delete user or reduce LDAP sync scope)
            JIRA uses following SQL to load users in {{processUsers}} method:
            {code:borderStyle=solid}
            SELECT U.ID, U.directory_id, U.user_name, U.lower_user_name, U.active, U.created_date, U.updated_date, U.first_name, U.lower_first_name, U.last_name, U.lower_last_name, U.display_name, U.lower_display_name, U.email_address, U.lower_email_address, U.CREDENTIAL, U.deleted_externally, U.EXTERNAL_ID, D.active, D.directory_position FROM cwd_user U INNER JOIN cwd_directory D ON U.directory_id = D.ID WHERE D.active <> 0 ORDER BY U.lower_user_name, D.directory_position;
            {code}

            h3. Note on Fix
            We partially mitigated the problem by implementing JRASERVER-65352, this introduces timeout to the cache population operation and optional async loading.
            We will monitor the problem and would review it again if it gets more traction.
            New: h3. Summary

            JIRA populates *activeUsersCountForLicense* cache in inefficient way
            h3. Environment
             * Large number of users: 10k+
            {code:java}
            // SQL to get all users
            select directory_id,count(1) from cwd_user group by directory_id;
            //SQL to get active users
            select directory_id,count(1) from cwd_user where active=1 group by directory_id;
            {code}

            h3. Steps to Reproduce
             # TBD
             # Enable debug logging for _com.atlassian.jira.application.DefaultApplicationRoleManager_

            h3. Expected Results

            JIRA populates the cache fast.
            h3. Actual Results

            Cache population is slow, majority of the incoming http threads are waiting for {{getUserCount}} method (_activeUsersCountForLicense_ cache) to be populated.

            Enabling debug logging will show cache population time:
            {noformat}
            2017-03-10 12:16:16,700 http-nio-8080-exec-539 DEBUG user 736x6651698x30 1se 1.0.0.0 /secure/MyJiraHome.jspa [c.a.jira.application.DefaultApplicationRoleManager] JIRA Software count: 4800
            2017-03-10 12:16:16,700 http-nio-8080-exec-539 DEBUG user 736x6651698x30 1se 1.0.0.0 /secure/MyJiraHome.jspa [c.a.jira.application.DefaultApplicationRoleManager] Count application users: 7.938 s JIRA Software
            {noformat}
            h3. Notes

            Note that JIRA with license _JIRA Software (Server) Unlimited Users_ is not affected, as license check is always true.
             (License is checked in 7.2.9 - JRASERVER-65462)

            Threads waiting for {{getUserCount}} method:
            {noformat}
             grep -A 20 'http-nio' jira_threads.1488804924.txt | grep -A 19 'java.lang.Thread.State: WAITING' | grep -c 'com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount'
            73
            {noformat}
            Example of the waiting thread (one of 73):
            {code:java}
            "http-nio-8080-exec-756" #365303 daemon prio=5 os_prio=0 tid=0x00007fda1801e000 nid=0x3664 waiting on condition [0x00007fde3abd3000]
               java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for <0x000000055dce6220> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
            ..
            at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:75)
            at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:97)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount(DefaultApplicationRoleManager.java:407)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.isRoleLimitExceeded(DefaultApplicationRoleManager.java:258)
            at com.atlassian.jira.application.ApplicationAuthorizationServiceImpl.getLicensingExactErrors(ApplicationAuthorizationServiceImpl.java:135)
            at com.atlassian.jira.application.ApplicationAuthorizationServiceImpl.getLicensingAccessErrors(ApplicationAuthorizationServiceImpl.java:78)
            ...
            at com.atlassian.greenhopper.license.SoftwareAuthorizationServiceImpl.getSoftwareLicensingAccessErrors(SoftwareAuthorizationServiceImpl.java:76)
            at com.atlassian.greenhopper.license.SoftwareAuthorizationServiceImpl.verifySoftwareLicense(SoftwareAuthorizationServiceImpl.java:50)
            at com.pyxis.greenhopper.jira.license.GreenHopperLicenseManagerImpl.verify(GreenHopperLicenseManagerImpl.java:73)
            at com.pyxis.greenhopper.jira.license.GreenHopperLicenseManagerImpl.isValid(GreenHopperLicenseManagerImpl.java:94)
            at com.atlassian.greenhopper.license.LicenseServiceImpl.getLicenseInfo(LicenseServiceImpl.java:46)
            at com.atlassian.greenhopper.license.LicenseServiceImpl.isLicensed(LicenseServiceImpl.java:66)
            ...
            {code}
            This bug can be amplificated by JRA-64839. JIRA outage will be caused by combination of 2 factors:
             * Too many cache flush events, see JRA-64839
             ** caused by allowing some user Application access
             * Slow cache population, current ticket
             * Those two thread will fight for cache, meantime all threads will wait (this instability could be 5+ min)
             ** one will flush
            {code:java}
            "http-nio-8080-exec-828" #365452 daemon prio=5 os_prio=0 tid=0x00007fda18083000 nid=0x38e0 waiting on condition [0x00007fde368df000]
               java.lang.Thread.State: WAITING (parking)
            ...
            at com.atlassian.cache.ehcache.LoadingCache.removeAll(LoadingCache.java:320)
            ...
            at com.atlassian.jira.application.DefaultApplicationRoleManager.clearUserCounts(DefaultApplicationRoleManager.java:651)
            {code}

             ** another will populate:
            {code:java}
            "http-nio-8080-exec-831" #365455 daemon prio=5 os_prio=0 tid=0x00007fda540a8800 nid=0x38e3 runnable [0x00007fde36485000]
               java.lang.Thread.State: RUNNABLE
            ..
            at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.processUsers(DelegatingUserDao.java:58).
            ...
            at com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:100)
            at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:76)
            at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:97)
            at com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount(DefaultApplicationRoleManager.java:407)
            {code}

            h3. Workaround

            If possible reduce total number of users not used in JIRA (delete user or reduce LDAP sync scope)
             JIRA uses following SQL to load users in {{processUsers}} method:
            {code:java}
            SELECT U.ID, U.directory_id, U.user_name, U.lower_user_name, U.active, U.created_date, U.updated_date, U.first_name, U.lower_first_name, U.last_name, U.lower_last_name, U.display_name, U.lower_display_name, U.email_address, U.lower_email_address, U.CREDENTIAL, U.deleted_externally, U.EXTERNAL_ID, D.active, D.directory_position FROM cwd_user U INNER JOIN cwd_directory D ON U.directory_id = D.ID WHERE D.active <> 0 ORDER BY U.lower_user_name, D.directory_position;
            {code}
            h3. Note on Fix

            We partially mitigated the problem by implementing JRASERVER-65352, this introduces timeout to the cache population operation and optional async loading.
             We will monitor the problem and would review it again if it gets more traction.
             
            Michal Wyrzykowski made changes -
            Link New: This issue relates to JRASERVER-71231 [ JRASERVER-71231 ]
            Andriy Yakovlev [Atlassian] made changes -
            Link New: This issue relates to JRASERVER-69913 [ JRASERVER-69913 ]
            set-jac-bot made changes -
            Summary Original: JIRA populates activeUsersCountForLicense cache in inefficient way New: ActiveUsersCountForLicense cache is populated inefficiently leading to performance degradation across the application

              lwlodarczyk Lukasz Wlodarczyk
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Affected customers:
              4 This affects my team
              Watchers:
              24 Start watching this issue

                Created:
                Updated:
                Resolved: