IMPORTANT: JAC is a Public system and anyone on the internet will be able to view the data in the created JAC tickets. Please don’t include Customer or Sensitive data in the JAC ticket.
Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-63515

JIRA Datacenter node with large number of user have long start-up time due to slow cache population

      Summary

      In JIRA Datacenter cluster with large number of user each time you restart a node it will take a long time to start-up to slow user cache population.

      Environment

      • JIRA DC
      • Large number of users: 40k+
      • Start time increases with large number of nodes

      Steps to Reproduce

      1. Restart node in JDC
      2. Check start-up time

      Expected Results

      Node starts within reasonable time ~4 minutes

      Actual Results

      Node starts within 10+ minutes
      You can see that from logs:

      2016-12-09 21:59:05,441 localhost-startStop-1 INFO      [atlassian.jira.startup.JiraStartupLogger] 
      JIRA starting...
      
      2016-12-09 22:01:21,773 localhost-startStop-1 INFO      [jira.upgrade.util.BuildNumberDao] Setting current version to 6.4.14
      ...
      2016-12-09 22:05:37,982 localhost-startStop-1 INFO      [atlassian.jira.scheduler.JiraSchedulerLauncher] Starting the JIRA Scheduler....
      2016-12-09 22:05:38,040 localhost-startStop-1 INFO      [atlassian.jira.scheduler.JiraSchedulerLauncher] JIRA Scheduler started.
      2016-12-09 22:10:22,747 localhost-startStop-1 INFO      [atlassian.jira.startup.LauncherContextListener] Memory Usage:
      Dec 09, 2016 10:10:22 PM org.apache.coyote.AbstractProtocol start
      INFO: Starting ProtocolHandler ["http-bio-8080"]
      

      Notes

      Thread dump taken during start-up shows following.

      • Thread: "localhost-startStop-1" is blocked waiting for user cache population:
        "localhost-startStop-1" #17 daemon prio=5 os_prio=0 tid=0x00007f208c002000 nid=0x675d waiting on condition [0x00007f20d84db000]
           java.lang.Thread.State: WAITING (parking)
        	at sun.misc.Unsafe.park(Native Method)
        	- parking to wait for  <0x000000069a63db08> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
        	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        	at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50)
        	at net.sf.ehcache.constructs.blocking.BlockingCache.acquiredLockForKey(BlockingCache.java:196)
        	at net.sf.ehcache.constructs.blocking.BlockingCache.get(BlockingCache.java:158)
        	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:75)
        	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:96)
        	at com.atlassian.jira.user.util.UserUtilImpl.getActiveUserCount(UserUtilImpl.java:721)
        ...
        	at com.atlassian.upm.core.impl.JiraApplicationDescriptor.getActiveUserCount(JiraApplicationDescriptor.java:28)
        	at com.atlassian.upm.license.internal.HostApplicationLicenses.getEditionAndRoleCountForEmbeddedLicense(HostApplicationLicenses.java:121)
        	at com.atlassian.upm.license.internal.ProductLicenses.getEditionAndRoleCountForEmbeddedLicense(ProductLicenses.java:454)
        ...
        	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.getEmbeddedLicenses(HostLicenseUpdatedHandlerImpl.java:415)
        	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.getChangedEmbeddedPluginLicenses(HostLicenseUpdatedHandlerImpl.java:362)
        	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.handleUpdatedHostLicense(HostLicenseUpdatedHandlerImpl.java:131)
        	at com.atlassian.upm.mac.HostLicenseUpdatedEventListener.handleUpdatedHostLicense(HostLicenseUpdatedEventListener.java:108)
        	at com.atlassian.upm.mac.HostLicenseUpdatedEventListener.onStart(HostLicenseUpdatedEventListener.java:67)
        	at com.atlassian.upm.UpmLifecycleManager.ensureStarted(UpmLifecycleManager.java:44)
        	at com.atlassian.upm.UpmLifecycleManager.onStart(UpmLifecycleManager.java:35)
        ...
        	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyLifecycleAwareOfStart(DefaultLifecycleManager.java:100)
        	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyOnStart(DefaultLifecycleManager.java:91)
        	at com.atlassian.sal.jira.lifecycle.JiraLifecycleManager.notifyOnStart(JiraLifecycleManager.java:68)
        	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.start(DefaultLifecycleManager.java:50)
        	- locked <0x0000000698a3edc8> (a com.atlassian.sal.jira.lifecycle.JiraLifecycleManager)
        	at com.atlassian.sal.jira.lifecycle.JiraLifecycleManager.onJiraStart(JiraLifecycleManager.java:62)
        ...
        	at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrEnqueue(DatabaseConfigurationManagerImpl.java:345)
        	at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrWhenDatabaseActivated(DatabaseConfigurationManagerImpl.java:215)
        	at com.atlassian.jira.startup.DefaultJiraLauncher.postDbLaunch(DefaultJiraLauncher.java:124)
        ...
        	at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4992)
        ...
        	at java.lang.Thread.run(Thread.java:745)
        

        Note that thread is waiting for com.atlassian.upm.core.impl.JiraApplicationDescriptor.getActiveUserCount, it could also wait for com.atlassian.jira.user.util.UserUtilImpl.getTotalUserCount

      • Other thread populates the cache and cache is synchronised with other nodes:
        "UpmScheduler:thread-1" #141 prio=5 os_prio=0 tid=0x00007f20920fe800 nid=0x6981 runnable [0x00007f145c2e1000]
           java.lang.Thread.State: RUNNABLE
        	at java.net.SocketInputStream.socketRead0(Native Method)
        ...
        	at net.sf.ehcache.distribution.RMICacheManagerPeerProvider.lookupRemoteCachePeer(RMICacheManagerPeerProvider.java:127)
        	at com.atlassian.jira.cluster.distribution.JiraCacheManagerPeerProvider.listRemoteCachePeers(JiraCacheManagerPeerProvider.java:93)
        ...
        	at net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementPut(RegisteredEventListeners.java:180)
        	at net.sf.ehcache.event.RegisteredEventListeners.notifyElementPut(RegisteredEventListeners.java:158)
        ...
        	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.buildCacheIfRequiredUnderLock(UserOrGroupCache.java:171)
        	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.buildCacheIfRequired(UserOrGroupCache.java:141)
        ...
        	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.getCache(UserOrGroupCache.java:58)
        	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.getCaseInsensitive(UserOrGroupCache.java:78)
        ...
        	at com.atlassian.jira.user.util.UserUtilImpl.getGroupMembers(UserUtilImpl.java:1173)
        ...
        	at com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:100)
        	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:76)
        	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:96)
        	at com.atlassian.jira.user.util.UserUtilImpl.getActiveUserCount(UserUtilImpl.java:721)
        ...
        	at com.atlassian.jira.DefaultHostContextAccessor.doInTransaction(DefaultHostContextAccessor.java:34)
        ...
        	at com.atlassian.sal.core.transaction.HostContextTransactionTemplate.execute(HostContextTransactionTemplate.java:21)
        ...
        	at com.atlassian.upm.schedule.UpmSchedulerImpl$1.run(UpmSchedulerImpl.java:165)
        ...
        	at java.lang.Thread.run(Thread.java:745)
        

      Workaround

      None

            Loading...
            IMPORTANT: JAC is a Public system and anyone on the internet will be able to view the data in the created JAC tickets. Please don’t include Customer or Sensitive data in the JAC ticket.
            Uploaded image for project: 'Jira Data Center'
            1. Jira Data Center
            2. JRASERVER-63515

            JIRA Datacenter node with large number of user have long start-up time due to slow cache population

                Summary

                In JIRA Datacenter cluster with large number of user each time you restart a node it will take a long time to start-up to slow user cache population.

                Environment

                • JIRA DC
                • Large number of users: 40k+
                • Start time increases with large number of nodes

                Steps to Reproduce

                1. Restart node in JDC
                2. Check start-up time

                Expected Results

                Node starts within reasonable time ~4 minutes

                Actual Results

                Node starts within 10+ minutes
                You can see that from logs:

                2016-12-09 21:59:05,441 localhost-startStop-1 INFO      [atlassian.jira.startup.JiraStartupLogger] 
                JIRA starting...
                
                2016-12-09 22:01:21,773 localhost-startStop-1 INFO      [jira.upgrade.util.BuildNumberDao] Setting current version to 6.4.14
                ...
                2016-12-09 22:05:37,982 localhost-startStop-1 INFO      [atlassian.jira.scheduler.JiraSchedulerLauncher] Starting the JIRA Scheduler....
                2016-12-09 22:05:38,040 localhost-startStop-1 INFO      [atlassian.jira.scheduler.JiraSchedulerLauncher] JIRA Scheduler started.
                2016-12-09 22:10:22,747 localhost-startStop-1 INFO      [atlassian.jira.startup.LauncherContextListener] Memory Usage:
                Dec 09, 2016 10:10:22 PM org.apache.coyote.AbstractProtocol start
                INFO: Starting ProtocolHandler ["http-bio-8080"]
                

                Notes

                Thread dump taken during start-up shows following.

                • Thread: "localhost-startStop-1" is blocked waiting for user cache population:
                  "localhost-startStop-1" #17 daemon prio=5 os_prio=0 tid=0x00007f208c002000 nid=0x675d waiting on condition [0x00007f20d84db000]
                     java.lang.Thread.State: WAITING (parking)
                  	at sun.misc.Unsafe.park(Native Method)
                  	- parking to wait for  <0x000000069a63db08> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
                  	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
                  	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
                  	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
                  	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
                  	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
                  	at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50)
                  	at net.sf.ehcache.constructs.blocking.BlockingCache.acquiredLockForKey(BlockingCache.java:196)
                  	at net.sf.ehcache.constructs.blocking.BlockingCache.get(BlockingCache.java:158)
                  	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:75)
                  	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:96)
                  	at com.atlassian.jira.user.util.UserUtilImpl.getActiveUserCount(UserUtilImpl.java:721)
                  ...
                  	at com.atlassian.upm.core.impl.JiraApplicationDescriptor.getActiveUserCount(JiraApplicationDescriptor.java:28)
                  	at com.atlassian.upm.license.internal.HostApplicationLicenses.getEditionAndRoleCountForEmbeddedLicense(HostApplicationLicenses.java:121)
                  	at com.atlassian.upm.license.internal.ProductLicenses.getEditionAndRoleCountForEmbeddedLicense(ProductLicenses.java:454)
                  ...
                  	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.getEmbeddedLicenses(HostLicenseUpdatedHandlerImpl.java:415)
                  	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.getChangedEmbeddedPluginLicenses(HostLicenseUpdatedHandlerImpl.java:362)
                  	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.handleUpdatedHostLicense(HostLicenseUpdatedHandlerImpl.java:131)
                  	at com.atlassian.upm.mac.HostLicenseUpdatedEventListener.handleUpdatedHostLicense(HostLicenseUpdatedEventListener.java:108)
                  	at com.atlassian.upm.mac.HostLicenseUpdatedEventListener.onStart(HostLicenseUpdatedEventListener.java:67)
                  	at com.atlassian.upm.UpmLifecycleManager.ensureStarted(UpmLifecycleManager.java:44)
                  	at com.atlassian.upm.UpmLifecycleManager.onStart(UpmLifecycleManager.java:35)
                  ...
                  	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyLifecycleAwareOfStart(DefaultLifecycleManager.java:100)
                  	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyOnStart(DefaultLifecycleManager.java:91)
                  	at com.atlassian.sal.jira.lifecycle.JiraLifecycleManager.notifyOnStart(JiraLifecycleManager.java:68)
                  	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.start(DefaultLifecycleManager.java:50)
                  	- locked <0x0000000698a3edc8> (a com.atlassian.sal.jira.lifecycle.JiraLifecycleManager)
                  	at com.atlassian.sal.jira.lifecycle.JiraLifecycleManager.onJiraStart(JiraLifecycleManager.java:62)
                  ...
                  	at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrEnqueue(DatabaseConfigurationManagerImpl.java:345)
                  	at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrWhenDatabaseActivated(DatabaseConfigurationManagerImpl.java:215)
                  	at com.atlassian.jira.startup.DefaultJiraLauncher.postDbLaunch(DefaultJiraLauncher.java:124)
                  ...
                  	at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4992)
                  ...
                  	at java.lang.Thread.run(Thread.java:745)
                  

                  Note that thread is waiting for com.atlassian.upm.core.impl.JiraApplicationDescriptor.getActiveUserCount, it could also wait for com.atlassian.jira.user.util.UserUtilImpl.getTotalUserCount

                • Other thread populates the cache and cache is synchronised with other nodes:
                  "UpmScheduler:thread-1" #141 prio=5 os_prio=0 tid=0x00007f20920fe800 nid=0x6981 runnable [0x00007f145c2e1000]
                     java.lang.Thread.State: RUNNABLE
                  	at java.net.SocketInputStream.socketRead0(Native Method)
                  ...
                  	at net.sf.ehcache.distribution.RMICacheManagerPeerProvider.lookupRemoteCachePeer(RMICacheManagerPeerProvider.java:127)
                  	at com.atlassian.jira.cluster.distribution.JiraCacheManagerPeerProvider.listRemoteCachePeers(JiraCacheManagerPeerProvider.java:93)
                  ...
                  	at net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementPut(RegisteredEventListeners.java:180)
                  	at net.sf.ehcache.event.RegisteredEventListeners.notifyElementPut(RegisteredEventListeners.java:158)
                  ...
                  	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.buildCacheIfRequiredUnderLock(UserOrGroupCache.java:171)
                  	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.buildCacheIfRequired(UserOrGroupCache.java:141)
                  ...
                  	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.getCache(UserOrGroupCache.java:58)
                  	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.getCaseInsensitive(UserOrGroupCache.java:78)
                  ...
                  	at com.atlassian.jira.user.util.UserUtilImpl.getGroupMembers(UserUtilImpl.java:1173)
                  ...
                  	at com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:100)
                  	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:76)
                  	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:96)
                  	at com.atlassian.jira.user.util.UserUtilImpl.getActiveUserCount(UserUtilImpl.java:721)
                  ...
                  	at com.atlassian.jira.DefaultHostContextAccessor.doInTransaction(DefaultHostContextAccessor.java:34)
                  ...
                  	at com.atlassian.sal.core.transaction.HostContextTransactionTemplate.execute(HostContextTransactionTemplate.java:21)
                  ...
                  	at com.atlassian.upm.schedule.UpmSchedulerImpl$1.run(UpmSchedulerImpl.java:165)
                  ...
                  	at java.lang.Thread.run(Thread.java:745)
                  

                Workaround

                None

                        Unassigned Unassigned
                        ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
                        Votes:
                        1 Vote for this issue
                        Watchers:
                        7 Start watching this issue

                          Created:
                          Updated:
                          Resolved:

                            Unassigned Unassigned
                            ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
                            Affected customers:
                            1 This affects my team
                            Watchers:
                            7 Start watching this issue

                              Created:
                              Updated:
                              Resolved: