[JRASERVER-63515] JIRA Datacenter node with large number of user have long start-up time due to slow cache population - Create and track feature requests for Atlassian products.

Type: Bug
Resolution: Fixed
Priority: Low (View bug fix roadmap)
Fix Version/s: 7.3.4, 7.2.9
Affects Version/s: 6.4.14, 7.2.8
Component/s: System Administration - General Configuration
Labels:

Introduced in Version:
6.04
Support reference count:
8
Symptom Severity:
Severity 2 - Major
UIS:
10
Bug Fix Policy:
View Atlassian Server bug fix policy

Summary

In JIRA Datacenter cluster with large number of user each time you restart a node it will take a long time to start-up to slow user cache population.

Environment

JIRA DC
Large number of users: 40k+
Start time increases with large number of nodes

Steps to Reproduce

Restart node in JDC
Check start-up time

Expected Results

Node starts within reasonable time ~4 minutes

Actual Results

Node starts within 10+ minutes
You can see that from logs:

2016-12-09 21:59:05,441 localhost-startStop-1 INFO      [atlassian.jira.startup.JiraStartupLogger] 
JIRA starting...

2016-12-09 22:01:21,773 localhost-startStop-1 INFO      [jira.upgrade.util.BuildNumberDao] Setting current version to 6.4.14
...
2016-12-09 22:05:37,982 localhost-startStop-1 INFO      [atlassian.jira.scheduler.JiraSchedulerLauncher] Starting the JIRA Scheduler....
2016-12-09 22:05:38,040 localhost-startStop-1 INFO      [atlassian.jira.scheduler.JiraSchedulerLauncher] JIRA Scheduler started.
2016-12-09 22:10:22,747 localhost-startStop-1 INFO      [atlassian.jira.startup.LauncherContextListener] Memory Usage:
Dec 09, 2016 10:10:22 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-bio-8080"]

Notes

Thread dump taken during start-up shows following.

Thread: "localhost-startStop-1" is blocked waiting for user cache population:

"localhost-startStop-1" #17 daemon prio=5 os_prio=0 tid=0x00007f208c002000 nid=0x675d waiting on condition [0x00007f20d84db000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000069a63db08> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
	at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50)
	at net.sf.ehcache.constructs.blocking.BlockingCache.acquiredLockForKey(BlockingCache.java:196)
	at net.sf.ehcache.constructs.blocking.BlockingCache.get(BlockingCache.java:158)
	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:75)
	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:96)
	at com.atlassian.jira.user.util.UserUtilImpl.getActiveUserCount(UserUtilImpl.java:721)
...
	at com.atlassian.upm.core.impl.JiraApplicationDescriptor.getActiveUserCount(JiraApplicationDescriptor.java:28)
	at com.atlassian.upm.license.internal.HostApplicationLicenses.getEditionAndRoleCountForEmbeddedLicense(HostApplicationLicenses.java:121)
	at com.atlassian.upm.license.internal.ProductLicenses.getEditionAndRoleCountForEmbeddedLicense(ProductLicenses.java:454)
...
	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.getEmbeddedLicenses(HostLicenseUpdatedHandlerImpl.java:415)
	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.getChangedEmbeddedPluginLicenses(HostLicenseUpdatedHandlerImpl.java:362)
	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.handleUpdatedHostLicense(HostLicenseUpdatedHandlerImpl.java:131)
	at com.atlassian.upm.mac.HostLicenseUpdatedEventListener.handleUpdatedHostLicense(HostLicenseUpdatedEventListener.java:108)
	at com.atlassian.upm.mac.HostLicenseUpdatedEventListener.onStart(HostLicenseUpdatedEventListener.java:67)
	at com.atlassian.upm.UpmLifecycleManager.ensureStarted(UpmLifecycleManager.java:44)
	at com.atlassian.upm.UpmLifecycleManager.onStart(UpmLifecycleManager.java:35)
...
	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyLifecycleAwareOfStart(DefaultLifecycleManager.java:100)
	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyOnStart(DefaultLifecycleManager.java:91)
	at com.atlassian.sal.jira.lifecycle.JiraLifecycleManager.notifyOnStart(JiraLifecycleManager.java:68)
	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.start(DefaultLifecycleManager.java:50)
	- locked <0x0000000698a3edc8> (a com.atlassian.sal.jira.lifecycle.JiraLifecycleManager)
	at com.atlassian.sal.jira.lifecycle.JiraLifecycleManager.onJiraStart(JiraLifecycleManager.java:62)
...
	at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrEnqueue(DatabaseConfigurationManagerImpl.java:345)
	at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrWhenDatabaseActivated(DatabaseConfigurationManagerImpl.java:215)
	at com.atlassian.jira.startup.DefaultJiraLauncher.postDbLaunch(DefaultJiraLauncher.java:124)
...
	at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4992)
...
	at java.lang.Thread.run(Thread.java:745)

Note that thread is waiting for com.atlassian.upm.core.impl.JiraApplicationDescriptor.getActiveUserCount, it could also wait for com.atlassian.jira.user.util.UserUtilImpl.getTotalUserCount

Other thread populates the cache and cache is synchronised with other nodes:

"UpmScheduler:thread-1" #141 prio=5 os_prio=0 tid=0x00007f20920fe800 nid=0x6981 runnable [0x00007f145c2e1000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
...
	at net.sf.ehcache.distribution.RMICacheManagerPeerProvider.lookupRemoteCachePeer(RMICacheManagerPeerProvider.java:127)
	at com.atlassian.jira.cluster.distribution.JiraCacheManagerPeerProvider.listRemoteCachePeers(JiraCacheManagerPeerProvider.java:93)
...
	at net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementPut(RegisteredEventListeners.java:180)
	at net.sf.ehcache.event.RegisteredEventListeners.notifyElementPut(RegisteredEventListeners.java:158)
...
	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.buildCacheIfRequiredUnderLock(UserOrGroupCache.java:171)
	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.buildCacheIfRequired(UserOrGroupCache.java:141)
...
	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.getCache(UserOrGroupCache.java:58)
	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.getCaseInsensitive(UserOrGroupCache.java:78)
...
	at com.atlassian.jira.user.util.UserUtilImpl.getGroupMembers(UserUtilImpl.java:1173)
...
	at com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:100)
	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:76)
	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:96)
	at com.atlassian.jira.user.util.UserUtilImpl.getActiveUserCount(UserUtilImpl.java:721)
...
	at com.atlassian.jira.DefaultHostContextAccessor.doInTransaction(DefaultHostContextAccessor.java:34)
...
	at com.atlassian.sal.core.transaction.HostContextTransactionTemplate.execute(HostContextTransactionTemplate.java:21)
...
	at com.atlassian.upm.schedule.UpmSchedulerImpl$1.run(UpmSchedulerImpl.java:165)
...
	at java.lang.Thread.run(Thread.java:745)

Workaround

None

is caused by

JRASERVER-64230 In Data Center users and groups caches are not preloaded on startup

Closed

is related to

JRASERVER-63556 Implement semi-async cache replication for EhCache in DC

Closed

relates to

JRASERVER-63587 JIRA Datacenter excessive EagerOfBizUserCache user cache population and replication on node start-up

Closed

RUM-1409 Loading...

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...

(3 mentioned in)

Loading...

Type: Bug
Resolution: Fixed
Priority: Low
Fix Version/s: 7.3.4, 7.2.9
Affects Version/s: 6.4.14, 7.2.8
Component/s: System Administration - General Configuration
Labels:

Introduced in Version:
6.04
Support reference count:
8
Symptom Severity:
Severity 2 - Major
UIS:
10
Bug Fix Policy:
View Atlassian Server bug fix policy

Summary

In JIRA Datacenter cluster with large number of user each time you restart a node it will take a long time to start-up to slow user cache population.

Environment

JIRA DC
Large number of users: 40k+
Start time increases with large number of nodes

Steps to Reproduce

Restart node in JDC
Check start-up time

Expected Results

Node starts within reasonable time ~4 minutes

Actual Results

Node starts within 10+ minutes
You can see that from logs:

2016-12-09 21:59:05,441 localhost-startStop-1 INFO      [atlassian.jira.startup.JiraStartupLogger] 
JIRA starting...

2016-12-09 22:01:21,773 localhost-startStop-1 INFO      [jira.upgrade.util.BuildNumberDao] Setting current version to 6.4.14
...
2016-12-09 22:05:37,982 localhost-startStop-1 INFO      [atlassian.jira.scheduler.JiraSchedulerLauncher] Starting the JIRA Scheduler....
2016-12-09 22:05:38,040 localhost-startStop-1 INFO      [atlassian.jira.scheduler.JiraSchedulerLauncher] JIRA Scheduler started.
2016-12-09 22:10:22,747 localhost-startStop-1 INFO      [atlassian.jira.startup.LauncherContextListener] Memory Usage:
Dec 09, 2016 10:10:22 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-bio-8080"]

Notes

Thread dump taken during start-up shows following.

Thread: "localhost-startStop-1" is blocked waiting for user cache population:

"localhost-startStop-1" #17 daemon prio=5 os_prio=0 tid=0x00007f208c002000 nid=0x675d waiting on condition [0x00007f20d84db000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000069a63db08> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
	at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50)
	at net.sf.ehcache.constructs.blocking.BlockingCache.acquiredLockForKey(BlockingCache.java:196)
	at net.sf.ehcache.constructs.blocking.BlockingCache.get(BlockingCache.java:158)
	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:75)
	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:96)
	at com.atlassian.jira.user.util.UserUtilImpl.getActiveUserCount(UserUtilImpl.java:721)
...
	at com.atlassian.upm.core.impl.JiraApplicationDescriptor.getActiveUserCount(JiraApplicationDescriptor.java:28)
	at com.atlassian.upm.license.internal.HostApplicationLicenses.getEditionAndRoleCountForEmbeddedLicense(HostApplicationLicenses.java:121)
	at com.atlassian.upm.license.internal.ProductLicenses.getEditionAndRoleCountForEmbeddedLicense(ProductLicenses.java:454)
...
	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.getEmbeddedLicenses(HostLicenseUpdatedHandlerImpl.java:415)
	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.getChangedEmbeddedPluginLicenses(HostLicenseUpdatedHandlerImpl.java:362)
	at com.atlassian.upm.mac.HostLicenseUpdatedHandlerImpl.handleUpdatedHostLicense(HostLicenseUpdatedHandlerImpl.java:131)
	at com.atlassian.upm.mac.HostLicenseUpdatedEventListener.handleUpdatedHostLicense(HostLicenseUpdatedEventListener.java:108)
	at com.atlassian.upm.mac.HostLicenseUpdatedEventListener.onStart(HostLicenseUpdatedEventListener.java:67)
	at com.atlassian.upm.UpmLifecycleManager.ensureStarted(UpmLifecycleManager.java:44)
	at com.atlassian.upm.UpmLifecycleManager.onStart(UpmLifecycleManager.java:35)
...
	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyLifecycleAwareOfStart(DefaultLifecycleManager.java:100)
	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyOnStart(DefaultLifecycleManager.java:91)
	at com.atlassian.sal.jira.lifecycle.JiraLifecycleManager.notifyOnStart(JiraLifecycleManager.java:68)
	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.start(DefaultLifecycleManager.java:50)
	- locked <0x0000000698a3edc8> (a com.atlassian.sal.jira.lifecycle.JiraLifecycleManager)
	at com.atlassian.sal.jira.lifecycle.JiraLifecycleManager.onJiraStart(JiraLifecycleManager.java:62)
...
	at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrEnqueue(DatabaseConfigurationManagerImpl.java:345)
	at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrWhenDatabaseActivated(DatabaseConfigurationManagerImpl.java:215)
	at com.atlassian.jira.startup.DefaultJiraLauncher.postDbLaunch(DefaultJiraLauncher.java:124)
...
	at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4992)
...
	at java.lang.Thread.run(Thread.java:745)

Note that thread is waiting for com.atlassian.upm.core.impl.JiraApplicationDescriptor.getActiveUserCount, it could also wait for com.atlassian.jira.user.util.UserUtilImpl.getTotalUserCount

Other thread populates the cache and cache is synchronised with other nodes:

"UpmScheduler:thread-1" #141 prio=5 os_prio=0 tid=0x00007f20920fe800 nid=0x6981 runnable [0x00007f145c2e1000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
...
	at net.sf.ehcache.distribution.RMICacheManagerPeerProvider.lookupRemoteCachePeer(RMICacheManagerPeerProvider.java:127)
	at com.atlassian.jira.cluster.distribution.JiraCacheManagerPeerProvider.listRemoteCachePeers(JiraCacheManagerPeerProvider.java:93)
...
	at net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementPut(RegisteredEventListeners.java:180)
	at net.sf.ehcache.event.RegisteredEventListeners.notifyElementPut(RegisteredEventListeners.java:158)
...
	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.buildCacheIfRequiredUnderLock(UserOrGroupCache.java:171)
	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.buildCacheIfRequired(UserOrGroupCache.java:141)
...
	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.getCache(UserOrGroupCache.java:58)
	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.getCaseInsensitive(UserOrGroupCache.java:78)
...
	at com.atlassian.jira.user.util.UserUtilImpl.getGroupMembers(UserUtilImpl.java:1173)
...
	at com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:100)
	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:76)
	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:96)
	at com.atlassian.jira.user.util.UserUtilImpl.getActiveUserCount(UserUtilImpl.java:721)
...
	at com.atlassian.jira.DefaultHostContextAccessor.doInTransaction(DefaultHostContextAccessor.java:34)
...
	at com.atlassian.sal.core.transaction.HostContextTransactionTemplate.execute(HostContextTransactionTemplate.java:21)
...
	at com.atlassian.upm.schedule.UpmSchedulerImpl$1.run(UpmSchedulerImpl.java:165)
...
	at java.lang.Thread.run(Thread.java:745)

Workaround

None

is caused by

JRASERVER-64230 In Data Center users and groups caches are not preloaded on startup

Closed

is related to

JRASERVER-63556 Implement semi-async cache replication for EhCache in DC

Closed

relates to

JRASERVER-63587 JIRA Datacenter excessive EagerOfBizUserCache user cache population and replication on node start-up

Closed

RUM-1409 Loading...

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...

(3 mentioned in)

Assignee:: Unassigned

Reporter:: Andriy Yakovlev [Atlassian]

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 13/Dec/2016 1:30 PM

Updated:: 28/Jul/2023 2:42 PM

Resolved:: 04/Apr/2017 1:26 PM

Details

Description

Summary

Environment

Steps to Reproduce

Expected Results

Actual Results

Notes

Workaround

Attachments

Issue Links

Forms

Activity

Details

Description

Summary

Environment

Steps to Reproduce

Expected Results

Actual Results

Notes

Workaround

Attachments

Issue Links

Forms

Activity

People

Dates

People

Dates