-
Bug
-
Resolution: Fixed
-
Highest
-
7.0.0
-
7
-
6
-
Severity 2 - Major
-
34
-
Summary
There's a regression in JIRA in the way how EagerOfBizUserCache is created on each new node and replicated to other nodes.
Currently, it's not warmed up so that the first thingy which deals with users (such as NodeReindexServiceThread reindexing issues) will start populating the EagerOfBizUserCache, cluster lock will be taken (and other locks, in the case of reindexing issues, DefaultIndexManager#indexLocks). It can take a while to complete, hence other threads may reach timeout when acquiring a lock (eg. ClusterMessageHandlerServiceThread at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup).
The previous behaviour was that the cache was populated before JIRA is fully-started.
Environment
- JIRA 7+ Datacenter
- >=3 nodes
Steps to Reproduce
- have a lot of users >=30000
- have issues to reindex
- start new node
Expected Results
EagerOfBizUserCache is populated during JIRA startup and it's ready before JIRA is fully-started.
Actual Results
EagerOfBizUserCache is populated after JIRA startup and it might block or slow down other operations, for example leads to index inconsistencies.
Notes
Example of the problem related to index lock timeout
ClusterMessageHandlerServiceThread is waiting for DefaultIndexManager lock:
ClusterMessageHandlerServiceThread:thread-1" #85 prio=5 tid=0x00007fca257e9000 nid=0x5d32 waiting on condition [0x00007fc9a6df7000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000005c495cc08> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) ... at com.atlassian.jira.issue.index.DefaultIndexManager.obtain(DefaultIndexManager.java:785) at com.atlassian.jira.issue.index.DefaultIndexManager.access$600(DefaultIndexManager.java:88) at com.atlassian.jira.issue.index.DefaultIndexManager$IndexLock.tryLock(DefaultIndexManager.java:1118) at com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock(DefaultIndexManager.java:354) ... at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:118) ...
but NodeReindexServiceThread:thread-1 does com.atlassian.jira.issue.index.DefaultIssueIndexer.perform and then sync user cache with other nodes and holding the lock DefaultIndexManager:
"NodeReindexServiceThread:thread-1" #84 prio=5 tid=0x00007fca25a4c000 nid=0x5d31 runnable [0x00007fc9a6ef6000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ... at com.atlassian.jira.cluster.distribution.JiraCacheManagerPeerProvider.listRemoteCachePeers(JiraCacheManagerPeerProvider.java:79) at net.sf.ehcache.distribution.RMISynchronousCacheReplicator.listRemoteCachePeers(RMISynchronousCacheReplicator.java:335) at net.sf.ehcache.distribution.RMISynchronousCacheReplicator.replicatePutNotification(RMISynchronousCacheReplicator.java:145) at com.atlassian.cache.ehcache.replication.rmi.RMISynchronousCacheReplicator.replicateViaCopy(RMISynchronousCacheReplicator.java:60) at com.atlassian.cache.ehcache.replication.rmi.RMISynchronousCacheReplicator.notifyElementPut(RMISynchronousCacheReplicator.java:48) at net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementPut(RegisteredEventListeners.java:192) at net.sf.ehcache.event.RegisteredEventListeners.notifyElementPut(RegisteredEventListeners.java:170) ... at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.buildCacheIfRequired(UserOrGroupCache.java:118) ... at com.atlassian.crowd.directory.AbstractInternalDirectory.findUserByName(AbstractInternalDirectory.java:173) at com.atlassian.crowd.directory.AbstractInternalDirectory.findUserByName(AbstractInternalDirectory.java:64) ... at com.atlassian.jira.user.util.DefaultUserManager.getUserByName(DefaultUserManager.java:258) ... at com.atlassian.jira.issue.index.DefaultIssueIndexer.perform(DefaultIssueIndexer.java:282) at com.atlassian.jira.issue.index.DefaultIssueIndexer.reindexIssues(DefaultIssueIndexer.java:162) at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexIssues(DefaultIndexManager.java:541) at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexIssueObjects(DefaultIndexManager.java:438) ... at com.atlassian.jira.index.ha.DefaultNodeReindexService.updateIssueIndex(DefaultNodeReindexService.java:404) at com.atlassian.jira.index.ha.DefaultNodeReindexService.updateAffectedIndexes(DefaultNodeReindexService.java:298) at com.atlassian.jira.index.ha.DefaultNodeReindexService.reIndex(DefaultNodeReindexService.java:252) ...
Workaround
When the actual problem is related to reaching index lock timeout, it can be increased (application property jira.index.lock.waittime).
- is caused by
-
JRASERVER-64230 In Data Center users and groups caches are not preloaded on startup
- Closed
- is related to
-
JRASERVER-63515 JIRA Datacenter node with large number of user have long start-up time due to slow cache population
- Closed
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
- was cloned as
-
RUN-1409 Loading...