Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-63587

JIRA Datacenter excessive EagerOfBizUserCache user cache population and replication on node start-up

XMLWordPrintable

      Summary

      There's a regression in JIRA in the way how EagerOfBizUserCache is created on each new node and replicated to other nodes.

      Currently, it's not warmed up so that the first thingy which deals with users (such as NodeReindexServiceThread reindexing issues) will start populating the EagerOfBizUserCache, cluster lock will be taken (and other locks, in the case of reindexing issues, DefaultIndexManager#indexLocks). It can take a while to complete, hence other threads may reach timeout when acquiring a lock (eg. ClusterMessageHandlerServiceThread at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup).

      The previous behaviour was that the cache was populated before JIRA is fully-started.
       

      Environment

      • JIRA 7+ Datacenter
      • >=3 nodes

      Steps to Reproduce

      1. have a lot of users >=30000
      2. have issues to reindex
      3. start new node

      Expected Results

      EagerOfBizUserCache is populated during JIRA startup and it's ready before JIRA is fully-started.

      Actual Results

      EagerOfBizUserCache is populated after JIRA startup and it might block or slow down other operations, for example leads to index inconsistencies.

      Notes

      Example of the problem related to index lock timeout

      ClusterMessageHandlerServiceThread is waiting for DefaultIndexManager lock:

      ClusterMessageHandlerServiceThread:thread-1" #85 prio=5 tid=0x00007fca257e9000 nid=0x5d32 waiting on condition [0x00007fc9a6df7000]
         java.lang.Thread.State: TIMED_WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00000005c495cc08> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      ...
      	at com.atlassian.jira.issue.index.DefaultIndexManager.obtain(DefaultIndexManager.java:785)
      	at com.atlassian.jira.issue.index.DefaultIndexManager.access$600(DefaultIndexManager.java:88)
      	at com.atlassian.jira.issue.index.DefaultIndexManager$IndexLock.tryLock(DefaultIndexManager.java:1118)
      	at com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock(DefaultIndexManager.java:354)
      ...
      	at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:118)
      ...
      

      but NodeReindexServiceThread:thread-1 does com.atlassian.jira.issue.index.DefaultIssueIndexer.perform and then sync user cache with other nodes and holding the lock DefaultIndexManager:

      "NodeReindexServiceThread:thread-1" #84 prio=5 tid=0x00007fca25a4c000 nid=0x5d31 runnable [0x00007fc9a6ef6000]
         java.lang.Thread.State: RUNNABLE
      	at java.net.SocketInputStream.socketRead0(Native Method)
      	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
      ...
      	at com.atlassian.jira.cluster.distribution.JiraCacheManagerPeerProvider.listRemoteCachePeers(JiraCacheManagerPeerProvider.java:79)
      	at net.sf.ehcache.distribution.RMISynchronousCacheReplicator.listRemoteCachePeers(RMISynchronousCacheReplicator.java:335)
      	at net.sf.ehcache.distribution.RMISynchronousCacheReplicator.replicatePutNotification(RMISynchronousCacheReplicator.java:145)
      	at com.atlassian.cache.ehcache.replication.rmi.RMISynchronousCacheReplicator.replicateViaCopy(RMISynchronousCacheReplicator.java:60)
      	at com.atlassian.cache.ehcache.replication.rmi.RMISynchronousCacheReplicator.notifyElementPut(RMISynchronousCacheReplicator.java:48)
      	at net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementPut(RegisteredEventListeners.java:192)
      	at net.sf.ehcache.event.RegisteredEventListeners.notifyElementPut(RegisteredEventListeners.java:170)
      ...
      	at com.atlassian.jira.crowd.embedded.ofbiz.UserOrGroupCache.buildCacheIfRequired(UserOrGroupCache.java:118)
      ...
      	at com.atlassian.crowd.directory.AbstractInternalDirectory.findUserByName(AbstractInternalDirectory.java:173)
      	at com.atlassian.crowd.directory.AbstractInternalDirectory.findUserByName(AbstractInternalDirectory.java:64)
      ...
      	at com.atlassian.jira.user.util.DefaultUserManager.getUserByName(DefaultUserManager.java:258)
      ...
      	at com.atlassian.jira.issue.index.DefaultIssueIndexer.perform(DefaultIssueIndexer.java:282)
      	at com.atlassian.jira.issue.index.DefaultIssueIndexer.reindexIssues(DefaultIssueIndexer.java:162)
      	at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexIssues(DefaultIndexManager.java:541)
      	at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexIssueObjects(DefaultIndexManager.java:438)
      ...
      	at com.atlassian.jira.index.ha.DefaultNodeReindexService.updateIssueIndex(DefaultNodeReindexService.java:404)
      	at com.atlassian.jira.index.ha.DefaultNodeReindexService.updateAffectedIndexes(DefaultNodeReindexService.java:298)
      	at com.atlassian.jira.index.ha.DefaultNodeReindexService.reIndex(DefaultNodeReindexService.java:252)
      ...
      

       

      Workaround

      When the actual problem is related to reaching index lock timeout, it can be increased (application property jira.index.lock.waittime).
       

              Unassigned Unassigned
              mrzymski Maciej Rzymski
              Votes:
              4 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: