Uploaded image for project: 'Jira Server and Data Center'
  1. Jira Server and Data Center
  2. JRASERVER-69652

Asynchronous cache replication can cause extra overhead in case of large number cache updates and many stale nodes

    XMLWordPrintable

Details

    Description

      Summary

      Asynchronous cache replication can cause extra overhead in case of large number cache updates and many stale nodes.

      Environment

      • Jira DC
      • A large number of stale nodes (see JRASERVER-42916)
      • Plugin (code) generating a large number of cache update events, eg reaching 2000 messages/min.

      Steps to Reproduce

      1. Open a URL which produces the cache update event while computing the business logic
        • Eg. #* /rest/servicedesk/1/<PRJ>/webfragments/sections/sd-queues-nav,servicedesk.agent.queues,servicedesk.agent.queues.ungrouped
      2. Measure response time and number of replication events

      Expected Results

      Performance doesn't degrade with a number of old nodes.

      Actual Results

      Performance degrades with a number of old stale nodes.

      • While taking thread dumps you can see a lot of threads busy in the following stack:
          java.lang.Thread.State: RUNNABLE
        	at java.io.RandomAccessFile.writeBytes(Native Method)
        	at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
        	at com.squareup.tape.QueueFile.writeHeader(QueueFile.java:184)
        	at com.squareup.tape.QueueFile.add(QueueFile.java:321)
        	- locked <0x00000003ce9b43e0> (a com.squareup.tape.QueueFile)
        	at com.squareup.tape.FileObjectQueue.add(FileObjectQueue.java:46)
        	at com.atlassian.jira.cluster.distribution.localq.tape.TapeLocalQCacheOpQueue.add(TapeLocalQCacheOpQueue.java:151)
        	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpQueueWithStats.add(LocalQCacheOpQueueWithStats.java:115)
        	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheManager.addToQueue(LocalQCacheManager.java:370)
        	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheManager.addToAllQueues(LocalQCacheManager.java:354)
        	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheReplicator.replicateToQueue(LocalQCacheReplicator.java:85)
        	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheReplicator.replicatePutNotification(LocalQCacheReplicator.java:65)
        	at com.atlassian.jira.cluster.cache.ehcache.AbstractJiraCacheReplicator.notifyElementUpdated(AbstractJiraCacheReplicator.java:123)
        	at net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementUpdated(RegisteredEventListeners.java:228)
        	at net.sf.ehcache.event.RegisteredEventListeners.notifyElementUpdated(RegisteredEventListeners.java:206)
        ...
        
      • From client's case, we saw 15 - 20% of all threads doing replicateToQueue

      Notes

      None

      Workaround

      Clean-up old node data manually, see JRASERVER-42916

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Votes:
              6 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated: