Uploaded image for project: 'Jira Software Data Center'
  1. Jira Software Data Center
  2. JSWSERVER-20618

Performance of Jira can degrade significantly due to slow sprint cache population

    XMLWordPrintable

Details

    • 7.02
    • 25
    • Severity 2 - Major
    • 215
    • Hide

      Hi Team, we're happy to announce that this will be fixed in 9.1.0 release. 

      We are introducing two new caches for Sprints to ensure they retrieval is performant in various scenarios. The new configuration will ensure a faster cache population, allowing Jira to perform such actions as opening an Agile board, assigning issues to a sprint much faster.

      This will enable your teams to work simultaneously, without being blocked by slow board loading time or facing timeout requests.

      Cheers

      Andrzej Kotas

      Jira DC PM

      Show
      Hi Team, we're happy to announce that this will be fixed in  9.1.0  release.  We are introducing two new caches for Sprints to ensure they retrieval is performant in various scenarios. The new configuration will ensure a faster cache population, allowing Jira to perform such actions as opening an Agile board, assigning issues to a sprint much faster. This will enable your teams to work simultaneously, without being blocked by slow board loading time or facing timeout requests. Cheers Andrzej Kotas Jira DC PM

    Description

      Issue Summary

      Jira inefficiently populates Sprint cache (com.atlassian.greenhopper.service.sprint.SprintManagerImpl.sprintCache) due to loading all elements in one go.
      The sprint cache population may become slow depending on the number of sprints that exist on the instance. In addition to that, some external factors such as the latency between the JVM and the DB server can also contribute to this behavior.

      In some cases 15k rounds trips between the application and the database (see details below) will be required for the entire data set to be retrieved by the thread populating the cache - this translates to a total of 15 seconds of waiting time, if the latency between Jira and the DB server is at 1ms. An increased latency between the app server and the DB server will increase the time it takes for the full results of the query to reach the application exponentially.

      Steps to Reproduce

      1. Set up Jira (8.5.4, 7.6.13)
      2. Create 150k sprints.
      3. Complete any sprint (this triggers a sprint cache flush)
        • Same fore createSprint, deleteSprint, updateSprint

      Expected Results

      The sprint cache will be populated again very quickly.

      Actual Results

      It will take 20+ seconds or even minutes for the sprint cache to be fully populated. During this period, any other threads that require sprint information will be parked, as the sprint cache is not available to them at this time. Once the thread currently populating the sprint cache finishes retrieving the entire data set, other threads will progress normally.

      The thread populating the sprint will have a similar stack:

      sun.nio.ch.FileDispatcherImpl.read0(Native Method)
      sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
      sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
      sun.nio.ch.IOUtil.read(IOUtil.java:197)
      sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
      oracle.net.nt.TimeoutSocketChannel.read(TimeoutSocketChannel.java:144)
      oracle.net.ns.NIOHeader.readHeaderBuffer(NIOHeader.java:82)
      oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:139)
      oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:101)
      oracle.net.ns.NIONSDataChannel.readDataFromSocketChannel(NIONSDataChannel.java:80)
      oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:98)
      oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
      oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485)
      oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
      oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:612)
      oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:226)
      ...
      com.sun.proxy.$Proxy3126.stream(Unknown Source)
      com.atlassian.greenhopper.service.sprint.SprintDao.loadAll(SprintDao.java:19)
      com.atlassian.greenhopper.service.sprint.SprintManagerImpl$SprintCacheSupplier.get(SprintManagerImpl.java:432)
      com.atlassian.greenhopper.service.sprint.SprintManagerImpl$SprintCacheSupplier.get(SprintManagerImpl.java:425)
      com.atlassian.cache.compat.delegate.DelegatingSupplier.get(DelegatingSupplier.java:22)
      com.atlassian.cache.ehcache.EhCacheManager$SupplierAdapter.load(EhCacheManager.java:260)
      ...
      com.atlassian.jira.cluster.cache.ehcache.BlockingParallelCacheReplicator.runDeferred(BlockingParallelCacheReplicator.java:172)
      com.atlassian.jira.cache.DeferredReplicationCachedReference.get(DeferredReplicationCachedReference.java:28)
      com.atlassian.cache.compat.delegate.DelegatingCachedReference.get(DelegatingCachedReference.java:22)
      com.atlassian.greenhopper.service.sprint.SprintManagerImpl.getAllSprints(SprintManagerImpl.java:124)
      com.atlassian.greenhopper.service.sprint.SprintManagerImpl.getSprints(SprintManagerImpl.java:132)
      

      While threads that are trying to access the sprint cache will be parked at the following stack:

      at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x0000000206cdbf10> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
      	at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50)
      	at net.sf.ehcache.constructs.blocking.BlockingCache.acquiredLockForKey(BlockingCache.java:196)
      	at net.sf.ehcache.constructs.blocking.BlockingCache.get(BlockingCache.java:158)
      	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:79)
      	at com.atlassian.cache.ehcache.DelegatingCachedReference.get(DelegatingCachedReference.java:73)
      	at com.atlassian.jira.cache.DeferredReplicationCachedReference$$Lambda$265/1538681607.get(Unknown Source)
      	at com.atlassian.jira.cluster.cache.ehcache.BlockingParallelCacheReplicator.runDeferred(BlockingParallelCacheReplicator.java:172)
      	at com.atlassian.jira.cache.DeferredReplicationCachedReference.get(DeferredReplicationCachedReference.java:28)
      	at com.atlassian.cache.compat.delegate.DelegatingCachedReference.get(DelegatingCachedReference.java:22)
      	at com.atlassian.greenhopper.service.sprint.SprintManagerImpl.getSprint(SprintManagerImpl.java:110)
      	...
      

      Notes

      • Problem get amplified for Oracle DB. The reason for this is that by default, the Oracle JDBC driver fetch results of a SQL query streaming it from DB by 10 rows at a time. In a situation where a table has 150k records, 15k round trips will be required for the complete information of the table to reach the application. When the sprint cache gets flushed, a full table scan of the AO_60DB71_SPRINT table is performed to populate the sprint cache.
      • Using an instance where 150k sprints exist as an example in Oracle DB , 15k rounds trips between the application and the database will be required for the entire data set to be retrieved by the thread waiting on this information - this translates to a total of 15 seconds if the latency between Jira and the DB server is at 1ms. An increased latency between the app server and the DB server will increase the time it takes for the full results of the query to reach the application exponentially.

      Workaround

      Attachments

        Issue Links

          Activity

            People

              klopacinski Karol Lopacinski
              lbugs Lucas Bugs
              Votes:
              8 Vote for this issue
              Watchers:
              27 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: