[JSWSERVER-20618] Performance of Jira can degrade significantly due to slow sprint cache population

Type: Bug
Resolution: Fixed
Priority: High (View bug fix roadmap)
Fix Version/s: 9.1.0
Affects Version/s: 7.2.15, 7.6.16, 7.13.14, 8.5.5, 8.20.12
Component/s: Sprint
Labels:

Introduced in Version:
7.02
Support reference count:
25
Symptom Severity:
Severity 2 - Major
UIS:
215
Bug Fix Policy:
View Atlassian Server bug fix policy
Current Status:

Hide

Hi Team, we're happy to announce that this will be fixed in 9.1.0 release.

We are introducing two new caches for Sprints to ensure they retrieval is performant in various scenarios. The new configuration will ensure a faster cache population, allowing Jira to perform such actions as opening an Agile board, assigning issues to a sprint much faster.

This will enable your teams to work simultaneously, without being blocked by slow board loading time or facing timeout requests.

Cheers

Andrzej Kotas

Jira DC PM

Show
Hi Team, we're happy to announce that this will be fixed in 9.1.0 release. We are introducing two new caches for Sprints to ensure they retrieval is performant in various scenarios. The new configuration will ensure a faster cache population, allowing Jira to perform such actions as opening an Agile board, assigning issues to a sprint much faster. This will enable your teams to work simultaneously, without being blocked by slow board loading time or facing timeout requests. Cheers Andrzej Kotas Jira DC PM

Issue Summary

Jira inefficiently populates Sprint cache (com.atlassian.greenhopper.service.sprint.SprintManagerImpl.sprintCache) due to loading all elements in one go.
The sprint cache population may become slow depending on the number of sprints that exist on the instance. In addition to that, some external factors such as the latency between the JVM and the DB server can also contribute to this behavior.

In some cases 15k rounds trips between the application and the database (see details below) will be required for the entire data set to be retrieved by the thread populating the cache - this translates to a total of 15 seconds of waiting time, if the latency between Jira and the DB server is at 1ms. An increased latency between the app server and the DB server will increase the time it takes for the full results of the query to reach the application exponentially.

Steps to Reproduce

Set up Jira (8.5.4, 7.6.13)
Create 150k sprints.
Complete any sprint (this triggers a sprint cache flush)
- Same fore createSprint, deleteSprint, updateSprint

Expected Results

The sprint cache will be populated again very quickly.

Actual Results

It will take 20+ seconds or even minutes for the sprint cache to be fully populated. During this period, any other threads that require sprint information will be parked, as the sprint cache is not available to them at this time. Once the thread currently populating the sprint cache finishes retrieving the entire data set, other threads will progress normally.

The thread populating the sprint will have a similar stack:

sun.nio.ch.FileDispatcherImpl.read0(Native Method)
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
sun.nio.ch.IOUtil.read(IOUtil.java:197)
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
oracle.net.nt.TimeoutSocketChannel.read(TimeoutSocketChannel.java:144)
oracle.net.ns.NIOHeader.readHeaderBuffer(NIOHeader.java:82)
oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:139)
oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:101)
oracle.net.ns.NIONSDataChannel.readDataFromSocketChannel(NIONSDataChannel.java:80)
oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:98)
oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485)
oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:612)
oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:226)
...
com.sun.proxy.$Proxy3126.stream(Unknown Source)
com.atlassian.greenhopper.service.sprint.SprintDao.loadAll(SprintDao.java:19)
com.atlassian.greenhopper.service.sprint.SprintManagerImpl$SprintCacheSupplier.get(SprintManagerImpl.java:432)
com.atlassian.greenhopper.service.sprint.SprintManagerImpl$SprintCacheSupplier.get(SprintManagerImpl.java:425)
com.atlassian.cache.compat.delegate.DelegatingSupplier.get(DelegatingSupplier.java:22)
com.atlassian.cache.ehcache.EhCacheManager$SupplierAdapter.load(EhCacheManager.java:260)
...
com.atlassian.jira.cluster.cache.ehcache.BlockingParallelCacheReplicator.runDeferred(BlockingParallelCacheReplicator.java:172)
com.atlassian.jira.cache.DeferredReplicationCachedReference.get(DeferredReplicationCachedReference.java:28)
com.atlassian.cache.compat.delegate.DelegatingCachedReference.get(DelegatingCachedReference.java:22)
com.atlassian.greenhopper.service.sprint.SprintManagerImpl.getAllSprints(SprintManagerImpl.java:124)
com.atlassian.greenhopper.service.sprint.SprintManagerImpl.getSprints(SprintManagerImpl.java:132)

While threads that are trying to access the sprint cache will be parked at the following stack:

at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000206cdbf10> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
	at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50)
	at net.sf.ehcache.constructs.blocking.BlockingCache.acquiredLockForKey(BlockingCache.java:196)
	at net.sf.ehcache.constructs.blocking.BlockingCache.get(BlockingCache.java:158)
	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:79)
	at com.atlassian.cache.ehcache.DelegatingCachedReference.get(DelegatingCachedReference.java:73)
	at com.atlassian.jira.cache.DeferredReplicationCachedReference$$Lambda$265/1538681607.get(Unknown Source)
	at com.atlassian.jira.cluster.cache.ehcache.BlockingParallelCacheReplicator.runDeferred(BlockingParallelCacheReplicator.java:172)
	at com.atlassian.jira.cache.DeferredReplicationCachedReference.get(DeferredReplicationCachedReference.java:28)
	at com.atlassian.cache.compat.delegate.DelegatingCachedReference.get(DelegatingCachedReference.java:22)
	at com.atlassian.greenhopper.service.sprint.SprintManagerImpl.getSprint(SprintManagerImpl.java:110)
	...

Notes

Problem get amplified for Oracle DB. The reason for this is that by default, the Oracle JDBC driver fetch results of a SQL query streaming it from DB by 10 rows at a time. In a situation where a table has 150k records, 15k round trips will be required for the complete information of the table to reach the application. When the sprint cache gets flushed, a full table scan of the AO_60DB71_SPRINT table is performed to populate the sprint cache.
Using an instance where 150k sprints exist as an example in Oracle DB , 15k rounds trips between the application and the database will be required for the entire data set to be retrieved by the thread waiting on this information - this translates to a total of 15 seconds if the latency between Jira and the DB server is at 1ms. An increased latency between the app server and the DB server will increase the time it takes for the full results of the query to reach the application exponentially.

Workaround

For Oracle DB, Increase the number of results that can be fetched at a single time by the Oracle JDBC driver:
https://docs.oracle.com/cd/E18283_01/java.112/e16548/resltset.htm#i1023619
This can be done by adding the following line to the dbconfig.xml file, inside the <jdbc-datasource> section:
```
<connection-properties>defaultRowPrefetch=XXX</connection-properties>
```
XXX translates to the numerical value of results we'll be fetching. The default value is 10 - we have seen an increase to 200 being very effective while adding very little overhead to the JVM's memory utilization.
See also a related KB Using the default Oracle JDBC fetch size may lead to performance issues in Jira

is related to

JRASERVER-68168 Make FieldLayoutCache population more efficient in Jira

Gathering Interest

FLASH-3226 You do not have permission to view this issue

relates to

JRASERVER-71740 Increase the default value of defaultRowPrefetch for Oracle

Gathering Interest

FLASH-2410 You do not have permission to view this issue

mentioned in: Preparing for Jira 9.1; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

resolves: ACE-4499 Loading...

(39 mentioned in, 1 resolves)

Form Name

Sławomir Zaraziński made changes - 01/Aug/2024 7:38 PM

Remote Link

New: This issue links to "Page (Confluence)" [ 933246 ]

shrivatsaa (Inactive) made changes - 09/Apr/2024 6:54 AM

Remote Link

New: This issue links to "Page (Confluence)" [ 891769 ]

Michael Silverman made changes - 24/Jan/2024 3:35 PM

Remote Link

New: This issue links to "ACE-4499 (Atlassian Support System)" [ 860237 ]

Konde made changes - 17/Oct/2023 5:38 AM

Affects Version/s

New: 8.20.12 [ 101713 ]

Filipi Lima made changes - 06/Oct/2023 2:42 PM

Remote Link

New: This issue links to "Page (Confluence)" [ 821991 ]

Rodrigo Baldasso made changes - 31/May/2023 9:54 PM

Remote Link

Original: This issue links to "FLASH-2410 (Bulldog)" [ 618079 ]

New: This issue links to "FLASH-2410 (JIRA Server (Bulldog))" [ 618079 ]

Michał Gozdera made changes - 10/May/2023 11:02 AM

Remote Link

New: This issue links to "Page (Confluence)" [ 760546 ]

Daniel Ponzio made changes - 11/Apr/2023 3:31 PM

Remote Link

New: This issue links to "Page (Confluence)" [ 748860 ]

Michal Sierzputowski made changes - 24/Feb/2023 2:15 PM

Remote Link

New: This issue links to "Page (Confluence)" [ 734961 ]

Thiago Masutti made changes - 24/Jan/2023 7:46 PM

Remote Link

New: This issue links to "Page (Confluence)" [ 724656 ]

Assignee:: Karol Lopacinski

Reporter:: Lucas Bugs

Affected customers:: 8 This affects my team

Watchers:: 27 Start watching this issue

Created:: 01/Jul/2020 9:32 PM

Updated:: 01/Aug/2024 7:38 PM

Resolved:: 21/Jul/2022 8:39 AM

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Notes

Workaround

Attachments

Issue Links

Forms

Activity

People

Dates

Backbone Issue Sync