[JSWSERVER-20618] Performance of Jira can degrade significantly due to slow sprint cache population

Type: Bug
Resolution: Fixed
Priority: High (View bug fix roadmap)
Fix Version/s: 9.1.0
Affects Version/s: 7.2.15, 7.6.16, 7.13.14, 8.5.5, 8.20.12
Component/s: Sprint
Labels:

Introduced in Version:
7.02
Support reference count:
25
Symptom Severity:
Severity 2 - Major
UIS:
215
Bug Fix Policy:
View Atlassian Server bug fix policy
Current Status:

Hide

Hi Team, we're happy to announce that this will be fixed in 9.1.0 release.

We are introducing two new caches for Sprints to ensure they retrieval is performant in various scenarios. The new configuration will ensure a faster cache population, allowing Jira to perform such actions as opening an Agile board, assigning issues to a sprint much faster.

This will enable your teams to work simultaneously, without being blocked by slow board loading time or facing timeout requests.

Cheers

Andrzej Kotas

Jira DC PM

Show
Hi Team, we're happy to announce that this will be fixed in 9.1.0 release. We are introducing two new caches for Sprints to ensure they retrieval is performant in various scenarios. The new configuration will ensure a faster cache population, allowing Jira to perform such actions as opening an Agile board, assigning issues to a sprint much faster. This will enable your teams to work simultaneously, without being blocked by slow board loading time or facing timeout requests. Cheers Andrzej Kotas Jira DC PM

Issue Summary

Jira inefficiently populates Sprint cache (com.atlassian.greenhopper.service.sprint.SprintManagerImpl.sprintCache) due to loading all elements in one go.
The sprint cache population may become slow depending on the number of sprints that exist on the instance. In addition to that, some external factors such as the latency between the JVM and the DB server can also contribute to this behavior.

In some cases 15k rounds trips between the application and the database (see details below) will be required for the entire data set to be retrieved by the thread populating the cache - this translates to a total of 15 seconds of waiting time, if the latency between Jira and the DB server is at 1ms. An increased latency between the app server and the DB server will increase the time it takes for the full results of the query to reach the application exponentially.

Steps to Reproduce

Set up Jira (8.5.4, 7.6.13)
Create 150k sprints.
Complete any sprint (this triggers a sprint cache flush)
- Same fore createSprint, deleteSprint, updateSprint

Expected Results

The sprint cache will be populated again very quickly.

Actual Results

It will take 20+ seconds or even minutes for the sprint cache to be fully populated. During this period, any other threads that require sprint information will be parked, as the sprint cache is not available to them at this time. Once the thread currently populating the sprint cache finishes retrieving the entire data set, other threads will progress normally.

The thread populating the sprint will have a similar stack:

sun.nio.ch.FileDispatcherImpl.read0(Native Method)
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
sun.nio.ch.IOUtil.read(IOUtil.java:197)
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
oracle.net.nt.TimeoutSocketChannel.read(TimeoutSocketChannel.java:144)
oracle.net.ns.NIOHeader.readHeaderBuffer(NIOHeader.java:82)
oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:139)
oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:101)
oracle.net.ns.NIONSDataChannel.readDataFromSocketChannel(NIONSDataChannel.java:80)
oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:98)
oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485)
oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:612)
oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:226)
...
com.sun.proxy.$Proxy3126.stream(Unknown Source)
com.atlassian.greenhopper.service.sprint.SprintDao.loadAll(SprintDao.java:19)
com.atlassian.greenhopper.service.sprint.SprintManagerImpl$SprintCacheSupplier.get(SprintManagerImpl.java:432)
com.atlassian.greenhopper.service.sprint.SprintManagerImpl$SprintCacheSupplier.get(SprintManagerImpl.java:425)
com.atlassian.cache.compat.delegate.DelegatingSupplier.get(DelegatingSupplier.java:22)
com.atlassian.cache.ehcache.EhCacheManager$SupplierAdapter.load(EhCacheManager.java:260)
...
com.atlassian.jira.cluster.cache.ehcache.BlockingParallelCacheReplicator.runDeferred(BlockingParallelCacheReplicator.java:172)
com.atlassian.jira.cache.DeferredReplicationCachedReference.get(DeferredReplicationCachedReference.java:28)
com.atlassian.cache.compat.delegate.DelegatingCachedReference.get(DelegatingCachedReference.java:22)
com.atlassian.greenhopper.service.sprint.SprintManagerImpl.getAllSprints(SprintManagerImpl.java:124)
com.atlassian.greenhopper.service.sprint.SprintManagerImpl.getSprints(SprintManagerImpl.java:132)

While threads that are trying to access the sprint cache will be parked at the following stack:

at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000206cdbf10> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
	at net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50)
	at net.sf.ehcache.constructs.blocking.BlockingCache.acquiredLockForKey(BlockingCache.java:196)
	at net.sf.ehcache.constructs.blocking.BlockingCache.get(BlockingCache.java:158)
	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:79)
	at com.atlassian.cache.ehcache.DelegatingCachedReference.get(DelegatingCachedReference.java:73)
	at com.atlassian.jira.cache.DeferredReplicationCachedReference$$Lambda$265/1538681607.get(Unknown Source)
	at com.atlassian.jira.cluster.cache.ehcache.BlockingParallelCacheReplicator.runDeferred(BlockingParallelCacheReplicator.java:172)
	at com.atlassian.jira.cache.DeferredReplicationCachedReference.get(DeferredReplicationCachedReference.java:28)
	at com.atlassian.cache.compat.delegate.DelegatingCachedReference.get(DelegatingCachedReference.java:22)
	at com.atlassian.greenhopper.service.sprint.SprintManagerImpl.getSprint(SprintManagerImpl.java:110)
	...

Notes

Problem get amplified for Oracle DB. The reason for this is that by default, the Oracle JDBC driver fetch results of a SQL query streaming it from DB by 10 rows at a time. In a situation where a table has 150k records, 15k round trips will be required for the complete information of the table to reach the application. When the sprint cache gets flushed, a full table scan of the AO_60DB71_SPRINT table is performed to populate the sprint cache.
Using an instance where 150k sprints exist as an example in Oracle DB , 15k rounds trips between the application and the database will be required for the entire data set to be retrieved by the thread waiting on this information - this translates to a total of 15 seconds if the latency between Jira and the DB server is at 1ms. An increased latency between the app server and the DB server will increase the time it takes for the full results of the query to reach the application exponentially.

Workaround

For Oracle DB, Increase the number of results that can be fetched at a single time by the Oracle JDBC driver:
https://docs.oracle.com/cd/E18283_01/java.112/e16548/resltset.htm#i1023619
This can be done by adding the following line to the dbconfig.xml file, inside the <jdbc-datasource> section:
```
<connection-properties>defaultRowPrefetch=XXX</connection-properties>
```
XXX translates to the numerical value of results we'll be fetching. The default value is 10 - we have seen an increase to 200 being very effective while adding very little overhead to the JVM's memory utilization.
See also a related KB Using the default Oracle JDBC fetch size may lead to performance issues in Jira

is related to

JRASERVER-68168 Make FieldLayoutCache population more efficient in Jira

Gathering Interest

FLASH-3226 You do not have permission to view this issue

relates to

JRASERVER-71740 Increase the default value of defaultRowPrefetch for Oracle

Gathering Interest

FLASH-2410 You do not have permission to view this issue

mentioned in: Preparing for Jira 9.1; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

resolves: ACE-4499 Loading...

(39 mentioned in, 1 resolves)

Form Name

Matt Doar added a comment - 12/Oct/2022 7:38 PM

Looks like this was only for Oracle DB right?

Matt Doar added a comment - 12/Oct/2022 7:38 PM Looks like this was only for Oracle DB right?

Andrzej Kotas added a comment - 05/Jul/2022 2:23 PM

Hi Team, we're happy to announce that this will be fixed in 9.1.0 release.

We are introducing two new caches for Sprints to ensure they retrieval is performant in various scenarios. The new configuration will ensure a faster cache population, allowing Jira to perform such actions as opening an Agile board, assigning issues to a sprint much faster.

This will enable your teams to work simultaneously, without being blocked by slow board loading time or facing timeout requests.

Cheers

Andrzej Kotas

Jira DC PM

Andrzej Kotas added a comment - 05/Jul/2022 2:23 PM Hi Team, we're happy to announce that this will be fixed in 9.1.0 release. We are introducing two new caches for Sprints to ensure they retrieval is performant in various scenarios. The new configuration will ensure a faster cache population, allowing Jira to perform such actions as opening an Agile board, assigning issues to a sprint much faster. This will enable your teams to work simultaneously, without being blocked by slow board loading time or facing timeout requests. Cheers Andrzej Kotas Jira DC PM

Dibyandu Roy added a comment - 07/Feb/2022 7:30 AM - edited

This issue is affecting us. Any ETA for the fix available. We have 110k sprints. We are using MSSQL, what is the fix for us?

Dibyandu Roy added a comment - 07/Feb/2022 7:30 AM - edited This issue is affecting us. Any ETA for the fix available. We have 110k sprints. We are using MSSQL, what is the fix for us?

Alexander Medchenko added a comment - 22/Nov/2021 7:17 AM

Hi. Please review findings relating this bug: https://getsupport.atlassian.com/servicedesk/customer/portal/41/PSSRV-22594

The main problem detected in parallel using of SprintManager in the environment with 65k+ closed sprints. If you are try to operate lots of sprint objects via SprintManager in the same time with lots of users are working with boards or Structure plugin (with stuctures having relations with sprints) - it brings Jira cluster not operable.

Alexander Medchenko added a comment - 22/Nov/2021 7:17 AM Hi. Please review findings relating this bug: https://getsupport.atlassian.com/servicedesk/customer/portal/41/PSSRV-22594 The main problem detected in parallel using of SprintManager in the environment with 65k+ closed sprints. If you are try to operate lots of sprint objects via SprintManager in the same time with lots of users are working with boards or Structure plugin (with stuctures having relations with sprints) - it brings Jira cluster not operable.

Assignee:: Karol Lopacinski

Reporter:: Lucas Bugs

Affected customers:: 8 This affects my team

Watchers:: 27 Start watching this issue

Created:: 01/Jul/2020 9:32 PM

Updated:: 01/Aug/2024 7:38 PM

Resolved:: 21/Jul/2022 8:39 AM

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Notes

Workaround

Attachments

Issue Links

Forms

Activity

Collapse comment: Matt Doar added a comment - 12/Oct/2022 7:38 PM

Expand comment: Matt Doar added a comment - 12/Oct/2022 7:38 PM

Collapse comment: Andrzej Kotas added a comment - 05/Jul/2022 2:23 PM

Expand comment: Andrzej Kotas added a comment - 05/Jul/2022 2:23 PM

Collapse comment: Dibyandu Roy added a comment - 07/Feb/2022 7:30 AM, Edited by Dibyandu Roy - 07/Feb/2022 7:30 AM

Expand comment: Dibyandu Roy added a comment - 07/Feb/2022 7:30 AM, Edited by Dibyandu Roy - 07/Feb/2022 7:30 AM

Collapse comment: Alexander Medchenko added a comment - 22/Nov/2021 7:17 AM

Expand comment: Alexander Medchenko added a comment - 22/Nov/2021 7:17 AM

People

Dates

Backbone Issue Sync