[CONFSERVER-58526] Requests to aggregate Jira metadata in Confluence can exhaust HTTP threads and cause outages

Type: Bug
Resolution: Fixed
Priority: Low
Fix Version/s: 6.13.9, 6.15.8, 7.0.5
Affects Version/s: 6.6.14, 6.15.6
Component/s: Server - Performance
Labels:
- pse-request
- tn-h

Fixed in Long Term Support Release/s:

Download 6.13
Support reference count:
3
Symptom Severity:
Severity 1 - Critical
UIS:
6
Bug Fix Policy:
View Atlassian Server bug fix policy

Issue Summary

When linked with Jira, if placed under high load and an unresponsive Jira, all Confluence HTTP threads can be occupied by Jira metadata aggregation tasks. This will leave no threads to process users' requests and eventually lead to outage.

Steps to Reproduce

Create an app link between Confluence and Jira
Block connections from Confluence to Jira e.g. use a firewall to block all traffic to Jira
In Confluence, open more than 10 new pages every 5 seconds (This throughput will make the queue to process Jira metadata aggregation to grow gradually)

Expected Results

Confluence should handle Jira's slowness gracefully.

Actual Results

Confluence users observe intermittent failures then complete outage.

Captured thread dumps will show all HTTP threads busy fetching metadata from Jira:

   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000034d5e64e0> (a java.util.concurrent.FutureTask)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCacheLoader.getAggregateFromTask(JiraAggregateCacheLoader.java:132)
	at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCacheLoader.getValue(JiraAggregateCacheLoader.java:116)
	at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCache.getAggregateData(JiraAggregateCache.java:74)
	at com.atlassian.confluence.plugins.metadata.jira.service.JiraMetadataService.getAggregateData(JiraMetadataService.java:74)
	at com.atlassian.confluence.plugins.metadata.jira.rest.JiraMetadataResource.getAggregateData(JiraMetadataResource.java:33)

Root cause

Confluence Jira Metadata plugin uses a single ThreadPoolExecutor (maximum of 10 threads in a pool backed by an unlimited LinkedBlockingQueue) to aggregate metadata from Jira for every page load in Confluence. There are 2 types of aggregation tasks: Cache loading and Metadata fetching. The problem lies in the fact that a Cache loading task creates Fetching tasks then submits them into the very executor it is running in then blocks and waits until either its child tasks are finished or is timed out (currently set to 5 seconds). If all threads in the executor are occupied by Cache loading tasks then all of them will be blocked for 5 seconds for no good reason while new loading tasks will be put into a waiting queue. If we keep throwing more than 10 loading tasks (by opening 10 new pages in the browsers) every 5 seconds, this situation will get worse and worse and ultimately leads to an outage.

Workaround

Disabling the "Confluence Jira Metadata" plugin should bring Confluence back.

The only functionality that will be lost is the "Jira Links" list at the top of the Confluence page if the page contains links to Jira. The actual links within the content of the page will still work.

links to

https://confluence.atlassian.com/confkb/performance-issue-with-confluence-due-to-jira-metadata-plugin-973499185.html

mentioned in: Page Failed to load; Page Failed to load; Page Failed to load

Quan Pham added a comment - 12/Dec/2019 4:30 AM

A fix for this issue is available to Server and Data Center customers in Confluence 7.0.5
Upgrade now or check out the Release Notes to see what other issues are resolved.

Quan Pham added a comment - 12/Dec/2019 4:30 AM A fix for this issue is available to Server and Data Center customers in Confluence 7.0.5 Upgrade now or check out the Release Notes to see what other issues are resolved.

Bradley Hyde added a comment - 18/Nov/2019 3:55 AM

If you're running the Confluence 6.13 Enterprise release, a fix for this issue is now available in Confluence 6.13.9, which you can find in the Download Archives.

Bradley Hyde added a comment - 18/Nov/2019 3:55 AM If you're running the Confluence 6.13 Enterprise release, a fix for this issue is now available in Confluence 6.13.9, which you can find in the Download Archives .

Nhan Dang added a comment - 27/Aug/2019 12:11 AM - edited

Hi philippe.perez,

This fix has just been released to 6.15.8. We normally let newly released changes that are deemed ER-worthy to soak for couple of weeks before backporting them to ER.

Hope this helps.

Cheers,

Nhan

Nhan Dang added a comment - 27/Aug/2019 12:11 AM - edited Hi philippe.perez , This fix has just been released to 6.15.8. We normally let newly released changes that are deemed ER-worthy to soak for couple of weeks before backporting them to ER. Hope this helps. Cheers, Nhan

Philippe PEREZ added a comment - 26/Aug/2019 1:38 PM - edited

Hello @Quan Pham,

When (and in which version) this bugfix will be back ported in Enterprise Release ?
We requested for it since a long time to our TAM and still waiting the fix before to migrate to 6.13.

Thanks.

Philippe PEREZ added a comment - 26/Aug/2019 1:38 PM - edited Hello @Quan Pham, When (and in which version) this bugfix will be back ported in Enterprise Release ? We requested for it since a long time to our TAM and still waiting the fix before to migrate to 6.13. Thanks.

Quan Pham added a comment - 26/Aug/2019 1:06 AM

A fix for this issue is available to Server and Data Center customers in Confluence 6.15.8
Upgrade now or check out the Release Notes to see what other issues are resolved.

Quan Pham added a comment - 26/Aug/2019 1:06 AM A fix for this issue is available to Server and Data Center customers in Confluence 6.15.8 Upgrade now or check out the Release Notes to see what other issues are resolved.

Confluence Data Center

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Root cause

Workaround

Attachments

Issue Links

Forms

Activity

Collapse comment: Quan Pham added a comment - 12/Dec/2019 4:30 AM

Expand comment: Quan Pham added a comment - 12/Dec/2019 4:30 AM

Collapse comment: Bradley Hyde added a comment - 18/Nov/2019 3:55 AM

Expand comment: Bradley Hyde added a comment - 18/Nov/2019 3:55 AM

Collapse comment: Nhan Dang added a comment - 27/Aug/2019 12:11 AM, Edited by Nhan Dang - 27/Aug/2019 12:12 AM

Expand comment: Nhan Dang added a comment - 27/Aug/2019 12:11 AM, Edited by Nhan Dang - 27/Aug/2019 12:12 AM

Collapse comment: Philippe PEREZ added a comment - 26/Aug/2019 1:38 PM, Edited by Philippe PEREZ - 26/Aug/2019 1:39 PM

Expand comment: Philippe PEREZ added a comment - 26/Aug/2019 1:38 PM, Edited by Philippe PEREZ - 26/Aug/2019 1:39 PM

Collapse comment: Quan Pham added a comment - 26/Aug/2019 1:06 AM

Expand comment: Quan Pham added a comment - 26/Aug/2019 1:06 AM

People

Dates