-
Bug
-
Resolution: Fixed
-
Low
-
6.6.14, 6.15.6
-
3
-
Severity 1 - Critical
-
6
-
Issue Summary
When linked with Jira, if placed under high load and an unresponsive Jira, all Confluence HTTP threads can be occupied by Jira metadata aggregation tasks. This will leave no threads to process users' requests and eventually lead to outage.
Steps to Reproduce
- Create an app link between Confluence and Jira
- Block connections from Confluence to Jira e.g. use a firewall to block all traffic to Jira
- In Confluence, open more than 10 new pages every 5 seconds (This throughput will make the queue to process Jira metadata aggregation to grow gradually)
Expected Results
Confluence should handle Jira's slowness gracefully.
Actual Results
Confluence users observe intermittent failures then complete outage.
Captured thread dumps will show all HTTP threads busy fetching metadata from Jira:
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000034d5e64e0> (a java.util.concurrent.FutureTask) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at java.util.concurrent.FutureTask.get(FutureTask.java:191) at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCacheLoader.getAggregateFromTask(JiraAggregateCacheLoader.java:132) at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCacheLoader.getValue(JiraAggregateCacheLoader.java:116) at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCache.getAggregateData(JiraAggregateCache.java:74) at com.atlassian.confluence.plugins.metadata.jira.service.JiraMetadataService.getAggregateData(JiraMetadataService.java:74) at com.atlassian.confluence.plugins.metadata.jira.rest.JiraMetadataResource.getAggregateData(JiraMetadataResource.java:33)
Root cause
Confluence Jira Metadata plugin uses a single ThreadPoolExecutor (maximum of 10 threads in a pool backed by an unlimited LinkedBlockingQueue) to aggregate metadata from Jira for every page load in Confluence. There are 2 types of aggregation tasks: Cache loading and Metadata fetching. The problem lies in the fact that a Cache loading task creates Fetching tasks then submits them into the very executor it is running in then blocks and waits until either its child tasks are finished or is timed out (currently set to 5 seconds). If all threads in the executor are occupied by Cache loading tasks then all of them will be blocked for 5 seconds for no good reason while new loading tasks will be put into a waiting queue. If we keep throwing more than 10 loading tasks (by opening 10 new pages in the browsers) every 5 seconds, this situation will get worse and worse and ultimately leads to an outage.
Workaround
Disabling the "Confluence Jira Metadata" plugin should bring Confluence back.
The only functionality that will be lost is the "Jira Links" list at the top of the Confluence page if the page contains links to Jira. The actual links within the content of the page will still work.
- links to
A fix for this issue is available to Server and Data Center customers in Confluence 7.0.5
Upgrade now or check out the Release Notes to see what other issues are resolved.