Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-58526

Requests to aggregate Jira metadata in Confluence can exhaust HTTP threads and cause outages

      Issue Summary

      When linked with Jira, if placed under high load and an unresponsive Jira, all Confluence HTTP threads can be occupied by Jira metadata aggregation tasks. This will leave no threads to process users' requests and eventually lead to outage.

      Steps to Reproduce

      1. Create an app link between Confluence and Jira
      2. Block connections from Confluence to Jira e.g. use a firewall to block all traffic to Jira
      3. In Confluence, open more than 10 new pages every 5 seconds (This throughput will make the queue to process Jira metadata aggregation to grow gradually)

      Expected Results

      Confluence should handle Jira's slowness gracefully.

      Actual Results

      Confluence users observe intermittent failures then complete outage.

      Captured thread dumps will show all HTTP threads busy fetching metadata from Jira:

         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x000000034d5e64e0> (a java.util.concurrent.FutureTask)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
      	at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCacheLoader.getAggregateFromTask(JiraAggregateCacheLoader.java:132)
      	at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCacheLoader.getValue(JiraAggregateCacheLoader.java:116)
      	at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCache.getAggregateData(JiraAggregateCache.java:74)
      	at com.atlassian.confluence.plugins.metadata.jira.service.JiraMetadataService.getAggregateData(JiraMetadataService.java:74)
      	at com.atlassian.confluence.plugins.metadata.jira.rest.JiraMetadataResource.getAggregateData(JiraMetadataResource.java:33)
      

      Root cause

      Confluence Jira Metadata plugin uses a single ThreadPoolExecutor (maximum of 10 threads in a pool backed by an unlimited LinkedBlockingQueue) to aggregate metadata from Jira for every page load in Confluence. There are 2 types of aggregation tasks: Cache loading and Metadata fetching. The problem lies in the fact that a Cache loading task creates Fetching tasks then submits them into the very executor it is running in then blocks and waits until either its child tasks are finished or is timed out (currently set to 5 seconds). If all threads in the executor are occupied by Cache loading tasks then all of them will be blocked for 5 seconds for no good reason while new loading tasks will be put into a waiting queue. If we keep throwing more than 10 loading tasks (by opening 10 new pages in the browsers) every 5 seconds, this situation will get worse and worse and ultimately leads to an outage.

      Workaround

      Disabling the "Confluence Jira Metadata" plugin should bring Confluence back.

      The only functionality that will be lost is the "Jira Links" list at the top of the Confluence page if the page contains links to Jira. The actual links within the content of the page will still work.

          Form Name

            [CONFSERVER-58526] Requests to aggregate Jira metadata in Confluence can exhaust HTTP threads and cause outages

              nhdang Nhan Dang
              0d0661ad170a Timothy Horton
              Affected customers:
              1 This affects my team
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: