Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-58526

Requests to aggregate Jira metadata in Confluence can exhaust HTTP threads and cause outages

      Issue Summary

      When linked with Jira, if placed under high load and an unresponsive Jira, all Confluence HTTP threads can be occupied by Jira metadata aggregation tasks. This will leave no threads to process users' requests and eventually lead to outage.

      Steps to Reproduce

      1. Create an app link between Confluence and Jira
      2. Block connections from Confluence to Jira e.g. use a firewall to block all traffic to Jira
      3. In Confluence, open more than 10 new pages every 5 seconds (This throughput will make the queue to process Jira metadata aggregation to grow gradually)

      Expected Results

      Confluence should handle Jira's slowness gracefully.

      Actual Results

      Confluence users observe intermittent failures then complete outage.

      Captured thread dumps will show all HTTP threads busy fetching metadata from Jira:

         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x000000034d5e64e0> (a java.util.concurrent.FutureTask)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
      	at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCacheLoader.getAggregateFromTask(JiraAggregateCacheLoader.java:132)
      	at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCacheLoader.getValue(JiraAggregateCacheLoader.java:116)
      	at com.atlassian.confluence.plugins.metadata.jira.aggregate.JiraAggregateCache.getAggregateData(JiraAggregateCache.java:74)
      	at com.atlassian.confluence.plugins.metadata.jira.service.JiraMetadataService.getAggregateData(JiraMetadataService.java:74)
      	at com.atlassian.confluence.plugins.metadata.jira.rest.JiraMetadataResource.getAggregateData(JiraMetadataResource.java:33)
      

      Root cause

      Confluence Jira Metadata plugin uses a single ThreadPoolExecutor (maximum of 10 threads in a pool backed by an unlimited LinkedBlockingQueue) to aggregate metadata from Jira for every page load in Confluence. There are 2 types of aggregation tasks: Cache loading and Metadata fetching. The problem lies in the fact that a Cache loading task creates Fetching tasks then submits them into the very executor it is running in then blocks and waits until either its child tasks are finished or is timed out (currently set to 5 seconds). If all threads in the executor are occupied by Cache loading tasks then all of them will be blocked for 5 seconds for no good reason while new loading tasks will be put into a waiting queue. If we keep throwing more than 10 loading tasks (by opening 10 new pages in the browsers) every 5 seconds, this situation will get worse and worse and ultimately leads to an outage.

      Workaround

      Disabling the "Confluence Jira Metadata" plugin should bring Confluence back.

      The only functionality that will be lost is the "Jira Links" list at the top of the Confluence page if the page contains links to Jira. The actual links within the content of the page will still work.

            [CONFSERVER-58526] Requests to aggregate Jira metadata in Confluence can exhaust HTTP threads and cause outages

            Quan Pham added a comment -

            A fix for this issue is available to Server and Data Center customers in Confluence 7.0.5
            Upgrade now or check out the Release Notes to see what other issues are resolved.

            Quan Pham added a comment - A fix for this issue is available to Server and Data Center customers in Confluence 7.0.5 Upgrade now or check out the Release Notes to see what other issues are resolved.

            If you're running the Confluence 6.13 Enterprise release, a fix for this issue is now available in Confluence 6.13.9, which you can find in the Download Archives.

            Bradley Hyde added a comment - If you're running the Confluence 6.13 Enterprise release, a fix for this issue is now available in Confluence 6.13.9, which you can find in the Download Archives .

            Nhan Dang added a comment - - edited

            Hi philippe.perez,

            This fix has just been released to 6.15.8. We normally let newly released changes that are deemed ER-worthy to soak for couple of weeks before backporting them to ER.

             

            Hope this helps.

             

            Cheers,

            Nhan

            Nhan Dang added a comment - - edited Hi philippe.perez , This fix has just been released to 6.15.8. We normally let newly released changes that are deemed ER-worthy to soak for couple of weeks before backporting them to ER.   Hope this helps.   Cheers, Nhan

            Philippe PEREZ added a comment - - edited

            Hello @Quan Pham,

            When (and in which version) this bugfix will be back ported in Enterprise Release ?
            We requested for it since a long time to our TAM and still waiting the fix before to migrate to 6.13.

            Thanks.

            Philippe PEREZ added a comment - - edited Hello @Quan Pham, When (and in which version) this bugfix will be back ported in Enterprise Release ? We requested for it since a long time to our TAM and still waiting the fix before to migrate to 6.13. Thanks.

            Quan Pham added a comment -

            A fix for this issue is available to Server and Data Center customers in Confluence 6.15.8
            Upgrade now or check out the Release Notes to see what other issues are resolved.

            Quan Pham added a comment - A fix for this issue is available to Server and Data Center customers in Confluence 6.15.8 Upgrade now or check out the Release Notes to see what other issues are resolved.

              nhdang Nhan Dang
              0d0661ad170a Timothy Horton
              Affected customers:
              1 This affects my team
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: