Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-74533

Unchecking lots of inline tasks results in all the HTTP thread exhaustion 

      Atlassian Update - 4th June 2024

      Hi Everyone,

      Thank you for your patience with this issue as we are still working on a complete solution.

      During our development, we were are able to pre-emptively release some optimisations which should help reduce the impact. This should greatly improve processing efficiency when checking many tasks on the same page.

      These optimisations have been released in version 8.5.10

      Thanks!

      Steps to reproduce

      1. Create a confluence page on a test instance with many inline tasks and publish it. 
      2. Rapidly untick as many inline tasks as you can as quickly as you can.

      Results

      1. Observe The HTTP thread stuck status in the Catalina.out logs, which will rapidly grow
        27-Aug-2021 16:25:20.259 - [1] thread(s) stuck.
        27-Aug-2021 16:25:30.265 - [2] thread(s) stuck.
        27-Aug-2021 16:25:30.300 - [3] thread(s) stuck.
        27-Aug-2021 16:25:30.302 - [4] thread(s) stuck.
        .
        .
        .
        27-Aug-2021 16:27:00.662 - [44] thread(s) stuck.
        27-Aug-2021 16:27:00.664 - [45] thread(s) stuck.
        27-Aug-2021 16:27:10.669 - [46] thread(s) stuck.
        27-Aug-2021 16:27:10.671 - [47] thread(s) stuck.
        
      2. Eventually all the HTTP threads will be exhausted.
      3. Catalina logs will have inlinetasks related threads in a STUCK state -
        27-Aug-2021 16:35:36.650 WARNING [Catalina-utility-1] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [http-nio-8080-exec-48] (id=[1331]) has been active for [61,172] milliseconds (since [8/27/21 4:34 PM]) to serve the same request for [https://localhost:8090/rest/inlinetasks/1/task/1039232143/905/] and may be stuck (configured threshold for this StuckThreadDetectionValve is [60] seconds). There is/are [48] thread(s) in total that are monitored by this Valve and may be stuck.
        	java.lang.Throwable
        		at java.base@11.0.7/jdk.internal.misc.Unsafe.park(Native Method)
                        . . .
        		at com.atlassian.confluence.plugins.merge.TDMMerger.mergeContent(TDMMerger.java:103)
        		at com.atlassian.confluence.plugins.merge.TDMMerger.mergeContent(TDMMerger.java:118)
        		at com.atlassian.confluence.util.diffs.PageLayoutAwareMerger.mergeContent(PageLayoutAwareMerger.java:53)
        		at com.atlassian.confluence.pages.DefaultDraftManager.mergeContent(DefaultDraftManager.java:149)
        .
        .
        		at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:124)
        		at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
        		at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
        		at com.sun.proxy.$Proxy2032.mergeContent(Unknown Source)
        		at com.atlassian.confluence.plugins.tasklist.service.DefaultInlineTaskService.updatePage(DefaultInlineTaskService.java:361)
        		at com.atlassian.confluence.plugins.tasklist.service.DefaultInlineTaskService.setTaskStatus(DefaultInlineTaskService.j
        

      This would cause a temporary outage for the node in case of DC.

      Workaround

      The first workaround (listed below) is easily the most effective, as it eliminates the problem completely. If it is not possible to influence user behaviour, workaround 2 can provide an impact, but the problem will still remain to some extent.

      Workaround-1 (Recommended workaround)

      Open the Page in EDIT mode and then try to update the tasks which will not create multiple drafts in the background which could avoid spawning of new HTTP threads.

      Workaround-2

      Reduce page size, which has a proven effect on inline task thread execution time. The following values were averaged over approximately 10 tests per item).

      • The impact of page version count is negligible:
        Versions Task execution time
        20 169 ms
        5800 122 ms
        17000 155 ms
      • However, page size has an impact on task change execution time:
        Size Task execution time
        1280 bytes 93 ms
        522075 bytes 1.32 sec
        1003467 bytes 1.85 sec
        1890955 bytes 4.20 sec
        2815461 bytes 6.45 sec
        4120091 bytes 8.93 sec
        4837173 bytes 11.20 sec

            [CONFSERVER-74533] Unchecking lots of inline tasks results in all the HTTP thread exhaustion 

            George Lipatov made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 970062 ]
            Rob made changes -
            Remote Link Original: This issue links to "Page (Confluence)" [ 920040 ]
            Rob made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 959919 ]
            James Whitehead made changes -
            Fix Version/s Original: 9.0.0 [ 106328 ]
            James Whitehead made changes -
            Fix Version/s New: 9.0.1 [ 108911 ]
            Rob made changes -
            Labels Original: architectural fireball fireball-whl whl-fy24q3 New: architectural fireball fireball-whl whl-fy24q3 whl-fy24q4
            Nobuyuki Mukai made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 924494 ]
            Nobuyuki Mukai made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 923578 ]
            Akshay Rai made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Waiting for Release [ 12075 ] New: Closed [ 6 ]
            SET Analytics Bot made changes -
            UIS Original: 83 New: 81

              d5dce7b13926 agawron
              3297cf7d7ee0 Arun Suryawanshi
              Affected customers:
              33 This affects my team
              Watchers:
              58 Start watching this issue

                Created:
                Updated:
                Resolved: