Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-101168

The userlister plugin in Confluence can cause excessive Hazelcast topic traffic for cache invalidation, leading to excessive heap usage which results in an OutOfMemoryError

XMLWordPrintable

      Issue Summary

      With heavy user activity on a specific account, the userlister plugin can cause excessive Hazelcast topic traffic for cache invalidation, causing excessive heap usage resulting in an OutOfMemoryError

      Steps to Reproduce

      Atlassian support is still refining these steps and we'll remove this line once we've confirmed the steps below are guaranteed to trigger the problem:

      1. Create a user account for REST API calls
      2. Make large volumes of REST API calls to any endpoint with standard auth or a personal access token (PAT)

      Expected Results

      Heap usage should fluctuate normally with usage. Users or scripts performing short REST API requests should not trigger any performance problems.

      Actual Results

      Heap usage can suddenly spike, filling the JVM heap space and causing an OutOfMemoryError, Full GC pauses, a cluster panic, or all of those combined.

      If an OutOfMemoryError occurs, the below error will be logged in the catalina.out file:

      java.lang.OutOfMemoryError: Java heap space
      

      When the problem occurs, heap usage will spike suddenly, often triggering the symptoms described (full GC pauses, OutOfMemoryError, etc):

      If a heap dump is available, a large amount of heap space (at least several GB, often tens of gigabytes) will be associated with class "com.hazelcast.util.executor.StripedExecutor$Worker" for the hz.confluence.event-3 thread, which is handling messages from other cluster nodes.

      In this example, 9.6 GB of heap space was occupied by these objects:

      The leak suspects report shows the linked list used for storing the hazelcast topic messages:

      The domtree will show the com.hazelcast.util.executor.StripedExecutor$Worker objects for the hz.confluence.event-3 thread:

      The linked list storing the payloads is evident once the largest classes above are expanded:

      In this case, these are cache invalidation and replication messages. Looking at the hazelcast payloads, they are hashes for AsyncInvalidationCache.Replication.com.atlassian.confluence.extra.userlister.DefaultUserListManager which have values like this:

      ....................(.saA..]]...p.....hz:impl:topicService..3......................cAsyncInvalidationCache.Replication.com.atlassian.confluence.extra.userlister.DefaultUserListManager....a.........................10.20.166.46.3..............sr..io.atlassian.fugue.Pair*b=l..u....L..leftt..Ljava/lang/Object;L..rightq.~..xpt..svc-acct@atlassian.comsr..java.util.HashSet.D.....4...xpw.....?@......t. E7C3489B7688935A4848A86D8ACC0EC4t. 65FF63678579FC564980BBBB1A0D526Ct. F1EFE41BA13CEC47638AD4E41EFD5A9At. D000F6EE04D7AD923E28DF09A24EAEB2t. C0D624C499EC02B9D03ED92E0A7CF6C3t. 3A4B7718F2B5A033C53C1BDF0D6E2957t. 722FF83DB9EEB17CB75E2D959752CC46t. FD4A0A41A092A1B7F0159B52341A8CA0t. AF3B974E44608B4647B3AA44E03B4586t. 5022B3440765C1483A76ACBBE2BF7D17t. 0778180A4CE46BDC43745ECBEF308B20t. 39EA611F68DB8C32FA98122CD9DE7AB8t. DB33E1CCF2FC73FAE58ACB57669BE32Et. CBBC7F1B875292F5CB6847EE03A5D212t. A47D09394CC1D07822977B6FC828E6B7t. 4323441D261C0A9CAF1D74EB7BB75FE2t. 1F0BCBEC08C3360D25C9394D1A8AEB66t. 76B06542C35018DE19E22DE683C6042Et. D
      

      The IP's mentioned in these payloads will be from other cluster nodes (not one specific node).

      The 'User lister' plugin within Confluence tracks user details like session ID when users login & logout, with the session details and a hash stored in a cache that hazelcast then tries to replicate that out across other cluster nodes, which in this case triggers the OutOfMemoryError.

      Workaround

      Disable the userlister plugin using the following steps:

      1. Login to Confluence as an administrator
      2. From the UI, select Confluence Administration >> Manage Apps >> Change the dropdown from 'User installed' to 'All apps'.
      3. Locate the 'User Lister' plugin (app key confluence.extra.userlister) and click 'disable'

      Note: Disabling this plugin will prevent the User list macro from functioning correctly. Use the link below to check what pages may contain this macro:
      <base_url>/dosearchsite.action?cql=macro+%3D+%22userlister%22

      Atlassian support is also investigating the effect of frequent cache flushing of the 'User Lister Plugin cache' - more will be added here after further testing.

        1. userlister-domtree.jpeg
          userlister-domtree.jpeg
          279 kB
        2. userlister-domtree-linkedblockingqueue.jpeg
          userlister-domtree-linkedblockingqueue.jpeg
          150 kB
        3. userlister-heap.png
          userlister-heap.png
          2.65 MB
        4. userlister-leaksuspects.jpeg
          userlister-leaksuspects.jpeg
          95 kB
        5. userlister-overview.jpeg
          userlister-overview.jpeg
          30 kB
        6. userlister-topicservice-hz.jpeg
          userlister-topicservice-hz.jpeg
          106 kB

              03cb0c04aa4f Irina Tiapchenko
              mninnes@atlassian.com Malcolm Ninnes
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: