-
Bug
-
Resolution: Unresolved
-
Low
-
None
-
8.6.2, 9.2.3, 9.2.5, 9.2.6, 9.2.7
-
Severity 2 - Major
-
Issue Summary
With heavy user activity on a specific account, the userlister plugin can cause excessive Hazelcast topic traffic for cache invalidation, causing excessive heap usage resulting in an OutOfMemoryError
Steps to Reproduce
Atlassian support is still refining these steps and we'll remove this line once we've confirmed the steps below are guaranteed to trigger the problem:
- Create a user account for REST API calls
- Make large volumes of REST API calls to any endpoint with standard auth or a personal access token (PAT)
Expected Results
Heap usage should fluctuate normally with usage. Users or scripts performing short REST API requests should not trigger any performance problems.
Actual Results
Heap usage can suddenly spike, filling the JVM heap space and causing an OutOfMemoryError, Full GC pauses, a cluster panic, or all of those combined.
If an OutOfMemoryError occurs, the below error will be logged in the catalina.out file:
java.lang.OutOfMemoryError: Java heap space
When the problem occurs, heap usage will spike suddenly, often triggering the symptoms described (full GC pauses, OutOfMemoryError, etc):
If a heap dump is available, a large amount of heap space (at least several GB, often tens of gigabytes) will be associated with class "com.hazelcast.util.executor.StripedExecutor$Worker" for the hz.confluence.event-3 thread, which is handling messages from other cluster nodes.
In this example, 9.6 GB of heap space was occupied by these objects:

The leak suspects report shows the linked list used for storing the hazelcast topic messages:
The domtree will show the com.hazelcast.util.executor.StripedExecutor$Worker objects for the hz.confluence.event-3 thread:
The linked list storing the payloads is evident once the largest classes above are expanded:
In this case, these are cache invalidation and replication messages. Looking at the hazelcast payloads, they are hashes for AsyncInvalidationCache.Replication.com.atlassian.confluence.extra.userlister.DefaultUserListManager which have values like this:
....................(.saA..]]...p.....hz:impl:topicService..3......................cAsyncInvalidationCache.Replication.com.atlassian.confluence.extra.userlister.DefaultUserListManager....a.........................10.20.166.46.3..............sr..io.atlassian.fugue.Pair*b=l..u....L..leftt..Ljava/lang/Object;L..rightq.~..xpt..svc-acct@atlassian.comsr..java.util.HashSet.D.....4...xpw.....?@......t. E7C3489B7688935A4848A86D8ACC0EC4t. 65FF63678579FC564980BBBB1A0D526Ct. F1EFE41BA13CEC47638AD4E41EFD5A9At. D000F6EE04D7AD923E28DF09A24EAEB2t. C0D624C499EC02B9D03ED92E0A7CF6C3t. 3A4B7718F2B5A033C53C1BDF0D6E2957t. 722FF83DB9EEB17CB75E2D959752CC46t. FD4A0A41A092A1B7F0159B52341A8CA0t. AF3B974E44608B4647B3AA44E03B4586t. 5022B3440765C1483A76ACBBE2BF7D17t. 0778180A4CE46BDC43745ECBEF308B20t. 39EA611F68DB8C32FA98122CD9DE7AB8t. DB33E1CCF2FC73FAE58ACB57669BE32Et. CBBC7F1B875292F5CB6847EE03A5D212t. A47D09394CC1D07822977B6FC828E6B7t. 4323441D261C0A9CAF1D74EB7BB75FE2t. 1F0BCBEC08C3360D25C9394D1A8AEB66t. 76B06542C35018DE19E22DE683C6042Et. D
The IP's mentioned in these payloads will be from other cluster nodes (not one specific node).

The 'User lister' plugin within Confluence tracks user details like session ID when users login & logout, with the session details and a hash stored in a cache that hazelcast then tries to replicate that out across other cluster nodes, which in this case triggers the OutOfMemoryError.
Workaround
Disable the userlister plugin using the following steps:
- Login to Confluence as an administrator
- From the UI, select Confluence Administration >> Manage Apps >> Change the dropdown from 'User installed' to 'All apps'.
- Locate the 'User Lister' plugin (app key confluence.extra.userlister) and click 'disable'
Note: Disabling this plugin will prevent the User list macro from functioning correctly. Use the link below to check what pages may contain this macro:
<base_url>/dosearchsite.action?cql=macro+%3D+%22userlister%22
Atlassian support is also investigating the effect of frequent cache flushing of the 'User Lister Plugin cache' - more will be added here after further testing.