-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
High
-
None
-
Affects Version/s: 3.3.0
-
Component/s: Clustering
-
Severity 2 - Major
Issue Summary
HazelcastBucketedExecutor$BucketProcessingBootstrapper creates excessive Hazelcast Packet objects when multiple bootstrappers contend for the same bucket. In some cases this could lead to Java heap space exhaustion (i.e. OutOfMemoryError).
Details
Bitbucket has a framework for running background tasks such that they scale across the cluster, providing more processing capability as cluster nodes are added.
When tasks are submitted to a BucketedExecutor, each submission calls scheduleLocally() which creates a new BucketProcessingBootstrapper, i.e a Java Runnable task. If many tasks are submitted (e.g., a hierarchy reindex triggering commit-indexing for hundreds of repositories), hundreds of bootstrappers are created for that bucket.
Only one bootstrapper can hold the bucket lock at a time and actually be processed. All others receive null from ClaimTasksFromBucketProcessor and reschedule themselves with a fixed 1-second delay. On each retry, bucketMap.executeOnKey() serializes a new ClaimTasksFromBucketProcessor into a Hazelcast Packet object (typically about 1kb in size). With hundreds of bootstrappers per bucket all retrying every second, this generates tens of thousands of packets per second.
Under normal conditions Hazelcast sends/drains these packets fast enough. However, if any factor slows packet processing packets accumulate in Hazelcast's internal queues faster than they can be drained. This creates a feedback loop: accumulated packets increase heap pressure, which triggers longer GC pauses, which further slows packet processing, which causes more packets to accumulate. It is possible for, in minutes, for the heap to be entirely consumed by millions of Packet objects that are still pending dispatch and thus not elegible for GC.
I said this happens if "any factor slows packet processing", some common examples of this include:
- a transient network interruption
- a permanent network interruption; it takes time for a node to be ejected from the cluster due to lost heartbeat and during this time heap growth can occur due to the problem described here
- one or more nodes in the cluster under significant memory pressure and spending time in GC, and thus not receiving Packet objects from other cluster nodes fast enough
- one of more nodes swapping heavily, and thus not receiving Packet objects from other cluster nodes fast enough
- perhaps simply a large volume of concurrent operations in the BucketExecutor such that it exceeds the clusters ability to send packets
Note that I've set the affects version as 3.3.0, as this looks to be the first version that shipped the BuckedExecutor and this behaviour has been present since then.
Suggested fix
One of the following approaches could be used to reduce packet generation:
- Exponential backoff on lock-contention retries: When ClaimTasksFromBucketProcessor returns null (bucket locked), the bootstrapper should back off exponentially (e.g., 1s, 2s, 4s, ... capped at 60s) rather than retrying at a fixed 1-second interval. This would significantly reduces packet generation rate under contention.
- Deduplicate bootstrapper scheduling: scheduleLocally() could track active bootstrappers per bucket and avoid scheduling a new one if one is already running or queued. This bounds the number of bootstrappers per bucket to one, regardless of how many tasks are submitted.
- Verify IMap backups are valuable for this use case. If they can be disabled this could potentially reduce or eliminate inter-node messages needing to be sent (needs to be verified)
Workaround
Currently there is no known workaround for this behavior. A workaround will be added here when available