Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
8.13.0, 8.20.0, 9.0.0
-
8.13
-
7
-
Severity 2 - Major
-
5
-
Description
Issue Summary
Context behind this bug
In a cluster of Jira nodes, scheduled jobs can be picked up and executed by any Jira node.
Examples of scheduled job include, but are not limited to:
- the Jira Batched Notification job (responsible for generating Jira batched Notifications and adding them to the Mail Queue)
- the Jira Service Management Notification job (responsible for generating Customer notifications and adding them to the Mail Queue)
Ideally, the execution of a scheduled job should be fairly distributed across all the nodes, so that:
- Node 1 runs job A
- Then next time job A is scheduled, Node 2 should run it
- Then next time job A is scheduled, Node 3 should run it
- Etc...
With a fairly load of scheduled jobs across nodes, this should ensure that, whenever a job adds mails to the Mail queue, the emails piling up in the queue are fairly distributed across nodes (each node has its own mail queue).
Problem
For unknown reasons, it has been observed in some Jira instance with heavy daily operations generating a lot of emails (Jira batched Notifications, or Customer notifications) that the same Jira node tends to run the same job over and over, instead of "letting other" nodes running the job on a Round Robin basis.
In the case where the job that the node keeps running on its own is the Jira Batched Notification job, of the JSM Customer Notification job, the following will happen:
- all the emails will end up on the Mail Queue of 1 single node
- in case of a busy Jira instance, the emails will be sent with a long delay, since all emails are being sent by 1 single node and emails are sent by the Mail Queue Service job with is using 1 single Caesium thread
Steps to replicate
Unknown. We haven't been able to replicate this behavior in our local Jira evironment.
Suggestion
Improve the way scheduled jobs are picked up by the nodes in a Jira Cluster, in order to ensure that jobs are executed fairly across all Jira nodes.
Note: while this is not exactly a bug, since technically, scheduled jobs are picked up "randomly" by any Jira node that is available, we are raising this ticket as a bug since the current algorithm does not prevent the same job from picking up the same job, causing the Jira Mail Queue to pile up.