Status: Closed (View Workflow)
Affects Version/s: 6.4.12, 7.5.1, 7.2.12, 7.6.7, 7.13.3, 8.7.0, 8.5.3
Component/s: Data Center - Other
Fixed in Long Term Support Release/s:
Introduced in Version:6.04
Support reference count:70
Symptom Severity:Severity 2 - Major
Bug Fix Policy:
Current Status:Atlassian Update – 06 Feb 2020 Hi everyone, I’m glad to announce that Jira 8.5.4, 8.7.1, 8.8.0, and later contain a remedy for this issue. First, when a node starts up, it will remove any tasks that were assigned to it before the restart. It's a generalized version of the same mechanism Jira uses for cleaning up stuck re-index and user anonymization jobs. Secondly, there's a new scheduled service that detects offline nodes and clears tasks assigned to them. For this mechanism, a node is considered offline if it didn't write a heartbeat to the database in the past 30 minutes. You can disable the second mechanism by setting the "jira.dc.cleanup.cluser.tasks.disabled" feature flag. You can follow our documentation to learn how to set it. You can also customize its behavior (how often the service is run and how long it takes for a node to be considered offline) by modifying two properties: cluster.task.cleanup.run.interval cluster.task.cleanup.offline.node.threshold You should refer to the jpm.xml file for an up-to-date documentation. Currently, the first property defaults to 60 seconds and the second one to 30 minutes (and disallows values below 10 minutes). We also created a product improvement suggestion to notify the users (e.g. via email) that a task they submitted has been removed - JRASERVER-70584 . Right now, only a log message like below is added: 2020-02-06 02:58:35,292+1100 Caesium-1-4 ERROR ServiceRunner [c.a.jira.cluster.ClusterTaskCleanupService] Removing stale 'Jira Indexing' task '10500' started on node 'node-id' . If you’re interested in the ability to remove stuck global tasks manually, please see the suggestion JRASERVER-66722 . Thank you, Daniel Rauf Jira Server Developer
Assume some node in JIRA Datacenter is executing long running task with has cluster wide status. If at some point progress abnormally stops before the job is complete, then job will be stuck for whole cluster.
The stuck job is stored in an in-memory cache and is replicated to other nodes when they start. All nodes must be shutdown at the same time in order for this job to be removed from cache.
- JIRA DataCenter with 2+ nodes
- Performs change that causes "Bulk Operation" action
- Monitor "Bulk Operation Progress" bar.
- Restart the node executing job (or create database connection failure)
- Progress bar appears in stuck state on each node.
- A restart of one node has no impact, progress bar continues to appear.
- The stuck job appears when trying to make other changes. Other changes cannot be made while this is stuck.
- All nodes must be shutdown at the same time in order for this job to be removed from cache.
- JIRA cluster detects the job is no longer progressing, throws an error, and no longer shows the stuck "Bulk Operation Progress" bar.
- Or JIRA cluster detects the job is no longer progressing and continues the job
In either case, a stuck job on one node does not require restart of entire cluster
- A stuck bulk edit can be reproduced by setting a breakpoint to stop thread on BulkEditOperation.java line 184.
- The bulk edit task is in memory and is communicated to all nodes.
- Nodes will keep this task in memory until it is deleted, or all nodes are down (clearing tasks in memory)
- Task is deleted when the operation is complete. Operation happens on the node the task started on.
In some cases, it may possible to manually delete the stuck job. This should only be done after being absolutely certain that the job is no longer running.
Example for and :
- This needs to be run by the user who executed the task. In addition the user must be a JIRA Administrator in order to delete the task.
- If necessary grant temporary admin permission, delete task, then remove admin permission.
- URL below applies to bulk edits as well as project changes
There is a internal REST point to stop tasks.
- DELETE /rest/projectconfig/1/migrationStatus/#id
- "#id" can be taken from the progress page's URL.