-
Bug
-
Resolution: Fixed
-
High
-
8.7.1
-
8.07
-
13
-
Severity 2 - Major
-
27
-
-
Summary
A mechanism that cleans-up stale cluster tasks (JRASERVER-70585) contains a bug, where it takes a property set in minutes and passes it to a method expecting milliseconds, without converting between units.
As a result, a task can be cleaned up way faster than it should.
Besides not being able to check the task's progress in the UI, errors like below might appear in the logs:
JiraTaskExecutionThread-2 ERROR /secure/admin/IndexReIndex!reindex.jspa [c.a.j.util.index.CompositeIndexLifecycleManager] Reindex All FAILED. Indexer: SharedEntityIndexManager: paths: [] java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException at com.atlassian.jira.task.TaskManagerImpl.withTaskMap(TaskManagerImpl.java:132) at com.atlassian.jira.task.TaskManagerImpl.refreshTaskInTaskCache(TaskManagerImpl.java:573) at com.atlassian.jira.task.TaskManagerImpl.onProgressMade(TaskManagerImpl.java:481) at com.atlassian.jira.task.TaskProgressAdapter.makeProgress(TaskProgressAdapter.java:44) at com.atlassian.jira.task.context.TaskProgressPercentageContextSink.updateProgress(TaskProgressPercentageContextSink.java:41) at com.atlassian.jira.task.context.CompositeSink.updateProgress(CompositeSink.java:36) at com.atlassian.jira.task.context.PercentageContext$Progress.update(PercentageContext.java:64) at com.atlassian.jira.task.context.PercentageContext.setName(PercentageContext.java:34) at com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndex(DefaultSharedEntityIndexManager.java:118) at com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndexAll(DefaultSharedEntityIndexManager.java:93) at com.atlassian.jira.util.index.CompositeIndexLifecycleManager.reIndexAll(CompositeIndexLifecycleManager.java:66) at com.atlassian.jira.util.index.CompositeIndexLifecycleManager.reIndexAll(CompositeIndexLifecycleManager.java:49) at com.atlassian.jira.web.action.admin.index.ReIndexAsyncIndexerCommand.doReindex(ReIndexAsyncIndexerCommand.java:27) at com.atlassian.jira.web.action.admin.index.AbstractAsyncIndexerCommand.call(AbstractAsyncIndexerCommand.java:63) at com.atlassian.jira.web.action.admin.index.ReIndexAsyncIndexerCommand.call(ReIndexAsyncIndexerCommand.java:18) at com.atlassian.jira.web.action.admin.index.AbstractAsyncIndexerCommand.call(AbstractAsyncIndexerCommand.java:26) at com.atlassian.jira.task.TaskManagerImpl$TaskCallableDecorator.call(TaskManagerImpl.java:533) at com.atlassian.jira.task.TaskManagerImpl$TaskCallableDecorator.call(TaskManagerImpl.java:491) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at com.atlassian.jira.task.ForkedThreadExecutor$ForkedRunnableDecorator.run(ForkedThreadExecutor.java:216) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at com.atlassian.jira.task.TaskManagerImpl.withTaskMap(TaskManagerImpl.java:130) ... 22 more Caused by: java.lang.NullPointerException at com.atlassian.jira.task.TaskManagerImpl.lambda$withTaskMap$1(TaskManagerImpl.java:128) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more
In the scenario above, the re-index is happening as it should and the errors concern only reporting the progress.
Environment
- Jira DC with 2+ nodes, running Jira 8.7.1 (the only affected Jira version)
Note: Neither Jira 8.5.4 nor 8.8.0 are affected by this bug.
Steps to Reproduce
- Perform a change that causes "Bulk Operation" action or start a re-index
- Create a short database connection failure
- The task with the job will be cleaned-up.
The task will still be executing. However, it will not be possible to check its progress and NullPointerExceptions can appear in the logs.
Actual Results
The property is not converted from minutes to milliseconds.
Expected Results
The property is converted from minutes to milliseconds.
Workaround
Either one of these:
- Disable the feature by setting the "jira.dc.cleanup.cluser.tasks.disabled" feature flag and restart the node. You can follow our documentation to learn how to set it.
- Set the cluster.task.cleanup.offline.node.threshold property to a high value, e.g. 1800000 (which equals to 30 minutes).
- Remember to remove this override before upgrading Jira, or else the mechanism will only clean-up tasks from nodes that are offline for ~3.5 years.
- Upgrade Jira to a version containing a fix once it's available, i.e. 8.7.2, 8.8.0, or higher.
Please note that the workaround needs to be applied to every node in the cluster, because the tasks are being removed by other nodes, not the node that's executing the task.
To make sure that the first workaround works, you can enable DEBUG logs for com.atlassian.jira.cluster.ClusterTaskCleanupService.
During the service startup (when the node starts or after a plugin system restart) you should see the "ClusterTaskCleanupService is disabled" message logged, instead of the "Registering ClusterTaskCleanupService" message if the feature flag is still enabled.
- is caused by
-
JRASERVER-66204 Bulk Operation can get stuck in JIRA Data Center
- Closed
-
JRASERVER-70585 Periodically clean-up cluster tasks from offline nodes
- Closed