Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-70663

Tasks can be cleaned too early in Jira DC 8.7.1

    XMLWordPrintable

Details

    • 8.07
    • 13
    • Severity 2 - Major
    • 27
    • Hide
      Atlassian Update – 23 Mar 2020

      Hi everyone,

      Jira 8.8.0 has been released on 19 Mar 2020 which contains the fix for the experienced bug.
      Therefore we advise all affected customers to upgrade to 8.8.0 or later.

      Ignat Alexeyenko
      Senior Engineer, Jira Server Team

      Show
      Atlassian Update – 23 Mar 2020 Hi everyone, Jira 8.8.0 has been released on 19 Mar 2020 which contains the fix for the experienced bug. Therefore we advise all affected customers to upgrade to 8.8.0 or later. Ignat Alexeyenko Senior Engineer, Jira Server Team

    Description

      Summary

      A mechanism that cleans-up stale cluster tasks (JRASERVER-70585) contains a bug, where it takes a property set in minutes and passes it to a method expecting milliseconds, without converting between units.

      As a result, a task can be cleaned up way faster than it should.

      Besides not being able to check the task's progress in the UI, errors like below might appear in the logs:

      JiraTaskExecutionThread-2 ERROR /secure/admin/IndexReIndex!reindex.jspa [c.a.j.util.index.CompositeIndexLifecycleManager] Reindex All FAILED.  Indexer: SharedEntityIndexManager: paths: []
      java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException
              at com.atlassian.jira.task.TaskManagerImpl.withTaskMap(TaskManagerImpl.java:132)
              at com.atlassian.jira.task.TaskManagerImpl.refreshTaskInTaskCache(TaskManagerImpl.java:573)
              at com.atlassian.jira.task.TaskManagerImpl.onProgressMade(TaskManagerImpl.java:481)
              at com.atlassian.jira.task.TaskProgressAdapter.makeProgress(TaskProgressAdapter.java:44)
              at com.atlassian.jira.task.context.TaskProgressPercentageContextSink.updateProgress(TaskProgressPercentageContextSink.java:41)
              at com.atlassian.jira.task.context.CompositeSink.updateProgress(CompositeSink.java:36)
              at com.atlassian.jira.task.context.PercentageContext$Progress.update(PercentageContext.java:64)
              at com.atlassian.jira.task.context.PercentageContext.setName(PercentageContext.java:34)
              at com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndex(DefaultSharedEntityIndexManager.java:118)
              at com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndexAll(DefaultSharedEntityIndexManager.java:93)
              at com.atlassian.jira.util.index.CompositeIndexLifecycleManager.reIndexAll(CompositeIndexLifecycleManager.java:66)
              at com.atlassian.jira.util.index.CompositeIndexLifecycleManager.reIndexAll(CompositeIndexLifecycleManager.java:49)
              at com.atlassian.jira.web.action.admin.index.ReIndexAsyncIndexerCommand.doReindex(ReIndexAsyncIndexerCommand.java:27)
              at com.atlassian.jira.web.action.admin.index.AbstractAsyncIndexerCommand.call(AbstractAsyncIndexerCommand.java:63)
              at com.atlassian.jira.web.action.admin.index.ReIndexAsyncIndexerCommand.call(ReIndexAsyncIndexerCommand.java:18)
              at com.atlassian.jira.web.action.admin.index.AbstractAsyncIndexerCommand.call(AbstractAsyncIndexerCommand.java:26)
              at com.atlassian.jira.task.TaskManagerImpl$TaskCallableDecorator.call(TaskManagerImpl.java:533)
              at com.atlassian.jira.task.TaskManagerImpl$TaskCallableDecorator.call(TaskManagerImpl.java:491)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at com.atlassian.jira.task.ForkedThreadExecutor$ForkedRunnableDecorator.run(ForkedThreadExecutor.java:216)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
              at java.util.concurrent.FutureTask.report(FutureTask.java:122)
              at java.util.concurrent.FutureTask.get(FutureTask.java:192)
              at com.atlassian.jira.task.TaskManagerImpl.withTaskMap(TaskManagerImpl.java:130)
              ... 22 more
      Caused by: java.lang.NullPointerException
              at com.atlassian.jira.task.TaskManagerImpl.lambda$withTaskMap$1(TaskManagerImpl.java:128)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              ... 1 more 

      In the scenario above, the re-index is happening as it should and the errors concern only reporting the progress.

      Environment

      • Jira DC with 2+ nodes, running Jira 8.7.1 (the only affected Jira version)

      Note: Neither Jira 8.5.4 nor 8.8.0 are affected by this bug.

      Steps to Reproduce

      1. Perform a change that causes "Bulk Operation" action or start a re-index
      2. Create a short database connection failure
      3. The task with the job will be cleaned-up.
        The task will still be executing. However, it will not be possible to check its progress and NullPointerExceptions can appear in the logs.

      Actual Results

      The property is not converted from minutes to milliseconds.

      Expected Results

      The property is converted from minutes to milliseconds.

      Workaround

      Either one of these:

      1. Disable the feature by setting the "jira.dc.cleanup.cluser.tasks.disabled" feature flag and restart the node. You can follow our documentation to learn how to set it.
      2. Set the cluster.task.cleanup.offline.node.threshold property to a high value, e.g. 1800000 (which equals to 30 minutes).
        • Remember to remove this override before upgrading Jira, or else the mechanism will only clean-up tasks from nodes that are offline for ~3.5 years.
      3. Upgrade Jira to a version containing a fix once it's available, i.e. 8.7.2, 8.8.0, or higher.

      Please note that the workaround needs to be applied to every node in the cluster, because the tasks are being removed by other nodes, not the node that's executing the task.

      To make sure that the first workaround works, you can enable DEBUG logs for com.atlassian.jira.cluster.ClusterTaskCleanupService.
      During the service startup (when the node starts or after a plugin system restart) you should see the "ClusterTaskCleanupService is disabled" message logged, instead of the "Registering ClusterTaskCleanupService" message if the feature flag is still enabled.

      Attachments

        Issue Links

          Activity

            People

              drauf Daniel Rauf
              drauf Daniel Rauf
              Votes:
              3 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: