Multithreaded index catch-up may fail if it encounters comment versions or worklogs versions without corresponding jiraaction entries

XMLWordPrintable

    • 9.12
    • 16
    • Severity 1 - Critical
    • 208

      Issue Summary

      If the new multithreaded index catch-up feature introduced in JRASERVER-76237: Index catch-up of Comments and Worklogs isn't multi-threaded during Jira startup encounters comment versions or worklogs versions without corresponding rows in the jiraaction table, it can fail and trigger a full reindex, significantly prolonging node startup times.

      If full reindexing on startup is disabled the situation is even worse - Jira will mark the node startup as failed, causing other services such as the task scheduler not to start. Subsequently running a full reindex does not improve this situation; when the node is restarted to bring up the task scheduler, the index catch-up will run again and fail again. This has particularly severe consequences during a non-ZDU upgrade when all nodes have been shut down, and the first node to start fails to recover the index.

      Steps to Reproduce

      1. Set up a clustered Jira environment with two nodes.
      2. Stop node 2.
      3. Add the startup parameter -Dcom.atlassian.jira.startup.allow.full.reindex=false to node 2.
      4. Log into node 1 and create a new Jira issue.
      5. Add a comment to the newly created Jira issue.
      6. Immediately run the following SQL query on the Jira database:
        DELETE FROM jiraaction WHERE id = (SELECT MAX(comment_id) FROM comment_version);
        
      1. Repeat steps 5 and 6 ten times.
      2. Start node 2.

      Expected Results

      Jira should recover its index on startup normally.

      Actual Results

      Jira fails to recover its index. If full reindexing on startup has been disabled, the Jira node will then fail to start:

      2025-02-25 06:57:08,044+0000 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] Reindexing comments.
      2025-02-25 06:57:08,047+0000 main WARN      [c.a.jira.index.AccumulatingResultBuilder] java.util.concurrent.ExecutionException: java.lang.NullPointerException: Cannot invoke "com.atlassian.jira.issue.comments.Comment.getIssue()" because "entity" is null
      java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException: Cannot invoke "com.atlassian.jira.issue.comments.Comment.getIssue()" because "entity" is null
              at com.atlassian.jira.index.FutureResult.await(FutureResult.java:40)
              at com.atlassian.jira.index.AccumulatingResultBuilder$CompositeResult.await(AccumulatingResultBuilder.java:177)
              at com.atlassian.jira.issue.index.IndexLock.obtain(IndexLock.java:50)
              at com.atlassian.jira.issue.index.DefaultIndexManager.await(DefaultIndexManager.java:936)
              at com.atlassian.jira.issue.index.DefaultIndexManager.lambda$reindexRelatedEntityUnderWriteLock$18(DefaultIndexManager.java:1044)
              at com.atlassian.jira.util.thread.JiraThreadLocalUtils.lambda$wrap$1(JiraThreadLocalUtils.java:156)
              at com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock(DefaultIndexManager.java:431)
              at com.atlassian.jira.issue.index.DefaultIndexManager.reindexRelatedEntityUnderWriteLock(DefaultIndexManager.java:1041)
              at com.atlassian.jira.issue.index.DefaultIndexManager.reindexCommentsInParallel(DefaultIndexManager.java:1028)
              at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
              at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.base/java.lang.reflect.Method.invoke(Method.java:569)
              at com.atlassian.jira.config.component.SwitchingInvocationHandler.invoke(SwitchingInvocationHandler.java:38)
              at jdk.proxy3/jdk.proxy3.$Proxy98.reindexCommentsInParallel(Unknown Source)
              at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.reindexCommentsByIds(DefaultIndexRecoveryManager.java:363)
              at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.reindexOutdatedEntities(DefaultIndexRecoveryManager.java:310)
              at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.reindexWithVersionCheckEntitiesUpdatedInTheLast(DefaultIndexRecoveryManager.java:247)
              at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager$ReplaceIndexRunner.catchUp(DefaultIndexRecoveryManager.java:536)
              at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager$ReplaceIndexRunner.run(DefaultIndexRecoveryManager.java:494)
              at com.atlassian.jira.util.thread.JiraThreadLocalUtils.lambda$wrap$1(JiraThreadLocalUtils.java:156)
              at com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock(DefaultIndexManager.java:431)
              at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
              at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.base/java.lang.reflect.Method.invoke(Method.java:569)
              at com.atlassian.jira.config.component.SwitchingInvocationHandler.invoke(SwitchingInvocationHandler.java:38)
              at jdk.proxy3/jdk.proxy3.$Proxy83.withReindexLock(Unknown Source)
              at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:172)
              at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.safeRecoverIndexFromBackup(DefaultIndexRecoveryManager.java:139)
              at com.atlassian.jira.index.DefaultIndexFetcher.recoverIndexFromMostRecentSnapshot(DefaultIndexFetcher.java:108)
              at com.atlassian.jira.cluster.DefaultClusterManager.pickIndexSnapshotFromSharedHome(DefaultClusterManager.java:439)
              at com.atlassian.jira.cluster.DefaultClusterManager.checkIndexOnStart(DefaultClusterManager.java:221)
              at com.atlassian.jira.startup.ClusteringLauncher.clusterSynchronizedCheckIndex(ClusteringLauncher.java:99)
              at com.atlassian.jira.startup.ClusteringLauncher.start(ClusteringLauncher.java:130)
              at com.atlassian.jira.startup.DefaultJiraLauncher.postDBActivated(DefaultJiraLauncher.java:177)
              at com.atlassian.jira.startup.DefaultJiraLauncher.lambda$postDbLaunch$2(DefaultJiraLauncher.java:154)
              at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrEnqueue(DatabaseConfigurationManagerImpl.java:305)
              at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrWhenDatabaseActivated(DatabaseConfigurationManagerImpl.java:202)
              at com.atlassian.jira.startup.DefaultJiraLauncher.postDbLaunch(DefaultJiraLauncher.java:144)
              at com.atlassian.jira.startup.DefaultJiraLauncher.lambda$start$0(DefaultJiraLauncher.java:109)
              at com.atlassian.jira.util.devspeed.JiraDevSpeedTimer.run(JiraDevSpeedTimer.java:31)
              at com.atlassian.jira.startup.DefaultJiraLauncher.start(DefaultJiraLauncher.java:107)
              at com.atlassian.jira.startup.LauncherContextListener.initSlowStuff(LauncherContextListener.java:162)
              at com.atlassian.jira.startup.LauncherContextListener.initSlowStuffInBackground(LauncherContextListener.java:147)
              at com.atlassian.jira.startup.LauncherContextListener.contextInitialized(LauncherContextListener.java:105)
              ... 5 filtered
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
              at org.apache.tomcat.util.threads.InlineExecutorService.execute(InlineExecutorService.java:75)
              at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:145)
              ... 5 filtered
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
              at org.apache.tomcat.util.threads.InlineExecutorService.execute(InlineExecutorService.java:75)
              at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:145)
              ... 8 filtered
              at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
              at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.base/java.lang.reflect.Method.invoke(Method.java:569)
              ... 2 filtered
      Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException: Cannot invoke "com.atlassian.jira.issue.comments.Comment.getIssue()" because "entity" is null
              at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
              at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
              at com.atlassian.jira.index.FutureResult.await(FutureResult.java:38)
              ... 75 more
      Caused by: java.lang.NullPointerException: Cannot invoke "com.atlassian.jira.issue.comments.Comment.getIssue()" because "entity" is null
              ... 2 filtered
              at com.atlassian.jira.issue.index.DefaultIssueIndexer$CommentOperation.shouldDeindex(DefaultIssueIndexer.java:1202)
              at com.atlassian.jira.issue.index.DefaultIssueIndexer$CommentOperation.shouldDeindex(DefaultIssueIndexer.java:1195)
              at java.base/java.util.stream.Collectors.lambda$partitioningBy$62(Collectors.java:1396)
              at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
              at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1707)
              at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
              at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
              at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
              at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
              at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
              at com.atlassian.jira.issue.index.DefaultIssueIndexer.lambda$processInnerBatch$6(DefaultIssueIndexer.java:339)
              at com.atlassian.jira.index.SimpleIndexingStrategy.apply(SimpleIndexingStrategy.java:7)
              at com.atlassian.jira.index.SimpleIndexingStrategy.apply(SimpleIndexingStrategy.java:5)
              at com.atlassian.jira.index.MultiThreadedIndexingStrategy$1.call(MultiThreadedIndexingStrategy.java:47)
              at com.atlassian.jira.index.MultiThreadedIndexingStrategy$1.call(MultiThreadedIndexingStrategy.java:43)
              at com.atlassian.jira.util.concurrent.BoundedExecutor$2.call(BoundedExecutor.java:68)
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
              at java.base/java.lang.Thread.run(Thread.java:840)
      2025-02-25 06:57:08,057+0000 main ERROR      [c.a.jira.cluster.DefaultClusterManager] Current node: jira_node2. Couldn't recover index even though it had been found in shared. Current list of other nodes: [jira_node1]
      com.atlassian.jira.index.IndexingFailureException: Indexing completed with 1 errors
      [...]
      2025-02-25 06:57:08,064+0000 main INFO      [c.a.jira.cluster.DefaultClusterManager] Current node: jira_node2. Will not trigger full foreground reindex, because system property com.atlassian.jira.startup.allow.full.reindex is set to false.
      2025-02-25 06:57:08,064+0000 main ERROR      [c.a.jira.cluster.DefaultClusterManager] Failed to prepare local index. Jira is in an unhealthy state.
      2025-02-25 06:57:08,070+0000 main INFO      [c.a.j.index.ha.DefaultNodeReindexService] [INDEX-REPLAY] Created node re-index service, paused=true, running period=5sec, delay=10sec
      2025-02-25 06:57:08,070+0000 main INFO      [c.a.j.index.ha.DefaultNodeReindexService] [INDEX-REPLAY] Pausing node re-index service
      com.atlassian.jira.index.ha.DefaultNodeReindexService$StackCollector
      [...]
      2025-02-25 06:57:08,073+0000 main INFO      [c.a.jira.cluster.DefaultClusterManager] Failed to get index on this node. Blocking the start of this instance.
      2025-02-25 06:57:08,092+0000 main INFO      [c.a.jira.cluster.DefaultClusterManager] Done checkIndexOnStart in: PT3.717612766S
      [...]
      2025-02-25 06:57:08,352+0000 main ERROR      [c.a.jira.upgrade.UpgradeLauncher] Skipping, JIRA is locked.
      2025-02-25 06:57:08,354+0000 main INFO      [c.a.jira.scheduler.JiraSchedulerLauncher] JIRA Scheduler not started: JIRA startup failed.
      2025-02-25 06:57:08,407+0000 main INFO      [c.a.jira.startup.LauncherContextListener] Startup is complete. Jira is ready to serve.
      

      Workaround

      The problem can also be worked around by taking a database backup, then clearing the affected rows from the comment_version table:

      MySQL / MS SQL

      DELETE cv FROM comment_version cv LEFT JOIN jiraaction ja ON cv.comment_id = ja.id JOIN jiraissue ji ON cv.parent_issue_id = ji.id WHERE cv.deleted = 'N' AND ja.id IS NULL AND (ji.archived IS NULL OR ji.archived != 'Y');
      

      PostgreSQL / Oracle

      DELETE FROM comment_version WHERE comment_id IN (SELECT cv.comment_id FROM comment_version cv LEFT JOIN jiraaction ja ON cv.comment_id = ja.id JOIN jiraissue ji ON cv.parent_issue_id = ji.id WHERE cv.deleted = 'N' AND ja.id IS NULL AND (ji.archived IS NULL OR ji.archived != 'Y'));
      

            Assignee:
            Unassigned
            Reporter:
            Marcus Fong
            Votes:
            17 Vote for this issue
            Watchers:
            46 Start watching this issue

              Created:
              Updated:
              Resolved: