Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-74298

Jira node fails to start due to cluster lock at the active objects

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • High
    • None
    • 8.22.6
    • Data Center - Other
    • 8.22
    • 12
    • Severity 2 - Major
    • 41
    • Hide
      Atlassian Update - 23rd June 2023

      Having read the description of this issue, as well as the related Atlassian Support tickets, we are unable to say if cluster locks were indeed responsible for the reported problems. Therefore, we decided to close this issue as Cannot Reproduce.

      For more details, see this comment.

      Kamil Cichy
      Jira Data Center

      Show
      Atlassian Update - 23rd June 2023 Having read the description of this issue, as well as the related Atlassian Support tickets, we are unable to say if cluster locks were indeed responsible for the reported problems. Therefore, we decided to close this issue as Cannot Reproduce . For more details, see this comment . Kamil Cichy Jira Data Center

    Description

      Jira node might fail to unlock if there is a database connectivity problem.

      When other nodes are starting up, the localhost-startStop thread sees the lock and waits, and the node startup hangs. 

      Environment

       JIRA Datacenter

      Steps to Reproduce

      1. Set up a Datacenter JIRA instance with 2 nodes
      2. Initiate action that would obtain the lock
      3. Break connection to the database while this node has a clusterlock and tries to unlock it

      Expected Results

      JIRA will warn that there is a lock for the active objects from the other node.

      OR

      The other node will unlock once connected back to the database.

      Actual Results

      Node does not start.
      Thread dumps collected during node startup show the localhost-startStop thread waiting for the lock for ActiveObjects:

      10:21:15 - localhost-startStop-1
      State:WAITING
      CPU usage:0.0%
      Running for: 0:56.95
      Waiting for
      This thread is waiting for notification on lock [0x73ba00030] without an owner
      Locks held
      This thread holds [0x73ba002b0, 0x500a991f8, 0x500008af8, 0x500008af8]
      
      Stack trace
      jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method)
      java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/Unknown Source)
      java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.13/Unknown Source)
      java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.13/Unknown Source)
      java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.13/Unknown Source)
      java.util.concurrent.CompletableFuture.get(java.base@11.0.13/Unknown Source)
      io.atlassian.util.concurrent.Promises$OfStage.claim(Promises.java:280)
      com.atlassian.activeobjects.osgi.TenantAwareActiveObjects.flushAll(TenantAwareActiveObjects.java:247)
      jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.13/Native Method)
      jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.13/Unknown Source)
      jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.13/Unknown Source)
      java.lang.reflect.Method.invoke(java.base@11.0.13/Unknown Source)
      org.joor.Reflect.on(Reflect.java:673)
      org.joor.Reflect.call(Reflect.java:379)
      org.joor.Reflect.call(Reflect.java:332)
      com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl.invokeAo(DatabaseSchemaCreationImpl.java:86)
      com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl$$Lambda$2360/0x0000000803c4e440.apply(Unknown Source)
      io.atlassian.fugue.Effect.accept(Effect.java:43)
      io.atlassian.fugue.Option$Some.forEach(Option.java:468)
      io.atlassian.fugue.Option$Some.foreach(Option.java:464)
      com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl.lambda$primeImpl$0(DatabaseSchemaCreationImpl.java:66)
      com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl$$Lambda$1532/0x0000000802169040.apply(Unknown Source)
      com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference.lambda$get$0(MemoizingResettingReference.java:59)
      com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference$$Lambda$2359/0x0000000803c4e040.get(Unknown Source)
      com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference$SmarterMemoizingSupplier.get(MemoizingResettingReference.java:150)
      com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference.safelyGetT(MemoizingResettingReference.java:71)
      com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference.get(MemoizingResettingReference.java:63)
      com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl.prime(DatabaseSchemaCreationImpl.java:60)
      com.atlassian.pocketknife.internal.querydsl.DatabaseAccessorImpl.execute(DatabaseAccessorImpl.java:62)
      com.atlassian.pocketknife.internal.querydsl.DatabaseAccessorImpl.runInTransaction(DatabaseAccessorImpl.java:43)
      com.atlassian.ratelimiting.db.internal.dao.QDSLSystemRateLimitingSettingsDao.initializeDbIfNeeded(QDSLSystemRateLimitingSettingsDao.java:39)
      com.atlassian.ratelimiting.internal.configuration.DefaultSystemPropertiesService.initializeData(DefaultSystemPropertiesService.java:64)
      com.atlassian.ratelimiting.internal.settings.RateLimitModificationSettingsService.onPluginEnabled(RateLimitModificationSettingsService.java:96)
      jdk.internal.reflect.GeneratedMethodAccessor331.invoke(Unknown Source)
      jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.13/Unknown Source)
      java.lang.reflect.Method.invoke(java.base@11.0.13/Unknown Source)
      com.atlassian.event.internal.SingleParameterMethodListenerInvoker.invoke(SingleParameterMethodListenerInvoker.java:42)
      com.atlassian.event.internal.ComparableListenerInvoker.invoke(ComparableListenerInvoker.java:48)
      com.atlassian.event.internal.AsynchronousAbleEventDispatcher.lambda$null$0(AsynchronousAbleEventDispatcher.java:37)
      com.atlassian.event.internal.AsynchronousAbleEventDispatcher$$Lambda$707/0x0000000800a60c40.run(Unknown Source)
      com.atlassian.event.internal.AsynchronousAbleEventDispatcher$$Lambda$180/0x0000000800376440.execute(Unknown Source)
      com.atlassian.event.internal.AsynchronousAbleEventDispatcher.dispatch(AsynchronousAbleEventDispatcher.java:85)
      com.atlassian.event.internal.EventPublisherImpl.publish(EventPublisherImpl.java:114)
      com.atlassian.event.internal.LockFreeEventPublisher.publish(LockFreeEventPublisher.java:40)
      com.atlassian.plugin.event.impl.DefaultPluginEventManager.broadcast(DefaultPluginEventManager.java:90)
      com.atlassian.plugin.manager.DefaultPluginManager.broadcastIgnoreError(DefaultPluginManager.java:1972)
      com.atlassian.plugin.manager.DefaultPluginManager.lambda$broadcastPluginEnabled$41(DefaultPluginManager.java:1782)
      com.atlassian.plugin.manager.DefaultPluginManager$$Lambda$2192/0x0000000803982040.run(Unknown Source)
      com.atlassian.plugin.manager.PluginTransactionContext.wrap(PluginTransactionContext.java:63)
      com.atlassian.plugin.manager.DefaultPluginManager.broadcastPluginEnabled(DefaultPluginManager.java:1781)
      com.atlassian.plugin.manager.DefaultPluginManager.lambda$enableDependentPlugins$24(DefaultPluginManager.java:1260)
      com.atlassian.plugin.manager.DefaultPluginManager$$Lambda$1018/0x0000000800dd2440.run(Unknown Source)
      com.atlassian.plugin.manager.PluginTransactionContext.wrap(PluginTransactionContext.java:63)
      com.atlassian.plugin.manager.DefaultPluginManager.enableDependentPlugins(DefaultPluginManager.java:1229)
      com.atlassian.plugin.manager.DefaultPluginManager.lambda$addPlugins$22(DefaultPluginManager.java:1214)
      com.atlassian.plugin.manager.DefaultPluginManager$$Lambda$1014/0x0000000800dd3440.run(Unknown Source)
      com.atlassian.plugin.manager.PluginTransactionContext.wrap(PluginTransactionContext.java:63)
      com.atlassian.plugin.manager.DefaultPluginManager.addPlugins(DefaultPluginManager.java:1114)
      com.atlassian.jira.plugin.JiraPluginManager.addPlugins(JiraPluginManager.java:157)
      com.atlassian.plugin.manager.DefaultPluginManager.lambda$earlyStartup$5(DefaultPluginManager.java:593)
      com.atlassian.plugin.manager.DefaultPluginManager$$Lambda$756/0x0000000800a83840.run(Unknown Source)
      com.atlassian.plugin.manager.PluginTransactionContext.wrap(PluginTransactionContext.java:63)
      com.atlassian.plugin.manager.DefaultPluginManager.earlyStartup(DefaultPluginManager.java:528)
      

      The Active-Objects init thread is waiting for the lock:

      10:21:04 - active-objects-init-JiraTenantImpl{id='system'}-0
      State:TIMED-WAITING
      CPU usage:6.7%
      Running for: 0:03.75
      Waiting for
      This thread is not waiting for notification on any lock
      Locks held
      This thread does not hold any locks
      
      Stack trace
      java.lang.Thread.sleep(java.base@11.0.13/Native Method)
      com.atlassian.beehive.db.DatabaseClusterLock.sleep(DatabaseClusterLock.java:621)
      com.atlassian.beehive.db.DatabaseClusterLock.tryLockWaitWithTimeout(DatabaseClusterLock.java:472)
      com.atlassian.beehive.db.DatabaseClusterLock.tryLock(DatabaseClusterLock.java:453)
      com.atlassian.activeobjects.internal.AbstractActiveObjectsFactory.create(AbstractActiveObjectsFactory.java:57)
      com.atlassian.activeobjects.internal.DelegatingActiveObjectsFactory.create(DelegatingActiveObjectsFactory.java:32)
      com.atlassian.activeobjects.osgi.TenantAwareActiveObjects$1$1$1.call(TenantAwareActiveObjects.java:91)
      com.atlassian.activeobjects.osgi.TenantAwareActiveObjects$1$1$1.call(TenantAwareActiveObjects.java:86)
      com.atlassian.sal.core.executor.ThreadLocalDelegateCallable.call(ThreadLocalDelegateCallable.java:38)
      java.util.concurrent.FutureTask.run(java.base@11.0.13/Unknown Source)
      java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/Unknown Source)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/Unknown Source)
      java.lang.Thread.run(java.base@11.0.13/Unknown Source)
      

      However, table clusterlockstatus shows that there is a lock from another node being held for more than one week, since some Database Connection problems had occurred:

      ID LOCK_NAME LOCKED_BY_NODE UPDATE_TIME
      1747827 ao-plugin.upgrade.com.atlassian.jira.migration.jira-migration-plugin node4 1660313500762

      Workaround

      Workaround 1
      Restart the problematic node. See the following KB article for full instructions on this workaround:
      https://confluence.atlassian.com/jirakb/jira-data-center-functionalities-loss-due-to-cluster-wide-lock-942860754.html

      Workaround 2

      • Run the following query to find the lock:
        select * from clusterlockstatus where locked_by_node is not NULL;
        
      • The query will return something like:
        ID LOCK_NAME LOCKED_BY_NODE UPDATE_TIME
        1747827 ao-plugin.upgrade.com.atlassian.jira.migration.jira-migration-plugin node4 1660313500762

       * Remove the lock:

      delete from clusterlockstatus where id = 1747827;
      

      Attachments

        Issue Links

          Activity

            People

              kcichy Kamil Cichy
              imurakami@atlassian.com Murakami
              Votes:
              5 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: