Create logging event for clusterlockstatus during startup

XMLWordPrintable

    • 2
    • 15

      Problem:

      In JIRA Data Center, it is possible that a node is not able to startup due to other node is holding a cluster lock.

      Symptom:

      JIRA is not able to startup, the log records it stucks getting current.getstatus()

      2017-09-03 19:50:07,583 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:07,584 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:07,585 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:07,585 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:07,586 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:07,587 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:10,095 localhost-startStop-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:10,097 localhost-startStop-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:10,581 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:10,583 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus() 
      

      From the CLUSTERLOCKSTATUS, we can see one of the nodes holding the clusterlock

      node-2 is not able to startup with the following found in the thread dump of node-2

      "localhost-startStop-1" #36 daemon prio=5 os_prio=0 tid=0x00007f732c002000 nid=0x666f waiting on condition [0x00007f73454e1000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
      	at java.lang.Thread.sleep(Native Method)
      	at com.atlassian.beehive.db.DatabaseClusterLock.sleep(DatabaseClusterLock.java:530)
      	at com.atlassian.beehive.db.DatabaseClusterLock.uninterruptibleWait(DatabaseClusterLock.java:102)
      	at com.atlassian.beehive.db.DatabaseClusterLock.lock(DatabaseClusterLock.java:82)
      	at com.atlassian.beehive.compat.delegate.DelegatingClusterLock.lock(DelegatingClusterLock.java:34)
      	at com.atlassian.upm.impl.Locks.runWithLock(Locks.java:114)
      	at com.atlassian.upm.impl.Locks.writeWithLock(Locks.java:81)
      	at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.launch(AsynchronousTaskStatusStoreImpl.java:459)
      	at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.onLifecycleEvent(AsynchronousTaskStatusStoreImpl.java:428)
      	at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.onStart(AsynchronousTaskStatusStoreImpl.java:389)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$4.consume(DefaultLifecycleManager.java:277)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$4.consume(DefaultLifecycleManager.java:274)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyLifecyleAware(DefaultLifecycleManager.java:303)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyOnStartIfStartedAndEnabled(DefaultLifecycleManager.java:273)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.access$300(DefaultLifecycleManager.java:49)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$3.evaluate(DefaultLifecycleManager.java:235)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$3.evaluate(DefaultLifecycleManager.java:232)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyLifecycleAwares(DefaultLifecycleManager.java:258)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyStartableLifecycleAwares(DefaultLifecycleManager.java:231)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.startIfApplicationSetup(DefaultLifecycleManager.java:219)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.start(DefaultLifecycleManager.java:210)
      	at com.atlassian.sal.jira.lifecycle.JiraLifecycleManager.onJiraStart(JiraLifecycleManager.java:64)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at com.atlassian.event.internal.SingleParameterMethodListenerInvoker.invoke(SingleParameterMethodListenerInvoker.java:36)
      	at com.atlassian.event.internal.AsynchronousAbleEventDispatcher$1$1.run(AsynchronousAbleEventDispatcher.java:48)
      	at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
      	at com.atlassian.event.internal.AsynchronousAbleEventDispatcher.dispatch(AsynchronousAbleEventDispatcher.java:107)
      	at com.atlassian.event.internal.EventPublisherImpl.invokeListeners(EventPublisherImpl.java:160)
      	at com.atlassian.event.internal.EventPublisherImpl.publish(EventPublisherImpl.java:79) 
      

      In particular the following line

      com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.onStart(AsynchronousTaskStatusStoreImpl.java:389)

      Recommendation

      Add logging if another node is holding cluster wide lock, in this case com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl preventing startup from completing.

        1. clusterlock.png
          211 kB
          vkharisma

            Assignee:
            Unassigned
            Reporter:
            vkharisma
            Votes:
            10 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated: