Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-65890

Create logging event for clusterlockstatus during startup

XMLWordPrintable

    • 2
    • 5
    • We collect Jira feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      Problem:

      In JIRA Data Center, it is possible that a node is not able to startup due to other node is holding a cluster lock.

      Symptom:

      JIRA is not able to startup, the log records it stucks getting current.getstatus()

      2017-09-03 19:50:07,583 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:07,584 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:07,585 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:07,585 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:07,586 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:07,587 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:10,095 localhost-startStop-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:10,097 localhost-startStop-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:10,581 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus()
      2017-09-03 19:50:10,583 ClusterMessageHandlerServiceThread:thread-1 DEBUG      [o.objectweb.jotm.jta] Current.getStatus() 
      

      From the CLUSTERLOCKSTATUS, we can see one of the nodes holding the clusterlock

      node-2 is not able to startup with the following found in the thread dump of node-2

      "localhost-startStop-1" #36 daemon prio=5 os_prio=0 tid=0x00007f732c002000 nid=0x666f waiting on condition [0x00007f73454e1000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
      	at java.lang.Thread.sleep(Native Method)
      	at com.atlassian.beehive.db.DatabaseClusterLock.sleep(DatabaseClusterLock.java:530)
      	at com.atlassian.beehive.db.DatabaseClusterLock.uninterruptibleWait(DatabaseClusterLock.java:102)
      	at com.atlassian.beehive.db.DatabaseClusterLock.lock(DatabaseClusterLock.java:82)
      	at com.atlassian.beehive.compat.delegate.DelegatingClusterLock.lock(DelegatingClusterLock.java:34)
      	at com.atlassian.upm.impl.Locks.runWithLock(Locks.java:114)
      	at com.atlassian.upm.impl.Locks.writeWithLock(Locks.java:81)
      	at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.launch(AsynchronousTaskStatusStoreImpl.java:459)
      	at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.onLifecycleEvent(AsynchronousTaskStatusStoreImpl.java:428)
      	at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.onStart(AsynchronousTaskStatusStoreImpl.java:389)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$4.consume(DefaultLifecycleManager.java:277)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$4.consume(DefaultLifecycleManager.java:274)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyLifecyleAware(DefaultLifecycleManager.java:303)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyOnStartIfStartedAndEnabled(DefaultLifecycleManager.java:273)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.access$300(DefaultLifecycleManager.java:49)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$3.evaluate(DefaultLifecycleManager.java:235)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$3.evaluate(DefaultLifecycleManager.java:232)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyLifecycleAwares(DefaultLifecycleManager.java:258)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyStartableLifecycleAwares(DefaultLifecycleManager.java:231)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.startIfApplicationSetup(DefaultLifecycleManager.java:219)
      	at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.start(DefaultLifecycleManager.java:210)
      	at com.atlassian.sal.jira.lifecycle.JiraLifecycleManager.onJiraStart(JiraLifecycleManager.java:64)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at com.atlassian.event.internal.SingleParameterMethodListenerInvoker.invoke(SingleParameterMethodListenerInvoker.java:36)
      	at com.atlassian.event.internal.AsynchronousAbleEventDispatcher$1$1.run(AsynchronousAbleEventDispatcher.java:48)
      	at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
      	at com.atlassian.event.internal.AsynchronousAbleEventDispatcher.dispatch(AsynchronousAbleEventDispatcher.java:107)
      	at com.atlassian.event.internal.EventPublisherImpl.invokeListeners(EventPublisherImpl.java:160)
      	at com.atlassian.event.internal.EventPublisherImpl.publish(EventPublisherImpl.java:79) 
      

      In particular the following line

      com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.onStart(AsynchronousTaskStatusStoreImpl.java:389)

      Recommendation

      Add logging if another node is holding cluster wide lock, in this case com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl preventing startup from completing.

            Unassigned Unassigned
            vkharisma vkharisma (Inactive)
            Votes:
            10 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated: