-
Suggestion
-
Resolution: Unresolved
-
None
-
2
-
5
-
Problem:
In JIRA Data Center, it is possible that a node is not able to startup due to other node is holding a cluster lock.
Symptom:
JIRA is not able to startup, the log records it stucks getting current.getstatus()
2017-09-03 19:50:07,583 ClusterMessageHandlerServiceThread:thread-1 DEBUG [o.objectweb.jotm.jta] Current.getStatus() 2017-09-03 19:50:07,584 ClusterMessageHandlerServiceThread:thread-1 DEBUG [o.objectweb.jotm.jta] Current.getStatus() 2017-09-03 19:50:07,585 ClusterMessageHandlerServiceThread:thread-1 DEBUG [o.objectweb.jotm.jta] Current.getStatus() 2017-09-03 19:50:07,585 ClusterMessageHandlerServiceThread:thread-1 DEBUG [o.objectweb.jotm.jta] Current.getStatus() 2017-09-03 19:50:07,586 ClusterMessageHandlerServiceThread:thread-1 DEBUG [o.objectweb.jotm.jta] Current.getStatus() 2017-09-03 19:50:07,587 ClusterMessageHandlerServiceThread:thread-1 DEBUG [o.objectweb.jotm.jta] Current.getStatus() 2017-09-03 19:50:10,095 localhost-startStop-1 DEBUG [o.objectweb.jotm.jta] Current.getStatus() 2017-09-03 19:50:10,097 localhost-startStop-1 DEBUG [o.objectweb.jotm.jta] Current.getStatus() 2017-09-03 19:50:10,581 ClusterMessageHandlerServiceThread:thread-1 DEBUG [o.objectweb.jotm.jta] Current.getStatus() 2017-09-03 19:50:10,583 ClusterMessageHandlerServiceThread:thread-1 DEBUG [o.objectweb.jotm.jta] Current.getStatus()
From the CLUSTERLOCKSTATUS, we can see one of the nodes holding the clusterlock
node-2 is not able to startup with the following found in the thread dump of node-2
"localhost-startStop-1" #36 daemon prio=5 os_prio=0 tid=0x00007f732c002000 nid=0x666f waiting on condition [0x00007f73454e1000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at com.atlassian.beehive.db.DatabaseClusterLock.sleep(DatabaseClusterLock.java:530) at com.atlassian.beehive.db.DatabaseClusterLock.uninterruptibleWait(DatabaseClusterLock.java:102) at com.atlassian.beehive.db.DatabaseClusterLock.lock(DatabaseClusterLock.java:82) at com.atlassian.beehive.compat.delegate.DelegatingClusterLock.lock(DelegatingClusterLock.java:34) at com.atlassian.upm.impl.Locks.runWithLock(Locks.java:114) at com.atlassian.upm.impl.Locks.writeWithLock(Locks.java:81) at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.launch(AsynchronousTaskStatusStoreImpl.java:459) at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.onLifecycleEvent(AsynchronousTaskStatusStoreImpl.java:428) at com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.onStart(AsynchronousTaskStatusStoreImpl.java:389) at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$4.consume(DefaultLifecycleManager.java:277) at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$4.consume(DefaultLifecycleManager.java:274) at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyLifecyleAware(DefaultLifecycleManager.java:303) at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyOnStartIfStartedAndEnabled(DefaultLifecycleManager.java:273) at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.access$300(DefaultLifecycleManager.java:49) at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$3.evaluate(DefaultLifecycleManager.java:235) at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager$3.evaluate(DefaultLifecycleManager.java:232) at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyLifecycleAwares(DefaultLifecycleManager.java:258) at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.notifyStartableLifecycleAwares(DefaultLifecycleManager.java:231) at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.startIfApplicationSetup(DefaultLifecycleManager.java:219) at com.atlassian.sal.core.lifecycle.DefaultLifecycleManager.start(DefaultLifecycleManager.java:210) at com.atlassian.sal.jira.lifecycle.JiraLifecycleManager.onJiraStart(JiraLifecycleManager.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.atlassian.event.internal.SingleParameterMethodListenerInvoker.invoke(SingleParameterMethodListenerInvoker.java:36) at com.atlassian.event.internal.AsynchronousAbleEventDispatcher$1$1.run(AsynchronousAbleEventDispatcher.java:48) at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299) at com.atlassian.event.internal.AsynchronousAbleEventDispatcher.dispatch(AsynchronousAbleEventDispatcher.java:107) at com.atlassian.event.internal.EventPublisherImpl.invokeListeners(EventPublisherImpl.java:160) at com.atlassian.event.internal.EventPublisherImpl.publish(EventPublisherImpl.java:79)
In particular the following line
com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl.onStart(AsynchronousTaskStatusStoreImpl.java:389)
Recommendation
Add logging if another node is holding cluster wide lock, in this case com.atlassian.upm.core.async.AsynchronousTaskStatusStoreImpl preventing startup from completing.
- is related to
-
JRASERVER-66596 JIRA Datacenter - Add Cluster lock status page which doesn't use locks
- Gathering Interest
- relates to
-
JRASERVER-66597 JIRA DC might lose Cluster lock due database connectivity problems
- Closed