Issue Summary
Bamboo primary node status is not getting updated when the secondary node takes over.
Both node are in RUNNING state in the Bamboo administration-> Clustering page.
In this case node1 was experiencing a thread starvation or clock leap as a result it's Primary lock was lost. The node2 Primary lock was acquired and status was changed to 'RUNNING' from 'RUNNING_AS_SECONDARY', where are node1 was still showing as RUNNING.
Steps to Reproduce
No
Expected Results
When secondary node was able to take a primary lock, It should make other node as secondary.
Actual Results
Node2 should show are RUNNING
Node1 should be shutting down and shows as offline in the Bamboo administration-> Clustering page after 5 minutes.
We saw the "acquireClusterLock.quartz_Worker-1" thread was stuck on PrimaryNodeServiceImpl.shutdown method;
java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait0(java.base@21.0.7/Native Method) - waiting on <no object reference available> at java.lang.Object.wait(java.base@21.0.7/Object.java:366) at java.lang.Thread.join(java.base@21.0.7/Thread.java:2079) - locked <xxxxxxx> (a org.quartz.core.QuartzSchedulerThread) at java.lang.Thread.join(java.base@21.0.7/Thread.java:2155) at org.quartz.core.QuartzSchedulerThread.halt(QuartzSchedulerThread.java:182) at org.quartz.core.QuartzScheduler.shutdown(QuartzScheduler.java:694) at org.quartz.impl.StdScheduler.shutdown(StdScheduler.java:206) at com.atlassian.bamboo.beehive.PrimaryNodeServiceImpl.shutdown(PrimaryNodeServiceImpl.java:77) at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl$$Lambda/xxxxxx.accept(Unknown Source) at java.util.Optional.ifPresent(java.base@21.0.7/Optional.java:178) at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl.renouncePrimaryRoleInternal(BambooClusterNodeHeartbeatServiceImpl.java:255) at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl.panicAndKillNode(BambooClusterNodeHeartbeatServiceImpl.java:268) at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl.lambda$setCurrentNodePrimary$1(BambooClusterNodeHeartbeatServiceImpl.java:235) at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl$$Lambda/xxxxx.run(Unknown Source) at io.atlassian.util.concurrent.ManagedLocks$ManagedLockImpl.withLock(ManagedLocks.java:302) at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl.setCurrentNodePrimary(BambooClusterNodeHeartbeatServiceImpl.java:221) at com.atlassian.bamboo.beehive.AcquirePrimaryNodeLockJob.execute(AcquirePrimaryNodeLockJob.java:39) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) - locked <xxxxx> (a java.lang.Object)
Workaround
Restart the Bamboo application in Node2
- mentioned in
-
Page Loading...