Bamboo primary node status is not getting updated when the secondary node takes over

XMLWordPrintable

    • Type: Bug
    • Resolution: Fixed
    • Priority: Medium
    • 12.0.0, 10.2.8, 11.0.5
    • Affects Version/s: 10.2.0, 11.0.0
    • Component/s: Builds
    • None
    • 2
    • Severity 3 - Minor
    • 1

      Issue Summary

      Bamboo primary node status is not getting updated when the secondary node takes over.
      Both node are in RUNNING state in the Bamboo administration-> Clustering page.

      In this case node1 was experiencing a thread starvation or clock leap as a result it's Primary lock was lost. The node2 Primary lock was acquired and status was changed to 'RUNNING' from 'RUNNING_AS_SECONDARY', where are node1 was still showing as RUNNING.

      Steps to Reproduce

      No

      Expected Results

      When secondary node was able to take a primary lock, It should make other node as secondary.

      Actual Results

      Node2 should show are RUNNING
      Node1 should be shutting down and shows as offline in the Bamboo administration-> Clustering page after 5 minutes.

      We saw the "acquireClusterLock.quartz_Worker-1" thread was stuck on PrimaryNodeServiceImpl.shutdown method;

      java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait0(java.base@21.0.7/Native Method)
      	- waiting on <no object reference available>
      	at java.lang.Object.wait(java.base@21.0.7/Object.java:366)
      	at java.lang.Thread.join(java.base@21.0.7/Thread.java:2079)
      	- locked <xxxxxxx> (a org.quartz.core.QuartzSchedulerThread)
      	at java.lang.Thread.join(java.base@21.0.7/Thread.java:2155)
      	at org.quartz.core.QuartzSchedulerThread.halt(QuartzSchedulerThread.java:182)
      	at org.quartz.core.QuartzScheduler.shutdown(QuartzScheduler.java:694)
      	at org.quartz.impl.StdScheduler.shutdown(StdScheduler.java:206)
      	at com.atlassian.bamboo.beehive.PrimaryNodeServiceImpl.shutdown(PrimaryNodeServiceImpl.java:77)
      	at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl$$Lambda/xxxxxx.accept(Unknown Source)
      	at java.util.Optional.ifPresent(java.base@21.0.7/Optional.java:178)
      	at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl.renouncePrimaryRoleInternal(BambooClusterNodeHeartbeatServiceImpl.java:255)
      	at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl.panicAndKillNode(BambooClusterNodeHeartbeatServiceImpl.java:268)
      	at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl.lambda$setCurrentNodePrimary$1(BambooClusterNodeHeartbeatServiceImpl.java:235)
      	at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl$$Lambda/xxxxx.run(Unknown Source)
      	at io.atlassian.util.concurrent.ManagedLocks$ManagedLockImpl.withLock(ManagedLocks.java:302)
      	at com.atlassian.bamboo.beehive.BambooClusterNodeHeartbeatServiceImpl.setCurrentNodePrimary(BambooClusterNodeHeartbeatServiceImpl.java:221)
      	at com.atlassian.bamboo.beehive.AcquirePrimaryNodeLockJob.execute(AcquirePrimaryNodeLockJob.java:39)
      	at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
      	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
      	- locked <xxxxx> (a java.lang.Object)

      Workaround

      Restart the Bamboo application in Node2

            Assignee:
            Mateusz Szmal
            Reporter:
            Vani
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: