Uploaded image for project: 'Bamboo Data Center'
  1. Bamboo Data Center
  2. BAM-25145

On Bamboo instances with large amount of Agents AllAgentsUpdatedEvent may cause "720 seconds" errors

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • High
    • 9.4.0, 9.3.4
    • 9.2.4, 9.3.3
    • Build Queues
    • None

    Description

      Problem

      On a very large Bamboo environment with thousands of Agents, whenever a new Plan/Agent/Capability set is modified, an AllAgentsUpdatedEvent recalculation kicks in and may block the queue and purge queued builds if the recalculation process takes too long to finish.

      Environment

      • Bamboo 9.2.x LTS and 9.3.x (may manifest on later and earlier releases) 

      Steps to Reproduce

      1. Create a very large Bamboo instance with thousands of Agents
      2. Modify the Agents capabilities while the queue is very busy
      3. Observe that during the BuildQueueManagerImpl recalculation no further Queue activity is processed
      4. Have the Agents/Capabilities/Plans changes run serially to cause multiple calls of the AgentAssignmentsUpdatedEvent

      Expected Results

      Queued objects should not get locked during Agent calculations. If something had already been dispatched, it should continue its flow and get picked up by an available Agent. The OrphanedBuildMonitorJob should ignore the time the BuildQueueManagerImpl spent recalculating the queue before considering a Build as orphan/hanging.

      Actual Results

      The OrphanedBuildMonitorJob considers the lockup time while running Recalculations as the total queue time for a build. If that time exceeds the default 720 seconds limit, legitimate builds may end up killed before being dispatched to and Agent.

      Workaround

      1. Reduce the number of Agents if possible (data hygiene for inactive Agents). You can run some SQL statements to find unused Agents:
      2. From Bamboo 9.3.4 and 9.4.0, you can increase the number of ORPHANED_BUILD_MONITOR_JOB_SCHEDULER_REACTION_DELAY_MULTIPLIER. For that you'd have to add a new -Dbamboo.orphaned.build.monitor.reaction.delay.multiplier system property to Bamboo and with a value between 3 and 20.
        The multiplier is applied to the following formula:
        • heartbeatTimeoutSeconds (600s) + ( orphaned.build.monitor.reaction.delay.multiplier (2) * heartbeatTimeout (60s) ) = 720 seconds

      Notes

      Attachments

        Issue Links

          Activity

            People

              851f15845f55 Mateusz Szmal
              73868399605e Eduardo Alvarenga
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: