Uploaded image for project: 'Bamboo Data Center'
  1. Bamboo Data Center
  2. BAM-13362

Automatic elastic instance management spun off multiple instances for every build

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Low
    • 4.4.5
    • 4.4.0, 4.4.2
    • Elastic Bamboo

    Description

      Configure Bamboo to use elastic agents with the following settings for the instance mamangement:

          Bamboo will shut down elastic instances which have been idle for more than 10 minutes.
          A maximum of 1 elastic instance can be started each minute.
          New elastic instances will be started when all of the following conditions have been met:
              There is at least 1 build in a queue.
              There is at least 1 build in a queue that are executable on elastic images.
              The average time builds have been waiting in a queue is at least 1 minute.
              The number of elastic instances on your AWS account that are not controlled by Bamboo does not exceed 100.
      

      With the above settings, it will be expected that Bamboo will spun only an instance of the agent for every build. This is however the opposite where Bamboo starts 2 instances. This comes with additional cost as two instances are started to service the build. The following can be seen in the instances startup:

      27-Jun-2013 16:41:36 Requested that new elastic instance be created for configuration: Default Image S3 x86_64 (linux)
      27-Jun-2013 16:41:37 Elastic instance [i-76023f17] transitioned from STARTING to IDENTIFIED.
      27-Jun-2013 16:43:21 Elastic instance [i-76023f17] transitioned from IDENTIFIED to RUNNING.
      27-Jun-2013 16:44:06 An elastic agent is loading on instance: [i-76023f17]
      27-Jun-2013 16:44:39 Elastic Agent "Elastic Agent on i-76023f17" started on instance i-76023f17
      27-Jun-2013 16:58:30 Elastic Agent "Elastic Agent on i-76023f17" stopped on instance i-76023f17
      27-Jun-2013 16:58:30 Requested termination of elastic instance: i-76023f17
      27-Jun-2013 16:58:34 Requested termination of elastic instance: i-76023f17
      27-Jun-2013 16:58:41 Elastic Agent "Elastic Agent on i-7bac7d16" stopped on instance i-7bac7d16
      27-Jun-2013 16:58:41 Requested termination of elastic instance: i-7bac7d16
      27-Jun-2013 16:58:47 Requested termination of elastic instance: i-76023f17
      27-Jun-2013 16:58:50 Elastic instance [i-7bac7d16] transitioned from RUNNING to SHUTTING_DOWN.
      27-Jun-2013 16:58:50 Elastic instance [i-76023f17] transitioned from RUNNING to SHUTTING_DOWN.
      27-Jun-2013 16:59:10 Elastic instance [i-7bac7d16] transitioned from SHUTTING_DOWN to TERMINATED.
      27-Jun-2013 16:59:10 Detected that the elastic instance [i-7bac7d16] has been terminated.
      27-Jun-2013 17:01:53 Elastic instance [i-76023f17] transitioned from SHUTTING_DOWN to TERMINATED.
      27-Jun-2013 17:01:53 Detected that the elastic instance [i-76023f17] has been terminated.
      27-Jun-2013 17:02:34 Currently there are no agents that can build: TESTA-TSTA-JOB1. New elastic instance(s) will be started (providing the configuration allows it).
      27-Jun-2013 17:02:34 1 elastic instance(s) will be started for those builds that cannot be build on currently connected agents.
      27-Jun-2013 17:02:35 Requested that new elastic instance be created for configuration: Default Image S3 x86_64 (linux)
      27-Jun-2013 17:02:37 Elastic instance [i-4ea4ac2d] transitioned from STARTING to IDENTIFIED.
      27-Jun-2013 17:04:25 Elastic instance [i-4ea4ac2d] transitioned from IDENTIFIED to RUNNING.
      27-Jun-2013 17:05:57 An elastic agent is loading on instance: [i-4ea4ac2d]
      27-Jun-2013 17:07:04 Elastic Agent "Elastic Agent on i-4ea4ac2d" started on instance i-4ea4ac2d
      27-Jun-2013 17:07:34 1 elastic instance(s) will be started to run builds that are waiting in a queue. Current queue size is 1, number of builds executable on elastic agents is 1. Bamboo is currently starting 0 elastic instances.
      27-Jun-2013 17:07:35 Requested that new elastic instance be created for configuration: Default Image S3 x86_64 (linux)
      27-Jun-2013 17:07:36 Elastic instance [i-54a1a937] transitioned from STARTING to IDENTIFIED.
      27-Jun-2013 17:09:36 Elastic instance [i-54a1a937] transitioned from IDENTIFIED to RUNNING.
      27-Jun-2013 17:10:29 An elastic agent is loading on instance: [i-54a1a937]
      27-Jun-2013 17:11:09 Elastic Agent "Elastic Agent on i-54a1a937" started on instance i-54a1a937 
      

      Now if we enable DEBUG level on the class com.amazonaws more information are provided. I am attaching a section of the logs.

      workaround

      The workaround is to set "The average time builds have been waiting in a queue" to something unreasonable big, like for example 28,500 minutes (or even more).

      The reason we should do that is that the Bamboo EC2 optimizer (the mechanism in Bamboo that can spin on or shut down EC2 instances according to the build queue load) algorithm works as follows (with details excluded):
      1) each minute run a check if there is a necessity to spin (or shut down) an EC2 instance:
      1.1) (more-less) if there are builds in the queue that can't be built with the current agents at all, start the relevant EC2 images
      1.2) if there are builds in the queue that can be built by EC2 agent and Bamboo is not spinning up EC2 images:
      1.2.1) check the thresholds (time line the queue, total builds in the queue, etc), and if we cross the thresholds - spin the relevant EC2 images

      And that "EC2 Optimizer" is running this algorithm each 60 seconds.
      Now, the problem is when the newly EC2 image is spun up and register itself as a ready agent. Before that new agent will pickup the build some time will pass or elapse (communication, initializing, etc). So the build that triggered EC2 image startup will still stay in the queue, most probably it will stay in the queue for the succeeding "EC2 Optimizer" run. And that succeeding run will check in the step (1.2) that Bamboo is not spinning EC2 images (because EC2 agent just transitioned to a ready state) so it will check the thresholds (did build stay 1 minute in queue? for sure - EC2 agents spin up in more than 1 minute) and as a result - the second EC2 image will be started by Bamboo again.

      This seemed to have fixed itself in 4.4.5

      Attachments

        1. log.txt
          68 kB

        Activity

          People

            Unassigned Unassigned
            smaiyaki Sultan Maiyaki (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: