Uploaded image for project: 'Bamboo Data Center'
  1. Bamboo Data Center
  2. BAM-20423

Elastic instance scheduler kills elastic agents running builds

      Issue Summary

      Elastic instance scheduler kills elastic agents running builds

      Environment

      Bamboo with an EC2 account enabled.

      Steps to Reproduce

      1. Create a plan with a task. In this example, a script task with the content
        sleep 3000
        
      2. Enable concurrent build
      3. Run the plan twice
      4. Start two elastic agents to run the two builds
      5. Configure elastic scheduling as described in our documentation with cron "0 0/1 * 1/1 * ? *" (every 1 minute) to set active instance exactly "1"

      Expected Results

      Agent is not shut down since the build is still running.

      Actual Results

      Agent is shut down. In the logs we see that the scheduler runs and shuts down the agent:

      2019-05-03 15:58:00,008 INFO [scheduler_Worker-10] [ElasticFunctionalityFacadeImpl] Adjusting elastic agents with schedule: com.atlassian.bamboo.agent.elastic.schedule.ElasticInstanceScheduleImpl@22d2d777[2031617,0 0/1 * 1/1 * ? *,com.atlassian.bamboo.agent.elastic.server.ElasticImageConfigurationImpl@be651a6b,EQUALS,1,true]
      2019-05-03 15:58:00,008 INFO [scheduler_Worker-10] [ElasticFunctionalityFacadeImpl] Attempting to shutdown 1 of 'Ubuntu stock image' elastic instances
      2019-05-03 15:58:00,008 INFO [scheduler_Worker-10] [AgentManagerImpl] No deployments running on agent Elastic Agent on i-07e3eaad67f987836
      2019-05-03 15:58:00,009 ERROR [scheduler_Worker-10] [AgentManagerImpl] Agent 'Elastic Agent on i-07e3eaad67f987836' went offline while building BAM-TEST-JOB1-27. The results of that build will not be available. 
      2019-05-03 15:58:00,009 INFO [scheduler_Worker-10] [DefaultErrorHandler] Recording an error: Agent 'Elastic Agent on i-07e3eaad67f987836' went offline while building BAM-TEST-JOB1-27. The results of that build will not be available.  : BAM-TEST-JOB1
      2019-05-03 15:58:00,015 INFO [scheduler_Worker-10] [CurrentlyBuildingContainer] removeCurrentlyBuilding called for [BAM-TEST-JOB1-27]
      2019-05-03 15:58:00,020 INFO [scheduler_Worker-10] [PlanStatePersisterImpl] Updating delta states of build following BAM-TEST-JOB1-27
      2019-05-03 15:58:00,022 INFO [scheduler_Worker-10] [AgentManagerImpl] Elastic Agent "Elastic Agent on i-07e3eaad67f987836" stopped on instance i-07e3eaad67f987836
      2019-05-03 15:58:00,025 INFO [elastic-pool-3-thread-2] [InstanceTerminator] Requesting that EC2 instance i-07e3eaad67f987836 be shut down.
      2019-05-03 15:58:00,025 INFO [scheduler_Worker-10] [ElasticFunctionalityFacadeImpl] Requested termination of elastic instance: i-07e3eaad67f987836
      

      Notes

      If plan using the following Script task, the elastic agent will not be killed:

      echo hello
      sleep 30
      <repeat the above commands for 100 times>
      

      Workaround

      For now there is no workaround available. If you encounter this problem please disable scheduling until a fix is delivered.

            [BAM-20423] Elastic instance scheduler kills elastic agents running builds

            SET Analytics Bot made changes -
            UIS Original: 2 New: 1
            Mateusz Szmal made changes -
            Fix Version/s New: 11.0.0 [ 110791 ]
            Mateusz Szmal made changes -
            Status Original: In Progress [ 3 ] New: Waiting for Release [ 12075 ]
            Mateusz Szmal made changes -
            Status Original: Gathering Impact [ 12072 ] New: In Progress [ 3 ]
            Mateusz Szmal made changes -
            Assignee New: Mateusz Szmal [ 851f15845f55 ]
            Mateusz Szmal made changes -
            Remote Link New: This issue links to "+core+ Dogfooding › TBD Test Git Branch Detection › issue-BAM-20423-do-not-kill-elastic-instances-with-live-builds (tardigrade-bamboo)" [ 984776 ]
            Zaro made changes -
            Status Original: Needs Triage [ 10030 ] New: Gathering Impact [ 12072 ]
            Zaro made changes -
            Priority Original: Highest [ 1 ] New: Low [ 4 ]
            SET Analytics Bot made changes -
            UIS Original: 22 New: 2
            Bugfix Automation Bot made changes -
            Status Original: Long Term Backlog [ 12073 ] New: Needs Triage [ 10030 ]

              851f15845f55 Mateusz Szmal
              pdemitrio Patricio
              Affected customers:
              5 This affects my team
              Watchers:
              13 Start watching this issue

                Created:
                Updated: