Details
-
Bug
-
Resolution: Timed out
-
Low
-
None
-
None
-
Severity 3 - Minor
-
Description
When:
- an elastic image configuration has an invalid (e.g. non-existent) subnet configured, and
- there is a build queued that requires this image configuration
then Bamboo refuses to spin up any image configuration to handle any queued builds.
In our case we changed all our VPC subnets and forgot to update one of the image configurations with the new values.
The server logs reveal the following:
2015-06-18 01:36:17,588 INFO [scheduler_Worker-10] [ElasticRunningInstancesOptimizerImpl] 5 elastic instance(s) will be started for those builds that cannot be build on currently connected agents. 2015-06-18 01:36:17,588 INFO [scheduler_Worker-10] [ElasticRunningInstancesOptimizerImpl] 5 elastic instance(s) will be started to run builds that are waiting in a queue. Current queue size is 198, number of builds executable on elastic agents is 183. Bamboo is currently starting 14 elastic instances. 2015-06-18 01:36:17,588 INFO [scheduler_Worker-10] [ElasticRunningInstancesOptimizerImpl] AWS account has 588 elastic instances started by Bamboo server(s) and has 0 spot requests pending, 588 in total. Of these, Bamboo controls 67. 2015-06-18 01:36:17,691 INFO [scheduler_Worker-10] [SubnetCache] query for a non-existing resource [subnet-64fd5a13] 2015-06-18 01:36:17,691 ERROR [scheduler_Worker-10] [ElasticInstancesMonitorJob] Failed to adjust the number of elastic agents. com.atlassian.aws.AWSException: Error when starting a new instance at com.atlassian.bamboo.agent.elastic.server.ElasticFunctionalityFacadeImpl.startupAgents(ElasticFunctionalityFacadeImpl.java:218) at com.atlassian.bamboo.agent.elastic.schedule.ElasticInstancesMonitorJob.execute(ElasticInstancesMonitorJob.java:50) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at com.atlassian.bamboo.utils.BambooRunnables$1.run(BambooRunnables.java:49) at com.atlassian.bamboo.security.ImpersonationHelper.runWith(ImpersonationHelper.java:31) at com.atlassian.bamboo.security.ImpersonationHelper.runWithSystemAuthority(ImpersonationHelper.java:20) at com.atlassian.bamboo.security.ImpersonationHelper$1.run(ImpersonationHelper.java:52) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) Caused by: com.atlassian.aws.ec2.caches.Ec2CacheMissException: The resource with ID 'subnet-64fd5a13' does not exist (Service: AmazonEC2; Status Code: 400; Error Code: InvalidSubnetID.NotFound; Request ID: none) at com.atlassian.aws.ec2.caches.SubnetCache.onResourceLookupFailure(SubnetCache.java:48) at com.atlassian.aws.AwsOmeCache.filterResources(AwsOmeCache.java:90) at com.atlassian.aws.AwsOmeCache.filterResources(AwsOmeCache.java:85) at com.atlassian.aws.AwsOmeCache.describe(AwsOmeCache.java:114) at com.atlassian.aws.AwsOmeCache.describeResources(AwsOmeCache.java:127) at com.atlassian.bamboo.agent.elastic.server.ElasticFunctionalityFacadeImpl.getVpcsAndSubnets(ElasticFunctionalityFacadeImpl.java:278) at com.atlassian.bamboo.agent.elastic.server.ElasticFunctionalityFacadeImpl.ensureSecurityGroupsExist(ElasticFunctionalityFacadeImpl.java:290) at com.atlassian.bamboo.agent.elastic.server.ElasticFunctionalityFacadeImpl.startupAgents(ElasticFunctionalityFacadeImpl.java:179) ... 7 more
Valid image configurations could still be started manually and would pick up builds during this time. Correcting the image configuration to use valid subnets immediately fixed the problem.
The impact to us was that our unwitting back-end mistake caused many of our Bamboo servers to lose most of their build capacity, and it was not obvious what the problem was.
I would have expected:
- Bamboo to continue being able to spin up instances from other image configurations to handle the builds that didn't require this one particular (invalid) image config; and
- a visible error in the UI reporting that there was a problem when trying to launch that image configuration.
Attachments
Issue Links
- was cloned as
-
BDEV-9471 Loading...