Details
-
Bug
-
Resolution: Unsolved Mysteries
-
High
-
None
-
2.3 M7
-
None
Description
've just switched from M6 to M7. I've spotted a problem that somehow M7 (compared to M6) seems too eagerly mark remote agents as "offline" and in result it shuts them down - about 5-10 per 4-5 hours. I've logged in to each of "offline" instances before they went down, but in elastic agent's logs there was nothing suspicious and it seems it was connecting fine with SSH tunnelled connection back to the master.
Do you know if any additional functionality (apart from official fix for-s for M7 - http://jira.atlassian.com/browse/BAM/fixforversion/14643) were added so that they can introduce this kind of behaviour - compared to M6? I've noticed that if the elastic agent is marked as "offline" it happens just after a build is completed on this agent (of course it doesn't happen after every completed build).
Nothing weird happened in Bamboo master's logs and the only sign of the node being put down is simply:
2009-06-26 06:31:57,120 INFO [pool-2-thread-1] [ElasticInstanceManagerImpl] Elastic instance i-c3fedfaa transitioned from RUNNING to SHUTTING_DOWN. 2009-06-26 06:31:57,293 INFO [pool-2-thread-1] [RemoteEC2InstanceImpl] EC2 instance i-c3fedfaa transitioned from running (16) to shutting-down (32) 2009-06-26 06:32:15,333 INFO [pool-2-thread-3] [RemoteEC2InstanceImpl] EC2 instance i-c3fedfaa has terminated. 2009-06-26 06:32:15,333 INFO [pool-2-thread-3] [ElasticInstanceManagerImpl] Elastic instance i-c3fedfaa transitioned from SHUTTING_DOWN to TERMINATED. 2009-06-26 06:32:15,333 INFO [pool-2-thread-3] [ElasticInstanceManagerImpl] Detected that the elastic instance i-c3fedfaa has stopped. 2009-06-26 06:32:15,344 INFO [pool-2-thread-3] [ElasticInstanceManagerImpl] Elastic Agent "Elastic Agent on i-c3fedfaa" stopped on instance i-c3fedfaa 2009-06-26 06:32:15,345 INFO [pool-2-thread-3] [EBSVolumeSupervisorImpl] Deleting EBS volume vol-b48c62dd 2009-06-26 06:32:15,346 ERROR [pool-10-thread-1] [AgentOfflineEventListener] Elastic instance i-c3fedfaa does not exist 2009-06-26 06:32:15,412 WARN [pool-2-thread-3] [EBSVolumeImpl] Attempt to detach EBS volume vol-b48c62dd from EC2 instance i-c3fedfaa failed. Proceeding with deletion. com.xerox.amazonws.ec2.EC2Exception: Client error : The volume 'vol-b48c62dd' is not 'attached'. at com.xerox.amazonws.ec2.Jec2.makeRequestInt(Jec2.java:1680) at com.xerox.amazonws.ec2.Jec2.detachVolume(Jec2.java:1569) at com.atlassian.aws.ec2.EBSVolumeImpl.delete(EBSVolumeImpl.java:38) at com.atlassian.bamboo.agent.elastic.server.EBSVolumeSupervisorImpl.purge(EBSVolumeSupervisorImpl.java:110) at com.atlassian.bamboo.agent.elastic.server.RemoteElasticInstanceImpl$1.ec2InstanceStateChanged(RemoteElasticInstanceImpl.java:291) at com.atlassian.aws.ec2.RemoteEC2InstanceImpl$4.run(RemoteEC2InstanceImpl.java:494) at com.atlassian.aws.ec2.RemoteEC2InstanceImpl$CatchingRunnableDecorator.run(RemoteEC2InstanceImpl.java:96) at com.atlassian.aws.ec2.RemoteEC2InstanceImpl.setState(RemoteEC2InstanceImpl.java:489) at com.atlassian.aws.ec2.RemoteEC2InstanceImpl.terminated(RemoteEC2InstanceImpl.java:325) at com.atlassian.aws.ec2.EC2InstanceState$3.supervise(EC2InstanceState.java:125) at com.atlassian.aws.ec2.RemoteEC2InstanceImpl.backgroundSupervise(RemoteEC2InstanceImpl.java:413) at com.atlassian.aws.ec2.RemoteEC2InstanceImpl.access$300(RemoteEC2InstanceImpl.java:25) at com.atlassian.aws.ec2.RemoteEC2InstanceImpl$2.run(RemoteEC2InstanceImpl.java:125) at com.atlassian.aws.ec2.RemoteEC2InstanceImpl$CatchingRunnableDecorator.run(RemoteEC2InstanceImpl.java:96) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)