Uploaded image for project: 'Bamboo'
  1. Bamboo
  2. BAM-13482

Avoid "starvation" for jobs pinned to EC2 images

    XMLWordPrintable

    Details

    • Type: Suggestion
    • Status: Gathering Interest (View Workflow)
    • Priority: Low
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Elastic Bamboo
    • Labels:
      None
    • Last commented by user?:
      true
    • Comments:
      0
    • UIS:
      0

      Description

      We very often see situations when:

      • all EC2 slots are used and some (or even most) instances are idle but kept running until the end of the paid hour,
      • some jobs are blocked in the queue because they need to run on EC2 on specific instances which are not running/not idle.

      While I understand the need to optimise cost and starting speed, that causes long feedback loop.

      Worse case scenario means a job can be stuck for more than 30 minutes with all EC2 agents being idle. Then the EC2 agent can take as much as 20mn to be started (spot instance bidding time + VM startup). So we create nearly an hour of wait time to save a few cents...

      Please consider the following improvement to ease the pain for those corner cases:
      IF all the EC2 slots are used
      AND there is a job in the queue that can only run in EC2
      AND no EC2 image currently running have the right requirements
      THEN
      disable and kill an idle EC2 image

      The exact algorithm for idle EC2 image selection is left as an exercise to the reader: Most represented type of idle images? Longest idle?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                pbruski Przemek Bruski
                Participants:
              • Votes:
                2 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Last commented:
                  5 years, 40 weeks, 2 days ago