Uploaded image for project: 'Bamboo Data Center'
  1. Bamboo Data Center
  2. BAM-13482

Avoid "starvation" for jobs pinned to EC2 images

    XMLWordPrintable

Details

    • 0
    • Our product teams collect and evaluate feedback from a number of different sources. To learn more about how we use customer feedback in the planning process, check out our new feature policy.

    Description

      We very often see situations when:

      • all EC2 slots are used and some (or even most) instances are idle but kept running until the end of the paid hour,
      • some jobs are blocked in the queue because they need to run on EC2 on specific instances which are not running/not idle.

      While I understand the need to optimise cost and starting speed, that causes long feedback loop.

      Worse case scenario means a job can be stuck for more than 30 minutes with all EC2 agents being idle. Then the EC2 agent can take as much as 20mn to be started (spot instance bidding time + VM startup). So we create nearly an hour of wait time to save a few cents...

      Please consider the following improvement to ease the pain for those corner cases:
      IF all the EC2 slots are used
      AND there is a job in the queue that can only run in EC2
      AND no EC2 image currently running have the right requirements
      THEN
      disable and kill an idle EC2 image

      The exact algorithm for idle EC2 image selection is left as an exercise to the reader: Most represented type of idle images? Longest idle?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pbruski Przemek Bruski
              Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: