Hi

      Recently we've suddenly noticed all our builds are failing with

      unexpected EOF
      

      (during our Dockerised Stack-based Haskell compilation)

      (e.g. this build)

      This is not repeatable either using local builds with the native tooling, or even running the Docker-based builder locally (using this is largely in order to make our builds repeatable and investigatable!)

      Can we get some more information about what is happening in the Docker Daemon here? It sounds to me like a networking failure, or an out-of-disk-space type of situation, but again we have no way of replicating, and all our builds are blocked

      Thanks

            [BCLOUD-15557] EOF errors in Docker-based pipelines builds

            Hey Y'all,

            We recently shipped custom memory limits - see BCLOUD-14752
            which can be used to provide additional memory allocations to your docker engine.

            This should resolve all the issues reported so far as the EOF is almost always linked to docker running out-of-memory.

            I'd encourage everyone to take advantage of the feature so as to not be impacted when docker memory limits are enforced.

            thanks,
            Seb

            Sebastian Cole (Inactive) added a comment - Hey Y'all, We recently shipped custom memory limits - see BCLOUD-14752 which can be used to provide additional memory allocations to your docker engine. This should resolve all the issues reported so far as the EOF is almost always linked to docker running out-of-memory. I'd encourage everyone to take advantage of the feature so as to not be impacted when docker memory limits are enforced. thanks, Seb

            lassian added a comment -

            Hi

            Sorry for not commenting on this sooner, let me provide an explanation as to why these errors are now occuring.

            We rolled out this change last week and have since had support cases and this issue created with more people experiencing EOF errors. However these are different to the issue created previously.

            All the EOF errors mentioned on here and as part of the support cases in the last week are due to users exceeding the now correctly applied 1GB limit that we apply to the docker container that your builds or docker commands run in. We can see this from the docker build log as all the EOF errors occur during a layers creation, not during an image pull/push. These layer creations are triggered via a RUN instruction in a dockerfile and coincide with a command a user wishes to run to generate a layer in an image that is exceeding the 1GB ram limit.

            Now as to what is consuming the RAM as part of the RUN command, can be broken down to a few different things (as multiple factors influence how containers and specifically cgroups handle memory accounting and limits)

            • Application memory required by the command(s) being run
            • diskcache being flooded if the command(s) download/upload large amounts of data from the internet
            • dentry/inode cache that can be flooded if the command(s) create a large number of files/directories
            • combination of all 3

            To replicate this locally (as users have mentioned above they havent been able to), you need to apply the same limits we apply to the docker service container using the following command for your docker build

            • docker image build --memory 1GB --memory-swap 1GB ... if using docker 1.13+
            • docker build -memory 1GB --memory-swap 1GB 0 ... if using docker 1.12

            I have also just updated the debug pipelines locally documentation to also reflect this

            Debug pipelines locally

            However depending on the environment of your local machine, you also may not hit the limits as the pagecache and dentry/inode cache are all related to how fast your network/disk is and how close to the sources you are uploading/downloading.

            Going forward and to allow users to still build images in pipelines we have prioritised the following issue

            Custom memory allocation

            Which will allow users to give more/less ram to the docker service container which will be taken from the build container (to keep withing the 4GB limit per step).

            In the meantime, we are going to disable the docker memory limits again to allow your steps to pass and will re-enable it when we release the custom memory allocation feature.

            As to the other issue that was created previously and referenced on here and why we made this change.

            The Bitbucket pipelines team discovered a bug in our system last year wherein we were not applying memory limits to the docker service container. This was allowing users to consume more than their overall 4GB step allowance and their 1GB service allowance, causing contention for resources on the nodes where the builds are run, impacting other customers.

            We released and rolled out a fix for this and found a large number of EOF errors occuring and the following issue was raised to capture these

            Docker build EOF

            Upon investigation we found that all pulling and pushing of large images (either initiated via docker image pull, docker pull or a dockerfiles FROM instruction) were failing due to the pagecache being saturated by our faster network speeds (as we run a docker image proxy close to the nodes and also are closer to dockerhub and have a fast uplink in our cloud infrastructure). We spent some time investigating a few solutions to this before rolling out one that allows users to still push/pull large images whilst also applying the memory limit.

            Kind Regards,
            Nathan Burrell

            lassian added a comment - Hi Sorry for not commenting on this sooner, let me provide an explanation as to why these errors are now occuring. We rolled out this change last week and have since had support cases and this issue created with more people experiencing EOF errors. However these are different to the issue created previously. All the EOF errors mentioned on here and as part of the support cases in the last week are due to users exceeding the now correctly applied 1GB limit that we apply to the docker container that your builds or docker commands run in. We can see this from the docker build log as all the EOF errors occur during a layers creation, not during an image pull/push. These layer creations are triggered via a RUN instruction in a dockerfile and coincide with a command a user wishes to run to generate a layer in an image that is exceeding the 1GB ram limit. Now as to what is consuming the RAM as part of the RUN command, can be broken down to a few different things (as multiple factors influence how containers and specifically cgroups handle memory accounting and limits) Application memory required by the command(s) being run diskcache being flooded if the command(s) download/upload large amounts of data from the internet dentry/inode cache that can be flooded if the command(s) create a large number of files/directories combination of all 3 To replicate this locally (as users have mentioned above they havent been able to), you need to apply the same limits we apply to the docker service container using the following command for your docker build docker image build --memory 1GB --memory-swap 1GB ... if using docker 1.13+ docker build - memory 1GB --memory-swap 1GB 0 ... if using docker 1.12 I have also just updated the debug pipelines locally documentation to also reflect this Debug pipelines locally However depending on the environment of your local machine, you also may not hit the limits as the pagecache and dentry/inode cache are all related to how fast your network/disk is and how close to the sources you are uploading/downloading. Going forward and to allow users to still build images in pipelines we have prioritised the following issue Custom memory allocation Which will allow users to give more/less ram to the docker service container which will be taken from the build container (to keep withing the 4GB limit per step). In the meantime, we are going to disable the docker memory limits again to allow your steps to pass and will re-enable it when we release the custom memory allocation feature. As to the other issue that was created previously and referenced on here and why we made this change. The Bitbucket pipelines team discovered a bug in our system last year wherein we were not applying memory limits to the docker service container. This was allowing users to consume more than their overall 4GB step allowance and their 1GB service allowance, causing contention for resources on the nodes where the builds are run, impacting other customers. We released and rolled out a fix for this and found a large number of EOF errors occuring and the following issue was raised to capture these Docker build EOF Upon investigation we found that all pulling and pushing of large images (either initiated via docker image pull, docker pull or a dockerfiles FROM instruction) were failing due to the pagecache being saturated by our faster network speeds (as we run a docker image proxy close to the nodes and also are closer to dockerhub and have a fast uplink in our cloud infrastructure). We spent some time investigating a few solutions to this before rolling out one that allows users to still push/pull large images whilst also applying the memory limit. Kind Regards, Nathan Burrell

            michaelvm added a comment -

            We'll be dropping pipelines on Monday if this is not fixed. Our builds have been failing since last week, and if we're going to take the cost of working around the problem we may as well take the cost necessary to move to a more stable service.

            Paying for a service where a stable build suddenly stops working, without notice, is not an enjoyable experience.

            michaelvm added a comment - We'll be dropping pipelines on Monday if this is not fixed. Our builds have been failing since last week, and if we're going to take the cost of working around the problem we may as well take the cost necessary to move to a more stable service. Paying for a service where a stable build suddenly stops working, without notice, is not an enjoyable experience.

            nick-rez added a comment -

            We've moved to CircleCI.... Time is money and no useful responses from Atlassian here.

            nick-rez added a comment - We've moved to CircleCI.... Time is money and no useful responses from Atlassian here.

            I have the same issue. My builds also failing with :

            #!bash
            Creating an optimized production build...
            unexpected EOF
            

            Was working fine until last week. I see the workarounds above, but not really a satisfactory solution.

            matthewpclark added a comment - I have the same issue. My builds also failing with : #!bash Creating an optimized production build... unexpected EOF Was working fine until last week. I see the workarounds above, but not really a satisfactory solution.

            Tomáš Klapka added a comment - It looks like a regression: BCLOUD-15280 https://status.bitbucket.org/incidents/jqp6vn8b3vlj

            nick-rez added a comment -

            Update for everyone else, for transparency - Atlassian have told me the Pipelines team have said this:

            ...the size of the Docker engine is consuming more than 1 GB of memory. At this time we have a limit of 1 GB for Docker container and it's not dynamic at this time.

            You will need to make the build outside of the container.

            But... if this limit isn't new, then why have all these builds suddenly stopped working?.. :thinking:

            nick-rez added a comment - Update for everyone else, for transparency - Atlassian have told me the Pipelines team have said this: ...the size of the Docker engine is consuming more than 1 GB of memory. At this time we have a limit of 1 GB for Docker container and it's not dynamic at this time. You will need to make the build outside of the container. But... if this limit isn't new, then why have all these builds suddenly stopped working?.. :thinking:

            Totally agree. I have already started moving to Jenkins hosted on my Azure.

            Swapnil Deshpande added a comment - Totally agree. I have already started moving to Jenkins hosted on my Azure.

            We don't want to spend time changing our build scripts, dockerfiles etc. It makes more sense to move everything to Amazon services if this is not going to be fixed soon. This is not the first issue with pipelines.

            Tomáš Klapka added a comment - We don't want to spend time changing our build scripts, dockerfiles etc. It makes more sense to move everything to Amazon services if this is not going to be fixed soon. This is not the first issue with pipelines.

            Hello guys.

            Here is the work around for the moment :

            Make your build inside the bitbucket pipelines, and then just take this build directory you have, and create your docker container adding this build directory inside.

            Size 2x works for bitbucket pipelines service but not for the docker container inside. The docker containers are blocked to 1gb of ram, so you need to make your hardcore build outside of a docker container.

            There is a feature request we need to upvote so they can make the docker container size dynamic : BCLOUD-14752

            Basile Beldame added a comment - Hello guys. Here is the work around for the moment : Make your build inside the bitbucket pipelines, and then just take this build directory you have, and create your docker container adding this build directory inside. Size 2x works for bitbucket pipelines service but not for the docker container inside. The docker containers are blocked to 1gb of ram, so you need to make your hardcore build outside of a docker container. There is a feature request we need to upvote so they can make the docker container size dynamic : BCLOUD-14752

              Unassigned Unassigned
              d88a7f5213dd nick-rez
              Affected customers:
              8 This affects my team
              Watchers:
              18 Start watching this issue

                Created:
                Updated:
                Resolved: