[BCLOUD-22851] Self-hosted runners do not always clear the docker mount directory

Type: Bug
Resolution: Unresolved
Priority: High
Component/s: Pipelines - Runners
Labels:
None

Support reference count:
16
Symptom Severity:
Severity 3 - Minor
UIS:
20,811
Bug Fix Policy:
View Atlassian Cloud bug fix policy

Issue Summary

Occasionally, users report that they receive the following error when executing their runner builds:
```
bash: docker: command not found
```
This issue is caused by the local Runner directory /tmp/<runner_uuid>/ already containing an empty docker folder.
We are not yet sure why the Runner directory sometimes already contains an empty docker folder (theory is that it's because of an unsuccessful runner configuration that has not cleaned itself up)
Since the docker folder already exists, the Runner fails to mount the new docker directory and the docker binary won't be present during the build.

This is reproducible on Data Center: no

Steps to Reproduce

Execute a build within a dockerized Self-hosted Runner
Observe the error that occurs (this is intermittent - hard to reproduce)

Expected Results

The build is able to execute

Actual Results

The build fails as it is unable to mount the directory

Workaround

The workaround is described in the following article:
https://confluence.atlassian.com/bbkb/docker-command-not-found-error-while-running-docker-commands-in-self-hosted-runner-1206785605.html

mentioned in: Page Failed to load; Page Failed to load

Radu Cristescu added a comment - 10/Mar/2025 4:43 PM - edited

Today, two of my self-hosted runners started showing this error for pipe: trigger-pipeline calls. It was all out of the blue. Both of them left behind an empty docker directory around the same time: 15:57 and 16:00 GMT.

Both runners are sharing the same server, which has 4 runners on it. I'm guessing that if I had enough load, all 4 of them would have shown the symptom.

I deleted both empty docker directories, reran the pipeline, and the pipe failed again, exactly the same way, leaving behind an empty docker directory.

Restarting the runners after deleting the empty directory fixes the issue, and I see an executable file called docker in there.

Do you know what that looks like? Looks like what happens when I docker -v /file/that/does/not/exist:/dest. It would create /file/that/does/not/exist as a directory on the host, and mount it as such.

Radu Cristescu added a comment - 10/Mar/2025 4:43 PM - edited Today, two of my self-hosted runners started showing this error for pipe: trigger-pipeline calls. It was all out of the blue. Both of them left behind an empty docker directory around the same time: 15:57 and 16:00 GMT. Both runners are sharing the same server, which has 4 runners on it. I'm guessing that if I had enough load, all 4 of them would have shown the symptom. I deleted both empty docker directories, reran the pipeline, and the pipe failed again, exactly the same way, leaving behind an empty docker directory. Restarting the runners after deleting the empty directory fixes the issue, and I see an executable file called docker in there. Do you know what that looks like? Looks like what happens when I docker -v /file/that/does/not/exist:/dest . It would create /file/that/does/not/exist as a directory on the host, and mount it as such.

Tom Emerson added a comment - 20/Feb/2025 4:56 PM

Our self hosted continues to experience this failure, varies between a couple of times a week to every day when we have very active projects approaching deadlines.

Surprised it has not yet been solved.

Tom Emerson added a comment - 20/Feb/2025 4:56 PM Our self hosted continues to experience this failure, varies between a couple of times a week to every day when we have very active projects approaching deadlines. Surprised it has not yet been solved.

Dan Milman added a comment - 19/Feb/2025 9:44 AM

This is a major issue for us and keeps us constantly having failed builds requiring manual remediation.
Please fix!

Dan Milman added a comment - 19/Feb/2025 9:44 AM This is a major issue for us and keeps us constantly having failed builds requiring manual remediation. Please fix!

Alex Figliolia added a comment - 30/Jan/2025 7:00 PM

Jumping in here to add to the frustration claim. This is a major issue for us and keeps us constantly having failed builds requiring manual remediation on servers that only a few resources can access.

Alex Figliolia added a comment - 30/Jan/2025 7:00 PM Jumping in here to add to the frustration claim. This is a major issue for us and keeps us constantly having failed builds requiring manual remediation on servers that only a few resources can access.

Curtis added a comment - 30/Jan/2025 3:15 PM

One thing we've noticed regarding the failures is that it seems to occur when a Pipeline with a Pipe integration is executed. Our Runners that execute Pipelines that have no Pipe configuration are all running fine.

Curtis added a comment - 30/Jan/2025 3:15 PM One thing we've noticed regarding the failures is that it seems to occur when a Pipeline with a Pipe integration is executed. Our Runners that execute Pipelines that have no Pipe configuration are all running fine.

Ben added a comment - 30/Jan/2025 2:35 PM

This is causing a big issue for our organisation. We have many agents that only a few people are allowed to administer. When this occurs it the fix whilst straight forward takes a long time to roll out to all agents.

Ben added a comment - 30/Jan/2025 2:35 PM This is causing a big issue for our organisation. We have many agents that only a few people are allowed to administer. When this occurs it the fix whilst straight forward takes a long time to roll out to all agents.

Bojan Kopanja added a comment - 09/Jan/2025 1:48 PM

I can confirm that this issue occurs relatively frequently and is quite frustrating when it happens, as not everyone has access to runners. As a result, the entire company’s pipeline remains blocked until a manual fix is applied. Describing the severity as ‘minor’ is, in my opinion, an understatement.

Bojan Kopanja added a comment - 09/Jan/2025 1:48 PM I can confirm that this issue occurs relatively frequently and is quite frustrating when it happens, as not everyone has access to runners. As a result, the entire company’s pipeline remains blocked until a manual fix is applied. Describing the severity as ‘minor’ is, in my opinion, an understatement.

Dejan Čabrilo added a comment - 09/Jan/2025 10:54 AM

I would like to appeal to reclassify symptom severity to something much higher than minor. While there is a workaround, it's extremely frustrating to users and it requires pager duty to get a runner unstuck, blocking the deploy process while it gets manually fixed. It happens relatively frequently (once every 40-50 builds in our experience so far).

Dejan Čabrilo added a comment - 09/Jan/2025 10:54 AM I would like to appeal to reclassify symptom severity to something much higher than minor. While there is a workaround, it's extremely frustrating to users and it requires pager duty to get a runner unstuck, blocking the deploy process while it gets manually fixed. It happens relatively frequently (once every 40-50 builds in our experience so far).

Jefferson Fermo added a comment - 20/May/2024 3:29 AM

Hi, As I mentioned here https://community.atlassian.com/t5/Bitbucket-questions/Bitbucket-self-hosted-runner-failed-to-create-shim-task-OCI/qaq-p/2459381#U2685295 this issue happens usually on most unfortunate times, do we expect tier 1 support logging in into runners that are usually deal with escalated privileged automation scripts? I wouldn't consider as low priority as it forces us to compromise security just to be able to use this feature.

Jefferson Fermo added a comment - 20/May/2024 3:29 AM Hi, As I mentioned here https://community.atlassian.com/t5/Bitbucket-questions/Bitbucket-self-hosted-runner-failed-to-create-shim-task-OCI/qaq-p/2459381#U2685295 this issue happens usually on most unfortunate times, do we expect tier 1 support logging in into runners that are usually deal with escalated privileged automation scripts? I wouldn't consider as low priority as it forces us to compromise security just to be able to use this feature.

Franz QT added a comment - 20/May/2024 3:21 AM

Any update on this? This is really frustrating and is affecting teams across our organization.

Franz QT added a comment - 20/May/2024 3:21 AM Any update on this? This is really frustrating and is affecting teams across our organization.

Assignee:: Unassigned

Reporter:: Ben

Affected customers:: 32 This affects my team

Watchers:: 30 Start watching this issue

Created:: 29/Aug/2023 5:04 AM

Updated:: 5 hours ago

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Issue Links

Forms

Activity

Collapse comment: Radu Cristescu added a comment - 10/Mar/2025 4:43 PM, Edited by Radu Cristescu - 10/Mar/2025 6:57 PM

Expand comment: Radu Cristescu added a comment - 10/Mar/2025 4:43 PM, Edited by Radu Cristescu - 10/Mar/2025 6:57 PM

Collapse comment: Tom Emerson added a comment - 20/Feb/2025 4:56 PM

Expand comment: Tom Emerson added a comment - 20/Feb/2025 4:56 PM

Collapse comment: Dan Milman added a comment - 19/Feb/2025 9:44 AM

Expand comment: Dan Milman added a comment - 19/Feb/2025 9:44 AM

Collapse comment: Alex Figliolia added a comment - 30/Jan/2025 7:00 PM

Expand comment: Alex Figliolia added a comment - 30/Jan/2025 7:00 PM

Collapse comment: Curtis added a comment - 30/Jan/2025 3:15 PM

Expand comment: Curtis added a comment - 30/Jan/2025 3:15 PM

Collapse comment: Ben added a comment - 30/Jan/2025 2:35 PM

Expand comment: Ben added a comment - 30/Jan/2025 2:35 PM

Collapse comment: Bojan Kopanja added a comment - 09/Jan/2025 1:48 PM

Expand comment: Bojan Kopanja added a comment - 09/Jan/2025 1:48 PM

Collapse comment: Dejan Čabrilo added a comment - 09/Jan/2025 10:54 AM

Expand comment: Dejan Čabrilo added a comment - 09/Jan/2025 10:54 AM

Collapse comment: Jefferson Fermo added a comment - 20/May/2024 3:29 AM

Expand comment: Jefferson Fermo added a comment - 20/May/2024 3:29 AM

Collapse comment: Franz QT added a comment - 20/May/2024 3:21 AM

Expand comment: Franz QT added a comment - 20/May/2024 3:21 AM

People

Dates