-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Medium
-
None
-
Affects Version/s: 7.17.1
-
Component/s: Git Hosting
-
10
-
Severity 3 - Minor
-
5
Issue Summary
Bitbucket uses NuProcess for external process execution.
A child Git process may not get reaped properly and end up as a defunct/zombie process due to a race condition in the following scenario:
The child Git process is timed out from a separate thread while the NuProcess thread is waiting for the external Git process to signal it is done.
It is waiting on a change for the tracked process in the deadpool list in com.zaxxer.nuprocess.linux.ProcessEpoll.
If the waiting thread is interrupted (e.g. due to a timeout), it will drop the remaining process and return and create a zombie.
Sample data
Git zombie process - PID 46341
atlbitb+ 46341 57558 57517 0.0 0.0 0 0 ? Z Wed May 18 00:20:21 2022 00:00:00 [git] <defunct>
Logs with trace and debug logging on the com.atlassian.bitbucket.dmz.process.NioProcess and com.zaxxer.nuprocess.linux.ProcessEpoll packages:
PID: 46341, request id: *78LZ5Xx20x110740121x34
2022-05-18 00:20:22,143 TRACE [threadpool:thread-2] USER1 *78LZ5Xx20x110740121x34 qtv9f8 137.201.17.50,0:0:0:0:0:0:0:1 "GET /rest/api/latest/projects/PROJ1/repos/repo1/branches HTTP/1.0" c.a.bitbucket.dmz.process.NioProcess 46341: [/usr/bin/git rev-list --format=%H%x02%P%x02%aN%x02%aE%x02%at%x02%cN%x02%cE%x02%ct -21 --no-min-parents --stdin --no-walk=unsorted --] started (cwd: /var/atlassian/application-data/bitbucket/shared/data/repositories/101) 2022-05-18 00:20:24,138 INFO [http-nio-7990-exec-21] USER1 *78LZ5Xx20x110740121x34 qtv9f8 137.201.17.50,0:0:0:0:0:0:0:1 "GET /rest/api/latest/projects/PROJ1/repos/repo1/branches HTTP/1.0" c.a.s.i.r.PluginRefMetadataMapProvider Timed out when retrieving ref metadata for com.atlassian.bitbucket.server.bitbucket-branch:latest-commit-metadata 2022-05-18 00:20:25,194 DEBUG [threadpool:thread-2] USER1 *78LZ5Xx20x110740121x34 qtv9f8 137.201.17.50,0:0:0:0:0:0:0:1 "GET /rest/api/latest/projects/PROJ1/repos/repo1/branches HTTP/1.0" c.z.nuprocess.linux.ProcessEpoll 46341: Added to deadpool 2022-05-18 00:20:25,194 DEBUG [threadpool:thread-2] USER1 *78LZ5Xx20x110740121x34 qtv9f8 137.201.17.50,0:0:0:0:0:0:0:1 "GET /rest/api/latest/projects/PROJ1/repos/repo1/branches HTTP/1.0" c.z.nuprocess.linux.ProcessEpoll No processes left to pump 2022-05-18 00:20:25,195 DEBUG [threadpool:thread-2] USER1 *78LZ5Xx20x110740121x34 qtv9f8 137.201.17.50,0:0:0:0:0:0:0:1 "GET /rest/api/latest/projects/PROJ1/repos/repo1/branches HTTP/1.0" c.z.nuprocess.linux.ProcessEpoll Interrupted with 1 processes still in the deadpool
On the threadpool:thread-2 thread, NuProcess added the Git process to the deadpool list, which is a list of processes that are dead but not yet reaped, while waiting for the process to progress to its final state.
While waiting for it, the timeout on the http-nio-7990-exec-21 thread occurred, which interrupted the wait. At this point, NuProcess dropped the process and will no longer wait for it, resulting in the zombie process.
Steps to Reproduce
N/A
Expected Results
Child Git processes are properly reaped.
Actual Results
Defunct/zombie Git processes are observed.
Workaround
- Terminate the parent process. Hence, a restart of Bitbucket Server instance cleans up the zombie processes.
- Adjust the time out value, after confirming from trace/debug logs that a timeout occurred and interrupted the cleanup.
For the specific sample above, where the time out occurred while retrieving ref metadata, the ref.metadata.timeout value can be raised (e.g. from 2 to 3 seconds).