-
Type:
Bug
-
Resolution: Unsolved Mysteries
-
Priority:
Medium
-
None
-
Affects Version/s: 3.11.2
-
Component/s: SSH
-
None
We use the google/android "repo" utility to manage the set of git repositories needed for our product.
When using "repo sync" with the -j (parallel) option, repo opens one SSH connection with the ControlMaster option, then multiplexes SSH sessions used to fetch/update each git repository over that connection rather than establishing a new connection for each fetch. This is to avoid SSH handshake overhead.
However, since updating from 3.4 to 3.11.2 last week, we've had intermittent sync failures in our CI builds, with errors like this:
repo has been initialized in /data/build-eqx-02.2/xml-data/build-dir/XXXXX
Write failed: Broken pipe
mux_client_request_session: read from master failed: Broken pipe
mux_client_request_session: read from master failed: Broken pipe
mux_client_request_session: read from master failed: Broken pipe
mux_client_request_session: read from master failed: Broken pipe
fatal: The remote end hung up upon initial contact
In other cases, the repo sync eventually succeeded, but took an hour where it normally would have been complete in a few minutes.
The sync-j option in repo's default.xml is set to 8, Stash's throttle.resource.scm-hosting is set to 8, and our build plan has 4 variants. So while there is over-subscription, as the CI build alone could result in 32 simultaneous fetches, Stash's throttling should handle this.
One reason I suspect a problem with Stash's SSH multiplexing is that yesterday we changed the CI build plan to set the GIT_SSH environment variable to "ssh" before invoking repo sync to disable multiplexing. We haven't seen this issue since.