Uploaded image for project: 'Bitbucket Data Center'
  1. Bitbucket Data Center
  2. BSERV-6887

Stale refs cause hosting operations to fail over HTTP(S)

    XMLWordPrintable

Details

    Description

      When using HTTP or HTTPS for hosting operations, it is possible for a ref to change between the ref advertizement request and the upload-pack or receive-pack request. This causes the upload-pack or receive-pack operation to fail, which can cause CI builds and other clients to fail when the server is under heavy load.

      Ideally refs should not be modified after they are advertized or, if they are, it should not prevent the subsequent request from succeeding. This is difficult to achieve since HTTP(S) hosting requires multiple distinct requests to the server. Two possible approaches might be deleting refs in a delayed fashion, or applying some sort of transaction. Implementing the transactional approach is hampered by the fact that the upload-pack or receive-pack request might never be made if the remote client decides everything is up-to-date (or is just doing an operation like git ls-remote).

      Analysis

      Regarding protocols

      Git and SSH protocols process the full hosting operation, ref advertizement and any subsequent upload-pack or receive-pack on the same network connection and, more importantly, in the same process. That allows Git and SSH hosting to read in all the refs for the ref advertizement and hold onto them. They don't lock them, per se; they simply cache the values the refs have. That improves performance, because the refs don't have to be read in again, but it also means when the next step of the hosting operation runs the requested refs can be validated against the cached state. So even if the ref has moved on disk, it won't cause the hosting operation to fail.

      HTTP, on the other hand, uses separate requests for the ref advertizement and the upload-pack or receive-pack, which means there's no cached state that can be retained between the two; it's two distinct forks of git http-backend that run. That means any changes to the refs on disk between the two requests does cause the hosting operation to fail.

      Stash + git functionality

      Stash does not advertise the refs that are available, or have any control over the payload of that advertisement; it forks out to git-http-backend (HTTP), git receive-pack or git-upload-pack (SSH) and they do the work. Similarly, when a clone/fetch/push/pull request is received, Stash forks out to one of those same commands to service it. Stash has no control over that processing, or its outcome; it just streams I/O between the remote client and the forked process and, for this race condition, steps in to replace the error message from git with one that makes what happened more explicit. Stash replaces the rather cryptic "fatal: git upload-pack: not our ref" message, which is not something anyone but a Git expert is likely to understand, with a clearer message: "A ref was requested that is no longer valid. The ref may have been updated while the git-upload-pack request was processing. Please try again."

      Git does not have any form of transaction, in the released versions, around its ref operations. Especially not across concurrent operations, or nearly-concurrent operations, being performed in separate processes. It has locking, to prevent concurrent modifications, but that locking is per-ref. There are patches on the mailing list which add ref transactions, but they have not yet graduated into a released version.

      Simple deletion is not necessarily the only trigger, so even if Stash built in some workaround where it took control of the ref advertisement and guarded refs that had been recently advertized so they couldn't be deleted, it wouldn't prevent the race condition. Even if the ref still exists, it's possible that, between the ref advertisement and the actual hosting request, a push moves that ref to point to a different hash. This can trigger the exact same "not our ref" behavior that a deletion can trigger, because really all that error is saying is "I don't have any ref that points to commit hash abc1234". In practice, that scenario generally works because of check_non_tip (discussed in this thread), but rebase-heavy workflows can evade that ancestry check and still fail with "not our ref" anyway.

      For further reference, the protocol declares, in http-protocol.txt:

      Clients MUST send at least one "want" command in the request body.
      Clients MUST NOT reference an id in a "want" command which did not
      appear in the response obtained through ref discovery unless the
      server advertises capability `allow-tip-sha1-in-want`.
      

      Note that this does not state that the server must retain refs between the advertizement and a subsequent git-upload-pack or git-receive-pack request, which may never come. It only states that the client must not request anything the server didn't advertize. The document goes on to describe the git-upload-pack processing as follows:

      Verify all objects in `want` are directly reachable from refs.
      
      The server MAY walk backwards through history or through
      the reflog to permit slightly stale requests.
      
      If no "want" objects are received, send an error:
      TODO: Define error if no "want" lines are requested.
      
      If any "want" object is not reachable, send an error:
      TODO: Define error if an invalid "want" is requested.
      

      (It's unfortunate that the documentation doesn't yet contain the exact errors; that's copied verbatim from the current Git source)

      As noted, the server may walk backwards through the reflogs. However, the implementation of git-upload-pack in the standard Git distribution does not do so. There's no option to enable it, because there's no code in Git to perform it. (Stash does not have reflogs enabled by default so even if it did the logs wouldn't be available without turning them on, but that's not difficult to achieve.) bturner sent a question to the mailing list to ask about this behavior; you can read more discussion there. It's worth noting that even if the reflog walking did happen, since reflogs are deleted when their associated refs are deleted it still would not fully fix the race condition; it would only address the rebase workflow aspect.

      Until ref transactions become a part of a released version of Git, there's very little Stash can do to fix this race condition. It is documented here on this issue to explain what's happening and why, and to allow interested people to watch for possible solutions being applied in the future.

      Workaround

      Use SSH for CI servers and other clients, since it does not suffer from this race condition. You can also vote for STASH-2508 to request git:// protocol support. While the issue is currently closed, it could be reassessed if sufficient use cases were present.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              drohan Daniel R
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: