Uploaded image for project: 'Bitbucket Server'
  1. Bitbucket Server
  2. BSERV-13312

Mesh: Support automatic self-healing of nodes after DR

    XMLWordPrintable

Details

    • Suggestion
    • Status: In Progress (View Workflow)
    • Resolution: Unresolved
    • None
    • Mesh
    • None
    • We collect Bitbucket feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

    Description

      Mesh does not yet support self-healing after disaster recovery. When Mesh nodes are recovered from snapshots, repositories that received one or more writes in the time window covering the snapshot times of the Mesh nodes may have inconsistent replicas on different nodes. Some writes could be captured in the snapshot of one Mesh node, but not in the other, leading to an inconsistent state. Repositories that did not receive writes in the "snapshot window" that the Mesh nodes were restored from are unaffected by this issue.

      Symptoms

      If the repository is in such an inconsistent state, the following problems may occur:

      • git processes, such as cloning a repository or listing a repository's refs, can return different results depending on what backing Mesh node services the request until the inconsistent replica(s) are repaired.
      • a successful write (e.g. a push to the repository) will trigger the repository to be repaired to the state that the majority of the replicas have, even if the minority has a newer version of the repository. As a result, a write that was only captured in the newest of the snapshots may be lost.
      • writes may fail and continue to fail if all three replicas captured a different state of the repository.

      Workarounds

      If a repository is suspected to be in an inconsistent state because a push is failing with the error "Ref update could not be replicated" or "error: remote unpack failed: unpack-objects abnormal exit", try to push a new branch or tag to the repository. This push may still fail, but will trigger the outdated replicas to be marked as inconsistent and subsequently repaired.

      Planned improvements

      Enhance Bitbucket Server to scan for such inconsistencies automatically upon recovery. For any detected inconsistencies, mark the outdated replica(s) as inconsistent to prevent requests from being sent to them, and schedule repairs.

      Attachments

        Issue Links

          Activity

            People

              mheemskerk Michael Heemskerk
              agenkin Anton Genkin
              Votes:
              4 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: