Uploaded image for project: 'Bitbucket Data Center'
  1. Bitbucket Data Center
  2. BSERV-4057

Handle database/filesystem conflicts (due to incorrect backup/restore) more gracefully

    • We collect Bitbucket feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      Original summary: "Add check on creating/deleting a Repository"
      Hi.
      On my Backup/restore tests I wanted to find out what happens if I do this backup "online" (hot-backup).
      I found out that always the database has the main knowledge of the system configuration and it seems only the filesystem (stash_home) knows some extra informations about plugins and repositories.

      In a scenario where you create a database backup followed by the backup of stash_home within a huge (unnatuaral) period of time you can describe following case:
      if you create a repository within this time, it comes that if you restore that backup - this repository is gone. This is okay as at the time of the database backup the repository was unknown. But sadly this repository exists in the stash_home filesystem. First nobody knows BUT:
      If you now try to create a new Repository on this restored system you get an error: stash says the reppsitory "repoID" (it's the repo-id-number) would exist... Looking into the filesystem you will see this repo existing...

      I came to the conclusion that Stash iterates by 1 when creating a new repository...
      I'd like to ask you to improve that by checking first if a repo with the next iteration already exists and suggest to create a new repository with one more iteration. Also put a big warning that it seems there is a "ghost-repository"
      this would not paralyze our users from creatign repositories when having a huge stash-instance later

      You can imageine that there is obviously the same counter when deleting a repository within that timespan.

      Maybe as a spanning view I'd suggest to store some metainformation (which stash-project, wich repo-name) inside the bulk-repository on the filesystem. (I'll look for an open issue for that or open another one later)

      Thanks!

            [BSERV-4057] Handle database/filesystem conflicts (due to incorrect backup/restore) more gracefully

            We'll track this suggestion in BSERV-4255 which we believe incorporates this.

            Roger Barnes (Inactive) added a comment - We'll track this suggestion in BSERV-4255 which we believe incorporates this.

            herzog@t-systems.com_match added a comment -

            Hi thank you for fiddling with that.
            Well, your question is a good one. I can't really answer it but think of it.

            As I know from other applications we run in our company they have some "consistency checks". Some are running on demand, others can be set on application start up. They compare the Data held in Database with the one that are stored in the filesystem. How they solve the conflicts differs from application to application and I don't really know what they internally really do if there are conflicts...
            But as there is an unwritten general concept of how to do application backups: database first, filesystem later - it make sense to follow this by checking the consistency of DB vs FS by helding the DB as "base line information".

            herzog@t-systems.com_match added a comment - Hi thank you for fiddling with that. Well, your question is a good one. I can't really answer it but think of it. As I know from other applications we run in our company they have some "consistency checks". Some are running on demand, others can be set on application start up. They compare the Data held in Database with the one that are stored in the filesystem. How they solve the conflicts differs from application to application and I don't really know what they internally really do if there are conflicts... But as there is an unwritten general concept of how to do application backups: database first, filesystem later - it make sense to follow this by checking the consistency of DB vs FS by helding the DB as "base line information".

            Felix,

            When the database and STASH_HOME backups are out of sync, you can run into all sorts of issues:

            If the database is older than STASH_HOME:

            • Repositories exist on disk that are unknown in the database because they were created after the db backup completed.
            • Repositories no longer exist on disk, but still exist in the database because they were deleted after the db backup completed.
            • The audit log (STASH_HOME/log/audit) contains entries for changes that have been 'undone' because the db has been rolled back to an earlier version.
            • Pull requests have been created on disk (internal git refs), notifications have been sent out, but they don't exist in the database backup. ID conflicts will arise when the next pull request is created. Links in emails will be incorrect.

            Depending on how you do your 'hot backup', there may be a host of other, even more serious problems that you should consider. If the database backup isn't atomic and write operations are being processed while the database is being backed up, the database backup may be internally inconsistent. Likewise, if you're using something like rsync to create a 'hot backup' of STASH_HOME while write operations are proceeding in STASH_HOME, you'll very likely end up with corrupt repositories. Tools like rsync work on a file-by-file basis and don't guarantee a consistent 'snapshot' of the file system. It is for these reasons that Stash puts the system in 'maintenance mode' during backup, blocking all SCM and database write operations for the duration of the backup.

            This list of issues is incomplete and as the time gap between the db and STASH_HOME backup increases, problems will be more prevalent.

            If I understand correctly, your suggestion is to make Stash handle some of the conflict cases more gracefully (and still warn). However, conflicts like these can indicate other problems (e.g. pointing a test server to the production database or vice versa), in which case you probably want to fail fast?

            Michael Heemskerk (Inactive) added a comment - Felix, When the database and STASH_HOME backups are out of sync, you can run into all sorts of issues: If the database is older than STASH_HOME: Repositories exist on disk that are unknown in the database because they were created after the db backup completed. Repositories no longer exist on disk, but still exist in the database because they were deleted after the db backup completed. The audit log (STASH_HOME/log/audit) contains entries for changes that have been 'undone' because the db has been rolled back to an earlier version. Pull requests have been created on disk (internal git refs), notifications have been sent out, but they don't exist in the database backup. ID conflicts will arise when the next pull request is created. Links in emails will be incorrect. Depending on how you do your 'hot backup', there may be a host of other, even more serious problems that you should consider. If the database backup isn't atomic and write operations are being processed while the database is being backed up, the database backup may be internally inconsistent. Likewise, if you're using something like rsync to create a 'hot backup' of STASH_HOME while write operations are proceeding in STASH_HOME, you'll very likely end up with corrupt repositories. Tools like rsync work on a file-by-file basis and don't guarantee a consistent 'snapshot' of the file system. It is for these reasons that Stash puts the system in 'maintenance mode' during backup, blocking all SCM and database write operations for the duration of the backup. This list of issues is incomplete and as the time gap between the db and STASH_HOME backup increases, problems will be more prevalent. If I understand correctly, your suggestion is to make Stash handle some of the conflict cases more gracefully (and still warn). However, conflicts like these can indicate other problems (e.g. pointing a test server to the production database or vice versa), in which case you probably want to fail fast?

              Unassigned Unassigned
              herzog@t-systems.com_match herzog@t-systems.com_match
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: