Uploaded image for project: 'Bitbucket Data Center'
  1. Bitbucket Data Center
  2. BSERV-9246

Provide a way to perform collection of unreferenced Git LFS objects

    XMLWordPrintable

Details

    • Suggestion
    • Resolution: Unresolved
    • None
    • Git LFS
    • None
    • 11
    • We collect Bitbucket feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

    Description

      About "garbage collection" and Git LFS objects

      Operations like removing a branch or rewriting history in Git, might generate unreferenced objects. The same might happen for "Git LFS" objects. Whilst "Git" has git gc which, amongst other actions, removes unreferenced objects, "Git LFS" doesn't implement a similar process and, due to its nature, a considerable disk space might be taken up on its "content server" for teams whose Git workflow fits into this description.

      Why Bitbucket server does not implement that and the problem in more details

      The main issue with "Git LFS" is that the specification does not describe "garbage collection". This is why we didn't implement it in the first place. When we wrote Git LFS we thought about that and it all boils down to how "Git LFS" works. Unlike "Git", its nature is of a centralised "Git LFS store" with its files in the commit history being checked out on demand by the developers as the files which depend on Git LFS simply contain instructions on how to fetch the "Large File" from the server.

      To put that in perspective, let's think about 2 developers (A and B) who checked out a "Git LFS" repository. The Developer A does some work off a certain branch (that was originally Developer's B branch) and is working on a feature locally. Turns out that developer B decided to delete the branch that Developer A had branched off. That means that, if we were to garbage collect "Git LFS" objects on the server, we could potentially be breaking Developer's A work as we would have to remove all the objects from the aforementioned branch, leaving Developer A without access to those objects when attempting to checkout a previous commit on the branch, for example. On Git, that wouldn't be a problem as everyone has a full copy of the repository (i.e. when each commit on the branch pulls all the files down - there is no such concept as "go to the content server and give me a certain file"). See our Git LFS Tutorial for a more detailed explanation on "Git LFS".

      The second main reason is another complicating factor as to how our application is architected around Forks.

      Why would customers benefit from this?

      At the moment there is no way to perform that (apart from very complicated and unsupported manual operations) but customers whose workflows might generate many unreferenced "Git LFS" objects could benefit from that in the future even though every "Git LFS" garbage collection will be a risky operation.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tbomfim ThiagoBomfim (Inactive)
              Votes:
              28 Vote for this issue
              Watchers:
              27 Start watching this issue

              Dates

                Created:
                Updated: