-
Suggestion
-
Resolution: Unresolved
-
None
-
None
-
14
-
About "garbage collection" and Git LFS objects
Operations like removing a branch or rewriting history in Git, might generate unreferenced objects. The same might happen for "Git LFS" objects. Whilst "Git" has git gc which, amongst other actions, removes unreferenced objects, "Git LFS" doesn't implement a similar process and, due to its nature, a considerable disk space might be taken up on its "content server" for teams whose Git workflow fits into this description.
Why Bitbucket server does not implement that and the problem in more details
The main issue with "Git LFS" is that the specification does not describe "garbage collection". This is why we didn't implement it in the first place. When we wrote Git LFS we thought about that and it all boils down to how "Git LFS" works. Unlike "Git", its nature is of a centralised "Git LFS store" with its files in the commit history being checked out on demand by the developers as the files which depend on Git LFS simply contain instructions on how to fetch the "Large File" from the server.
To put that in perspective, let's think about 2 developers (A and B) who checked out a "Git LFS" repository. The Developer A does some work off a certain branch (that was originally Developer's B branch) and is working on a feature locally. Turns out that developer B decided to delete the branch that Developer A had branched off. That means that, if we were to garbage collect "Git LFS" objects on the server, we could potentially be breaking Developer's A work as we would have to remove all the objects from the aforementioned branch, leaving Developer A without access to those objects when attempting to checkout a previous commit on the branch, for example. On Git, that wouldn't be a problem as everyone has a full copy of the repository (i.e. when each commit on the branch pulls all the files down - there is no such concept as "go to the content server and give me a certain file"). See our Git LFS Tutorial for a more detailed explanation on "Git LFS".
The second main reason is another complicating factor as to how our application is architected around Forks.
Why would customers benefit from this?
At the moment there is no way to perform that (apart from very complicated and unsupported manual operations) but customers whose workflows might generate many unreferenced "Git LFS" objects could benefit from that in the future even though every "Git LFS" garbage collection will be a risky operation.