Adding or Deleting a Mesh node can lead to disk space exhaustion on the node

XMLWordPrintable

    • Severity 3 - Minor
    • CtB - Improve Existing

      Issue Summary

      This is reproducible on Data Center: Yes

      On installations with large repository hierarchies managed by Mesh (hierarchies with a large number of repository and where each repo has a large packfile - in excess of 10GiB), addition or deletion of a Mesh node can fail due to disk space being exhausted on it.

      Steps to Reproduce

      1. Have a Bitbucket instance with large repository hierarchies being managed by Mesh
      2. Add a new Mesh node or delete an existing Mesh node (subject to the replication factor requirement, i.e., deletion still leaves sufficient number of nodes to respect the configured replication factor (3 by default)

      Expected Results

      • While adding a new node, the node should get added. A certain number of existing repositories managed by Mesh are placed on it and it starts serving requests for them.
      • While removing an existing node, repositories on it are moved to other nodes, and the node is successfully deleted on it. 

      Actual Results

      The below exception may be thrown in the atlassian-mesh.log file on the node:

      io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Invalid pack: UNKNOWN (sha1 file 'objects/pack/tmp_idx_VeGbd7' write error. Out of diskspace)
      	at io.grpc.Status.asRuntimeException(Status.java:533)
      	at com.atlassian.bitbucket.mesh.AbstractStatusException.toStatusException(AbstractStatusException.java:44)
      	at com.atlassian.bitbucket.mesh.repair.RepairTarget.handleError(RepairTarget.java:226)
      	at com.atlassian.bitbucket.mesh.repair.RepairTarget.onNext(RepairTarget.java:182)
      	at com.atlassian.bitbucket.mesh.repair.RepairTarget.onNext(RepairTarget.java:60)
      	at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
      	at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
      	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:657)
      	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:644)
      	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
      	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      	at java.base/java.lang.Thread.run(Thread.java:840) 

      Workaround

      Currently there is no known workaround for this behavior. A workaround will be added here when available

            Assignee:
            Unassigned
            Reporter:
            Chandravadan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: