Description
Issue Summary
When attempting to migrate a Repository back to the remote mesh that was previously remigrated back to the NFS share from the remote mesh, it fails with the following ERROR message
2024-04-03 14:45:12,004 ERROR [dc-migration:thread-1] danny *M4OZ1Vx846x420925x13 nb7hm4 10.151.208.210,10.150.3.43 "POST /rest/api/latest/migration/mesh HTTP/1.1" c.a.s.i.m.DefaultMeshMigrationService Migration of hierarchy 4b112484deaa515877c5 failed
com.atlassian.bitbucket.dmz.migration.MeshMigrationFailedException: [TEST/server[21436]] Sidecar failed to repair the primary Mesh repository. Aborting migration
This is reproducible on Data Center: (no)
Steps to Reproduce
Couldn't reproduce the issue locally
Expected Results
Repo successfully migrated back to Remote Mesh
Actual Results
Migration fails with below errors
The below exception is thrown in the atlassian-bitbucket.log file:
2024-04-03 14:45:11,992 ERROR [mesh-grpc-request:thread-130] danny *M4OZ1Vx846x420925x13 nb7hm4 10.151.208.210,10.150.3.43 "POST /rest/api/latest/migration/mesh HTTP/1.1" c.a.s.i.s.g.m.DefaultErrorTranslator ABORTED: Repair of p/000c/h/4b112484deaa515877c5/r/21436 is already running 2024-04-03 14:45:12,004 ERROR [dc-migration:thread-1] danny *M4OZ1Vx846x420925x13 nb7hm4 10.151.208.210,10.150.3.43 "POST /rest/api/latest/migration/mesh HTTP/1.1" c.a.s.i.m.DefaultMeshMigrationService Migration of hierarchy 4b112484deaa515877c5 failed com.atlassian.bitbucket.dmz.migration.MeshMigrationFailedException: [~TEST/server[21436]] Sidecar failed to repair the primary Mesh repository. Aborting migration at com.atlassian.stash.internal.scm.git.mesh.RepositoryMeshMigrator$GitHierarchyMigration.repairFromSidecar(RepositoryMeshMigrator.java:382) at com.atlassian.stash.internal.scm.git.mesh.RepositoryMeshMigrator$GitHierarchyMigration.stage(RepositoryMeshMigrator.java:315) at com.atlassian.stash.internal.migration.DefaultMeshMigrationService$MeshMigrationVisitor.stageRepository(DefaultMeshMigrationService.java:665) at com.atlassian.stash.internal.migration.DefaultMeshMigrationService$MeshMigrationVisitor.visit(DefaultMeshMigrationService.java:459) at com.atlassian.stash.internal.migration.DefaultMeshMigrationService$MeshMigrationVisitor.visit(DefaultMeshMigrationService.java:372) at com.atlassian.bitbucket.scope.RepositoryScope.accept(RepositoryScope.java:26) at com.atlassian.stash.internal.migration.DefaultMeshMigrationService.lambda$migrateRepositories$7(DefaultMeshMigrationService.java:224) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.LinkedList$LLSpliterator.forEachRemaining(LinkedList.java:1235) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1580) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1723) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at com.atlassian.stash.internal.migration.DefaultMeshMigrationService.migrateRepositories(DefaultMeshMigrationService.java:222) at com.atlassian.stash.internal.migration.DefaultMigrationService.lambda$startMeshMigration$11(DefaultMigrationService.java:505) at com.atlassian.sal.core.executor.ThreadLocalDelegateRunnable.run(ThreadLocalDelegateRunnable.java:34) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.lang.Thread.run(Thread.java:750) ... 14 frames trimmed Caused by: com.atlassian.bitbucket.scm.CommandFailedException: 'Unknown' exited with code -1 at com.atlassian.stash.internal.scm.git.mesh.DefaultErrorTranslator.translateDefault(DefaultErrorTranslator.java:167) at com.atlassian.stash.internal.scm.git.mesh.DefaultErrorTranslator.translate(DefaultErrorTranslator.java:104) at com.atlassian.stash.internal.scm.git.mesh.DefaultErrorTranslator.translateIfKnownCause(DefaultErrorTranslator.java:269) at com.atlassian.stash.internal.scm.git.mesh.DefaultErrorTranslator.maybeTranslate(DefaultErrorTranslator.java:57) at com.atlassian.stash.internal.scm.git.mesh.AbstractFutureResponseObserver.maybeTranslate(AbstractFutureResponseObserver.java:209) at com.atlassian.stash.internal.scm.git.mesh.AbstractFutureResponseObserver.lambda$asFuture$1(AbstractFutureResponseObserver.java:123) at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884) at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) at com.atlassian.stash.internal.scm.git.mesh.AbstractFutureResponseObserver.onError(AbstractFutureResponseObserver.java:99) at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:487) at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at com.atlassian.stash.internal.scm.git.mesh.LastSeenClientInterceptor$LastSeenClientListener.onClose(LastSeenClientInterceptor.java:40) at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at com.atlassian.stash.internal.scm.git.mesh.StatefulClientCallListener.onClose(StatefulClientCallListener.java:34) at com.atlassian.stash.internal.scm.git.mesh.ErrorHandlingClientInterceptor$ErrorHandlingCall$1.onClose(ErrorHandlingClientInterceptor.java:149) at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562) at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ... 3 common frames omitted Caused by: io.grpc.StatusRuntimeException: ABORTED: Repair of p/000c/h/4b112484deaa515877c5/r/21436 is already running at io.grpc.Status.asRuntimeException(Status.java:535) ... 19 common frames omitted
Remote Mesh Log Errors
2024-04-03 14:45:11,977 DEBUG [grpc-server:thread-3860] danny 5J57O7SDx885x17403530x9 *M4OZ1Vx846x420925x13,3HI0PCCJx885x135654x4 10.150.3.45 "RepositoryService/Repair" (>1 <0) c.a.b.mesh.repair.RepairTarget [p/000c/h/4b112484deaa515877c5/r/21436] Starting repair from ds/0/h/4b112484deaa515877c5/r/21436 2024-04-03 14:45:11,978 WARN [grpc-server:thread-3860] danny 5J57O7SDx885x17403530x9 *M4OZ1Vx846x420925x13,3HI0PCCJx885x135654x4 10.150.3.45 "RepositoryService/Repair" (>1 <0) c.a.b.mesh.repair.RepairTarget [p/000c/h/4b112484deaa515877c5/r/21436] Repair failed io.grpc.StatusRuntimeException: ABORTED: Repair of p/000c/h/4b112484deaa515877c5/r/21436 is already running at io.grpc.Status.asRuntimeException(Status.java:535) at com.atlassian.bitbucket.mesh.AbstractStatusException.toStatusException(AbstractStatusException.java:44) at com.atlassian.bitbucket.mesh.repair.RepairTarget.handleError(RepairTarget.java:225) at com.atlassian.bitbucket.mesh.repair.RepairTarget.onNext(RepairTarget.java:181) at com.atlassian.bitbucket.mesh.repair.RepairTarget.onNext(RepairTarget.java:61) at com.atlassian.bitbucket.mesh.grpc.GrpcServiceAdvice$ErrorTranslatingStreamObserver.onNext(GrpcServiceAdvice.java:133) at io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262) at io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33) at com.atlassian.bitbucket.mesh.request.RequestServerCallListener.onMessage(RequestServerCallListener.java:29) at io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33) at com.atlassian.bitbucket.mesh.grpc.ExecutionContextServerCallListener.lambda$onMessage$3(ExecutionContextServerCallListener.java:36) at io.grpc.Context.run(Context.java:536) at com.atlassian.bitbucket.mesh.execution.GrpcExecutionManager$GrpcExecutionContext.run(GrpcExecutionManager.java:232) at com.atlassian.bitbucket.mesh.grpc.ExecutionContextServerCallListener.onMessage(ExecutionContextServerCallListener.java:36) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:330) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:313) at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:834) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: com.atlassian.bitbucket.mesh.repair.RepositoryRepairAlreadyRunningException: Repair of p/000c/h/4b112484deaa515877c5/r/21436 is already running at com.atlassian.bitbucket.mesh.repair.RepairGate.acquireTicket(RepairGate.java:55) at com.atlassian.bitbucket.mesh.repair.DefaultInteractiveRepairHelper.startRepair(DefaultInteractiveRepairHelper.java:136) at com.atlassian.bitbucket.mesh.repair.RepairTarget.startRepair(RepairTarget.java:199) at com.atlassian.bitbucket.mesh.repair.RepairTarget.onNext(RepairTarget.java:142) ... 18 common frames omitted 2024-04-03 14:45:11,978 DEBUG [grpc-server:thread-3860] danny 5J57O7SDx885x17403530x9 *M4OZ1Vx846x420925x13,3HI0PCCJx885x135654x4 10.150.3.45 "RepositoryService/Repair" (>1 <0) c.a.b.mesh.grpc.GrpcServiceAdvice The RPC was closed twice java.lang.IllegalStateException: call already closed at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at io.grpc.internal.ServerCallImpl.closeInternal(ServerCallImpl.java:216) at io.grpc.internal.ServerCallImpl.close(ServerCallImpl.java:209) at io.grpc.PartialForwardingServerCall.close(PartialForwardingServerCall.java:48) at io.grpc.ForwardingServerCall.close(ForwardingServerCall.java:22) at io.grpc.ForwardingServerCall$SimpleForwardingServerCall.close(ForwardingServerCall.java:39) at com.atlassian.bitbucket.mesh.grpc.LoggingServerInterceptor$LoggingServerCall.close(LoggingServerInterceptor.java:37) at io.grpc.PartialForwardingServerCall.close(PartialForwardingServerCall.java:48) at io.grpc.ForwardingServerCall.close(ForwardingServerCall.java:22) at io.grpc.ForwardingServerCall$SimpleForwardingServerCall.close(ForwardingServerCall.java:39) at com.atlassian.bitbucket.mesh.request.RequestServerCall.close(RequestServerCall.java:30) at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onError(ServerCalls.java:389) at com.atlassian.bitbucket.mesh.grpc.BackoffStreamObserver.onError(BackoffStreamObserver.java:57) at com.atlassian.bitbucket.mesh.repair.RepairTarget.handleError(RepairTarget.java:231) at com.atlassian.bitbucket.mesh.repair.RepairTarget.onNext(RepairTarget.java:181) at com.atlassian.bitbucket.mesh.repair.RepairTarget.onNext(RepairTarget.java:61) at com.atlassian.bitbucket.mesh.grpc.GrpcServiceAdvice$ErrorTranslatingStreamObserver.onNext(GrpcServiceAdvice.java:133) at io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262) at io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33) at com.atlassian.bitbucket.mesh.request.RequestServerCallListener.onMessage(RequestServerCallListener.java:29) at io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33) at com.atlassian.bitbucket.mesh.grpc.ExecutionContextServerCallListener.lambda$onMessage$3(ExecutionContextServerCallListener.java:36) at io.grpc.Context.run(Context.java:536) at com.atlassian.bitbucket.mesh.execution.GrpcExecutionManager$GrpcExecutionContext.run(GrpcExecutionManager.java:232) at com.atlassian.bitbucket.mesh.grpc.ExecutionContextServerCallListener.onMessage(ExecutionContextServerCallListener.java:36) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:330) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:313) at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:834) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)
Workaround
Move the existing hierarchy directory from the partition on all Mesh Nodes and retry the Mesh Migration.
From the above logs the hierarchy directory is the 4b112484deaa515877c5 directory under the partition 000c in the Filesystem.