Uploaded image for project: 'Bitbucket Data Center'
  1. Bitbucket Data Center
  2. BSERV-12393

Streaming directory contents or last modified details can hang

    XMLWordPrintable

Details

    Description

      Issue Summary

      Streaming directory contents or last modified details (the latest commits the for the files in a given directory) can randomly hang.

      Since loading contributing guidelines (or just checking whether there are contributing guidelines to load) streams directory contents to check for the presence of a CONTRIBUTING file, this issue can also cause loading contributing guidelines to hang. Since contributing guidelines are displayed on the pull request creation screen, this can block creating pull requests for subsets of repositories.

      Steps to Reproduce

      This issue is caused by a race condition in new non-blocking handling for git cat-file --\batch which shipped in Bitbucket Server 7.0. Since it's a race, there's no simple reproduction. Instead, it has a chance to happen on any ContentService.streamDirectory call, based on the cadence with which git cat-file --batch outputs data.

      One common requirement in order for the race to be possible is that the directory being streamed must have subdirectories. If it only contains files, the race will not occur.

      Expected Results

      • Loading directory contents works reliably
      • Loading last modified details works reliably
      • Loading contributing guidelines works reliably

      Actual Results

      Streaming directory contents sporadically hangs, which can eventually consume all of Tomcat's HTTP threads and block access to the server. If streaming directory contents hangs while searching for contributing guidelines, it can block access to pull requests for the repositories in which it happens.

      As a race condition, this issue is capable of producing a few different errors in the logs. However, the easiest way to identify it is by checking thread dumps and looking for threads blocked in call stacks like these:

      "http-nio-7990-exec-1" #421 daemon prio=5 os_prio=0 tid=0x00007f8e7408e800 nid=0x4d9 waiting on condition [0x00007f8e0dcf0000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x0000000097f1f9e0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      	at java.util.concurrent.locks.LockSupport.park(Unknown Source)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
      	at java.util.concurrent.ArrayBlockingQueue.take(Unknown Source)
      	at com.atlassian.stash.internal.scm.git.command.lstree.DirectoryCollapsingContentTreeCallback$DirectoryCollapsingBatchCatFileStdioHandler.collapse(DirectoryCollapsingContentTreeCallback.java:399)
      	at com.atlassian.stash.internal.scm.git.command.lstree.DirectoryCollapsingContentTreeCallback.onTreeNode(DirectoryCollapsingContentTreeCallback.java:103)
      	at com.atlassian.stash.internal.scm.git.command.lstree.CallbackLsTreeStdoutHandler.processLine(CallbackLsTreeStdoutHandler.java:87)
      	at com.atlassian.stash.internal.scm.git.command.lstree.CallbackLsTreeStdoutHandler.onStdout(CallbackLsTreeStdoutHandler.java:67)
      	at com.atlassian.bitbucket.dmz.process.AbstractLineHandler$$Lambda$2175/1852549262.accept(Unknown Source)
      	at com.atlassian.bitbucket.dmz.process.LinePump.offerLine(LinePump.java:139)
      	at com.atlassian.bitbucket.dmz.process.LinePump.forEach(LinePump.java:94)
      	at com.atlassian.bitbucket.dmz.process.AbstractLineHandler.onStdout(AbstractLineHandler.java:100)
      	at com.atlassian.bitbucket.internal.process.CompositeNioStdioHandler.onStdout(CompositeNioStdioHandler.java:171)
      	at com.atlassian.bitbucket.internal.process.nu.NioNuProcessHandler$$Lambda$1756/20986451.accept(Unknown Source)
      	at com.atlassian.bitbucket.internal.process.nu.NioNuProcessHandler.lambda$handleOutput$4(NioNuProcessHandler.java:364)
      	at com.atlassian.bitbucket.internal.process.nu.NioNuProcessHandler$$Lambda$1758/1646640953.run(Unknown Source)
      	at com.atlassian.bitbucket.internal.process.nu.NioNuProcessHandler.lambda$invokeCallback$5(NioNuProcessHandler.java:372)
      	at com.atlassian.bitbucket.internal.process.nu.NioNuProcessHandler$$Lambda$1748/627663807.getAsBoolean(Unknown Source)
      	at com.atlassian.bitbucket.internal.process.nu.NioNuProcessHandler.invokeCallback(NioNuProcessHandler.java:380)
      	at com.atlassian.bitbucket.internal.process.nu.NioNuProcessHandler.invokeCallback(NioNuProcessHandler.java:371)
      	at com.atlassian.bitbucket.internal.process.nu.NioNuProcessHandler.handleOutput(NioNuProcessHandler.java:358)
      	at com.atlassian.bitbucket.internal.process.nu.NioNuProcessHandler.onStdout(NioNuProcessHandler.java:184)
      	at com.zaxxer.nuprocess.internal.BasePosixProcess.readStdout(BasePosixProcess.java:363)
      	at com.zaxxer.nuprocess.linux.ProcessEpoll.process(ProcessEpoll.java:243)
      	at com.zaxxer.nuprocess.internal.BaseEventProcessor.run(BaseEventProcessor.java:81)
      	at com.zaxxer.nuprocess.linux.ProcessEpoll.run(ProcessEpoll.java:188)
      	at com.zaxxer.nuprocess.linux.LinuxProcess.run(LinuxProcess.java:114)
      	at com.zaxxer.nuprocess.linux.LinProcessFactory.runProcess(LinProcessFactory.java:50)
      	at com.zaxxer.nuprocess.NuProcessBuilder.run(NuProcessBuilder.java:273)
      	at com.atlassian.bitbucket.internal.process.nu.NuNioProcessHelper.run(NuNioProcessHelper.java:75)
      	at com.atlassian.bitbucket.internal.process.NioCommand.call(NioCommand.java:52)
      	at com.atlassian.stash.internal.content.DefaultContentService.streamDirectory(DefaultContentService.java:197)
      
      http-nio-7990-exec-2" #20 daemon prio=5 os_prio=0 tid=0x00007f8f4483d000 nid=0xa9 waiting on condition [0x00007f8eb155d000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x0000000083307f88> (a com.google.common.util.concurrent.SettableFuture)
      	at java.util.concurrent.locks.LockSupport.park(Unknown Source)
      	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:502)
      	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
      	at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196)
      	at com.google.common.cache.LocalCache$LoadingValueReference.waitForValue(LocalCache.java:3581)
      	at com.google.common.cache.LocalCache$Segment.waitForLoadingValue(LocalCache.java:2174)
      	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2038)
      	at com.google.common.cache.LocalCache.get(LocalCache.java:3952)
      	at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4871)
      	at com.atlassian.cache.memory.DelegatingCache.get(DelegatingCache.java:178)
      	at com.atlassian.cache.hazelcast.HazelcastAsyncHybridCache.get(HazelcastAsyncHybridCache.java:86)
      	at com.atlassian.bitbucket.server.internal.contributing.DefaultContributingGuidelinesService.getPath(DefaultContributingGuidelinesService.java:77)
      	at com.atlassian.bitbucket.server.internal.contributing.DefaultContributingGuidelinesService.getPath(DefaultContributingGuidelinesService.java:97)
      	at com.atlassian.bitbucket.server.internal.contributing.PullRequestCreateContributingGuidelinesFormFragment.doView(PullRequestCreateContributingGuidelinesFormFragment.java:47)
      

      After some delay, the git cat-file --batch process will be terminated by the system (which will not release the deadlocked HTTP threads). When that happens, the stack trace will include an exception like this:

      	Suppressed: com.atlassian.bitbucket.scm.CommandFailedException: '/usr/bin/git cat-file --batch' exited with code 15
      		at com.atlassian.bitbucket.scm.DefaultCommandExitHandler.onError(DefaultCommandExitHandler.java:43)
      		at com.atlassian.bitbucket.scm.git.command.GitCommandExitHandler.evaluateThrowable(GitCommandExitHandler.java:111)
      		at com.atlassian.bitbucket.scm.git.command.GitCommandExitHandler.onError(GitCommandExitHandler.java:208)
      		at com.atlassian.bitbucket.scm.DefaultCommandExitHandler.onExit(DefaultCommandExitHandler.java:32)
      		at com.atlassian.bitbucket.internal.process.nu.NioNuProcessHandler.callExitHandler(NioNuProcessHandler.java:286)
      		at com.atlassian.bitbucket.internal.process.nu.NioNuProcessHandler.finish(NioNuProcessHandler.java:327)
      

      Workaround

      For affected versions, the workaround is to disable NIO handling by setting process.nio.enabled=false in bitbucket.properties and restart the server.

      *Warning*: Support for disabling NIO handling will be removed in a future version, so after upgrading to a fixed version the process.nio.enabled=false setting should be removed.

      Attachments

        Issue Links

          Activity

            People

              bturner Bryan Turner (Inactive)
              bturner Bryan Turner (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: