Uploaded image for project: 'Bitbucket Data Center'
  1. Bitbucket Data Center
  2. BSERV-10575

Too many open files caused by a deadlock in SSH threads due to race condition with PuTTY SSH clients

      BSERV-10100 introduced throttling of SSH writes when the remote SSH client specifies a large SSH window size. Recent versions of PuTTY request a windows size of 2GB and the throttling was necessary to prevent OutOfMemoryErrors.

      Unfortunately, the fix for BSERV-10100 introduced a race condition which can result in a deadlock when an SSH connection is terminated by the remote client while the pending writes to the client are throttled. This deadlock affects the SSH I/O processing thread(s). SSH sessions are mapped to I/O processing threads. Any (new) sessions mapped to a deadlocked I/O processing thread will hang without any I/O processing.

      Symptoms
      Some git operations over SSH will hang without any output and fail after a timeout occurs.

      The atlassian-bitbucket.log contains "Too many open files" errors:

      java.io.IOException: Too many open files
      

      Diagnosis
      Take a thread dump of the Bitbucket Server instance and look for threads with a name starting with NioProcessor-. If these threads have stacktrace similar to the following, the thread is deadlocked.

       - org.apache.sshd.common.channel.ChannelOutputStream.close() @bci=0, line=232 (Compiled frame)
       - org.apache.sshd.common.util.io.IoUtils.closeQuietly(java.io.Closeable[]) @bci=29, line=127 (Compiled frame)
       - org.apache.sshd.server.channel.ChannelSession.doCloseImmediately() @bci=155, line=209 (Compiled frame)
       - org.apache.sshd.common.util.closeable.AbstractCloseable.close(boolean) @bci=65, line=81 (Compiled frame)
       - org.apache.sshd.common.channel.AbstractChannel.close(boolean) @bci=37, line=448 (Compiled frame)
       - com.atlassian.bitbucket.internal.ssh.server.ScmHostingChannelSession.close(boolean) @bci=2, line=47 (Compiled frame)
       - org.apache.sshd.common.util.closeable.ParallelCloseable.doClose(boolean) @bci=117, line=65 (Compiled frame)
       - org.apache.sshd.common.util.closeable.SimpleCloseable.close(boolean) @bci=14, line=63 (Compiled frame)
       - org.apache.sshd.common.util.closeable.AbstractInnerCloseable.doCloseImmediately() @bci=5, line=47 (Compiled frame)
      

            [BSERV-10575] Too many open files caused by a deadlock in SSH threads due to race condition with PuTTY SSH clients

            Hi all,

            The custom SSH plugin that was attached has been removed from this ticket as we've just released the new official versions of Bitbucket that include this fix. All the fix versions listed in this issue are available for download and will ensure that you do not run into this issue.

            Paul Thompson (Inactive) added a comment - Hi all, The custom SSH plugin that was attached has been removed from this ticket as we've just released the new official versions of Bitbucket that include this fix. All the fix versions listed in this issue are available for download and will ensure that you do not run into this issue.

            Okay Thanks for your quick response.

            Thomas Beck added a comment - Okay Thanks for your quick response.

            Hi Thomas,

            To be clear, this issue is very a much a race condition that only occurs when remote windows clients (using PuTTY) terminate the SSH connection early (CtrI-C). When deadlocked, the thread dumps would show the "Nio-Processor-<num>" threads in a BLOCKED state and  the stacktrace for the thread blocked on "org.apache.sshd.common.channel.ChannelOutputStream.close()". 

            Your thread dump snippets do not show a deadlock, but if you'd like more help, I'd suggest opening a support request so our support engineers can advise you.

            Michael Heemskerk (Inactive) added a comment - Hi Thomas, To be clear, this issue is very a much a race condition that only occurs when remote windows clients (using PuTTY) terminate the SSH connection early (CtrI-C). When deadlocked, the thread dumps would show the "Nio-Processor-<num>" threads in a BLOCKED state and  the stacktrace for the thread blocked on "org.apache.sshd.common.channel.ChannelOutputStream.close()".  Your thread dump snippets do not show a deadlock, but if you'd like more help, I'd suggest opening a support request so our support engineers can advise you.

            Hello Michael,
            I do not understand how I can check my system for threads which are  deadlocked.

            I´m create a thread dump based on this description:

            https://confluence.atlassian.com/bitbucketserverkb/generate-a-thread-dump-externally-779171716.html

            than I grep inside of the dump for NioProcessor

            "NioProcessor-1" #764366 daemon prio=5 os_prio=0 tid=0x00007f8e1005d000 nid=0x8441 runnable [0x00007f8d41f62000]   java.lang.Thread.State: RUNNABLE        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)        - locked <0x000000009351d868> (a sun.nio.ch.Util$3)        - locked <0x000000009351d858> (a java.util.Collections$UnmodifiableSet)        - locked <0x000000009351d878> (a sun.nio.ch.EPollSelectorImpl)        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)        at org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:98)        at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1075)        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)        at java.lang.Thread.run(Thread.java:748) 

            How can I identify deadlock threads on my nodes?

            Can you please give me some details how to find it.

            Thomas Beck added a comment - Hello Michael, I do not understand how I can check my system for threads which are  deadlocked. I´m create a thread dump based on this description: https://confluence.atlassian.com/bitbucketserverkb/generate-a-thread-dump-externally-779171716.html than I grep inside of the dump for NioProcessor "NioProcessor-1" #764366 daemon prio=5 os_prio=0 tid=0x00007f8e1005d000 nid=0x8441 runnable [0x00007f8d41f62000]   java.lang.Thread.State: RUNNABLE        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)        - locked <0x000000009351d868> (a sun.nio.ch.Util$3)        - locked <0x000000009351d858> (a java.util.Collections$UnmodifiableSet)        - locked <0x000000009351d878> (a sun.nio.ch.EPollSelectorImpl)        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)        at org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:98)        at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1075)        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)        at java.lang.Thread.run(Thread.java:748) How can I identify deadlock threads on my nodes? Can you please give me some details how to find it.

            Is this a usable fall-back solution?

            If you want/need to revert to the original version, all you need to do is remove BITBUCKET_HOME/shared/plugins/installed-plugins/plugin_<some-hash>_bitbucket-ssh-4.14.11-70941489b2fda6.jar and then restart the nodes one-by-one. There's no need to back up the original jar since the bundled plugin is still available to the system. It's just that a newer version of the plugin will 'mask' the bundled version. Remove the newer version and the bundled version will be re-activated.

            How can I verify that the bugfix works?

            You can verify that the correct version of the plugin is installed by going to Administration > Manage add-ons, select "All add-ons" from the dropdown and checking the version on the "Bitbucket Server - SSH" entry.

            Other than that, the effect should be that no more NioProcessor threads should get stuck in the close() operation listed in the issue description. You can verify that by taking a thread dump and inspecting it.

            I have a Data centre with 4 nodes running.

            You only need to upload the patch plugin once and it'll be activated on all 4 nodes.

            Michael Heemskerk (Inactive) added a comment - Is this a usable fall-back solution? If you want/need to revert to the original version, all you need to do is remove BITBUCKET_HOME/shared/plugins/installed-plugins/plugin_<some-hash>_bitbucket-ssh-4.14.11-70941489b2fda6.jar and then restart the nodes one-by-one. There's no need to back up the original jar since the bundled plugin is still available to the system. It's just that a newer version of the plugin will 'mask' the bundled version. Remove the newer version and the bundled version will be re-activated. How can I verify that the bugfix works? You can verify that the correct version of the plugin is installed by going to Administration > Manage add-ons, select "All add-ons" from the dropdown and checking the version on the "Bitbucket Server - SSH" entry. Other than that, the effect should be that no more NioProcessor threads should get stuck in the close() operation listed in the issue description. You can verify that by taking a thread dump and inspecting it. I have a Data centre with 4 nodes running. You only need to upload the patch plugin once and it'll be activated on all 4 nodes.

            Sorry once more,
            I have a Data centre with 4 nodes running.
            Must I backup the file ./atlassian-bitbucket/WEB-INF/atlassian-bundled-plugins/bitbucket-ssh-4.14.10.jar on each node?
            Does the upload works for each node?

            Thats not clear for me.

            Thomas Beck added a comment - Sorry once more, I have a Data centre with 4 nodes running. Must I backup the file ./atlassian-bitbucket/WEB-INF/atlassian-bundled-plugins/bitbucket-ssh-4.14.10.jar on each node? Does the upload works for each node? Thats not clear for me.

            Tanks Michael,
            here are my last questions:

            1. if i made a safety copy of
              ./atlassian-bitbucket/WEB-INF/atlassian-bundled-plugins/bitbucket-ssh-4.14.10.jar
              and I do the upload. Everything is fine.
              But in case of some problems i do the same upload with my safety copy, Is this a usable fall-back solution?
              I need a fall-back, because it is our productive system with ~7000 users.
            1. How can I verify that the bugfix works?
              best regards Thomas

            Thomas Beck added a comment - Tanks Michael, here are my last questions: if i made a safety copy of ./atlassian-bitbucket/WEB-INF/atlassian-bundled-plugins/bitbucket-ssh-4.14.10.jar and I do the upload. Everything is fine. But in case of some problems i do the same upload with my safety copy, Is this a usable fall-back solution? I need a fall-back, because it is our productive system with ~7000 users. How can I verify that the bugfix works? best regards Thomas

            Here's a (very) short description:

            • Go to Administration > Manage Add-ons
            • Choose "Upload add-on" and upload the patched ssh plugin
            • Wait for the upload and install to finish. 

            Michael Heemskerk (Inactive) added a comment - Here's a (very) short description: Go to Administration > Manage Add-ons Choose "Upload add-on" and upload the patched ssh plugin Wait for the upload and install to finish. 

            Hello Michael,
            this sound very good. Can you please provide me a small step by step description how I can install this patched plugin correctly.
            Only to take care that my system is updated in the right way.
            Thanks again and I will spend you a cup of coffee if we met us in the future

            Thomas Beck added a comment - Hello Michael, this sound very good. Can you please provide me a small step by step description how I can install this patched plugin correctly. Only to take care that my system is updated in the right way. Thanks again and I will spend you a cup of coffee if we met us in the future

            Hi thomas.beck21956276,

            We do have a patched version of the SSH plugin that can be installed on either 4.14.10 or 4.14.11. Installing the patched plugin should resolve the issue for these versions. We currently do not have patched versions for other releases, but expect new releases for them in the coming days.

            [^bitbucket-ssh-4.14.11-70941489b2fda6.jar]

            Michael Heemskerk (Inactive) added a comment - Hi thomas.beck21956276 , We do have a patched version of the SSH plugin that can be installed on either 4.14.10 or 4.14.11. Installing the patched plugin should resolve the issue for these versions. We currently do not have patched versions for other releases, but expect new releases for them in the coming days. [^bitbucket-ssh-4.14.11-70941489b2fda6.jar]

              mheemskerk Michael Heemskerk (Inactive)
              mheemskerk Michael Heemskerk (Inactive)
              Affected customers:
              0 This affects my team
              Watchers:
              14 Start watching this issue

                Created:
                Updated:
                Resolved: