Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-66450

Two Synchrony instances started during Synchrony restart

    XMLWordPrintable

Details

    Description

      Issue Summary

      When Synchrony is restarted it could be that two new instances of Synchrony will be started. First one will be started by the code that performs restart. Second one will be started by code that is triggered when Synchrony process has stopped. This is a raise condition and all depends from timings. We have been able to reproduce this problem on Windows, but it could appear on Linux as well. It became more noticeable after fix for https://jira.atlassian.com/browse/CONFSERVER-60014 was added.

      Steps to Reproduce

      1. Get id of running Synchrony process (pid)
      2. Open Confluence Administration -> General Configuration -> Collaborative editing
      3. Press "Restart Synchrony" button and press "Restart" in dialog box
      4. Check atlassian-confluence.log file and find records about starting synchrony from two different threads at almost the same time
        2021-06-29 01:34:17,528 DEBUG [synchrony-interop-executor:thread-1] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] logAndExtract Starting Synchrony process
        2021-06-29 01:34:17,529 DEBUG [synchrony-interop-executor:thread-2] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] logAndExtract Starting Synchrony process
        
      5. Check that there are more than one synchrony process created by Confluence and their pids are different from those one that we have seen at step 1

      Expected Results

      One Synchrony instance is started by code which performs restart

      Actual Results

      Two Synchrony instances started and collaborative editing stops working properly.

      atlassian-confluence.log file contains records about starting synchrony from two different threads at almost the same time:

      2021-06-29 01:34:17,528 DEBUG [synchrony-interop-executor:thread-1] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] logAndExtract Starting Synchrony process
      2021-06-29 01:34:17,529 DEBUG [synchrony-interop-executor:thread-2] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] logAndExtract Starting Synchrony process
      

      The below exception is thrown in the atlassian-synchrony.log file:

      2021-06-25 00:50:12,846 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 2021-06-25 00:50:12,846 ERROR [main] [synchrony.logging] uncaught exception {:throwable #error {
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	 :cause "Address already in use: bind"
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	 :via
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	 [{:type clojure.lang.ExceptionInfo
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	   :message "error while starting state #'synchrony.server.http-server/http-server"
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	   :data {:var #'synchrony.server.http-server/http-server}
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	   :at [mount.lite$start$fn__24142 invoke "lite.clj" 153]}
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  {:type java.net.BindException
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	   :message "Address already in use: bind"
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	   :at [sun.nio.ch.Net bind0 nil -2]}]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	 :trace
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	 [[sun.nio.ch.Net bind0 nil -2]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [sun.nio.ch.Net bind nil -1]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [sun.nio.ch.Net bind nil -1]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [sun.nio.ch.ServerSocketChannelImpl bind nil -1]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.channel.socket.nio.NioServerSocketChannel doBind "NioServerSocketChannel.java" 134]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.channel.AbstractChannel$AbstractUnsafe bind "AbstractChannel.java" 550]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.channel.DefaultChannelPipeline$HeadContext bind "DefaultChannelPipeline.java" 1334]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.channel.AbstractChannelHandlerContext invokeBind "AbstractChannelHandlerContext.java" 506]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.channel.AbstractChannelHandlerContext bind "AbstractChannelHandlerContext.java" 491]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.channel.DefaultChannelPipeline bind "DefaultChannelPipeline.java" 973]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.channel.AbstractChannel bind "AbstractChannel.java" 248]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.bootstrap.AbstractBootstrap$2 run "AbstractBootstrap.java" 356]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.util.concurrent.AbstractEventExecutor safeExecute "AbstractEventExecutor.java" 164]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.util.concurrent.SingleThreadEventExecutor runAllTasks "SingleThreadEventExecutor.java" 472]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.channel.nio.NioEventLoop run "NioEventLoop.java" 500]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.util.concurrent.SingleThreadEventExecutor$4 run "SingleThreadEventExecutor.java" 989]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [io.netty.util.internal.ThreadExecutorMap$2 run "ThreadExecutorMap.java" 74]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [manifold.executor$thread_factory$reify__14792$f__14793 invoke "executor.clj" 47]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [clojure.lang.AFn run "AFn.java" 22]
      2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 	  [java.lang.Thread run nil -1]]}}
      

      Workaround

      Find process id (pid) of all synchrony processes and kill them. It's a big chance that new instance will start after that and error message will disappear from Collaborative editing page. If error message haven't disappeared, then check that there are no new synchrony processes present (kill them as well if they are present) and then press Restart Synchrony button.

      How to find and kill synchrony PID on Linux

      1. Execute next command to find Synchrony PID
        ps -ef | grep "synchrony.core"
        

        For e.g in this case Synchrony PID is 79585

          501 79585 66133   0  1:14pm ??         0:26.55 /Users/itiapchenko/.sdkman/candidates/java/8.0.252.hs-adpt/jre/bin/java -classpath /usr/local/Cellar/tomcat/9.0.36/libexec/temp/4.0.0-master-f680ea21.jar:/Users/itiapchenko/Workspace/confluence-server-master/confluence-core/confluence-webapp/target/confluence/WEB-INF/lib/postgresql-42.2.18.jar -Xss2048k -Xmx2g *synchrony.core* sql
        
      2. Execute next command to kill synchrony (replace SYNCHRONY_PID with Synchrony PID that you have found on previous step)
        kill -9 SYNCHRONY_PID
        

      How to find and kill synchrony PID on Windows

      1. Execute the following command against your command prompt (If your instance is running on a port other than 8090, substitute 8090 for your port number). This command will tell you the PID of your Confluence instance.
        netstat -ona | findstr ":8090"
        

        For e.g in this case Confluence PID was 1908:

        C:\Users\Administrator>netstat -ona | findstr ":8090"
          TCP    0.0.0.0:8090           0.0.0.0:0              LISTENING       1908
          TCP    10.9.31.115:8090       10.9.30.89:37640       ESTABLISHED     1908
          TCP    10.9.31.115:8090       10.9.30.89:37660       TIME_WAIT       0
          TCP    10.9.31.115:8090       10.9.30.89:37702       TIME_WAIT       0
          TCP    10.9.31.115:8090       10.9.30.89:37708       ESTABLISHED     1908
          TCP    10.9.31.115:8090       10.9.30.89:37716       TIME_WAIT       0
          TCP    10.9.31.115:8090       10.9.30.89:37730       TIME_WAIT       0
          TCP    10.9.31.115:8090       10.9.30.89:37748       TIME_WAIT       0
          TCP    10.9.31.115:8090       10.9.30.89:37762       TIME_WAIT       0
          ...
          TCP    [::]:8090              [::]:0                 LISTENING       1908
        
      2. Use this PID in the command below (replace PID with confluence pid).
        wmic process get processid,parentprocessid,executablepath|find "PID"
        
      3. From the result of the last command get PID of processes whose name ends with java.exe and use next command to kill them (replace SYNCHRONY_PID with Synchrony PID):
        taskkill /f /pid SYNCHRONY_PID
        

        For e.g. in this case Synchrony PID was 6864

        C:\Users\Administrator>wmic process get processid,parentprocessid,executablepath|find "1908"
        C:\Atlassian\Confluence\atlassian-confluence-7.12.3\bin\Tomcat9.exe    776              1908
        C:\Windows\system32\conhost.exe    1908             1704
        C:\Java\adopt\jdk8u202-b08\jre\bin\java.exe    1908             6864
        

      Attachments

        Issue Links

          Activity

            People

              03cb0c04aa4f Irina Tiapchenko
              03cb0c04aa4f Irina Tiapchenko
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: