-
Bug
-
Resolution: Fixed
-
High
-
7.4.10, 7.12.3, 7.13.0
-
Severity 2 - Major
-
37
-
Issue Summary
When Synchrony is restarted it could be that two new instances of Synchrony will be started. First one will be started by the code that performs restart. Second one will be started by code that is triggered when Synchrony process has stopped. This is a raise condition and all depends from timings. We have been able to reproduce this problem on Windows, but it could appear on Linux as well. It became more noticeable after fix for https://jira.atlassian.com/browse/CONFSERVER-60014 was added.
Steps to Reproduce
- Get id of running Synchrony process (pid)
- Open Confluence Administration -> General Configuration -> Collaborative editing
- Press "Restart Synchrony" button and press "Restart" in dialog box
- Check atlassian-confluence.log file and find records about starting synchrony from two different threads at almost the same time
2021-06-29 01:34:17,528 DEBUG [synchrony-interop-executor:thread-1] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] logAndExtract Starting Synchrony process 2021-06-29 01:34:17,529 DEBUG [synchrony-interop-executor:thread-2] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] logAndExtract Starting Synchrony process
- Check that there are more than one synchrony process created by Confluence and their pids are different from those one that we have seen at step 1
Expected Results
One Synchrony instance is started by code which performs restart
Actual Results
Two Synchrony instances started and collaborative editing stops working properly.
atlassian-confluence.log file contains records about starting synchrony from two different threads at almost the same time:
2021-06-29 01:34:17,528 DEBUG [synchrony-interop-executor:thread-1] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] logAndExtract Starting Synchrony process 2021-06-29 01:34:17,529 DEBUG [synchrony-interop-executor:thread-2] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] logAndExtract Starting Synchrony process
The below exception is thrown in the atlassian-synchrony.log file:
2021-06-25 00:50:12,846 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] 2021-06-25 00:50:12,846 ERROR [main] [synchrony.logging] uncaught exception {:throwable #error { 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] :cause "Address already in use: bind" 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] :via 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [{:type clojure.lang.ExceptionInfo 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] :message "error while starting state #'synchrony.server.http-server/http-server" 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] :data {:var #'synchrony.server.http-server/http-server} 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] :at [mount.lite$start$fn__24142 invoke "lite.clj" 153]} 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] {:type java.net.BindException 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] :message "Address already in use: bind" 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] :at [sun.nio.ch.Net bind0 nil -2]}] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] :trace 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [[sun.nio.ch.Net bind0 nil -2] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [sun.nio.ch.Net bind nil -1] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [sun.nio.ch.Net bind nil -1] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [sun.nio.ch.ServerSocketChannelImpl bind nil -1] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.channel.socket.nio.NioServerSocketChannel doBind "NioServerSocketChannel.java" 134] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.channel.AbstractChannel$AbstractUnsafe bind "AbstractChannel.java" 550] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.channel.DefaultChannelPipeline$HeadContext bind "DefaultChannelPipeline.java" 1334] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.channel.AbstractChannelHandlerContext invokeBind "AbstractChannelHandlerContext.java" 506] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.channel.AbstractChannelHandlerContext bind "AbstractChannelHandlerContext.java" 491] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.channel.DefaultChannelPipeline bind "DefaultChannelPipeline.java" 973] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.channel.AbstractChannel bind "AbstractChannel.java" 248] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.bootstrap.AbstractBootstrap$2 run "AbstractBootstrap.java" 356] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.util.concurrent.AbstractEventExecutor safeExecute "AbstractEventExecutor.java" 164] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.util.concurrent.SingleThreadEventExecutor runAllTasks "SingleThreadEventExecutor.java" 472] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.channel.nio.NioEventLoop run "NioEventLoop.java" 500] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.util.concurrent.SingleThreadEventExecutor$4 run "SingleThreadEventExecutor.java" 989] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [io.netty.util.internal.ThreadExecutorMap$2 run "ThreadExecutorMap.java" 74] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [manifold.executor$thread_factory$reify__14792$f__14793 invoke "executor.clj" 47] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [clojure.lang.AFn run "AFn.java" 22] 2021-06-25 00:50:12,847 DEBUG [691:StdOutHandler [C:\Ganesh\jre\bin\java.exe]] [java.lang.Thread run nil -1]]}}
Workaround
Find process id (pid) of all synchrony processes and kill them. It's a big chance that new instance will start after that and error message will disappear from Collaborative editing page. If error message haven't disappeared, then check that there are no new synchrony processes present (kill them as well if they are present) and then press Restart Synchrony button.
How to find and kill synchrony PID on Linux
- Execute next command to find Synchrony PID
ps -ef | grep "synchrony.core"
For e.g in this case Synchrony PID is 79585
501 79585 66133 0 1:14pm ?? 0:26.55 /Users/itiapchenko/.sdkman/candidates/java/8.0.252.hs-adpt/jre/bin/java -classpath /usr/local/Cellar/tomcat/9.0.36/libexec/temp/4.0.0-master-f680ea21.jar:/Users/itiapchenko/Workspace/confluence-server-master/confluence-core/confluence-webapp/target/confluence/WEB-INF/lib/postgresql-42.2.18.jar -Xss2048k -Xmx2g *synchrony.core* sql
- Execute next command to kill synchrony (replace SYNCHRONY_PID with Synchrony PID that you have found on previous step)
kill -9 SYNCHRONY_PID
How to find and kill synchrony PID on Windows
- Execute the following command against your command prompt (If your instance is running on a port other than 8090, substitute 8090 for your port number). This command will tell you the PID of your Confluence instance.
netstat -ona | findstr ":8090"
For e.g in this case Confluence PID was 1908:
C:\Users\Administrator>netstat -ona | findstr ":8090" TCP 0.0.0.0:8090 0.0.0.0:0 LISTENING 1908 TCP 10.9.31.115:8090 10.9.30.89:37640 ESTABLISHED 1908 TCP 10.9.31.115:8090 10.9.30.89:37660 TIME_WAIT 0 TCP 10.9.31.115:8090 10.9.30.89:37702 TIME_WAIT 0 TCP 10.9.31.115:8090 10.9.30.89:37708 ESTABLISHED 1908 TCP 10.9.31.115:8090 10.9.30.89:37716 TIME_WAIT 0 TCP 10.9.31.115:8090 10.9.30.89:37730 TIME_WAIT 0 TCP 10.9.31.115:8090 10.9.30.89:37748 TIME_WAIT 0 TCP 10.9.31.115:8090 10.9.30.89:37762 TIME_WAIT 0 ... TCP [::]:8090 [::]:0 LISTENING 1908
- Use this PID in the command below (replace PID with confluence pid).
wmic process get processid,parentprocessid,executablepath|find "PID"
- From the result of the last command get PID of processes whose name ends with java.exe and use next command to kill them (replace SYNCHRONY_PID with Synchrony PID):
taskkill /f /pid SYNCHRONY_PID
For e.g. in this case Synchrony PID was 6864
C:\Users\Administrator>wmic process get processid,parentprocessid,executablepath|find "1908" C:\Atlassian\Confluence\atlassian-confluence-7.12.3\bin\Tomcat9.exe 776 1908 C:\Windows\system32\conhost.exe 1908 1704 C:\Java\adopt\jdk8u202-b08\jre\bin\java.exe 1908 6864
Form Name |
---|
A fix for this issue is available to Server and Data Center customers in Confluence 7.13.0
Upgrade now or check out the Release Notes to see what other issues are resolved.