Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-58217

Synchrony websocket error: No HttpDecoder and no HttpServerCodec in the pipeline

      Issue Summary

      Confluence becomes unresponsive after all http threads get stuck waiting on a response to establish a websockets connection with Synchrony. Synchrony process shows close to max CPU usage and the top output indicates that many of its threads are long running and consuming a lot of CPU.

      Environment

      • Confluence Server 6.15.2
      • Synchrony-proxy running in front of Synchrony

      Notes

      Synchrony logs show the following error:

      {
        "synchrony":{
          "message":"tilt.bayeux.server [error] websocket error",
          "ns":"tilt.bayeux.server",
          "throwable":"java.lang.IllegalStateException: No HttpDecoder and no HttpServerCodec in the pipeline
                  at io.netty.handler.codec.http.websocketx.WebSocketServerHandshaker.handshake(WebSocketServerHandshaker.java:181)
                  at aleph.http.server$initialize_websocket_handler$fn__15817.invoke(server.clj:560)
                  at aleph.http.server$initialize_websocket_handler.invokeStatic(server.clj:559)
                  at aleph.http$websocket_connection.invokeStatic(http.clj:176)
                  at tilt.bayeux.server$handle_websocket$fn__51728.invoke(server.clj:489)
                  at tilt.bayeux.server.(take?)(server.clj:489)
                  at tilt.bayeux.server$handle_websocket$fn__51728.invoke(server.clj:484)
                  at clojure.lang.AFn.run(AFn.java:22)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                  at java.lang.Thread.run(Thread.java:748)",
          "level":"error"
        }
      }
      

      Synchrony-proxy logs also indicate that something has gone wrong during the handshake:

      2019-05-01 04:58:28 ERROR c.a.s.p.w.WebSocketProxy:49 afterConnectionEstablished - Handshake error
      2019-05-01 04:58:29 ERROR c.a.s.p.w.WebSocketProxy:49 afterConnectionEstablished - Handshake error
      2019-05-01 04:58:30 ERROR c.a.s.p.w.WebSocketProxy:49 afterConnectionEstablished - Handshake error
      2019-05-01 04:58:31 ERROR c.a.s.p.w.WebSocketProxy:49 afterConnectionEstablished - Handshake error
      

      catalina.out:

      01-May-2019 04:58:32.362 SEVERE [SimpleAsyncTaskExecutor-1] org.springframework.web.socket.client.WebSocketConnectionManager.onFailure Failed to connect
       javax.websocket.DeploymentException: The HTTP request to initiate the WebSocket connection failed
      	at org.apache.tomcat.websocket.WsWebSocketContainer.connectToServerRecursive(WsWebSocketContainer.java:486)
      	at org.apache.tomcat.websocket.WsWebSocketContainer.connectToServer(WsWebSocketContainer.java:195)
      	at org.springframework.web.socket.client.standard.StandardWebSocketClient$1.call(StandardWebSocketClient.java:152)
      	at org.springframework.web.socket.client.standard.StandardWebSocketClient$1.call(StandardWebSocketClient.java:149)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: java.util.concurrent.TimeoutException
      	at sun.nio.ch.PendingFuture.get(PendingFuture.java:197)
      	at org.apache.tomcat.websocket.WsWebSocketContainer.processResponse(WsWebSocketContainer.java:803)
      	at org.apache.tomcat.websocket.WsWebSocketContainer.connectToServerRecursive(WsWebSocketContainer.java:348)
      	... 5 more
      

      Thread dumps will show most of the HTTP threads waiting for the Synchrony Proxy to get a Synchrony session, as follows:

      at sun.misc.Unsafe.park(Native Method) 
      at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
      at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1695) 
      at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) 
      at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1775) 
      at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) 
      at com.atlassian.synchrony.proxy.websocket.WebSocketUpstreamHandler.getSynchronySession(WebSocketUpstreamHandler.java:64)
      

      Workaround

      Option 1

      Until the bug is fixed, a temporary workaround is to switch Synchrony into "Limited" mode.

      Follow the steps below to perform the workaround (if Confluence is unresponsive, perform a restart before proceeding).

      1. Go to   > General Configuration > Collaborative editing
      2. Choose Change mode.
      3. Select "Limited" and choose Change

       Synchrony can also be changed to "Off" mode, but any shared drafts will become unavailable while in "Off" mode. 

      Option 2

      If you want to keep CE enabled, point your Load Balancer or Reverse Proxy directly to Synchrony and avoid using the internal proxy. More details here.

          Form Name

            [CONFSERVER-58217] Synchrony websocket error: No HttpDecoder and no HttpServerCodec in the pipeline

            zhihe an added a comment -

            same problem in confluence 6.8.1

            zhihe an added a comment - same problem in confluence 6.8.1

            Confluence 6.15.8 server, is also affected 

            smile smile added a comment - Confluence 6.15.8 server, is also affected 

            Sameer V added a comment -

            Confluence 6.13.4 is also affected by this issue.

            Sameer V added a comment - Confluence 6.13.4 is also affected by this issue.

            We have upgraded Aleph in Confluence 7.0, and made a few other changes that might fix this issue. Please, feel free to reopen this ticket if the issue still occurs for you in 7.0.

            Ales Huzik (Inactive) added a comment - We have upgraded Aleph in Confluence 7.0, and made a few other changes that might fix this issue. Please, feel free to reopen this ticket if the issue still occurs for you in 7.0.

            Hi Team,

            May we know if there's a ETA for this?
            We encounter this issue quite frequently these days.
            Our users keep complaining that this prevents them from their daily task on Confluence.

            Also, switching Synchrony into "Limited" mode is not a workaround for us.
            Our users are used to working on one single page with teammates at the same time.

            Please kindly raise the priority of this bug and let us know the ETA for it.
            Thanks.

             

            Stella Chang added a comment - Hi Team, May we know if there's a ETA for this? We encounter this issue quite frequently these days. Our users keep complaining that this prevents them from their daily task on Confluence. Also, switching Synchrony into "Limited" mode is not a workaround for us. Our users are used to working on one single page with teammates at the same time. Please kindly raise the priority of this bug and let us know the ETA for it. Thanks.  

            We are seeing the same issue on our Confluence server (6.15.2).

            Kelvin Huang added a comment - We are seeing the same issue on our Confluence server (6.15.2).

            Carl Ferlatte added a comment - - edited

            We've had daily crashes because of this bug. The load goes high rendering our instance unusable. The only way to bring it back is to restart synchrony. Limited isn't really an option, it breaks the purpose of the collaborative editing. I hope you can fix this bug asap.

             

            I made a simple bash script that runs every minute, gets the PID of synchrony from "confluence/etc/temp/synchrony.pid", checks the CPU load of the PID using "ps", and if above a treshold variable, sends a "kill -HUP <pid>" to restart synchrony, and sends me an email if it restarted it. Works well so far.

            Carl Ferlatte added a comment - - edited We've had daily crashes because of this bug. The load goes high rendering our instance unusable. The only way to bring it back is to restart synchrony. Limited isn't really an option, it breaks the purpose of the collaborative editing. I hope you can fix this bug asap.   I made a simple bash script that runs every minute, gets the PID of synchrony from "confluence/etc/temp/synchrony.pid", checks the CPU load of the PID using "ps", and if above a treshold variable, sends a "kill -HUP <pid>" to restart synchrony, and sends me an email if it restarted it. Works well so far.

            same problem in confluence 6.7.1

            Dheerajkumar Patel added a comment - same problem in confluence 6.7.1

            We have the issue on 6.15.1

            Grant A. Kirkman added a comment - We have the issue on 6.15.1

            same problem in confluence 6.11.2

            Rilwan_Ahmed_NC added a comment - same problem in confluence 6.11.2

              ahuzik Ales Huzik (Inactive)
              vhu@atlassian.com Vivian Hu (Inactive)
              Affected customers:
              18 This affects my team
              Watchers:
              40 Start watching this issue

                Created:
                Updated:
                Resolved: