Confluence Data Center cluster panics on new install when adding a second node

XMLWordPrintable

    • 14
    • Severity 3 - Minor
    • 32

      Issue Summary

      When deploying a new instance of a clustered Confluence Data Center with TCP/IP discovery, the second node startup cannot join the Cluster and results with one of the nodes encountering a Cluster panic.

      This is reproducible on Data Center: Yes

      Steps to Reproduce

      1. Install Confluence Data Center node with Clustered TCP/IP discover
        • Specify the two IP addresses of both the the nodes on the Clustering setup screen
      2. Complete the installation on this first node
      3. Check that it is working as expected and also check Confluence Administration » General Configuration » Clustering screen looks good
      4. Deploy a second Confluence Data Center node, copying across the confluence.cfg.xml from Node 1
      5. Start the second node

      Expected Results

      The second node should seamlessly join the existing Confluence DC Cluster.

      Actual Results

      The second node is unable to join the Confluence DC Cluster with the following shown in the logs:

      Node 1 (192.168.56.30)

      atlassian-confluence.log / com.hazelcast DEBUG
      2022-11-08 04:49:50,439 DEBUG [hz.confluence.IO.thread-Acceptor] [hazelcast.nio.tcp.TcpIpAcceptor] log [192.168.56.30]:5801 [testcluster] [3.12.11] Accepting socket connection from /192.168.56.50:58123
      2022-11-08 04:49:50,461 DEBUG [hz.confluence.cached.thread-4] [hazelcast.nio.tcp.TcpIpEndpointManager] log [192.168.56.30]:5801 [testcluster] [3.12.11] Established socket connection between /192.168.56.30:5801 and /192.168.56.50:58123
      2022-11-08 04:49:50,472 WARN [hz.confluence.IO.thread-in-2] [hazelcast.nio.tcp.TcpIpConnection] log [192.168.56.30]:5801 [testcluster] [3.12.11] Connection[id=1, /192.168.56.30:5801->/192.168.56.50:58123, qualifier=null, endpoint=null, alive=false, type=NONE] closed. Reason: Exception in Connection[id=1, /192.168.56.30:5801->/192.168.56.50:58123, qualifier=null, endpoint=null, alive=true, type=NONE], thread=hz.confluence.IO.thread-in-2
       -- referer: http://vm30:8090/setup/setupcluster-start.action | url: /setup/setupcluster.action | traceId: 0072a0e2e77a368c | userName: anonymous | action: setupcluster
      java.lang.IllegalStateException: Unknown protocol: 
      	at com.hazelcast.nio.tcp.UnifiedProtocolDecoder.onRead(UnifiedProtocolDecoder.java:107)
      	at com.hazelcast.internal.networking.nio.NioInboundPipeline.process(NioInboundPipeline.java:135)
      	at com.hazelcast.internal.networking.nio.NioThread.processSelectionKey(NioThread.java:369)
      	at com.hazelcast.internal.networking.nio.NioThread.processSelectionKeys(NioThread.java:354)
      	at com.hazelcast.internal.networking.nio.NioThread.selectLoop(NioThread.java:280)
      	at com.hazelcast.internal.networking.nio.NioThread.run(NioThread.java:235)
      

      shortly later:

      ...
      2022-11-08 04:51:01,077 DEBUG [hz.confluence.event-5] [hazelcast.internal.cluster.ClusterService] log [192.168.56.30]:5801 [testcluster] [3.12.11] Removed connection to [192.168.56.50]:5801
      2022-11-08 04:51:04,490 WARN [Caesium-1-2] [confluence.cluster.hazelcast.HazelcastClusterSafetyManager] onNumbersAreDifferent Detected different number in database [ 13132033 ] and cache [ 2249473 ]. Cache number last set by [ 545ee60c ]
       -- referer: http://vm30:8090/setup/setupcluster-start.action | url: /setup/setupcluster.action | traceId: 0072a0e2e77a368c | userName: anonymous | action: setupcluster
      2022-11-08 04:51:04,491 WARN [Caesium-1-2] [confluence.cluster.hazelcast.HazelcastClusterSafetyManager] onNumbersAreDifferent Triggering panic on current node. This cluster size [ 1 ], cluster size from db: [ 1 ]
      ...
      

      Node 2 (192.168.56.50)

      2022-11-08 04:49:48,166 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.19.2 [build 8804 based on commit hash 4a8abee6f65948bdad19f90ec8049c38342c2851] - synchrony version 5.0.0
      2022-11-08 04:49:49,329 INFO [Catalina-utility-1] [atlassian.confluence.cluster.DefaultClusterConfigurationHelper] lambda$populateExistingClusterSetupConfig$5 Populating setup configuration if running with Cluster mode...
      2022-11-08 04:49:49,648 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] configure Configuring Hazelcast with instanceName [confluence], join configuration TCP/IP member addresses: 192.168.56.30|192.168.56.50, network interfaces [192.168.56.50, fe80:0:0:0:20c:29ff:fe1b:7c72%ens192] and local port 5801
      2022-11-08 04:49:49,649 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Starting the cluster.
      2022-11-08 04:49:49,653 WARN [Catalina-utility-1] [com.hazelcast.instance.HazelcastInstanceFactory] log Hazelcast is starting in a Java modular environment (Java 9 and newer) but without proper access to required Java packages. Use additional Java arguments to provide Hazelcast access to Java internal API. The internal API access is used to get the best performance results. Arguments to be used:
       --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED
      2022-11-08 04:49:49,902 WARN [Catalina-utility-1] [com.hazelcast.nio.UnsafeHelper] log sun.misc.Unsafe has been disabled because your platform does not support unaligned access to memory, some features might be not available.
      2022-11-08 04:49:50,480 INFO [hz.confluence.event-5] [confluence.cluster.hazelcast.LoggingClusterMembershipListener] memberAdded [192.168.56.50]:5801 joined the cluster
      2022-11-08 04:49:50,484 INFO [hz.confluence.event-5] [confluence.cluster.hazelcast.LoggingClusterMembershipListener] logClusterMembers Cluster now has 1 members: [[192.168.56.50]:5801]
      2022-11-08 04:49:50,538 WARN [hz.confluence.cached.thread-3] [hazelcast.interceptor.authenticator.DefaultClusterJoinManager] checkNodeAuthenticationEnabled CONNECT(192.168.56.50:58123 -> 192.168.56.30:5801): Node authentication failed: Unexpected bytes from remote node, closing socket 
      2022-11-08 04:49:51,560 WARN [Catalina-utility-1] [atlassian.hazelcast.micrometer.JmxBinder] bind No objects found for pattern com.hazelcast:type=HazelcastInstance.ConnectionManager,*
      2022-11-08 04:49:51,560 WARN [Catalina-utility-1] [atlassian.hazelcast.micrometer.JmxBinder] bind No objects found for pattern com.hazelcast:type=HazelcastInstance.EventService,*
      2022-11-08 04:49:51,560 WARN [Catalina-utility-1] [atlassian.hazelcast.micrometer.JmxBinder] bind No objects found for pattern com.hazelcast:type=HazelcastInstance.OperationService,*
      2022-11-08 04:49:51,561 WARN [Catalina-utility-1] [atlassian.hazelcast.micrometer.JmxBinder] bind No objects found for pattern com.hazelcast:type=HazelcastInstance.PartitionServiceMBean,*
      2022-11-08 04:49:51,561 WARN [Catalina-utility-1] [atlassian.hazelcast.micrometer.JmxBinder] bind No objects found for pattern com.hazelcast:type=HazelcastInstance.ManagedExecutorService,*
      2022-11-08 04:49:51,561 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Confluence cluster node identifier is [545ee878]
      2022-11-08 04:49:51,594 INFO [Catalina-utility-1] [springframework.web.context.ContextLoader] initWebApplicationContext Root WebApplicationContext: initialization started
      2022-11-08 04:49:53,136 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] <init> Loading EhCache cache manager
      2022-11-08 04:49:58,300 INFO [Catalina-utility-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] init init: cluster ClusterService{address=[192.168.56.50]:5801}
      2022-11-08 04:49:58,300 INFO [Catalina-utility-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] init init: cluster contains Member [192.168.56.50]:5801 - 4400fea0-6163-4272-9eb5-91b3b420772c this
      ...
      ..
      2022-11-08 04:50:43,774 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] init Confluence is ready to serve
      2022-11-08 04:51:01,126 WARN [hz.confluence.cached.thread-5] [hazelcast.interceptor.authenticator.ParanoidObjectDataInputStream] readUTF Rejecting request to read 1213874948 UTF characters
      2022-11-08 04:51:01,127 WARN [hz.confluence.cached.thread-5] [hazelcast.interceptor.authenticator.DefaultClusterJoinManager] checkNodeAuthenticationEnabled ACCEPT(192.168.56.50:5801 <- 192.168.56.30:57215): Node authentication failed: Unexpected bytes from remote node, closing socket 
      2022-11-08 04:51:01,128 WARN [hz.confluence.cached.thread-5] [hazelcast.nio.tcp.TcpIpAcceptor] log [192.168.56.50]:5801 [testcluster] [3.12.11] com.atlassian.confluence.impl.cluster.hazelcast.interceptor.authenticator.NodeConnectionException: Unexpected bytes from remote node, closing socket
      com.atlassian.confluence.impl.cluster.hazelcast.interceptor.authenticator.NodeConnectionException: Unexpected bytes from remote node, closing socket
      	at com.atlassian.confluence.impl.cluster.hazelcast.interceptor.authenticator.DefaultClusterJoinManager.checkNodeAuthenticationEnabled(DefaultClusterJoinManager.java:68)
      	at com.atlassian.confluence.impl.cluster.hazelcast.interceptor.authenticator.DefaultClusterJoinManager.accept(DefaultClusterJoinManager.java:47)
      	at com.atlassian.confluence.impl.cluster.hazelcast.interceptor.ClusterJoinSocketInterceptor.onAccept(ClusterJoinSocketInterceptor.java:49)
      	at com.hazelcast.nio.NodeIOService.interceptSocket(NodeIOService.java:300)
      	at com.hazelcast.nio.tcp.TcpIpAcceptor$AcceptorIOThread.configureAndAssignSocket(TcpIpAcceptor.java:316)
      	at com.hazelcast.nio.tcp.TcpIpAcceptor$AcceptorIOThread.access$1400(TcpIpAcceptor.java:138)
      	at com.hazelcast.nio.tcp.TcpIpAcceptor$AcceptorIOThread$1.run(TcpIpAcceptor.java:305)
      	at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      	at java.base/java.lang.Thread.run(Thread.java:829)
      	at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64)
      	at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80)
      
      • Node 2 also doesn't appear on Node 1's Clustering screen

      Workaround

      A restart of Confluence DC Node 1 after the initial install has completed is required before starting up node 2. i.e.

      1. Deploy Confluence DC Node 1
      2. Once it is up and running, shut it down, start it up again
      3. Then start Confluence DC Node 2

            Assignee:
            Arun Antony
            Reporter:
            Eric L
            Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: