Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-78321

Confluence node can join the cluster even if not listed as a cluster peer

XMLWordPrintable

      Problem

      When starting a new node that is not listed as a peer on the existing nodes, it is added to the cluster and does not trigger a cluster panic. This behavior contradicts the information from the doc, which leads to believe the list needs to be updated on all nodes:

      If the discovery mode is set to TCP/IP, you’ll need to update the confluence.cluster.peers property in the confluence.cfg.xml file for each node so the file lists all nodes in your cluster:

      Environment

      • Confluence DC clustered
      • 2 or more nodes

      Steps to Reproduce

      1. Create a three-node cluster with all members listed as peers:
        <property name="confluence.cluster.peers">10.232.39.186,10.232.39.128,10.232.39.129</property>
        
      2. Stop all the nodes and remove the IP of one of the servers from the peers list of all members:
        <property name="confluence.cluster.peers">10.232.39.186,10.232.39.128</property>
        
      3. Start the nodes one by one, keeping the removed member as the last one to be started

      Expected Results

      The nodes listed as peers are able to join the cluster. The node removed from the list cannot join the cluster unless the list of peers is updated as described in the documentation.

      Actual Results

      The member excluded from the list is able to join the cluster without issues. From the example above:

      2022-04-07 14:33:48,213 INFO [hz.confluence.event-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] memberAdded memberAdded: Member [10.232.39.129]:5801 - 805735de-b9ca-4b8d-b308-349c8299fba6
      2022-04-07 14:33:48,213 INFO [hz.confluence.event-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] memberAdded memberAdded: cluster contains Member [10.232.39.186]:5801 - 4c87772f-6976-4f8d-bd7b-e2fb062b83c9
      2022-04-07 14:33:48,213 INFO [hz.confluence.event-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] memberAdded memberAdded: cluster contains Member [10.232.39.128]:5801 - ef71849c-bb29-49b0-9cf1-1ba8b734c882 this
      2022-04-07 14:33:48,213 INFO [hz.confluence.event-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] memberAdded memberAdded: cluster contains Member [10.232.39.129]:5801 - 805735de-b9ca-4b8d-b308-349c8299fba6
      2022-04-07 14:33:48,215 INFO [hz.confluence.event-5] [confluence.cluster.hazelcast.LoggingClusterMembershipListener] memberAdded [10.232.39.129]:5801 joined the cluster
      2022-04-07 14:33:48,215 INFO [hz.confluence.event-5] [confluence.cluster.hazelcast.LoggingClusterMembershipListener] logClusterMembers Cluster now has 3 members: [[10.232.39.128]:5801, [10.232.39.129]:5801, [10.232.39.186]:5801]
      2022-04-07 14:33:48,347 INFO [alert-dispatch:thread-1] [atlassian-monitor] log 2022-04-07T14:33:48.219Z Component 'Hazelcast' alerted 'Node joined the cluster' (details: {"member":"Member [10.232.39.129]:5801 - 805735de-b9ca-4b8d-b308-349c8299fba6"}, trigger: {"pluginKey": "not-detected"})
      

      Looking at the TCP IP configuration printed on the logs, we can confirm that member 10.232.39.129 is not on the list:

      configure Configuring Hazelcast with instanceName [confluence], join configuration TCP/IP member addresses: 10.232.39.186|10.232.39.128, network interfaces [10.232.39.128, fe80:0:0:0:c02:9eff:fed7:a817%eth0] and local port 5801
      

      We can also confirm the three members are part of the cluster on the Clustering management page.

      Workaround

      Edit the cluster name on confluence.cfg.xml if you would like to isolate nodes that should not be part of the same cluster:

          <property name="confluence.cluster.name">confcluster</property>
      

      When the cluster name does not match, the node can't join the cluster and a panic is triggered on it, even if it has the list of current members on its peers configuration.

      Notes

      The behavior is only reproducible when starting the node that is not listed a member after the other nodes are already running. Also, that new node needs to have the other members in its own list of peers to find the cluster.

      This actually matches what is described in the Hazelcats documentation since the mechanism is intended for node discovery, not for control or security purposes:

      Note that all of the cluster members don't have to be listed there but at least one of them has to be active in cluster when a new member joins

      We need to update the Confluence document, stating that it is recommended to update the peers list, but not mandatory.

              Unassigned Unassigned
              bandreeti Bernardo Andreeti
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: