Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-39396

Node rejoining cluster can cause cluster panic. Configure cluster safety cache to flush value on merge.

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Medium Medium
    • 5.8.5
    • 5.6, 5.7
    • None

      The scenario we encountered on EAC is:
      1. A node gets kicked from the cluster
      2. The node then rejoins. IMap states are merged
      3. The merging process gets the cluster safety number out of sync with the database
      4. (boom), cluster panic. Even worse is the fact that this can get the IMap in a bad state where all nodes panic

      To avoid this, on a cluster merge, we should have the cluster safety number map clear the stored entry. That way it will run a fresh cluster safety check without the (potentially) polluted value

      For the same reason this IMap should not have backups.

            [CONFSERVER-39396] Node rejoining cluster can cause cluster panic. Configure cluster safety cache to flush value on merge.

            Since this has been an ongoing issue since May 2015, I feel it should have been made publicly available or at least available to customers who are trying to upgrade or install the Confluence Data Center versions. Jay Virgil and Chuck Talk have worked tirelessly to help us, and withholding this critical detail is not excusable.

            Stephen Gramm added a comment - Since this has been an ongoing issue since May 2015, I feel it should have been made publicly available or at least available to customers who are trying to upgrade or install the Confluence Data Center versions. Jay Virgil and Chuck Talk have worked tirelessly to help us, and withholding this critical detail is not excusable.

              drizzuto David Rizzuto
              alwang Alice Wang (Inactive)
              Affected customers:
              0 This affects my team
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: