Details
-
Bug
-
Resolution: Fixed
-
Medium
-
5.8.8, 5.8.16
-
9
-
Severity 2 - Major
-
Description
The cluster safety mechanism is not working for 5.8.x Server editions of Confluence. This removes a safety net that prevents multiple Confluence instances from inadvertently updating the same database, which could lead to unexpected data corruption.
For example, this can happen when cloning the production environment to create a test environment, and the step to update the database connection to point to the test environment is missed. In this scenario, both the Prod and Test Confluence instances would be updating the same DB.
Symptoms
- The clustersafety table contains no rows (expected behavior: contains 1 row)
- Multiple 5.8.x Server instances can connect to the same Confluence database and not fail (expected behavior: cluster panic)
Other notes
- Confluence 5.7 and below Server and Data Center editions have working cluster safety mechanisms, as expected
- Confluence 5.8.x Data Center edition has a working cluster safety mechanism as expected
- This affects 5.8.1 Server and above
Testing notes:
- Start 2 standalone nodes of confluence, point them to the common DB
- Ensure that cluster safety job is scheduled and running
- Ensure that one node panics when cluster safety job runs
Cluster configuration check:
- Set up CDC locally (2 nodes)
- Ensure that safety job runs and nodes don't panic (you may want to set logging level to debug for com.atlassian.confluence.cluster.hazelcast.HazelcastClusterSafetyManager)
- Emulate network outage between nodes. I used pfctl utility for that:
sudo pfctl -e (sudo pfctl -sr; echo "block drop quick on lo0 proto tcp from any to any port = 5802") | sudo pfctl -f - (sudo pfctl -sr; echo "block drop quick on lo0 proto tcp from any to any port = 5801") | sudo pfctl -f - sudo pfctl -v -s rules
To clean up filtering:
sudo pfctl -f /etc/pf.conf
- one node should panic on when safety job runs.
Attachments
Issue Links
- mentioned in
-
Page Loading...