[CONFSERVER-58260] Synchrony fails to reestablish connection after a fail over when configured with a clustered database

Type: Bug
Resolution: Unresolved
Priority: Low
Fix Version/s: None
Affects Version/s: 6.15.4
Component/s: Editor - Synchrony
Labels:
- scale-team

Support reference count:
5
Symptom Severity:
Severity 2 - Major
UIS:
2
Bug Fix Policy:
View Atlassian Server bug fix policy

Issue Summary

When running Confluence in a clustered database and performing a failover, Synchrony will fail to reestablish connection to the database upon database engine rejoining.

All Synchrony transaction will fail after fail over is completed.

Environment

Verified on Aurora and Postgres RDS running external Synchrony.

Steps to Reproduce

1. Run Confluence with external Synchrony on a RDS Postgres
2. Open up collaborative editor with two sessions
3. Fail over RDS Postgres
4. Attempt to make changes in collaborative editor and observe Synchrony connection cannot be established

Expected Results

Synchrony automatically establishes connection after fail over.

Actual Results

Synchrony does not automatically establish connection after fail over.

Workaround

Restarting the Synchrony node will force the connection to be reestablished.

mentioned in: Page Failed to load

Form Name

varun added a comment - 15/May/2019 5:26 AM - edited

Connection tuning seems to mitigate this problem. Setting reasonable values for the following properties (or env vars) will help synchrony reconnect to a database after a failover event

Property	Environment Variable
synchrony.database.idle.connection.test.period	SYNCHRONY_DATABASE_IDLE_CONNECTION_TEST_PERIOD
synchrony.database.max.idle.time.excess.connections	SYNCHRONY_DATABASE_MAX_IDLE_TIME_EXCESS_CONNECTIONS
synchrony.database.max.idle.time	SYNCHRONY_DATABASE_MAX_IDLE_TIME
synchrony.database.test.connection.on.checkin	SYNCHRONY_DATABASE_TEST_CONNECTION_ON_CHECKIN

We need to test collaborative editing by setting configurations to different values and failing over and document the recommended settings

varun added a comment - 15/May/2019 5:26 AM - edited Connection tuning seems to mitigate this problem. Setting reasonable values for the following properties (or env vars) will help synchrony reconnect to a database after a failover event Property Environment Variable synchrony.database.idle.connection.test.period SYNCHRONY_DATABASE_IDLE_CONNECTION_TEST_PERIOD synchrony.database.max.idle.time.excess.connections SYNCHRONY_DATABASE_MAX_IDLE_TIME_EXCESS_CONNECTIONS synchrony.database.max.idle.time SYNCHRONY_DATABASE_MAX_IDLE_TIME synchrony.database.test.connection.on.checkin SYNCHRONY_DATABASE_TEST_CONNECTION_ON_CHECKIN We need to test collaborative editing by setting configurations to different values and failing over and document the recommended settings

Adam Brokes added a comment - 10/May/2019 6:22 AM

After the failover was finished, we were able to see 3 connections still using the reader Aurora node and this exception was thrown in the browser console when trying to edit collaboratively.

11:06:50.741 VM219:1251 warn reinit Caught BatchUpdateException for insert into "EVENTS" ("history", "rev", "partition", "sequence", "event") values (?, ?, ?, ?, ?)
Error: Caught BatchUpdateException for insert into "EVENTS" ("history", "rev", "partition", "sequence", "event") values (?, ?, ?, ?, ?)
 at new Xj (eval at <anonymous> (http://confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809-CDN/en_US/7901/5add2ffb254089f9b2b4da47cac4a1fe5d074b7a/1388f45a017d5e2ce90810891801350e/_/download/contextbatch/js/_super/batch.js?locale=en-US:409:128), <anonymous>:489:26)
 at Yj (eval at <anonymous> (http://confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809-CDN/en_US/7901/5add2ffb254089f9b2b4da47cac4a1fe5d074b7a/1388f45a017d5e2ce90810891801350e/_/download/contextbatch/js/_super/batch.js?locale=en-US:409:128), <anonymous>:490:75)
 at eval (eval at <anonymous> (http://confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809-
....

Adam Brokes added a comment - 10/May/2019 6:22 AM After the failover was finished, we were able to see 3 connections still using the reader Aurora node and this exception was thrown in the browser console when trying to edit collaboratively. 11:06:50.741 VM219:1251 warn reinit Caught BatchUpdateException for insert into "EVENTS" ( "history" , "rev" , "partition" , "sequence" , "event" ) values (?, ?, ?, ?, ?) Error: Caught BatchUpdateException for insert into "EVENTS" ( "history" , "rev" , "partition" , "sequence" , "event" ) values (?, ?, ?, ?, ?) at new Xj (eval at <anonymous> (http: //confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809-CDN/en_US/7901/5add2ffb254089f9b2b4da47cac4a1fe5d074b7a/1388f45a017d5e2ce90810891801350e/_/download/contextbatch/js/_super/batch.js?locale=en-US:409:128), <anonymous>:489:26) at Yj (eval at <anonymous> (http: //confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809-CDN/en_US/7901/5add2ffb254089f9b2b4da47cac4a1fe5d074b7a/1388f45a017d5e2ce90810891801350e/_/download/contextbatch/js/_super/batch.js?locale=en-US:409:128), <anonymous>:490:75) at eval (eval at <anonymous> (http: //confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809- ....

Confluence Data Center

Details

Description

Issue Summary

Environment

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Issue Links

Forms

Activity

Collapse comment: varun added a comment - 15/May/2019 5:26 AM, Edited by Ales Huzik - 20/May/2019 1:19 AM

Expand comment: varun added a comment - 15/May/2019 5:26 AM, Edited by Ales Huzik - 20/May/2019 1:19 AM

Collapse comment: Adam Brokes added a comment - 10/May/2019 6:22 AM

Expand comment: Adam Brokes added a comment - 10/May/2019 6:22 AM

People

Dates