Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-58260

Synchrony fails to reestablish connection after a fail over when configured with a clustered database

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Low Low
    • None
    • 6.15.4
    • Editor - Synchrony

      Issue Summary

      When running Confluence in a clustered database and performing a failover, Synchrony will fail to reestablish connection to the database upon database engine rejoining.

      All Synchrony transaction will fail after fail over is completed.

      Environment

      Verified on Aurora and Postgres RDS running external Synchrony.

      Steps to Reproduce

      1. Run Confluence with external Synchrony on a RDS Postgres
      2. Open up collaborative editor with two sessions
      3. Fail over RDS Postgres
      4. Attempt to make changes in collaborative editor and observe Synchrony connection cannot be established

      Expected Results

      Synchrony automatically establishes connection after fail over.

      Actual Results

      Synchrony does not automatically establish connection after fail over.

      Workaround

      Restarting the Synchrony node will force the connection to be reestablished.

          Form Name

            [CONFSERVER-58260] Synchrony fails to reestablish connection after a fail over when configured with a clustered database

            varun added a comment - - edited

            Connection tuning seems to mitigate this problem. Setting reasonable values for the following properties (or env vars) will help synchrony reconnect to a database after a failover event

             

             

            Property Environment Variable
            synchrony.database.idle.connection.test.period SYNCHRONY_DATABASE_IDLE_CONNECTION_TEST_PERIOD
            synchrony.database.max.idle.time.excess.connections SYNCHRONY_DATABASE_MAX_IDLE_TIME_EXCESS_CONNECTIONS
            synchrony.database.max.idle.time SYNCHRONY_DATABASE_MAX_IDLE_TIME
            synchrony.database.test.connection.on.checkin SYNCHRONY_DATABASE_TEST_CONNECTION_ON_CHECKIN

             

             

            We need to test collaborative editing by setting configurations to different values and failing over and document the recommended settings

             

             

            varun added a comment - - edited Connection tuning seems to mitigate this problem. Setting reasonable values for the following properties (or env vars) will help synchrony reconnect to a database after a failover event     Property Environment Variable synchrony.database.idle.connection.test.period SYNCHRONY_DATABASE_IDLE_CONNECTION_TEST_PERIOD synchrony.database.max.idle.time.excess.connections SYNCHRONY_DATABASE_MAX_IDLE_TIME_EXCESS_CONNECTIONS synchrony.database.max.idle.time SYNCHRONY_DATABASE_MAX_IDLE_TIME synchrony.database.test.connection.on.checkin SYNCHRONY_DATABASE_TEST_CONNECTION_ON_CHECKIN     We need to test collaborative editing by setting configurations to different values and failing over and document the recommended settings    

            After the failover was finished, we were able to see 3 connections still using the reader Aurora node and this exception was thrown in the browser console when trying to edit collaboratively.

            11:06:50.741 VM219:1251 warn reinit Caught BatchUpdateException for insert into "EVENTS" ("history", "rev", "partition", "sequence", "event") values (?, ?, ?, ?, ?)
            Error: Caught BatchUpdateException for insert into "EVENTS" ("history", "rev", "partition", "sequence", "event") values (?, ?, ?, ?, ?)
             at new Xj (eval at <anonymous> (http://confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809-CDN/en_US/7901/5add2ffb254089f9b2b4da47cac4a1fe5d074b7a/1388f45a017d5e2ce90810891801350e/_/download/contextbatch/js/_super/batch.js?locale=en-US:409:128), <anonymous>:489:26)
             at Yj (eval at <anonymous> (http://confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809-CDN/en_US/7901/5add2ffb254089f9b2b4da47cac4a1fe5d074b7a/1388f45a017d5e2ce90810891801350e/_/download/contextbatch/js/_super/batch.js?locale=en-US:409:128), <anonymous>:490:75)
             at eval (eval at <anonymous> (http://confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809-
            ....

            Adam Brokes added a comment - After the failover was finished, we were able to see 3 connections still using the reader Aurora node and this exception was thrown in the browser console when trying to edit collaboratively. 11:06:50.741 VM219:1251 warn reinit Caught BatchUpdateException for insert into "EVENTS" ( "history" , "rev" , "partition" , "sequence" , "event" ) values (?, ?, ?, ?, ?) Error: Caught BatchUpdateException for insert into "EVENTS" ( "history" , "rev" , "partition" , "sequence" , "event" ) values (?, ?, ?, ?, ?) at new Xj (eval at <anonymous> (http: //confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809-CDN/en_US/7901/5add2ffb254089f9b2b4da47cac4a1fe5d074b7a/1388f45a017d5e2ce90810891801350e/_/download/contextbatch/js/_super/batch.js?locale=en-US:409:128), <anonymous>:489:26) at Yj (eval at <anonymous> (http: //confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809-CDN/en_US/7901/5add2ffb254089f9b2b4da47cac4a1fe5d074b7a/1388f45a017d5e2ce90810891801350e/_/download/contextbatch/js/_super/batch.js?locale=en-US:409:128), <anonymous>:490:75) at eval (eval at <anonymous> (http: //confl-loadb-3z05bq85m17r-1494529760.us-west-2.elb.amazonaws.com/s/2a828c72c2d24752e89e028afd5fe809- ....

              Unassigned Unassigned
              smadi Said
              Affected customers:
              1 This affects my team
              Watchers:
              5 Start watching this issue

                Created:
                Updated: