Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-21986

Database deadlock during initialisation of plugins

    XMLWordPrintable

Details

    Description

      This issue is also described in http://confluence.atlassian.com/display/CONFKB/Unable+to+Start+Confluence+or+Some+Plugins+are+Disabled+upon+Startup

      This issue can affect Confluence installed on SQL Server, Oracle, MySQL but most commonly affects SQL Server.

      Confluence may fail to start up due to database deadlocks in the Bandana table caused by multi-threaded spring context initialisation. What happens seems to be:

      • The main plugin-thread looks up the general state of the plugin system in bandana, thereby locking the whole bandana table
      • That thread spawns separate threads in separate transactions to initialise individual Spring contexts for plugins.
      • Some plugins try to look up their own settings in the bandana table during that initialisation but must wait for the main plugin-thread to release the lock.
      • Eventually the initalisation of the plugin times out like this:
        2011-02-28 14:49:29,366 ERROR [Thread-1] [internal.util.concurrent.RunnableTimedExecution] execute Closing runnable for context NonValidatingOsgiBundleXmlApplicationContext(bundle=com.atlassian.confluence.extra.officeconnector, config=osgibundle:/META-INF/spring/*.xml) did not finish in 10000ms; consider taking a snapshot and then shutdown the VM in case the thread still hangs
        

      This can result in Confluence failing to be usable at all, or can result in just those plugins not working.

      Patch

      We strongly recommend upgrading to Confluence 3.5.9 or newer, which contains a proper fix for this issue. This unsupported patch has only been tested against Confluence 3.3 but should work for other versions up to 3.5.7 as well. If upgrading is not an option, you may follow these steps to patch your version of Confluence:

      First, shut down Confluence.

      Then, alter the Bandana table to do the following:

      • Remove duplicate rows.
      • Remove or update any rows with a null bandanacontext value
      • Remove or update any rows with a null bandanakey value
      • Alter the bandanacontext and bandanakey columns and set them to not null
      • Add a unique constraint to the bandana table on the bandanacontext and bandanakey columns
      • Drop the band_key_idx index
      • Create a new index called band_cont_key_idx on the bandanacontext and bandanakey columns

      If running PostgreSQL, you may run the following:

      -- Find duplicate rows, keep only one of each
      SELECT b1.* 
      FROM BANDANA b1, BANDANA b2 
      WHERE b1.BANDANAID <> b2.BANDANAID AND 
      b1.BANDANACONTEXT = b2.BANDANACONTEXT AND 
      b1.BANDANAKEY = b2.BANDANAKEY 
      ORDER BY b1.BANDANACONTEXT, b1.BANDANAKEY, b1.BANDANAID DESC
      -- Remove null values
      update BANDANA set BANDANACONTEXT = 'CONF22428-NULL-BACKUP' where BANDANACONTEXT is null;
      update BANDANA set BANDANAKEY = 'CONF22428-NULL-BACKUP' where BANDANAKEY is null;
      -- Add not null constraints
      alter table BANDANA alter column BANDANACONTEXT set not null;
      alter table BANDANA alter column BANDANAKEY set not null;
      -- Add unique constraint
      alter table BANDANA add constraint bandana_unique_key unique (BANDANACONTEXT, BANDANAKEY);
      -- Optimise indexes
      drop index band_key_idx;
      create index band_cont_key_idx on BANDANA (BANDANACONTEXT, BANDANAKEY);
      
      Warning

      After altering the bandana table as mentioned above, you may have problems upgrading to Confluence 3.5.9 and 3.5.11. We recommend you upgrade to 3.5.12 or above instead.

      Next, extract the attached patch zip file to your confluence/WEB-INF/classes directory. Ensure that the following files are created:

      • com/atlassian/confluence/setup/bandana/persistence/dao/hibernate/HibernateConfluenceBandanaRecordDao$3.class
      • com/atlassian/confluence/setup/bandana/persistence/dao/hibernate/HibernateConfluenceBandanaRecordDao$2.class
      • com/atlassian/confluence/setup/bandana/persistence/dao/hibernate/HibernateConfluenceBandanaRecordDao.class
      • com/atlassian/confluence/setup/bandana/persistence/dao/hibernate/HibernateConfluenceBandanaRecordDao$1.class
      • com/atlassian/confluence/setup/bandana/ConfluenceCachingBandanaPersister.class
      • com/atlassian/confluence/setup/bandana/ConfluenceBandanaRecord.hbm.xml
      • com/atlassian/confluence/setup/bandana/ConfluenceCachingBandanaPersister$EmptyCachePlaceholder.class

      The patch is now applied and you can start Confluence again.

      Known workaround for SQL Server

      Setting the isolation level of the database as follows is known to fix the issue:

      ALTER DATABASE <database name>
         SET READ_COMMITTED_SNAPSHOT ON
         WITH ROLLBACK IMMEDIATE;
      

      It's possible there is another fix, such as making the plugin system close its transaction at some stage, or adding an index to the bandana table (since a lack of appropriate indices in sql server has given us similar issues before.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              don.willis@atlassian.com Don Willis
              Votes:
              9 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 8h
                  8h
                  Remaining:
                  Remaining Estimate - 8h
                  8h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified