Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-60142

Changing distributed cache settings prevents Confluence cluster node restart due to a Hazelcast exception

    XMLWordPrintable

Details

    Description

      Issue Summary

      If distributed cache sizes are modified, cluster nodes cannot be restarted without shutting down the whole cluster first. This is due to a Hazelcast exception that occurs due to the detected difference between the cache sizes.

      This only affects Confluence 7.0.1 Data Center onwards.

      Local or hybrid cache size changes do not trigger this issue, only caches marked in the advanced view as Distributed.

      Steps to Reproduce

      1. Start with all nodes in the cluster running
      2. Modify one of the distributed caches:
        • Confluence Administration >> General Configuration >> Cache Management >> Show Advanced View -> Find 'Content Properties' -> Change size from 10000 to 10050.
      3. Shutdown Confluence on one of the nodes
      4. Start Confluence on the node that was shutdown. When starting, it will attempt to configure the distributed cache with the new size of 10050 (specified in cache-settings-overrides.properties in the shared home directory) however this won't match the existing size that the other nodes are using (10000) and the error below will occur. 

      Expected Results

      Like pre-7.0.1 versions (Hazelcast 3.8.6 and below), Confluence cluster nodes should start with no errors after cache tuning.

      Actual Results

      From 7.0.1 onwards (Hazelcast 3.11.4 and 3.11.6), after distributed cache sizes are modified, cluster nodes will not restart.  The below exception is thrown in the atlassian-confluence.log file:

      com.hazelcast.config.ConfigurationException: Cannot add a dynamic configuration 'MapConfig{name='atlassian-cache.Cache.com.atlassian.confluence.user.ConfluenceUserPropertySetFactory.propertysets', inMemoryFormat=BINARY', backupCount=0, asyncBackupCount=0, timeToLiveSeconds=3600, maxIdleSeconds=3600, evictionPolicy='LFU', mapEvictionPolicy='null', evictionPercentage=25, minEvictionCheckMillis=100, maxSizeConfig=MaxSizeConfig{maxSizePolicy='PER_NODE', size=10050}, readBackupData=false, hotRestart=HotRestartConfig{enabled=false, fsync=false}, nearCacheConfig=NearCacheConfig{name=default, inMemoryFormat=OBJECT, invalidateOnChange=true, timeToLiveSeconds=3600, maxIdleSeconds=3600, maxSize=10050, evictionPolicy='LFU', evictionConfig=EvictionConfig{size=10050, maxSizePolicy=ENTRY_COUNT, evictionPolicy=LFU, comparatorClassName=null, comparator=null}, cacheLocalEntries=true, localUpdatePolicy=INVALIDATE, preloaderConfig=NearCachePreloaderConfig{enabled=false, directory=, storeInitialDelaySeconds=600, storeIntervalSeconds=600}}, mapStoreConfig=MapStoreConfig{enabled=false, className='null', factoryClassName='null', writeDelaySeconds=0, writeBatchSize=1, implementation=null, factoryImplementation=null, properties={}, initialLoadMode=LAZY, writeCoalescing=true}, mergePolicyConfig=MergePolicyConfig{policy='com.atlassian.confluence.cluster.hazelcast.AlwaysNullMapMergePolicy', batchSize=100}, wanReplicationRef=null, entryListenerConfigs=[], mapIndexConfigs=[], mapAttributeConfigs=[], quorumName=null, queryCacheConfigs=[], cacheDeserializedValues=INDEX_ONLY}' as there is already a conflicting configuration 'MapConfig{name='atlassian-cache.Cache.com.atlassian.confluence.user.ConfluenceUserPropertySetFactory.propertysets', inMemoryFormat=BINARY', backupCount=0, asyncBackupCount=0, timeToLiveSeconds=3600, maxIdleSeconds=3600, evictionPolicy='LFU', mapEvictionPolicy='null', evictionPercentage=25, minEvictionCheckMillis=100, maxSizeConfig=MaxSizeConfig{maxSizePolicy='PER_NODE', size=10000}, readBackupData=false, hotRestart=HotRestartConfig{enabled=false, fsync=false}, nearCacheConfig=NearCacheConfig{name=default, inMemoryFormat=OBJECT, invalidateOnChange=true, timeToLiveSeconds=3600, maxIdleSeconds=3600, maxSize=10000, evictionPolicy='LFU', evictionConfig=EvictionConfig{size=10000, maxSizePolicy=ENTRY_COUNT, evictionPolicy=LFU, comparatorClassName=null, comparator=null}, cacheLocalEntries=true, localUpdatePolicy=INVALIDATE, preloaderConfig=NearCachePreloaderConfig{enabled=false, directory=, storeInitialDelaySeconds=600, storeIntervalSeconds=600}}, mapStoreConfig=MapStoreConfig{enabled=false, className='null', factoryClassName='null', writeDelaySeconds=0, writeBatchSize=1, implementation=null, factoryImplementation=null, properties={}, initialLoadMode=LAZY, writeCoalescing=true}, mergePolicyConfig=MergePolicyConfig{policy='com.atlassian.confluence.cluster.hazelcast.AlwaysNullMapMergePolicy', batchSize=100}, wanReplicationRef=null, entryListenerConfigs=[], mapIndexConfigs=[], mapAttributeConfigs=[], quorumName=null, queryCacheConfigs=[], cacheDeserializedValues=INDEX_ONLY}'
      

      The actual 'name' in MapConfig will differ, depending on which cache value was adjusted.

      Workaround

      Please use one of the following workarounds to temporarily fix the problem:

      1. Change the modified cache setting back to the previous value
        or
      2. Shutdown all cluster nodes, then restart them one by one.
      3. Adding the JVM parameter '-Dhazelcast.dynamicconfig.ignore.conflicts=true' to all cluster nodes's startup command (in setenv.sh/bat).

      Attachments

        Issue Links

          Activity

            People

              ttranminh Tam Tran
              mninnes@atlassian.com Malcolm Ninnes
              Votes:
              4 Vote for this issue
              Watchers:
              31 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: