Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-58181

Confluence cluster can become unresponsive when hazelcast must return information related to many labels

XMLWordPrintable

      Issue Summary

      Confluence can become unresponsive, waiting on hazelcast to return information relating to labels. This issue is only seen when using the Midori Better Archiving plugin on a Confluence cluster with ~70000 users.

      Environment

      Confluence 6.6.9 Data Center with 4 nodes
      Midori Better Archiving plugin v 8.1.0

      Steps to Reproduce

      The steps to reproduce are difficult to articulate, as the issue occurs while Confluence is under normal usage, albeit with a large number of users. jxie and vhu@atlassian.com have both looked at this previously, and they determined that when the issue is occurring, Hazelcast appears to be overwhelmed. It is believed that the plugin is overwhelming hazelcast with additional requests to the labels cache. Confluence already does a labels fetch on page load (as indicated by the threads similarly waiting on the hazelcast cache for labels information after the plugin had been disabled), but the midori plugin is at least doubling the number of requests to the cache for the same information by checking labels for every web item that might be displayed (the list provided to getDisplayableItems).

      It does seem like threads are being processed (moving from WAITING to RUNNABLE), but it's happening quite slowly, which could indicate Hazelcast is backed up or that some nodes holding label cache data are responsive while others aren't.

      Expected Results

      Confluence should remain stable while the Midori Better Archiving plugin is enabled, even when Confluence is under high load

      Actual Results

      Here are some sample threads that show the behavior we see when the issue is occurring:

      "http-nio-8090-exec-216 url:/display/EXAM/Page+Name username:charlie url:/pages/viewpage.action username:charlie" #84425 daemon prio=5 tid=0x00007effcc04a000 nid=0x1efee waiting on condition [0x00007efd98398000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
      	at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:153)
      	at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:376)
      	at com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:307)
      	at com.hazelcast.map.impl.proxy.NearCachedMapProxyImpl.getInternal(NearCachedMapProxyImpl.java:101)
      	at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:112)
      	at com.atlassian.cache.hazelcast.HazelcastCache.getOrLoad(HazelcastCache.java:243)
      	at com.atlassian.cache.hazelcast.HazelcastCache.get(HazelcastCache.java:95)
      	at com.atlassian.confluence.cache.DefaultConfluenceCache.get(DefaultConfluenceCache.java:47)
      	at com.atlassian.confluence.cache.ConfluenceMonitoringCache.get(ConfluenceMonitoringCache.java:45)
      	at com.atlassian.confluence.cache.DefaultConfluenceCache.get(DefaultConfluenceCache.java:47)
      	at com.atlassian.confluence.cache.ConfluenceMonitoringCache.get(ConfluenceMonitoringCache.java:45)
      	at com.atlassian.confluence.cache.hazelcast.hibernate.HazelcastHibernateCache.get(HazelcastHibernateCache.java:32)
      	at com.atlassian.hibernate.extras.tangosol.CoherenceCacheStrategy.get(CoherenceCacheStrategy.java:68)
      	at com.atlassian.confluence.cache.hibernate.SwitchableCacheConcurrencyStrategy.get(SwitchableCacheConcurrencyStrategy.java:29)
      	at com.atlassian.confluence.cache.hibernate.ConfluenceCacheStrategy.get(ConfluenceCacheStrategy.java:49)
      	at com.atlassian.hibernate.adapter.adapters.cache.EntityRegionAccessStrategyV5Adapter.get(EntityRegionAccessStrategyV5Adapter.java:51)
      	at org.hibernate.engine.internal.CacheHelper.fromSharedCache(CacheHelper.java:32)
      	at org.hibernate.event.internal.DefaultLoadEventListener.getFromSharedCache(DefaultLoadEventListener.java:651)
      	at org.hibernate.event.internal.DefaultLoadEventListener.loadFromSecondLevelCache(DefaultLoadEventListener.java:595)
      	at org.hibernate.event.internal.DefaultLoadEventListener.doLoad(DefaultLoadEventListener.java:462)
      	at org.hibernate.event.internal.DefaultLoadEventListener.load(DefaultLoadEventListener.java:219)
      	...
      	at com.atlassian.confluence.util.LabelUtil.extractLabelsFromLabellings(LabelUtil.java:625)
      	at com.atlassian.confluence.core.AbstractLabelableEntityObject.getLabels(AbstractLabelableEntityObject.java:26)
      	...
      	at com.atlassian.confluence.spaces.SpaceDescription_$$_jvstabe_15.getLabels(SpaceDescription_$$_jvstabe_15.java)
      	at com.midori.confluence.plugin.archiving.service.support.ContentLifecycleLabelHelper.findSpaceLabel(ContentLifecycleLabelHelper.java:156)
      	at com.midori.confluence.plugin.archiving.service.support.ContentLifecycleLabelHelper.isBlacklistedSpace(ContentLifecycleLabelHelper.java:112)
      	at com.midori.confluence.plugin.archiving.util.NonBlacklistedGlobalFreshSpacePredicate.apply(NonBlacklistedGlobalFreshSpacePredicate.java:15)
      	at com.midori.confluence.plugin.archiving.web.condition.NonBlacklistedGlobalFreshSpaceCondition.shouldDisplay(NonBlacklistedGlobalFreshSpaceCondition.java:15)
      	at com.atlassian.confluence.plugin.descriptor.web.conditions.BaseConfluenceCondition.shouldDisplay(BaseConfluenceCondition.java:33)
      	at com.atlassian.plugin.web.conditions.AndCompositeCondition.shouldDisplay(AndCompositeCondition.java:14)
      	at com.atlassian.plugin.web.conditions.AndCompositeCondition.shouldDisplay(AndCompositeCondition.java:14)
      	at com.atlassian.plugin.web.DefaultWebInterfaceManager.filterFragmentsByCondition(DefaultWebInterfaceManager.java:154)
      	at com.atlassian.plugin.web.DefaultWebInterfaceManager.getDisplayableItems(DefaultWebInterfaceManager.java:107)
      	at com.atlassian.confluence.plugin.descriptor.web.ConfluenceWebInterfaceManager.getDisplayableItems(ConfluenceWebInterfaceManager.java:74)
      	...
      	at com.atlassian.confluence.plugins.pagebanner.PageBannerContextProvider.getBannerItems(PageBannerContextProvider.java:72)
      	at com.atlassian.confluence.plugins.pagebanner.PageBannerContextProvider.getContextMap(PageBannerContextProvider.java:57)
      	at com.atlassian.plugin.web.descriptors.DefaultWebPanelModuleDescriptor$ContextAwareWebPanel.getHtml(DefaultWebPanelModuleDescriptor.java:144)
      	...
      	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
      	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
      	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
      	- locked <0x00007f03be400550> (a org.apache.tomcat.util.net.NioChannel)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
      	at java.lang.Thread.run(Thread.java:748)
      "http-nio-8090-exec-203 url:/display/EXAM/Page+Name username:charlie url:/pages/viewpage.action username:charlie" #14472 daemon prio=5 tid=0x00007ffac00fd800 nid=0x1c1e4 waiting on condition [0x00007fec4e98d000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
      	at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:153)
      	at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:376)
      	at com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:307)
      	at com.hazelcast.map.impl.proxy.NearCachedMapProxyImpl.getInternal(NearCachedMapProxyImpl.java:101)
      	at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:112)
      	at com.atlassian.cache.hazelcast.HazelcastCache.getOrLoad(HazelcastCache.java:243)
      	at com.atlassian.cache.hazelcast.HazelcastCache.get(HazelcastCache.java:95)
      	at com.atlassian.confluence.cache.DefaultConfluenceCache.get(DefaultConfluenceCache.java:47)
      	at com.atlassian.confluence.cache.ConfluenceMonitoringCache.get(ConfluenceMonitoringCache.java:45)
      	at com.atlassian.confluence.cache.DefaultConfluenceCache.get(DefaultConfluenceCache.java:47)
      	at com.atlassian.confluence.cache.ConfluenceMonitoringCache.get(ConfluenceMonitoringCache.java:45)
      	at com.atlassian.confluence.cache.hazelcast.hibernate.HazelcastHibernateCache.get(HazelcastHibernateCache.java:32)
      	at com.atlassian.hibernate.extras.tangosol.CoherenceCacheStrategy.get(CoherenceCacheStrategy.java:68)
      	at com.atlassian.confluence.cache.hibernate.SwitchableCacheConcurrencyStrategy.get(SwitchableCacheConcurrencyStrategy.java:29)
      	at com.atlassian.confluence.cache.hibernate.ConfluenceCacheStrategy.get(ConfluenceCacheStrategy.java:49)
      	at com.atlassian.hibernate.adapter.adapters.cache.EntityRegionAccessStrategyV5Adapter.get(EntityRegionAccessStrategyV5Adapter.java:51)
      	at org.hibernate.engine.internal.CacheHelper.fromSharedCache(CacheHelper.java:32)
      	at org.hibernate.event.internal.DefaultLoadEventListener.getFromSharedCache(DefaultLoadEventListener.java:651)
      	at org.hibernate.event.internal.DefaultLoadEventListener.loadFromSecondLevelCache(DefaultLoadEventListener.java:595)
      	at org.hibernate.event.internal.DefaultLoadEventListener.doLoad(DefaultLoadEventListener.java:462)
      	at org.hibernate.event.internal.DefaultLoadEventListener.load(DefaultLoadEventListener.java:219)
      	...
      	at com.atlassian.confluence.user.ConfluenceUserImpl_$$_jvstb24_3.getName(ConfluenceUserImpl_$$_jvstb24_3.java)
      	at com.atlassian.confluence.labels.Label.getOwner(Label.java:90)
      	at com.atlassian.confluence.labels.LabelParser.create(LabelParser.java:189)
      	at com.atlassian.confluence.labels.LabelParser.render(LabelParser.java:193)
      	at com.atlassian.confluence.labels.Label.getDisplayTitle(Label.java:170)
      	at com.atlassian.confluence.util.LabelUtil.extractLabelsFromLabellings(LabelUtil.java:626)
      	at com.atlassian.confluence.core.AbstractLabelableEntityObject.getLabels(AbstractLabelableEntityObject.java:26)
      	...
      	at com.atlassian.confluence.spaces.SpaceDescription_$$_jvstb24_15.getLabels(SpaceDescription_$$_jvstb24_15.java)
      	at com.atlassian.confluence.userstatus.DefaultFavouriteManager.isUserFavourite(DefaultFavouriteManager.java:48)
      	...
      	at com.atlassian.confluence.plugins.ia.ui.DefaultSoySidebarContextProvider.populateWithMainSidebarData(DefaultSoySidebarContextProvider.java:201)
      	at com.atlassian.confluence.plugins.ia.ui.DefaultSoySidebarContextProvider.getContextMap(DefaultSoySidebarContextProvider.java:116)
      	...
      	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
      	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
      	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
      	- locked <0x00007ff290402f40> (a org.apache.tomcat.util.net.NioChannel)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
      	at java.lang.Thread.run(Thread.java:748)
      "http-nio-8090-exec-220 url:/display/EXAM/Page+Name username:charlie url:/pages/viewpage.action username:charlie" #84429 daemon prio=5 tid=0x00007effcc163800 nid=0x1eff2 runnable [0x00007efd8dfa9000]
         java.lang.Thread.State: RUNNABLE
      	at java.lang.System.identityHashCode(Native Method)
      	at java.util.IdentityHashMap.hash(IdentityHashMap.java:295)
      	at java.util.IdentityHashMap.get(IdentityHashMap.java:329)
      	at org.hibernate.engine.internal.EntityEntryContext.getAssociatedManagedEntity(EntityEntryContext.java:177)
      	at org.hibernate.engine.internal.EntityEntryContext.addEntityEntry(EntityEntryContext.java:91)
      	at org.hibernate.engine.internal.StatefulPersistenceContext.addEntry(StatefulPersistenceContext.java:544)
      	at org.hibernate.engine.internal.StatefulPersistenceContext.addEntity(StatefulPersistenceContext.java:471)
      	at org.hibernate.engine.internal.TwoPhaseLoad.addUninitializedCachedEntity(TwoPhaseLoad.java:385)
      	at org.hibernate.event.internal.DefaultLoadEventListener.convertCacheEntryToEntity(DefaultLoadEventListener.java:738)
      	at org.hibernate.event.internal.DefaultLoadEventListener.processCachedEntry(DefaultLoadEventListener.java:625)
      	at org.hibernate.event.internal.DefaultLoadEventListener.loadFromSecondLevelCache(DefaultLoadEventListener.java:602)
      	at org.hibernate.event.internal.DefaultLoadEventListener.doLoad(DefaultLoadEventListener.java:462)
      	at org.hibernate.event.internal.DefaultLoadEventListener.load(DefaultLoadEventListener.java:219)
      	at org.hibernate.event.internal.DefaultLoadEventListener.doOnLoad(DefaultLoadEventListener.java:116)
      	at org.hibernate.event.internal.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:89)
      	at org.hibernate.internal.SessionImpl.fireLoad(SessionImpl.java:1230)
      	at org.hibernate.internal.SessionImpl.immediateLoad(SessionImpl.java:1088)
      	at org.hibernate.proxy.AbstractLazyInitializer.initialize(AbstractLazyInitializer.java:155)
      	at org.hibernate.proxy.AbstractLazyInitializer.getImplementation(AbstractLazyInitializer.java:259)
      	at org.hibernate.proxy.pojo.javassist.JavassistLazyInitializer.invoke(JavassistLazyInitializer.java:73)
      	at com.atlassian.hibernate.adapter.proxy.JavassistLazyInitializer_ImplementV2Proxy.invoke(JavassistLazyInitializer_ImplementV2Proxy.java:68)
      	at com.atlassian.confluence.user.ConfluenceUserImpl_$$_jvstabe_3.getName(ConfluenceUserImpl_$$_jvstabe_3.java)
      	at com.atlassian.confluence.labels.Label.getOwner(Label.java:90)
      	at com.atlassian.confluence.labels.LabelParser.create(LabelParser.java:189)
      	at com.atlassian.confluence.labels.LabelParser.render(LabelParser.java:193)
      	at com.atlassian.confluence.labels.Label.getDisplayTitle(Label.java:170)
      	at com.atlassian.confluence.util.LabelUtil.extractLabelsFromLabellings(LabelUtil.java:626)
      	at com.atlassian.confluence.core.AbstractLabelableEntityObject.getLabels(AbstractLabelableEntityObject.java:26)
      	...
      	at com.atlassian.confluence.spaces.SpaceDescription_$$_jvstabe_15.getLabels(SpaceDescription_$$_jvstabe_15.java)
      	at com.midori.confluence.plugin.archiving.service.support.ContentLifecycleLabelHelper.findSpaceLabel(ContentLifecycleLabelHelper.java:156)
      	at com.midori.confluence.plugin.archiving.service.support.ContentLifecycleLabelHelper.isBlacklistedSpace(ContentLifecycleLabelHelper.java:112)
      	at com.midori.confluence.plugin.archiving.util.NonBlacklistedGlobalFreshSpacePredicate.apply(NonBlacklistedGlobalFreshSpacePredicate.java:15)
      	at com.midori.confluence.plugin.archiving.web.condition.NonBlacklistedGlobalFreshSpaceCondition.shouldDisplay(NonBlacklistedGlobalFreshSpaceCondition.java:15)
      	at com.atlassian.confluence.plugin.descriptor.web.conditions.BaseConfluenceCondition.shouldDisplay(BaseConfluenceCondition.java:33)
      	at com.atlassian.plugin.web.conditions.AndCompositeCondition.shouldDisplay(AndCompositeCondition.java:14)
      	at com.atlassian.plugin.web.conditions.AndCompositeCondition.shouldDisplay(AndCompositeCondition.java:14)
      	at com.atlassian.plugin.web.DefaultWebInterfaceManager.filterFragmentsByCondition(DefaultWebInterfaceManager.java:154)
      	at com.atlassian.plugin.web.DefaultWebInterfaceManager.getDisplayableItems(DefaultWebInterfaceManager.java:107)
      	at com.atlassian.confluence.plugin.descriptor.web.ConfluenceWebInterfaceManager.getDisplayableItems(ConfluenceWebInterfaceManager.java:74)
      	...
      	at com.atlassian.confluence.plugins.pagebanner.PageBannerContextProvider.getBannerItems(PageBannerContextProvider.java:72)
      	at com.atlassian.confluence.plugins.pagebanner.PageBannerContextProvider.getContextMap(PageBannerContextProvider.java:57)
      	at com.atlassian.plugin.web.descriptors.DefaultWebPanelModuleDescriptor$ContextAwareWebPanel.getHtml(DefaultWebPanelModuleDescriptor.java:144)
      	...
      	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
      	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
      	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
      	- locked <0x00007f03af801258> (a org.apache.tomcat.util.net.NioChannel)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
      	at java.lang.Thread.run(Thread.java:748)

      Notes

      Is it possible that this is a bug in hazelcast? Perhaps https://github.com/hazelcast/hazelcast/issues/13170

      Workaround

      Disable Midori Better Archiving

              Unassigned Unassigned
              jkennemer Jonathon (Inactive)
              Votes:
              4 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: