Caches in JIRA Data Center are self invalidating

XMLWordPrintable

    • 7.02
    • 10
    • Severity 1 - Critical
    • 1,547

      Summary

      Caches in JIRA Data Center are self invalidating

      Environment

      • JIRA datacenter - Only Data Center customers are affected by this issue

      Symptoms

      1. JIRA is unstable with high CPU usage and high GC, requests are dropped.
      2. On DB level one can observe high traffic to projectroleactor table.
        SELECT ID, PID, PROJECTROLEID, ROLETYPE, ROLETYPEPARAMETER FROM projectroleactor WHERE PROJECTROLEID=:1
        
      3. Following error messages in logs maybe present 
        • Looking up rmiUrl NODE_ADDRESS/com.atlassian.jira.security.roles.CachingProjectRoleAndActorStore.projectRoleActors threw a connection exception.
          
        • [c.a.jira.cluster.ReplicatorExecutorServiceFactory] Cache replication thread pool is too small. Try increasing number of threads by setting system property com.atlassian.cache.parallelReplicationThreadCount.
          
        • Stack traces with dbcpool exhausted symptoms. 

      Steps to reproduce (to be verified)

      1. Setup JIRA Data Center cluster with minimum two nodes, the more the better
      2. Create thousands of projects 
      3. Start multiple treads that are executing JIRA agile dashboard
        • Those calls must be calling all the nodes
      4. Create project
      5. JIRA should start unstable behaviour. 

       

      Root cause

      1. JIRA 6.4 used com.atlassian.cache.CacheFactory#getCache(java.lang.String, com.atlassian.cache.CacheLoader<K,V>, com.atlassian.cache.CacheSettings) with cache loaders.
        • Caches constructed this way are LoadingCache which is BlockingCache this means that cacheLoader will load value exactly once for each key.
        • The value will be reloaded only when invalidated.
      2. JIRA 7.2 caches were migrated to use cache without CacheLoader and started executing method.
        com.atlassian.vcache.LocalCacheOperations#get(K, java.util.function.Supplier<? extends V>)
        • Caches constructed this way are not blocking and the get with supplier operation is not blocking.
        • This means that multiple treads will try to load the same value from the loader.
        • If two treads load the values for the same key then one of them will execute put on ehcache in the situation when value already exists in the cache. In other words one of the threads overrides value that was stored by the other thread.
        • In this situation ehcache recognises this as value update and triggers invalidation on all the other nodes.
        • As a result of the invalidation all other nodes are very likely to trigger the same invalidation behaviour making the problem even worse. This effectively remove caching for those values and multiplies invalidation traffic across nodes.

      This issue affects 21 caches in total the most significant being CachingProjectRoleAndActorStore

      List of caches

      com.atlassian.jira.issue.managers.CachingCustomFieldManager
      com.atlassian.jira.issue.fields.screen.CachingFieldScreenStore
      com.atlassian.jira.project.util.CachingProjectKeyStore
      com.atlassian.jira.project.CachingProjectManager
      com.atlassian.jira.security.roles.CachingProjectRoleAndActorStore
      com.atlassian.jira.security.auth.trustedapps.CachingTrustedApplicationStore
      com.atlassian.jira.workflow.CachingWorkflowDescriptorStore
      com.atlassian.jira.config.DefaultConstantsManager
      com.atlassian.jira.util.DefaultDurationFormatterProvider
      com.atlassian.jira.event.type.DefaultEventTypeManager
      com.atlassian.jira.config.feature.DefaultFeatureManager
      com.atlassian.jira.issue.link.DefaultIssueLinkTypeManager
      com.atlassian.jira.config.DefaultReindexMessageManager
      com.atlassian.jira.service.DefaultServiceManager
      com.atlassian.jira.workflow.DefaultWorkflowSchemeManager
      com.atlassian.jira.workflow.EagerWorkflowSchemeManager
      com.atlassian.jira.issue.context.persistence.FieldConfigContextPersisterWorker
      com.atlassian.jira.issue.fields.config.persistence.FieldConfigPersisterImpl
      com.atlassian.jira.issue.fields.config.persistence.FieldConfigSchemePersisterImpl
      com.atlassian.jira.issue.index.managers.FieldIndexerManagerImpl
      com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl
      
      

        1. jira-7.2-caches-jdc.jmx
          34 kB
          Adam Jakubowski

            Assignee:
            Unassigned
            Reporter:
            Adam Jakubowski (Inactive)
            Votes:
            13 Vote for this issue
            Watchers:
            43 Start watching this issue

              Created:
              Updated:
              Resolved: