-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Highest
-
Affects Version/s: 7.2.0, 7.2.7, 7.3.1
-
Component/s: Issue - View Issue
-
7.02
-
10
-
Severity 1 - Critical
-
1,547
Summary
Caches in JIRA Data Center are self invalidating
Environment
- JIRA datacenter - Only Data Center customers are affected by this issue
Symptoms
- JIRA is unstable with high CPU usage and high GC, requests are dropped.
- On DB level one can observe high traffic to projectroleactor table.
SELECT ID, PID, PROJECTROLEID, ROLETYPE, ROLETYPEPARAMETER FROM projectroleactor WHERE PROJECTROLEID=:1
- Following error messages in logs maybe present
Looking up rmiUrl NODE_ADDRESS/com.atlassian.jira.security.roles.CachingProjectRoleAndActorStore.projectRoleActors threw a connection exception.
-
[c.a.jira.cluster.ReplicatorExecutorServiceFactory] Cache replication thread pool is too small. Try increasing number of threads by setting system property com.atlassian.cache.parallelReplicationThreadCount.
-
- Stack traces with dbcpool exhausted symptoms.
Steps to reproduce (to be verified)
- Setup JIRA Data Center cluster with minimum two nodes, the more the better
- Create thousands of projects
- Start multiple treads that are executing JIRA agile dashboard
- Those calls must be calling all the nodes
- Create project
- JIRA should start unstable behaviour.
Root cause
- JIRA 6.4 used com.atlassian.cache.CacheFactory#getCache(java.lang.String, com.atlassian.cache.CacheLoader<K,V>, com.atlassian.cache.CacheSettings) with cache loaders.
- Caches constructed this way are LoadingCache which is BlockingCache this means that cacheLoader will load value exactly once for each key.
- The value will be reloaded only when invalidated.
- JIRA 7.2 caches were migrated to use cache without CacheLoader and started executing method.
com.atlassian.vcache.LocalCacheOperations#get(K, java.util.function.Supplier<? extends V>)- Caches constructed this way are not blocking and the get with supplier operation is not blocking.
- This means that multiple treads will try to load the same value from the loader.
- If two treads load the values for the same key then one of them will execute put on ehcache in the situation when value already exists in the cache. In other words one of the threads overrides value that was stored by the other thread.
- In this situation ehcache recognises this as value update and triggers invalidation on all the other nodes.
- As a result of the invalidation all other nodes are very likely to trigger the same invalidation behaviour making the problem even worse. This effectively remove caching for those values and multiplies invalidation traffic across nodes.
This issue affects 21 caches in total the most significant being CachingProjectRoleAndActorStore
List of caches
com.atlassian.jira.issue.managers.CachingCustomFieldManager com.atlassian.jira.issue.fields.screen.CachingFieldScreenStore com.atlassian.jira.project.util.CachingProjectKeyStore com.atlassian.jira.project.CachingProjectManager com.atlassian.jira.security.roles.CachingProjectRoleAndActorStore com.atlassian.jira.security.auth.trustedapps.CachingTrustedApplicationStore com.atlassian.jira.workflow.CachingWorkflowDescriptorStore com.atlassian.jira.config.DefaultConstantsManager com.atlassian.jira.util.DefaultDurationFormatterProvider com.atlassian.jira.event.type.DefaultEventTypeManager com.atlassian.jira.config.feature.DefaultFeatureManager com.atlassian.jira.issue.link.DefaultIssueLinkTypeManager com.atlassian.jira.config.DefaultReindexMessageManager com.atlassian.jira.service.DefaultServiceManager com.atlassian.jira.workflow.DefaultWorkflowSchemeManager com.atlassian.jira.workflow.EagerWorkflowSchemeManager com.atlassian.jira.issue.context.persistence.FieldConfigContextPersisterWorker com.atlassian.jira.issue.fields.config.persistence.FieldConfigPersisterImpl com.atlassian.jira.issue.fields.config.persistence.FieldConfigSchemePersisterImpl com.atlassian.jira.issue.index.managers.FieldIndexerManagerImpl com.atlassian.jira.issue.security.IssueSecurityLevelManagerImpl