-
Bug
-
Resolution: Fixed
-
High
-
8.13.3
-
8.13
-
4
-
Severity 1 - Critical
-
44
-
Issue Summary
ProjectRoleActorsZduSafeCache uses a composite key based on <projectId,roleId>. Some components (like JQL validation, as used in the test below) iterate over entries of this cache for all projects. Hence, when this cache is fully flushed, it's aggressively loaded back via multiple threads using such components. This causes many concurrent cache load requests, which may block temporarily on each other due to lock contention on cache stripes. The amount of delay caused by this lock contention may become noticeable on large instances with high load, a large web-container thread pool and a large number of projects.
Steps to Reproduce
The issue may be reproduced with a load test. The amount of delay that can be observed depends on the size of test data, load and test hardware.
- Set up a load test to generate concurrent issue search load on the system. This may be a call to issue search REST API like /rest/api/2/search?jql=comment~test. Use a JQL that refers to a component (this triggers a permission check), i.e. in the example provided, the component is comment
- After the test warmup period, flush the ProjectRoleActorsZduSafeCache by creating a new empty Group at Administration/User management/Groups, and then deleting this Group
- you may observe stalled requests on the load test tool and verify the issue with threaddumps during this period
Operations that are known to trigger this bug
- deleting a User
- deleting a Group
- anonymizing a User
- Project import
Sample stack traces indicating the problem
- many blocked threads with the stack
sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) net.sf.ehcache.concurrent.ReadWriteLockSync.lock(ReadWriteLockSync.java:50) net.sf.ehcache.constructs.blocking.BlockingCache.acquiredLockForKey(BlockingCache.java:196) net.sf.ehcache.constructs.blocking.BlockingCache.get(BlockingCache.java:158) com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:80) com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:108) com.atlassian.jira.cache.DeferredReplicationCache.get(DeferredReplicationCache.java:48) com.atlassian.jira.security.roles.ProjectRoleActorsZduSafeCache.get(ProjectRoleActorsZduSafeCache.java:48) com.atlassian.jira.security.roles.CachingProjectRoleAndActorStore.getProjectRoleActors(CachingProjectRoleAndActorStore.java:123)
- a few threads with active loaders
java.net.SocketInputStream.socketRead0(Native Method) java.net.SocketInputStream.socketRead(SocketInputStream.java:116) java.net.SocketInputStream.read(SocketInputStream.java:171) java.net.SocketInputStream.read(SocketInputStream.java:141) sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:457) sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68) sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1095) sun.security.ssl.SSLSocketImpl.access$200(SSLSocketImpl.java:72) sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:815) org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161) org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128) org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:113) org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73) org.postgresql.core.PGStream.receiveChar(PGStream.java:441) org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2057) org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323) org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:473) org.postgresql.jdbc.PgStatement.execute(PgStatement.java:393) org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164) org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:114) org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83) org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83) com.atlassian.jira.ofbiz.sql.PreparedStatementWrapper.executeQuery(PreparedStatementWrapper.java:42) com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement.lambda$executeQuery$5(DiagnosticPreparedStatement.java:59) com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement$$Lambda$1921/659876343.execute(Unknown Source) com.atlassian.diagnostics.internal.platform.monitor.db.DefaultDatabaseDiagnosticsCollector.recordExecutionTime(DefaultDatabaseDiagnosticsCollector.java:69) com.atlassian.jira.diagnostic.connection.DatabaseDiagnosticsCollectorDelegate.recordExecutionTime(DatabaseDiagnosticsCollectorDelegate.java:55) com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement.executeQuery(DiagnosticPreparedStatement.java:59) org.ofbiz.core.entity.jdbc.SQLProcessor.executeQuery(SQLProcessor.java:527) org.ofbiz.core.entity.GenericDAO.createEntityListIterator(GenericDAO.java:881) org.ofbiz.core.entity.GenericDAO.selectListIteratorByCondition(GenericDAO.java:861) org.ofbiz.core.entity.GenericDAO.selectByAnd(GenericDAO.java:733) org.ofbiz.core.entity.GenericHelperDAO.findByAnd(GenericHelperDAO.java:166) org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:913) org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:891) org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:860) com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findByAnd(DefaultOfBizDelegator.java:83) com.atlassian.jira.ofbiz.WrappingOfBizDelegator.findByAnd(WrappingOfBizDelegator.java:62) com.atlassian.jira.security.roles.OfBizProjectRoleAndActorStore.getRoleActors(OfBizProjectRoleAndActorStore.java:323) com.atlassian.jira.security.roles.OfBizProjectRoleAndActorStore.getProjectRoleActors(OfBizProjectRoleAndActorStore.java:144) com.atlassian.jira.security.roles.CachingProjectRoleAndActorStore.loadProjectRoleActorsFromDelegate(CachingProjectRoleAndActorStore.java:205) com.atlassian.jira.security.roles.CachingProjectRoleAndActorStore$$Lambda$267/981565593.load(Unknown Source) com.atlassian.cache.ehcache.wrapper.ValueProcessorAtlassianCacheLoaderDecorator.load(ValueProcessorAtlassianCacheLoaderDecorator.java:26) com.atlassian.cache.ehcache.LoadingCache.getFromLoader(LoadingCache.java:134) com.atlassian.cache.ehcache.LoadingCache$$Lambda$215/888134930.apply(Unknown Source) com.atlassian.cache.ehcache.SynchronizedLoadingCacheDecorator.synchronizedLoad(SynchronizedLoadingCacheDecorator.java:29) com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:102) com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:81) com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:108) com.atlassian.jira.cache.DeferredReplicationCache.get(DeferredReplicationCache.java:48) com.atlassian.jira.security.roles.ProjectRoleActorsZduSafeCache.get(ProjectRoleActorsZduSafeCache.java:48) com.atlassian.jira.security.roles.CachingProjectRoleAndActorStore.getProjectRoleActors(CachingProjectRoleAndActorStore.java:123)
Expected Results
ProjectRoleActorsZduSafeCache is repopulated after a flush without causing a noticeable impact on server responsiveness
Actual Results
A temporary lock contention may result after the cache is flushed, causing noticeable hit to responsiveness
Note on fix
The problem was mitigated by changing the cache flush logic: only entries associated with projects where the user/group is a role actor will be invalidated.
Workaround
Currently, there is no known workaround for this behaviour. A workaround will be added here when available
- is related to
-
JRASERVER-69446 Removing actor from project role can make Jira unresponsive
- Closed
-
JRASERVER-70518 Increase number of cache stripes for EHCache cache
- Closed
- is duplicated by
-
RAID-2462 Loading...
- links to
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...