Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-64839

JIRA generates cache flush event for each default group after enabling Application access

    XMLWordPrintable

Details

    Description

      Summary

      JIRA generates cache flush event for each default group after enabling Application access. These frequent cache flush put a lot of pressure to application as all threads needs to wait for the cache and especially together with JRA-64384 can make application unusable. Basically for each addUserMembershipsForGroup action JIRA flushes the cache.

      Steps to Reproduce

      1. Add many groups and make then default for JIRA Software (same problem for ServiceDesk)
        • Application access page: [ Administration > Application > Application access]
      2. Create user
      3. Enable debug logging for com.atlassian.jira.application.DefaultApplicationRoleManager
      4. Give user Application access

      Expected Results

      • JIRA adds user to groups.
      • None or one cache flush will be generated.

      Actual Results

      • JIRA will add user to groups.
      • Many cache flush events generated, one for each added group (each group edit event generates the cache flush).
      • Check logs:
        2017-03-15 18:19:41,883 http-nio-8081-exec-11 DEBUG admin 1099x2859x1 1tt78a8 127.0.0.1 /rest/internal/2/viewuser/application/jira-software [c.a.jira.application.DefaultApplicationRoleManager] Clearing the user counts on com.atlassian.crowd.event.group.GroupMembershi
        pDeletedEvent.
        2017-03-15 18:19:41,898 http-nio-8081-exec-11 DEBUG admin 1099x2859x1 1tt78a8 127.0.0.1 /rest/internal/2/viewuser/application/jira-software [c.a.jira.application.DefaultApplicationRoleManager] Clearing the user counts on com.atlassian.crowd.event.group.GroupMembershi
        pDeletedEvent.
        
      • Thread dump, thread trying to flush the cache: com.atlassian.cache.ehcache.LoadingCache.removeAll
        "http-nio-8080-exec-828" #365452 daemon prio=5 os_prio=0 tid=0x00007fda18083000 nid=0x38e0 waiting on condition [0x00007fde368df000]
           java.lang.Thread.State: WAITING (parking)
        	at sun.misc.Unsafe.park(Native Method)
        	- parking to wait for  <0x000000055b994878> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
        	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
        	at com.atlassian.cache.ehcache.LoadingCache.removeAll(LoadingCache.java:320)
        	at com.atlassian.cache.ehcache.DelegatingCache.removeAll(DelegatingCache.java:148)
        	at com.atlassian.jira.application.DefaultApplicationRoleManager.clearUserCounts(DefaultApplicationRoleManager.java:651)
        	at com.atlassian.jira.application.DefaultApplicationRoleManager.onGroupMembershipCreated(DefaultApplicationRoleManager.java:510)
        	at sun.reflect.GeneratedMethodAccessor8945.invoke(Unknown Source)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        	at java.lang.reflect.Method.invoke(Method.java:498)
        	at com.atlassian.event.internal.SingleParameterMethodListenerInvoker.invoke(SingleParameterMethodListenerInvoker.java:36)
        	at com.atlassian.event.internal.AsynchronousAbleEventDispatcher$1$1.run(AsynchronousAbleEventDispatcher.java:48)
        	at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
        	at com.atlassian.event.internal.AsynchronousAbleEventDispatcher.dispatch(AsynchronousAbleEventDispatcher.java:107)
        	at com.atlassian.event.internal.EventPublisherImpl.invokeListeners(EventPublisherImpl.java:160)
        	at com.atlassian.event.internal.EventPublisherImpl.publish(EventPublisherImpl.java:79)
        	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.addUserToGroup(DirectoryManagerGeneric.java:727)
        	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.addUserToGroup(ApplicationServiceGeneric.java:1297)
        	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.addUserToGroup(CrowdServiceImpl.java:711)
        	at com.atlassian.crowd.embedded.core.DelegatingCrowdService.addUserToGroup(DelegatingCrowdService.java:193)
        	at com.atlassian.crowd.embedded.core.FilteredCrowdServiceImpl.addUserToGroup(FilteredCrowdServiceImpl.java:324)
        	at com.atlassian.jira.user.util.UserUtilImpl.doAddUserToGroup(UserUtilImpl.java:353)
        	at com.atlassian.jira.user.util.UserUtilImpl.addUserToGroups(UserUtilImpl.java:367)
        	at com.atlassian.jira.bc.user.DefaultUserService.addUserToApplication(DefaultUserService.java:608)
        ...
        	at com.atlassian.jira.rest.util.UpdateUserApplicationHelper.doApplicationUpdate(UpdateUserApplicationHelper.java:123)
        	at com.atlassian.jira.rest.util.UpdateUserApplicationHelper.addUserToApplication(UpdateUserApplicationHelper.java:57)
        	at com.atlassian.jira.rest.internal.v2.ViewUserResource.addUserToApplication(ViewUserResource.java:90)
        

      Notes

      Problem will be caused by combination of 2 factors:

      • Too many cache flush events, current ticket
        • caused by allowing some user Application access
        • For each default group there will be additional cache flush event when user is given Application access
      • Slow cache population, see JRA-64384
      • Those two thread will fight for cache, meantime all threads will wait (this instability could be 5+ min), see related JRA-64847
        • one will flush
          "http-nio-8080-exec-828" #365452 daemon prio=5 os_prio=0 tid=0x00007fda18083000 nid=0x38e0 waiting on condition [0x00007fde368df000]
             java.lang.Thread.State: WAITING (parking)
          ...
          	at com.atlassian.cache.ehcache.LoadingCache.removeAll(LoadingCache.java:320)
          ...
          	at com.atlassian.jira.application.DefaultApplicationRoleManager.clearUserCounts(DefaultApplicationRoleManager.java:651)
          
        • another will populate:
          "http-nio-8080-exec-831" #365455 daemon prio=5 os_prio=0 tid=0x00007fda540a8800 nid=0x38e3 runnable [0x00007fde36485000]
             java.lang.Thread.State: RUNNABLE
          ..
          	at com.atlassian.jira.crowd.embedded.ofbiz.DelegatingUserDao.processUsers(DelegatingUserDao.java:58).
          ...
          	at com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:100)
          	at com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:76)
          	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:97)
          	at com.atlassian.jira.application.DefaultApplicationRoleManager.getUserCount(DefaultApplicationRoleManager.java:407)
          

      Workaround

      • Reduce number of default groups to 1.
      Note on Fix

      We largely mitigated the problem by implementing JRASERVER-65342, that makes cache flushes non-blocking, so they will not stack-up in case is many events.

      Attachments

        1. Application_access.png
          Application_access.png
          60 kB
        2. Edit_user_2.png
          Edit_user_2.png
          184 kB

        Issue Links

          Activity

            People

              lwlodarczyk Lukasz Wlodarczyk
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Votes:
              3 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: