Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-61759

JIRA performance degradation during directory sync

    XMLWordPrintable

Details

    Description

      Summary

      JIRA performance degrades significantly during full and incremental directory sync. CPU spiking to 100%, page load delay between .5 - 5 min.

      Environment

      • JIRA 7.0 or later
      • JIRA configured with AD LDAP directory (type Connected)

      Steps to Reproduce

      Steps to reproduce*

      1) Configure Microsoft AD in jira 7.1.4, make sure AD has enough users ie. 10k users
      2) Import few users 
      3) Login to jira as local user and be on directory page
      4) Configure jmeter "http" sessions using one of the AD user, session should repeat contentiously for 30 minute
      5) Initiate 20 concurrent sessions from jmeter --e.g browsing boards , issues etc.
      6) After 10 second of jemeter sessions initiate LDAP sync
      7) Login to jira on another browser and keep navigating pages, after 3-8 minute you will notice slowness and LDAP sync is running forever.

      Expected Results

      Directory sync should happen without having performance degradation.

      Actual Result

      JIRA performance significantly degrades when full and incremental sync is happening.

      Verification

      To verify if the instance is affected by this bug, collect thread dumps during slow performance as per Generate a Thread Dump - reviewing them the Caesium thread will contain the below thread over several thread dumps:

      "Caesium-1-3" #189 daemon prio=5 tid=0x00007f2eae212000 nid=0x3540 waiting on condition [0x00007f2f1cff9000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x0000000543279b50> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
      	at com.atlassian.cache.memory.DelegatingCache.removeAll(DelegatingCache.java:256)
      	at com.atlassian.jira.application.DefaultApplicationRoleManager.clearUserCounts(DefaultApplicationRoleManager.java:632)
      	at com.atlassian.jira.application.DefaultApplicationRoleManager.onUserDeleted(DefaultApplicationRoleManager.java:529)
      	at sun.reflect.GeneratedMethodAccessor2429.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at com.atlassian.event.internal.SingleParameterMethodListenerInvoker.invoke(SingleParameterMethodListenerInvoker.java:36)
      	at com.atlassian.event.internal.AsynchronousAbleEventDispatcher$1$1.run(AsynchronousAbleEventDispatcher.java:48)
      	at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
      	at com.atlassian.event.internal.AsynchronousAbleEventDispatcher.dispatch(AsynchronousAbleEventDispatcher.java:107)
      	at com.atlassian.event.internal.EventPublisherImpl.invokeListeners(EventPublisherImpl.java:160)
      	at com.atlassian.event.internal.EventPublisherImpl.publish(EventPublisherImpl.java:79)
      	at com.atlassian.crowd.directory.DbCachingRemoteChangeOperations.publishEvent(DbCachingRemoteChangeOperations.java:1062)
      	at com.atlassian.crowd.directory.DbCachingRemoteChangeOperations.deleteCachedUsersByName(DbCachingRemoteChangeOperations.java:318)
      	at com.atlassian.crowd.directory.DbCachingRemoteChangeOperations.deleteCachedUsersByGuid(DbCachingRemoteChangeOperations.java:285)
      	at com.atlassian.crowd.directory.DirectoryCacheImplUsingChangeOperations.deleteCachedUsersByGuid(DirectoryCacheImplUsingChangeOperations.java:72)
      	at com.atlassian.crowd.directory.ldap.cache.UsnChangedCacheRefresher.synchroniseUserChangesGuid(UsnChangedCacheRefresher.java:356)
      	at com.atlassian.crowd.directory.ldap.cache.UsnChangedCacheRefresher.synchroniseUserChanges(UsnChangedCacheRefresher.java:381)
      	at com.atlassian.crowd.directory.ldap.cache.UsnChangedCacheRefresher.synchroniseChanges(UsnChangedCacheRefresher.java:124)
      	at com.atlassian.crowd.directory.DbCachingRemoteDirectory.synchroniseCache(DbCachingRemoteDirectory.java:1097)
      	at com.atlassian.crowd.manager.directory.DirectorySynchroniserImpl.synchronise(DirectorySynchroniserImpl.java:76)
      	at com.atlassian.jira.crowd.embedded.JiraDirectorySynchroniser.synchronizeDirectory(JiraDirectorySynchroniser.java:77)
      	at com.atlassian.jira.crowd.embedded.JiraDirectorySynchroniser.runJob(JiraDirectorySynchroniser.java:52)
      	at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:153)
      	at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:118)
      	at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:97)
      	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:401)
      	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:396)
      	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:349)
      	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$1.consume(CaesiumSchedulerService.java:255)
      	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$1.consume(CaesiumSchedulerService.java:252)
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:65)
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:59)
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:34)
      	at java.lang.Thread.run(Thread.java:745)
      

      And there will be a number of Tomcat worker threads waiting on futures inside that cache (as per the attached screenshot). Essentially the sync is invalidating a cache that a large number of actions rely on and when it's invalidated those threads will block until it's repopulated.

      Workaround 

      (If using Active Directory)

      1. Set the JIRA system property:
        -Dcrowd.use.legacy.ad.incremental.sync=true
        

      This workaround will mean that users who no longer exist in AD but own content in JIRA will not be deleted from the cache on an INCREMENTAL sync only, thus not triggering this issue. A FULL sync will still be affected, however.

      Note on partial fix:

      We significantly reduced the performance problem by resolving issue -JRA-62742-, which was about skipping active user counting in cases when it was not required.

      After gathering feedback we decided to reopen this issue to look further into how can we improve user synchronisation performance. 

      Attachments

        1. sync.png
          sync.png
          81 kB
        2. Waiting on futures.jpg
          Waiting on futures.jpg
          627 kB

        Issue Links

          Activity

            People

              izinoviev Ilya Zinoviev (Inactive)
              vkharisma vkharisma (Inactive)
              Votes:
              19 Vote for this issue
              Watchers:
              50 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: