Uploaded image for project: 'Crowd Data Center'
  1. Crowd Data Center
  2. CWD-5098

Directory syncs with large cache changes can take a very long time to complete

    XMLWordPrintable

Details

    Description

      Problem

      After an LDAP directory has already synced and users/groups/memberships have been cached to the database, subsequent syncs can take an extremely long amount of time to complete if there's a large change in user management data. This can be caused by a misconfiguration, e.g. a directory which pulled in too many users initially, and has been adjusted afterwards to pull in less. Or, it can simply be due to large changes on the LDAP server itself.

      This has the potential to tie up CPU and database resources and can cause performance issues in the application.

      Example/Steps to Reproduce

      1. Configure an LDAP directory with 50000 users and 10000 groups, with an average of 100 group memberships per person
      2. Allow the directory to be synced to the database
      3. Subsequently, edit the LDAP directory in a way such that the user search filter now only pulls in 1000 users.

      This will result in the directory needing to delete 49000 users from cwd_users. To do that, it also needs to look up all of the memberships of each of the 49000 users from the cwd_membership table and issue DELETE statements to the database to remove those. In the logs and the UI, there will be some indication that a massive amount of users are being removed:

      2018-03-26 17:51:06,822 INFO [Caesium-1-3] [atlassian.crowd.directory.DbCachingRemoteChangeOperations] deleteCachedUsersByName deleting [ 49000 ] users
      

      This process will likely take a long time to complete.

      Suggestions

      It would actually be quicker for Crowd to sync "from scratch" in this case instead of reconciling the cached dataset in the database. The act of having to work with a huge delta is expensive and taxing on the database. In other words, we can improve sync time in these cases by simply adding 1000 users from zero instead of trying to delete 49000 users from 50000. This will require the directory to perform some pre-sync statistics gathering, so that the directory can decide whether to use the "from scratch" approach or go with the original approach.

      Additionally/alternatively, in the case of large deletes, we might be able to improve the way some of the changes are handled from the database perspective. For example, if the user to be removed is in 100 groups, the sync will first need to delete these 100 memberships. Currently, it will issue 100 individual DELETE statements using the primary key column of the cwd_membership table ("id"):

      2018-03-26 18:51:06,822 DEBUG [Caesium-1-1] [org.hibernate.SQL] logStatement delete from cwd_membership where id=?
      2018-03-26 18:51:06,822 TRACE [Caesium-1-1] [type.descriptor.sql.BasicBinder] bind binding parameter [1] as [BIGINT] - [294934]
      

      It should be more efficient to look up all of the membership the rows associated with the user ID in one go, using the child_id column, and issue a single DELETE statement that way.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rchang Robert Chang
              Votes:
              7 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: