Massive amount of inter-node traffic to retrieve nested group membership

XMLWordPrintable

    • 2
    • Severity 3 - Minor
    • 40

      Issue Summary

      When the group nesting on the LDAP is too deep, Confluence cluster nodes send/receive massive amount of distributed cache for the nested group membership ending up with network congestion.

      This is reproducible on Data Center: yes

      Steps to Reproduce

      1. Setup a Confluence cluster that has 2 nodes, a Crowd instance, and an OpenLDAP server.
      2. Create some users and put them in a group which is part of many other nested groups in LDAP server. Sample python script:
        from ldap3 import Server, Connection, ALL, NTLM,SUBTREE
        server = Server('ldap:389', get_info=ALL)
        conn = Connection(server, user='cn=admin,dc=ldap,dc=atlassian,dc=internal', password='password', check_names=True, auto_bind=True)
        conn.add('ou=atlassian,dc=ldap,dc=atlassian,dc=internal', [ 'organizationalUnit', 'top'])
        
        users = [f'cn=ldapuser{i},ou=atlassian,dc=ldap,dc=atlassian,dc=internal' for i in range(1,100)]
        for index, value in enumerate(users):
            conn.add(value, ['inetOrgPerson', 'organizationalPerson', 'person', 'top'], {'sn': 'ldapuser' + str(index+1),"userPassword": 'password'})
        
        groups = [f'LDAPgroup-{i}' for i in range(1,3000)]
        for index, value in enumerate(groups):
            if index == 0:
                conn.add(f'cn={value},ou=atlassian,dc=ldap,dc=atlassian,dc=internal', [ 'groupOfUniqueNames', 'top'],{'cn': value, 'uniqueMember': users})
            else:
                conn.add(f'cn={value},ou=atlassian,dc=ldap,dc=atlassian,dc=internal', [ 'groupOfUniqueNames', 'top'],{'cn': value, 'uniqueMember': f'cn={groups[index-1]},ou=atlassian,dc=ldap,dc=atlassian,dc=internal'})
        
        conn.unbind()
        
      1. Add a directory in Crowd and associate it with Confluence application.
      2. Add a directory in Confluence and associate it with the above application in Crowd with nested group enabled.
      3. Synchronize the directory. This will create users that belongs to plenty of nested group.
        Sample:
      4. Log in to any user that is created in LDAP e.g. ldapuser1 in the above case
      5. Load any single page

      Expected Results

      No or small amount of inter-node hazelcast traffic to load just a single page. User can view a page without any impact on the performance.

      Actual Results

      Messages for CachedCrowdMembershipDao cache is send/receive pretty frequently among the node via Hazelcast. That can cause network congestion.
      When you run tcpdump command in any of the Confluence node, it would show tremendous amount of messages as below.

      > tcpdump -A | grep CachedCrowdMembershipCacheKey
       ...
      .7.....T......"...7............................".... '.....`...jatlassian-cache.Cache.com.atlassian.confluence.impl.user.crowd.CachedCrowdMembershipDao.GROUP_PARENT_CACHE................sr.Fcom.atlassian.confluence.impl.user.crowd.CachedCrowdMembershipCacheKey...........J..directoryIdL..namet..Ljava/lang/String;L..typet.SLcom/atlassian/confluence/impl/user/crowd/CachedCrowdMembershipCacheKey$MemberType;xp........t..ldapgroup-333~r.Qcom.atlassian.confluence.impl.user.crowd.CachedCrowdMembershipCacheKey$MemberType...........xr..java.lang.Enum...........xpt..GROUPS_FOR_GROUP......."
      ...
      

      The cache management screen suggests over 50,000+ hit/miss is recorded just loading a single page.

      Workaround

      Any of the below

      • Avoid or reduce using nested group.
      • If you are using Active Directory, consider flattening the nested group on Crowd utilizing the workaround suggested in https://jira.atlassian.com/browse/CWD-2082
      • Tweak the cache setting. Even though this is not recommended as it may break the consistency of the instance, you will be able to reduce the inter-node traffic by overriding the CachedCrowdMembershipCacheKey from distributed to hybrid by tweaking cache-settings-overrides.properties
        cache.replicateViaInvalidation.com.atlassian.confluence.impl.user.crowd.CachedCrowdMembershipDao.GROUP_CHILD_CACHE=true
        cache.replicateViaInvalidation.com.atlassian.confluence.impl.user.crowd.CachedCrowdMembershipDao.GROUP_PARENT_CACHE=true
        

              Assignee:
              Kenny MacLeod
              Reporter:
              Nobuyuki Mukai
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: