Uploaded image for project: 'Crowd Data Center'
  1. Crowd Data Center
  2. CWD-3799

Performance Problems due to O(n) group lookups and other perfomance problems

    • Our product teams collect and evaluate feedback from a number of different sources. To learn more about how we use customer feedback in the planning process, check out our new feature policy.

      Our customer faces quite a big (sometimes large) performance problem when synchronizing their Confluence against their Crowd. A full sync (which sometimes happens although incremental sync is enabled) takes 400-600s on an idle, production-like (i.e. powerful) test system, and can take more than an hour in production.
      During most of the time, there are no requests to crowd, and only one thread is using a CPU core (100%), there are no I/O waits.

      Thread dumps show that the culprit is com.atlassian.crowd.directory.DbCachingRemoteChangeOperations.findUserMembershipForGroupChanges

      This method shows a few weaknesses:

      • contains lookups are performed that are O(n)
      • ~n lookups are performed, so we have O(n^2)
        first example is internalMembers:
        List<String> internalMembers = ...
        ...
        for (String remoteUser : remoteUsers)
        {
            if (!internalMembers.contains(remoteUser))
                usersToAdd.add(remoteUser);
        }
        

        second is remoteUsers:

        ... Collection<String> remoteUsers ...
        ...
        remoteUsers = Collections2.transform(remoteUsers, IdentifierUtils.TO_LOWER_CASE);
        ...
        for (String internalUser : internalMembers)
        {
            if (!remoteUsers.contains(internalUser))
                usersToRemove.add(internalUser);
        }
        
      • the contains lookups are performed against a life view made with Collections2.transform resp. Lists.transform, although
        • the documentation of Collections2.transform states that

          When a live view is not needed, it may be faster to copy the transformed collection and use the copy.

        • the documentation of Lists.transform even states

          The function is applied lazily, invoked when needed. This is necessary for the returned list to be a view, but it means that the function will be applied many times for bulk operations like List.contains(java.lang.Object)

        • which is exactly what is done here
        • so we are not only dealing with O(n^2) comparisons, but also with n^2 invocations of toLowerCase, while only n would be needed

      So I did what Google suggests, copy both collections into HashSets and use them for both iteration and contains lookup.

      Result:
      On the above mentioned test system, the sync takes about 15 seconds (30 times speedup).

      Patch is attached, it was taken from Confluence 5.1.5 (Crowd 2.6.2), but applies cleanly against Crowd 2.7.0 (Confluence 5.4.2), too.

            [CWD-3799] Performance Problems due to O(n) group lookups and other perfomance problems

            Katherine Yabut made changes -
            Workflow Original: JAC Suggestion Workflow [ 3388593 ] New: JAC Suggestion Workflow 3 [ 3630693 ]
            Status Original: RESOLVED [ 5 ] New: Closed [ 6 ]
            Monique Khairuliana (Inactive) made changes -
            Workflow Original: Simplified Crowd Development Workflow v2 [ 1390472 ] New: JAC Suggestion Workflow [ 3388593 ]
            Issue Type Original: Improvement [ 4 ] New: Suggestion [ 10000 ]
            vkharisma made changes -
            Link New: This issue is related to CONFCLOUD-32661 [ CONFCLOUD-32661 ]
            Owen made changes -
            Workflow Original: Crowd Development Workflow v2 [ 623383 ] New: Simplified Crowd Development Workflow v2 [ 1390472 ]
            Patryk made changes -
            Fix Version/s New: 2.8.3 [ 53696 ]
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            Patryk added a comment -

            Hello,

            The issue has been resolved in Crowd 2.8.3, but was tracked as CWD-4249.

            Best regards,
            Patryk Petrowski

            Patryk added a comment - Hello, The issue has been resolved in Crowd 2.8.3, but was tracked as CWD-4249 . Best regards, Patryk Petrowski
            Patryk made changes -
            Link Original: This issue is related to CWD-4249 [ CWD-4249 ]
            Patryk made changes -
            Link New: This issue is duplicated by CWD-4249 [ CWD-4249 ]
            Lukasz Pater made changes -
            Remote Link New: This issue links to "Page (Extranet)" [ 170359 ]

            Thank you, but I'm not looking to apply modifications to the source and I doubt my customer is either, thank you for the tip.

            Steven F Behnke added a comment - Thank you, but I'm not looking to apply modifications to the source and I doubt my customer is either, thank you for the tip.

              Unassigned Unassigned
              101ec24b4e2e Martin Sander
              Votes:
              32 Vote for this issue
              Watchers:
              30 Start watching this issue

                Created:
                Updated:
                Resolved: