Loading...

XML

Word

Printable

Type: Suggestion
Resolution: Fixed
Fix Version/s: 2.8.3
Component/s: Directory - Remote Crowd, Embedded, Integration - Confluence
Labels:
- pse-request
Environment:
Tested in Atlassian Confluence 5.1.5 Standalone
synchronizing against Crowd 2.6.4
large userbase (~ 30000 users, 3 large groups containging 27000, 20000, 18000 users)

Our customer faces quite a big (sometimes large) performance problem when synchronizing their Confluence against their Crowd. A full sync (which sometimes happens although incremental sync is enabled) takes 400-600s on an idle, production-like (i.e. powerful) test system, and can take more than an hour in production.
During most of the time, there are no requests to crowd, and only one thread is using a CPU core (100%), there are no I/O waits.

Thread dumps show that the culprit is com.atlassian.crowd.directory.DbCachingRemoteChangeOperations.findUserMembershipForGroupChanges

This method shows a few weaknesses:

contains lookups are performed that are O(n)

~n lookups are performed, so we have O(n^2)
first example is internalMembers:

List<String> internalMembers = ...
...
for (String remoteUser : remoteUsers)
{
    if (!internalMembers.contains(remoteUser))
        usersToAdd.add(remoteUser);
}

second is remoteUsers:

... Collection<String> remoteUsers ...
...
remoteUsers = Collections2.transform(remoteUsers, IdentifierUtils.TO_LOWER_CASE);
...
for (String internalUser : internalMembers)
{
    if (!remoteUsers.contains(internalUser))
        usersToRemove.add(internalUser);
}

the contains lookups are performed against a life view made with Collections2.transform resp. Lists.transform, although
- the documentation of Collections2.transform states that
  When a live view is not needed, it may be faster to copy the transformed collection and use the copy.
- the documentation of Lists.transform even states
  The function is applied lazily, invoked when needed. This is necessary for the returned list to be a view, but it means that the function will be applied many times for bulk operations like List.contains(java.lang.Object)
- which is exactly what is done here
- so we are not only dealing with O(n^2) comparisons, but also with n^2 invocations of toLowerCase, while only n would be needed

So I did what Google suggests, copy both collections into HashSets and use them for both iteration and contains lookup.

Result:
On the above mentioned test system, the sync takes about 15 seconds (30 times speedup).

Patch is attached, it was taken from Confluence 5.1.5 (Crowd 2.6.2), but applies cleanly against Crowd 2.7.0 (Confluence 5.4.2), too.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

sync_performance.diff
4 kB
12/Feb/2014 4:51 PM

is duplicated by

CWD-4249 DbCachingRemoteChangeOperations methods to find membership changes use List.contains check inside a loop

Closed

is related to

CONFSERVER-32661 (Full) Crowd Sync is very slow

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...

Assignee:: Unassigned
Reporter:: Martin Sander
Votes:: 32 Vote for this issue
Watchers:: 30 Start watching this issue

Created:: 12/Feb/2014 4:51 PM
Updated:: 19/Sep/2019 5:51 AM
Resolved:: 26/Apr/2016 2:27 PM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates