[CWD-3799] Performance Problems due to O(n) group lookups and other perfomance problems

Type: Suggestion
Resolution: Fixed
Fix Version/s: 2.8.3
Component/s: Directory - Remote Crowd, Embedded, Integration - Confluence
Labels:
- pse-request
Environment:
Tested in Atlassian Confluence 5.1.5 Standalone
synchronizing against Crowd 2.6.4
large userbase (~ 30000 users, 3 large groups containging 27000, 20000, 18000 users)

Feedback Policy:

Our product teams collect and evaluate feedback from a number of different sources. To learn more about how we use customer feedback in the planning process, check out our new feature policy.

Our customer faces quite a big (sometimes large) performance problem when synchronizing their Confluence against their Crowd. A full sync (which sometimes happens although incremental sync is enabled) takes 400-600s on an idle, production-like (i.e. powerful) test system, and can take more than an hour in production.
During most of the time, there are no requests to crowd, and only one thread is using a CPU core (100%), there are no I/O waits.

Thread dumps show that the culprit is com.atlassian.crowd.directory.DbCachingRemoteChangeOperations.findUserMembershipForGroupChanges

This method shows a few weaknesses:

contains lookups are performed that are O(n)

~n lookups are performed, so we have O(n^2)
first example is internalMembers:

List<String> internalMembers = ...
...
for (String remoteUser : remoteUsers)
{
    if (!internalMembers.contains(remoteUser))
        usersToAdd.add(remoteUser);
}

second is remoteUsers:

... Collection<String> remoteUsers ...
...
remoteUsers = Collections2.transform(remoteUsers, IdentifierUtils.TO_LOWER_CASE);
...
for (String internalUser : internalMembers)
{
    if (!remoteUsers.contains(internalUser))
        usersToRemove.add(internalUser);
}

the contains lookups are performed against a life view made with Collections2.transform resp. Lists.transform, although
- the documentation of Collections2.transform states that
  When a live view is not needed, it may be faster to copy the transformed collection and use the copy.
- the documentation of Lists.transform even states
  The function is applied lazily, invoked when needed. This is necessary for the returned list to be a view, but it means that the function will be applied many times for bulk operations like List.contains(java.lang.Object)
- which is exactly what is done here
- so we are not only dealing with O(n^2) comparisons, but also with n^2 invocations of toLowerCase, while only n would be needed

So I did what Google suggests, copy both collections into HashSets and use them for both iteration and contains lookup.

Result:
On the above mentioned test system, the sync takes about 15 seconds (30 times speedup).

Patch is attached, it was taken from Confluence 5.1.5 (Crowd 2.6.2), but applies cleanly against Crowd 2.7.0 (Confluence 5.4.2), too.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

sync_performance.diff
4 kB
12/Feb/2014 4:51 PM

is duplicated by

CWD-4249 DbCachingRemoteChangeOperations methods to find membership changes use List.contains check inside a loop

Closed

is related to

CONFSERVER-32661 (Full) Crowd Sync is very slow

Closed

mentioned in: Page Failed to load; Page Failed to load; Page Failed to load

Patryk added a comment - 26/Apr/2016 2:27 PM

Hello,

The issue has been resolved in Crowd 2.8.3, but was tracked as ~~CWD-4249~~.

Best regards,
Patryk Petrowski

Patryk added a comment - 26/Apr/2016 2:27 PM Hello, The issue has been resolved in Crowd 2.8.3, but was tracked as CWD-4249 . Best regards, Patryk Petrowski

Steven F Behnke added a comment - 05/Nov/2015 2:59 PM

Thank you, but I'm not looking to apply modifications to the source and I doubt my customer is either, thank you for the tip.

Steven F Behnke added a comment - 05/Nov/2015 2:59 PM Thank you, but I'm not looking to apply modifications to the source and I doubt my customer is either, thank you for the tip.

Martin Sander added a comment - 05/Nov/2015 10:56 AM

Hey Steve, this patch is not provided by Atlassian at all, it is just an improvement suggestion by me.

You have to apply it to the crowd sources and compile them yourself. If you don't know how to do it, a friendly Atlassian Expert will gladly assist you in exchange for some cash.

I don't work for an Atlassian Expert company anymore, so unfortunately you cannot hire me to do it .

Martin Sander added a comment - 05/Nov/2015 10:56 AM Hey Steve, this patch is not provided by Atlassian at all, it is just an improvement suggestion by me. You have to apply it to the crowd sources and compile them yourself. If you don't know how to do it, a friendly Atlassian Expert will gladly assist you in exchange for some cash. I don't work for an Atlassian Expert company anymore, so unfortunately you cannot hire me to do it .

Steve Behnke [DiscoverEquip.com] added a comment - 04/Nov/2015 8:39 PM

Is this patch provided in new installers or is it still provided on a support basis?

Steve Behnke [DiscoverEquip.com] added a comment - 04/Nov/2015 8:39 PM Is this patch provided in new installers or is it still provided on a support basis?

Martin Sander added a comment - 28/Sep/2015 11:48 AM

The patch is attached to this issue: sync_performance.diff. Beware that it needs to be applied to the Crowd sources, not the JIRA/Confluence sources. It can be done with just the source jar and the pom from the maven repository, but it involves some manual effort.

Martin Sander added a comment - 28/Sep/2015 11:48 AM The patch is attached to this issue: sync_performance.diff . Beware that it needs to be applied to the Crowd sources, not the JIRA/Confluence sources. It can be done with just the source jar and the pom from the maven repository, but it involves some manual effort.

Tom Bell added a comment - 23/Sep/2015 4:32 PM

We're seeing long sync times between Confluence and Crowd here as well with roughly 10k users. Our syncs are only taking about 30 minutes each compared to 24 hours for others but that still seems way too long when waiting for simple group adjustments in Crowd to propagate down to Confluence. Our Jira service syncs with Crowd in under 1 minute. Can someone from the Confluence team talk to the Jira team to see how they've managed to keep the Crowd sync times reasonable? Meanwhile where is this patch discussed above?

Thanks!

Tom Bell added a comment - 23/Sep/2015 4:32 PM We're seeing long sync times between Confluence and Crowd here as well with roughly 10k users. Our syncs are only taking about 30 minutes each compared to 24 hours for others but that still seems way too long when waiting for simple group adjustments in Crowd to propagate down to Confluence. Our Jira service syncs with Crowd in under 1 minute. Can someone from the Confluence team talk to the Jira team to see how they've managed to keep the Crowd sync times reasonable? Meanwhile where is this patch discussed above? Thanks!

David Adam added a comment - 07/May/2015 9:12 AM

It would be really great to see if Atlassian is going to invest some time on this. Possibly, this bug affects synchronization from a remote LDAP directory as well. I've been in touch with people who own a huge Active Directory (70k+ users, 100k+ groups) and they say that snychronization usually takes more than 24 hours. This sync time seems to be ridicululous, as the AD DB size is approximately 6GB only.

David Adam added a comment - 07/May/2015 9:12 AM It would be really great to see if Atlassian is going to invest some time on this. Possibly, this bug affects synchronization from a remote LDAP directory as well. I've been in touch with people who own a huge Active Directory (70k+ users, 100k+ groups) and they say that snychronization usually takes more than 24 hours. This sync time seems to be ridicululous, as the AD DB size is approximately 6GB only.

David Yu added a comment - 08/Apr/2015 5:15 PM

This is a great low hanging fruit to pick! We see fresh new syncs of 1,400 seconds. Not as bad as the folks above but definitely noticeable when you're restarting or recovering a system.

David Yu added a comment - 08/Apr/2015 5:15 PM This is a great low hanging fruit to pick! We see fresh new syncs of 1,400 seconds. Not as bad as the folks above but definitely noticeable when you're restarting or recovering a system.

Martin Sander added a comment - 07/Apr/2015 12:28 PM

acsjira@honeywell.com: You have to apply the patch to the crowd sources. If you have a crowd license, download the sources via my.atlassian.com.

You then have to switch out the crowd-core jar in your application.

Martin Sander added a comment - 07/Apr/2015 12:28 PM acsjira@honeywell.com : You have to apply the patch to the crowd sources. If you have a crowd license, download the sources via my.atlassian.com. You then have to switch out the crowd-core jar in your application.

Honeywell JIRA Admin added a comment - 07/Apr/2015 6:50 AM

We are facing similar issue in JIRA and confluence.

Where I can apply this patch?

Honeywell JIRA Admin added a comment - 07/Apr/2015 6:50 AM We are facing similar issue in JIRA and confluence. Where I can apply this patch?

Assignee:: Unassigned

Reporter:: Martin Sander

Votes:: 32 Vote for this issue

Watchers:: 30 Start watching this issue

Created:: 12/Feb/2014 4:51 PM

Updated:: 19/Sep/2019 5:51 AM

Resolved:: 26/Apr/2016 2:27 PM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

Collapse comment: Patryk added a comment - 26/Apr/2016 2:27 PM

Expand comment: Patryk added a comment - 26/Apr/2016 2:27 PM

Collapse comment: Steven F Behnke added a comment - 05/Nov/2015 2:59 PM

Expand comment: Steven F Behnke added a comment - 05/Nov/2015 2:59 PM

Collapse comment: Martin Sander added a comment - 05/Nov/2015 10:56 AM

Expand comment: Martin Sander added a comment - 05/Nov/2015 10:56 AM

Collapse comment: Steve Behnke [DiscoverEquip.com] added a comment - 04/Nov/2015 8:39 PM

Expand comment: Steve Behnke [DiscoverEquip.com] added a comment - 04/Nov/2015 8:39 PM

Collapse comment: Martin Sander added a comment - 28/Sep/2015 11:48 AM

Expand comment: Martin Sander added a comment - 28/Sep/2015 11:48 AM

Collapse comment: Tom Bell added a comment - 23/Sep/2015 4:32 PM

Expand comment: Tom Bell added a comment - 23/Sep/2015 4:32 PM

Collapse comment: David Adam added a comment - 07/May/2015 9:12 AM

Expand comment: David Adam added a comment - 07/May/2015 9:12 AM

Collapse comment: David Yu added a comment - 08/Apr/2015 5:15 PM

Expand comment: David Yu added a comment - 08/Apr/2015 5:15 PM

Collapse comment: Martin Sander added a comment - 07/Apr/2015 12:28 PM

Expand comment: Martin Sander added a comment - 07/Apr/2015 12:28 PM

Collapse comment: Honeywell JIRA Admin added a comment - 07/Apr/2015 6:50 AM

Expand comment: Honeywell JIRA Admin added a comment - 07/Apr/2015 6:50 AM

People

Dates