[CONFSERVER-43156] Sessions are never cleared during LDAP user directory sync

Type: Bug
Resolution: Fixed
Priority: Medium
Fix Version/s: 6.1.0
Affects Version/s: 5.8.18, 5.10.0
Component/s: Server - Authentication
Labels:

Support reference count:
3
Symptom Severity:
Severity 2 - Major
Bug Fix Policy:
View Atlassian Server bug fix policy

Summary

Confluence does not clear its session during a full sync, making Hibernate track a gigantic pool of objects, i.e. synchronizing a large LDAP is virtually impossible; since it takes days to complete (even with very good hardware).
With more than 60K users the synchronization can take up to 5 days. But with flushing session and clearing it, the synchronization can be completed in 10 minutes.

Environment

All Confluence versions with crowd-api-2.8.3
All supported databases are impacted

Steps to Reproduce

Install Confluence 5.8 and set up connection to large LDAP instance
Perform an upgrade to 5.10

Expected Results

Confluence is able to finish the synchronization in couple hours

Actual Results

The synchronization is taking up to 6 days

Workaround

No verified workaround available.
bjarne.holen843903303 proposed to use session.flush(), session.clear() via. the HibernateTemplate after each user was synchronized.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

com.atlassian.crowd.embedded.hibernate2.HibernateUserDao.java.diff
07/Jul/2016 12:51 PM
0.1 kB
Anton Shaleev
jvmtop_2016_07_15.png
15/Jul/2016 1:56 PM
508 kB
bjarne holen

is incorporated by

CONFSERVER-44221 Slow user synchronisation from Crowd directory

Closed

is related to

CONFSERVER-45698 Confluence 5.9+ much slower to sync users and remove users from groups compared to 5.8

Closed

mentioned in: Page Failed to load

Minh Tran added a comment - 19/Mar/2017 11:27 PM

A fix for this issue is now available for Confluence Server customers.
Upgrade now or check out the Release Notes to see what other issues are resolved.

Minh Tran added a comment - 19/Mar/2017 11:27 PM A fix for this issue is now available for Confluence Server customers. Upgrade now or check out the Release Notes to see what other issues are resolved.

Richard Atkins added a comment - 17/Mar/2017 12:02 AM

Resolved while fixing ~~CONF-44221~~, through additional session flush and clear invocations around the crowd event publisher

Richard Atkins added a comment - 17/Mar/2017 12:02 AM Resolved while fixing CONF-44221 , through additional session flush and clear invocations around the crowd event publisher

bjarne holen added a comment - 24/Jan/2017 1:59 PM

This seems to be the exact same issue:

https://jira.atlassian.com/browse/CONF-44221

bjarne holen added a comment - 24/Jan/2017 1:59 PM This seems to be the exact same issue: https://jira.atlassian.com/browse/CONF-44221

bjarne holen added a comment - 13/Sep/2016 12:42 PM

Possible workaround

We have applied LDAP filtering to synchronize smaller sections of the organization at a time, i.e. starting with all users matching a (uid=a*), then adding more and more users by combining this filter, i.e. (|(uid=a*)(uid=b*)) etc. until we have populated all the users. For our organization this can be done within a day or so.

Also a property which may be related to this popped up in a search of mine, which is perhaps related to this as well -Dcrowd.use.legacy.ad.incremental.sync=true, found here:

https://confluence.atlassian.com/confkb/confluence-incremental-synchronisation-failed-and-falls-back-to-a-full-sync-when-connecting-to-ldap-812320145.html

bjarne holen added a comment - 13/Sep/2016 12:42 PM Possible workaround We have applied LDAP filtering to synchronize smaller sections of the organization at a time, i.e. starting with all users matching a (uid=a*), then adding more and more users by combining this filter, i.e. (|(uid=a*)(uid=b*)) etc. until we have populated all the users. For our organization this can be done within a day or so. Also a property which may be related to this popped up in a search of mine, which is perhaps related to this as well -Dcrowd.use.legacy.ad.incremental.sync=true, found here: https://confluence.atlassian.com/confkb/confluence-incremental-synchronisation-failed-and-falls-back-to-a-full-sync-when-connecting-to-ldap-812320145.html

bjarne holen added a comment - 07/Jul/2016 1:42 PM - edited

It should be noted that we are upgrading from 4.3.7 to 5.8.18, (we initially tried to upgrade to 5.10, but the lack of support for Oracle 11.2 made us switch to 5.8.18), but I'm pretty sure this affects all versions of Confluence in terms of synchronizations from LDAP, we had identical problems when we tried our upgrade to Confluence 5.10.

The problem is related to when Hibernate session's are cleared (i.e. flushed and emptied). The easiest way to see the issue "live", is to start a full synchronization on an LDAP directory while the net.sf.hibernate.impl.SessionImpl has logging set to DEBUG, it should be clear that 99.99% of the time consumption stems from looking for dirty (i.e. changed) values inside the same massive amount of objects stored in the session that never gets synchronized.

It should be noted that this "patch" does not solve the issue fully in terms of doing an upgrade from one version to another, as there are other bottlenecks as well, but there could be numerous places where Hibernate/Confluence has the same issue I guess.

Support ticket for reference:

https://support.atlassian.com/servicedesk/customer/portal/14/CSP-178586

Uploaded a picture of the call stack on a running synchronization task (with the patch in place, i.e. updateUsers is now ok, but there are similar problems for other jobs). It now finishes in roughly 25 hours, almost all the time is spent by Hibernate looking for changes in the cache.

The time consuming part now is located at:
com.atlassian.confluence.user.persistence.dao.hibernate.HibernatePersonalInformationDao.getByUser, on the call to findNamedQueryStringParam

bjarne holen added a comment - 07/Jul/2016 1:42 PM - edited It should be noted that we are upgrading from 4.3.7 to 5.8.18, (we initially tried to upgrade to 5.10, but the lack of support for Oracle 11.2 made us switch to 5.8.18), but I'm pretty sure this affects all versions of Confluence in terms of synchronizations from LDAP, we had identical problems when we tried our upgrade to Confluence 5.10. The problem is related to when Hibernate session's are cleared (i.e. flushed and emptied). The easiest way to see the issue "live", is to start a full synchronization on an LDAP directory while the net.sf.hibernate.impl.SessionImpl has logging set to DEBUG , it should be clear that 99.99% of the time consumption stems from looking for dirty (i.e. changed) values inside the same massive amount of objects stored in the session that never gets synchronized. It should be noted that this "patch" does not solve the issue fully in terms of doing an upgrade from one version to another, as there are other bottlenecks as well, but there could be numerous places where Hibernate/Confluence has the same issue I guess. Support ticket for reference: https://support.atlassian.com/servicedesk/customer/portal/14/CSP-178586 Uploaded a picture of the call stack on a running synchronization task (with the patch in place, i.e. updateUsers is now ok, but there are similar problems for other jobs). It now finishes in roughly 25 hours, almost all the time is spent by Hibernate looking for changes in the cache. The time consuming part now is located at: com.atlassian.confluence.user.persistence.dao.hibernate.HibernatePersonalInformationDao.getByUser, on the call to findNamedQueryStringParam

Confluence Data Center

Details

Description

Summary

Environment

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Attachments

Issue Links

Forms

Activity

Collapse comment: Minh Tran added a comment - 19/Mar/2017 11:27 PM

Expand comment: Minh Tran added a comment - 19/Mar/2017 11:27 PM

Collapse comment: Richard Atkins added a comment - 17/Mar/2017 12:02 AM

Expand comment: Richard Atkins added a comment - 17/Mar/2017 12:02 AM

Collapse comment: bjarne holen added a comment - 24/Jan/2017 1:59 PM

Expand comment: bjarne holen added a comment - 24/Jan/2017 1:59 PM

Collapse comment: bjarne holen added a comment - 13/Sep/2016 12:42 PM

Possible workaround

Expand comment: bjarne holen added a comment - 13/Sep/2016 12:42 PM

Collapse comment: bjarne holen added a comment - 07/Jul/2016 1:42 PM, Edited by bjarne holen - 15/Jul/2016 2:05 PM

Expand comment: bjarne holen added a comment - 07/Jul/2016 1:42 PM, Edited by bjarne holen - 15/Jul/2016 2:05 PM

People

Dates