[CONFSERVER-22342] Synchronising LDAP/Crowd can completely fail because transactions are not properly rolled back in Hibernate2BatchProcessor

Type: Bug
Resolution: Fixed
Priority: Medium
Fix Version/s: 3.5.5
Affects Version/s: None
Component/s: None
Labels:

Bug Fix Policy:
View Atlassian Server bug fix policy

Hibernate2BatchProcessor#commitTransaction() clears the transaction from ThreadLocal before trying to flush the Hibernate session. If the flushing fails, rollbackTransaction() will not clear the session because the transaction cannot be found. After this the whole batch operation will fail, as the offending operation fails on every subsequent flush.

Directory synchronisation algorithm is not atomic, so it can sometimes try to add existing users to the database, which cause the session flushing to fail. AbstractBatchProcessor has logic to handle these failures gracefully, but it does not work in Confluence because the session is not cleared properly.

In a large Confluence instance the following behaviour could trigger this issue:

New directory is added with large amount of users
Sync is started, synchronisation algorithm finds that all users in the new directory need to be added and starts adding them
A user from the large user set logs in before the user sync is completed (This triggers user creation in the local instance)
Sync operation tries to add the user who was created in the previous step thus causing the flush to fail

At this stage no new users can be added as all flushes will fail. Membership synchronisation will proceed very slowly as some users have not been added, so batch operations will fall back to individual processing.

Patch

Attached is an updated version of atlassian-embedded-crowd-hibernate2 jar, to patch this issue. When patched, transactions will now correctly rollback, allowing the synchronisation to complete. All records in the rolled back transaction will be ignored until the next synchronisation attempt (or until the affected users log in).

It is known to work in Confluence 3.5.4, and might work in earlier versions, but these have not been tested. It is not needed in Confluence 3.5.6, as that version already contains this fix.

This patch also addresses ~~CONF-22631~~, so that any records that fail to synchronise are logged correctly. Users with Confluence 3.5.5 should install this patch to avoid that issue.

Installation

To install the patch:

Stop Confluence
Move the old atlassian-embedded-crowd-hibernate2 jar out of <confluence install dir>/confluence/WEB-INF/lib
Copy the new jar into the same directory
Start Confluence

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

CONF-22342__Testcase.patch
18/Apr/2011 4:37 AM
4 kB
Olli Nevalainen
atlassian-embedded-crowd-hibernate2-1.2.9-m3.jar
01/Jun/2011 4:20 AM
66 kB
Richard Atkins

causes

CONFSERVER-22631 Null pointer when detecting duplicate memberships while synchronising

Closed

CONFSERVER-22644 Synchronisation failures can affect more users or groups than necessary

Closed

is related to

CONFSERVER-22593 Active Directory user/membership updates lost due to duplicate entries until restart

Closed

Michael S added a comment - 06/Jul/2011 7:19 AM

It is known to work in Confluence 3.5.4, and might work in earlier versions, but these have not been tested.

A customer has reported that 3.5.4 was working for a while and then stopped. Sounds like the effects may not be seen straight away.

Michael S added a comment - 06/Jul/2011 7:19 AM It is known to work in Confluence 3.5.4, and might work in earlier versions, but these have not been tested. A customer has reported that 3.5.4 was working for a while and then stopped. Sounds like the effects may not be seen straight away.

Richard Atkins added a comment - 01/Jun/2011 4:20 AM

The previous patch (atlassian-embedded-crowd-hibernate2-1.2.9-m1.jar), had an issue that would prevent XML backups from restoring group memberships correctly. I've updated the patch to resolve this issue.

Richard Atkins added a comment - 01/Jun/2011 4:20 AM The previous patch (atlassian-embedded-crowd-hibernate2-1.2.9-m1.jar), had an issue that would prevent XML backups from restoring group memberships correctly. I've updated the patch to resolve this issue.

Richard Atkins added a comment - 30/May/2011 1:47 AM

We've opted to release 3.5.5 with this fix as is for now, but I'll also update the patch attached to this issue with the fix for ~~CONF-22631~~.

Richard Atkins added a comment - 30/May/2011 1:47 AM We've opted to release 3.5.5 with this fix as is for now, but I'll also update the patch attached to this issue with the fix for CONF-22631 .

Richard Atkins added a comment - 30/May/2011 1:08 AM

Thanks to Colin Goudie, we've found an issue with the patch that will cause a null pointer exception if a duplicate membership is detected while synchronising memberships. This exception will cause the synchronisation attempt to abort, preventing all memberships after the affected batch from being synchronised until the next synchronisation attempt. I'll attach an updated patch to fix this additional issue shortly.

Richard Atkins added a comment - 30/May/2011 1:08 AM Thanks to Colin Goudie, we've found an issue with the patch that will cause a null pointer exception if a duplicate membership is detected while synchronising memberships. This exception will cause the synchronisation attempt to abort, preventing all memberships after the affected batch from being synchronised until the next synchronisation attempt. I'll attach an updated patch to fix this additional issue shortly.

Richard Atkins added a comment - 27/May/2011 2:32 AM - edited

(Comment deleted, obsolete)

Richard Atkins added a comment - 27/May/2011 2:32 AM - edited (Comment deleted, obsolete)

Matt Ryall added a comment - 18/Apr/2011 8:57 AM

An IM comment from Olli:

I wrote a test, and I think I understand it now. In Hibernate2BatchProsessor line 107 transaction is forgotten, so when flush on line 108 fails, rollbackTransaction() in line 121 becomes a no-op and so fails to clear the session

Sounds like it's pretty straightforward to fix.

Matt Ryall added a comment - 18/Apr/2011 8:57 AM An IM comment from Olli: I wrote a test, and I think I understand it now. In Hibernate2BatchProsessor line 107 transaction is forgotten, so when flush on line 108 fails, rollbackTransaction() in line 121 becomes a no-op and so fails to clear the session Sounds like it's pretty straightforward to fix.

Olli Nevalainen added a comment - 18/Apr/2011 4:37 AM

Attached a test case that triggers this problem.

Olli Nevalainen added a comment - 18/Apr/2011 4:37 AM Attached a test case that triggers this problem.

Details

Description

Patch

Installation

Attachments

Attachments

Issue Links

Forms

Activity

Collapse comment: Michael S added a comment - 06/Jul/2011 7:19 AM

Expand comment: Michael S added a comment - 06/Jul/2011 7:19 AM

Collapse comment: Richard Atkins added a comment - 01/Jun/2011 4:20 AM

Expand comment: Richard Atkins added a comment - 01/Jun/2011 4:20 AM

Collapse comment: Richard Atkins added a comment - 30/May/2011 1:47 AM

Expand comment: Richard Atkins added a comment - 30/May/2011 1:47 AM

Collapse comment: Richard Atkins added a comment - 30/May/2011 1:08 AM

Expand comment: Richard Atkins added a comment - 30/May/2011 1:08 AM

Collapse comment: Richard Atkins added a comment - 27/May/2011 2:32 AM, Edited by Richard Atkins - 30/May/2011 6:22 AM

Expand comment: Richard Atkins added a comment - 27/May/2011 2:32 AM, Edited by Richard Atkins - 30/May/2011 6:22 AM

Collapse comment: Matt Ryall added a comment - 18/Apr/2011 8:57 AM

Expand comment: Matt Ryall added a comment - 18/Apr/2011 8:57 AM

Collapse comment: Olli Nevalainen added a comment - 18/Apr/2011 4:37 AM

Expand comment: Olli Nevalainen added a comment - 18/Apr/2011 4:37 AM

People

Dates

Time Tracking