Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Low
Fix Version/s: None
Affects Version/s: 5.3.2
Component/s: Directories
Labels:
None

Support reference count:
2
Symptom Severity:
Severity 3 - Minor
UIS:
1

Issue Summary

When a node fails to release the lock on the directory synchronisation process it initially started, the synchronisation process can continue running in the background. This issue can occur if the node holding the lock becomes non-operational or if its ID changes.

As a result, the synchronisation process remains active but can't be completed properly with the below messages in the application logs.

2024-07-12 15:10:00,103 Caesium-2-1 INFO [crowd.manager.directory.FailedSynchronisationManagerImpl] Found 1 stalled synchronisations for directories [ [xxxxxx] ]. Rescheduling them to run again

2024-07-12 15:15:00,026 Caesium-2-4 INFO [crowd.manager.directory.FailedSynchronisationManagerImpl] Found 1 stalled synchronisations for directories [ [xxxxx] ]. Rescheduling them to run again

There might be other locks in the cwd_cluster table which might be affected due to a similar situation.

This is reproducible on Data Center: Yes

Steps to Reproduce

Start a Crowd cluster with two nodes and begin the synchronisation for the User Directory on one of the nodes.
Bring down the node on which the User Directory synchronisation was started leaving only one available node in the cluster.
The inactive node will hold the lock for the directory synchronisation in the below table(cwd_cluster_lock) and synchronisation will remain in progress :

select * from cwd_cluster_lock where lock_name like '%com.atlassian.crowd.embedded.api.Directory:DIRECTORY_ID%'; // Replace DIRECTORY_ID with the directory id

select * from cwd_synchronisation_status where directory_id = 'xxxxx';

Expected Results

Once the node is inactive or if the node ID is changed, the directory synchronisation lock should be released and the User Directory synchronisation should continue.

Actual Results

The below message is thrown in the Crowd application logs(atlassian-crowd.log) and the User Directory synchronisation is stuck :

2024-07-12 15:10:00,103 Caesium-2-1 INFO [crowd.manager.directory.FailedSynchronisationManagerImpl] Found 1 stalled synchronisations for directories [ [xxxxxx] ]. Rescheduling them to run again 
2024-07-12 15:15:00,026 Caesium-2-4 INFO [crowd.manager.directory.FailedSynchronisationManagerImpl] Found 1 stalled synchronisations for directories [ [xxxxx] ]. Rescheduling them to run again

Workaround

Stop one of the nodes and update the node ID in the below tables to reflect the current/live node in the database tables which are part of the cluster followed by starting the node :

update cwd_cluster_lock set node_id = '*******' where lock_name = 'com.atlassian.crowd.embedded.api.Directory:DIRECTORY_ID'; // Replace DIRECTORY_ID with the directory id  

update cwd_synchronisation_status set node_id = '*******' where directory_id = 'xxxxxx';

is related to

CWD-5923 Release lock for backup process in cwd_cluster_lock if assigned to an invalid node ID

Long Term Backlog

mentioned in: Page Loading...; Page Loading...

Assignee:: Unassigned
Reporter:: Nitin Rastogi
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: 26/Aug/2024 2:52 PM
Updated:: 17/Jun/2025 4:05 AM

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Issue Links

Activity

People

Dates