-
Bug
-
Resolution: Fixed
-
High
-
7.13.0, 7.6.9, 8.3.1, 7.13.12, 8.13.0
-
7.06
-
122
-
Severity 2 - Major
-
120
-
-
Issue Summary
Data inconsistency result in users not able to login. There is a number of related cases when for all of them the root cause isn’t found, for most of them DB manipulation/refreshing cache/re-adding directory and syncing it solved the problem.
This results in users unable to login, but flows/actions that could result in this state are plenty of. Mainly those are user login, directory Synchronisation.
For some customers that works only temporary so the workaround is done over and over again.
Inconsistency exists in DB between cwd_membership and cwd_user tables.
The most probable reason of it is cache corruption. It was confirmed in 1 case.
There is a KnowledgeBase article for that, number of cases with similar symptoms and the same workaround steps.
https://confluence.atlassian.com/jirakb/ldap-users-and-groups-display-unexpectedly-in-jira-server-776799341.html
The biggest problem with pointing out the root cause is that problem show off with the delay(like a couple of months later), when there are no available logs providing information how data inconsistency was created (due to failed CRUD operation, or cache corruption -> that's why there are different workaround - manual DB update / new user-directory + resync & cache refresh)
Steps to Reproduce
- Problem is not yet reproducible
- On affected/corrupted instances users are not able to login
Expected Results
An active user is able to login. If a problem occurs, sync AD with Jira solves the problem.
-> There is no data inconsistency
Actual Results
The below exception is thrown in the Atlassian-jira.log file:
... 2019-11-11 21:10:26,238 http-nio-9280-exec-17 ERROR [c.a.j.web.servlet.InternalServerErrorServlet] {errorId=a0e52747-b31d-4200-a2ab-5da0eedddf5d, interpretedMsg=, cause=com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericEntityException: while inserting: [GenericEntity:Membership][lowerChildName,atariq][membershipType,GROUP_USER][parentName,SG-NETACCESS-CORP][childName,atariq][directoryId,10000][id,571581][childId,78974][lowerParentName,sg-netaccess-corp][parentId,14392] (SQL Exception while executing the following:INSERT INTO cwd_membership (ID, parent_id, child_id, membership_type, group_type, parent_name, lower_parent_name, child_name, lower_child_name, directory_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) (Duplicate entry '14392-78974-GROUP_USER' for key 'uk_mem_parent_child_type')), stacktrace=com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericEntityException: while inserting: [GenericEntity:Membership][lowerChildName,atariq][membershipType,GROUP_USER][parentName,SG-NETACCESS-CORP][childName,atariq][directoryId,10000][id,571581][childId,78974][lowerParentName,sg-netaccess-corp][parentId,14392] (SQL Exception while executing the following:INSERT INTO cwd_membership (ID, parent_id, child_id, membership_type, group_type, parent_name, lower_parent_name, child_name, lower_child_name, directory_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) (Duplicate entry '14392-78974-GROUP_USER' for key 'uk_mem_parent_child_type'))
Notes
We have reports from a group of customers that suggests that problem was caused by staging/testing Jira instance sending the cache updates to the production cluster. In other words, production data in cwd_membership cache got poisoned by data from another environment.
- This usually happens when production data backup snapshot is restored to lower environments without cleaning the cluster nodes information.
- To prevent this, please remove all production nodes from lower environments after restore, please check Remove abandoned or offline nodes in JIRA Data Center
- We have a KB explaining problem in detail and covering diagnostic steps: Inconsistency in group membership and user status on one or multiple nodes in Jira Datacenter
- Problem is lagerly mitigated in Jira 8.10+ with introduction of node clean-up job , see
JRASERVER-42916
Workaround
- Refresh caches by full Jira restart (all nodes in case of DC)
- Full restart in Data Center is required because nodes with a corrupted cache will replicate it to other starting nodes, this means shutting down all nodes, validating the service is stopped and starting them back up, a rolling restart will not resolve this issue.
- Add new directory, sync it, remove the 1st one
- Create a new user directory with exact same settings as Directory ID: 10000 (name: Active Directory server
- Synchronize with this new directory
- Move this new directory to top position (the first directory to be checked)
- Test user access
- Remove old directory(id:10000)
- Remove blocking(inconsistent data) user from cwd_membership and consequently update cwd_user:
We should be able to see if the DB is affected by running:-- users select distinct M.lower_child_name from cwd_membership M join cwd_user U on U.lower_user_name = M.lower_child_name and U.directory_id = M.directory_id and M.membership_type = 'GROUP_USER' where M.child_id != U.id; -- groups select distinct M.lower_child_name from cwd_membership M join cwd_group G on G.lower_group_name = M.lower_child_name and G.directory_id = M.directory_id and M.membership_type = 'GROUP_GROUP' where M.child_id != G.id;
Following updates will remove any inconsistent data. Always have a backup of your database before doing any changes to it
-- For Postgres DROP INDEX UK_MEM_PARENT_CHILD_TYPE; update cwd_membership set child_id = U.id from cwd_user U where cwd_membership.lower_child_name = U.lower_user_name and cwd_membership.directory_id = U.directory_id and cwd_membership.membership_type = 'GROUP_USER' and child_id != U.id; update cwd_membership set child_id = G.id from cwd_group G where cwd_membership.lower_child_name = G.lower_group_name and cwd_membership.directory_id = G.directory_id and cwd_membership.membership_type = 'GROUP_GROUP' and child_id != G.id; update cwd_membership set parent_id = G.id from cwd_group G where cwd_membership.lower_parent_name = G.lower_group_name and cwd_membership.directory_id = G.directory_id and parent_id != G.id; delete from cwd_membership where lower_child_name not in ( select lower_user_name from cwd_user ) and membership_type = 'GROUP_USER'; delete from cwd_membership where lower_child_name not in ( select lower_group_name from cwd_group ) and membership_type = 'GROUP_GROUP'; delete from cwd_membership where lower_parent_name not in ( select lower_group_name from cwd_group ); create unique index uk_mem_parent_child_type on public.cwd_membership using btree (parent_id, child_id, membership_type)
A cold restart is needed to rebuilt the cache with the correct data.
- causes
-
JRASERVER-74171 Synchronisation fails with Active Directory because of duplicate records in cwd_membership table.
- Closed
- is related to
-
JRASERVER-74911 User/group cache not fully replicated on node start.
- Closed
-
PSR-372 Loading...
- relates to
-
JRASERVER-74215 Instances using custom authentication plugins might fail to log in users due to corrupt cache
- Closed
-
JRASERVER-39768 'ERROR: duplicate key value violates unique constraint "uk_mem_parent_child_type"' thrown when renaming user
- Closed
-
JRASERVER-71483 As a Jira admin I would like to recover from user cache corruption without a whole cluster restart
- Closed
-
HL-1629 Loading...
-
LOGIN-3 Loading...
-
LOGIN-27 Loading...
- Mentioned in
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...