Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: High
Fix Version/s: 8.22.0, 8.20.8
Affects Version/s: 7.13.0, 7.6.9, 8.3.1, 7.13.12, 8.13.0
Component/s: Login, User Management - Others
Labels:

Introduced in Version:
7.06
Support reference count:
122
Symptom Severity:
Severity 2 - Major
UIS:
120
Bug Fix Policy:
View Atlassian Server bug fix policy
Current Status:

Hide

Please check this KB to check some details on how this bug was fixed and what other problems related to user-login were fixed: https://confluence.atlassian.com/jirakb/user-login-jira-stats-logs-1108675859.html

Show
Please check this KB to check some details on how this bug was fixed and what other problems related to user-login were fixed: https://confluence.atlassian.com/jirakb/user-login-jira-stats-logs-1108675859.html

Issue Summary

Data inconsistency result in users not able to login. There is a number of related cases when for all of them the root cause isn’t found, for most of them DB manipulation/refreshing cache/re-adding directory and syncing it solved the problem.

This results in users unable to login, but flows/actions that could result in this state are plenty of. Mainly those are user login, directory Synchronisation.

For some customers that works only temporary so the workaround is done over and over again.

Inconsistency exists in DB between cwd_membership and cwd_user tables.
The most probable reason of it is cache corruption. It was confirmed in 1 case.

There is a KnowledgeBase article for that, number of cases with similar symptoms and the same workaround steps.
https://confluence.atlassian.com/jirakb/ldap-users-and-groups-display-unexpectedly-in-jira-server-776799341.html

The biggest problem with pointing out the root cause is that problem show off with the delay(like a couple of months later), when there are no available logs providing information how data inconsistency was created (due to failed CRUD operation, or cache corruption -> that's why there are different workaround - manual DB update / new user-directory + resync & cache refresh)

Steps to Reproduce

Problem is not yet reproducible
On affected/corrupted instances users are not able to login

Expected Results

An active user is able to login. If a problem occurs, sync AD with Jira solves the problem.
-> There is no data inconsistency

Actual Results

The below exception is thrown in the Atlassian-jira.log file:

...
2019-11-11 21:10:26,238 http-nio-9280-exec-17 ERROR      [c.a.j.web.servlet.InternalServerErrorServlet] {errorId=a0e52747-b31d-4200-a2ab-5da0eedddf5d, interpretedMsg=, cause=com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericEntityException: 
while inserting: [GenericEntity:Membership][lowerChildName,atariq][membershipType,GROUP_USER][parentName,SG-NETACCESS-CORP][childName,atariq][directoryId,10000][id,571581][childId,78974][lowerParentName,sg-netaccess-corp][parentId,14392] 
(SQL Exception while executing the following:INSERT INTO cwd_membership (ID, parent_id, child_id, membership_type, group_type, parent_name, lower_parent_name, child_name, lower_child_name, directory_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) (Duplicate entry '14392-78974-GROUP_USER' for key 'uk_mem_parent_child_type')), 
stacktrace=com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericEntityException: while inserting: [GenericEntity:Membership][lowerChildName,atariq][membershipType,GROUP_USER][parentName,SG-NETACCESS-CORP][childName,atariq][directoryId,10000][id,571581][childId,78974][lowerParentName,sg-netaccess-corp][parentId,14392] (SQL Exception while executing the following:INSERT INTO cwd_membership (ID, parent_id, child_id, membership_type, group_type, parent_name, lower_parent_name, child_name, lower_child_name, directory_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) (Duplicate entry '14392-78974-GROUP_USER' for key 'uk_mem_parent_child_type'))

Notes

We have reports from a group of customers that suggests that problem was caused by staging/testing Jira instance sending the cache updates to the production cluster. In other words, production data in cwd_membership cache got poisoned by data from another environment.

This usually happens when production data backup snapshot is restored to lower environments without cleaning the cluster nodes information.
To prevent this, please remove all production nodes from lower environments after restore, please check Remove abandoned or offline nodes in JIRA Data Center
We have a KB explaining problem in detail and covering diagnostic steps: Inconsistency in group membership and user status on one or multiple nodes in Jira Datacenter
Problem is lagerly mitigated in Jira 8.10+ with introduction of node clean-up job , see ~~JRASERVER-42916~~

Workaround

Refresh caches by full Jira restart (all nodes in case of DC)
- Full restart in Data Center is required because nodes with a corrupted cache will replicate it to other starting nodes, this means shutting down all nodes, validating the service is stopped and starting them back up, a rolling restart will not resolve this issue.

Add new directory, sync it, remove the 1st one
- Create a new user directory with exact same settings as Directory ID: 10000 (name: Active Directory server
- Synchronize with this new directory
- Move this new directory to top position (the first directory to be checked)
- Test user access
- Remove old directory(id:10000)

Remove blocking(inconsistent data) user from cwd_membership and consequently update cwd_user:
We should be able to see if the DB is affected by running:

-- users
select distinct M.lower_child_name
from cwd_membership M
join cwd_user U 
  on U.lower_user_name = M.lower_child_name 
  and U.directory_id = M.directory_id 
  and M.membership_type = 'GROUP_USER'
where M.child_id != U.id;

-- groups
select distinct M.lower_child_name
from cwd_membership M 
join cwd_group G
  on G.lower_group_name = M.lower_child_name 
  and G.directory_id = M.directory_id 
  and M.membership_type = 'GROUP_GROUP'
where M.child_id != G.id;

Following updates will remove any inconsistent data. Always have a backup of your database before doing any changes to it

-- For Postgres
DROP INDEX UK_MEM_PARENT_CHILD_TYPE;

update cwd_membership
set child_id = U.id
from cwd_user U
where cwd_membership.lower_child_name = U.lower_user_name
and cwd_membership.directory_id = U.directory_id
and cwd_membership.membership_type = 'GROUP_USER'
and child_id != U.id;

update cwd_membership
set child_id = G.id
from cwd_group G
where cwd_membership.lower_child_name = G.lower_group_name
and cwd_membership.directory_id = G.directory_id
and cwd_membership.membership_type = 'GROUP_GROUP'
and child_id != G.id;

update cwd_membership
set parent_id = G.id
from cwd_group G
where cwd_membership.lower_parent_name = G.lower_group_name
and cwd_membership.directory_id = G.directory_id
and parent_id != G.id;

delete from cwd_membership 
where lower_child_name not in
(
  select lower_user_name 
  from cwd_user
)
and membership_type = 'GROUP_USER';

delete from cwd_membership 
where lower_child_name not in
(
  select lower_group_name
  from cwd_group
)
and membership_type = 'GROUP_GROUP';

delete from cwd_membership 
where lower_parent_name not in 
(
  select lower_group_name
  from cwd_group
);

create unique index uk_mem_parent_child_type
on public.cwd_membership 
using
btree (parent_id, child_id, membership_type)

A cold restart is needed to rebuilt the cache with the correct data.

causes

JRASERVER-74171 Synchronisation fails with Active Directory because of duplicate records in cwd_membership table.

Closed

is related to

JRASERVER-74911 User/group cache not fully replicated on node start.

Closed

PSR-372 Loading...

relates to

JRASERVER-74215 Instances using custom authentication plugins might fail to log in users due to corrupt cache

Closed

JRASERVER-39768 'ERROR: duplicate key value violates unique constraint "uk_mem_parent_child_type"' thrown when renaming user

Closed

JRASERVER-71483 As a Jira admin I would like to recover from user cache corruption without a whole cluster restart

Closed

HL-1629 Loading...

Mentioned in

CSS Top Asks - Jira On Prem: JRASERVER-71483 | recover from the user cache corruption without a whole cluster restart

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(4 relates to, 1 Mentioned in, 51 mentioned in)

Assignee:: Maciej Swinarski (Inactive)

Reporter:: Artur Faruga

Votes:: 25 Vote for this issue

Watchers:: 60 Start watching this issue

Created:: 24/Feb/2020 8:32 AM

Updated:: 23/Oct/2023 1:30 PM

Resolved:: 18/Feb/2022 1:47 PM

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Notes

Workaround

Attachments

Issue Links

Forms

Activity

People

Dates