[JRASERVER-70995] Inefficient full table scans when deleting from cwd_membership during directory syncs

Type: Bug
Resolution: Unresolved
Priority: Low (View bug fix roadmap)
Fix Version/s: None
Affects Version/s: 8.5.4, 7.13.13
Component/s: User Management - LDAP Integration
Labels:

Introduced in Version:
7.13
Support reference count:
3
Symptom Severity:
Severity 2 - Major
UIS:
1
Bug Fix Policy:
View Atlassian Server bug fix policy

Issue Summary

At present, user deletions during directory syncs do not utilize any indexes on the cwd_membership table, meaning each deletion requires in a full table scan. When the table is large the DB rapidly exhausts any in-memory buffers and has to constantly re-scan the table.

Examining the SQL logging underneath the sync, the deletion has three predicates on membership_type, child_name, and directory_id:

delete from cwd_membership where membership_type='GROUP_USER' and child_name='<username>' and directory_id=<id>;

No existing index is suitable for this query. There is an index that would be suitable if the query utilized lower_child_name instead:

"idx_mem_dir_child" btree (lower_child_name, membership_type, directory_id)

list of all indexes for the table cwd_membership:

    "pk_cwd_membership" PRIMARY KEY, btree (id)
    "uk_mem_parent_child_type" UNIQUE, btree (parent_id, child_id, membership_type)
    "idx_mem_dir_child" btree (lower_child_name, membership_type, directory_id)
    "idx_mem_dir_parent" btree (lower_parent_name, membership_type, directory_id)
    "idx_mem_dir_parent_child" btree (lower_parent_name, lower_child_name, membership_type, directory_id)

Steps to Reproduce

Generate a large number of users and groups in local Jira directory or remote directory; ensure users have one or more group memberships
If using a remote directory, perform a full sync and observe elapsed time for deletions

If using a local directory, delete group memberships for a single user using the following query and benchmark the time:

delete from cwd_membership where membership_type='GROUP_USER' and child_name='<username>' and directory_id=<id>;

Expected Results

Deletion should utilize a covering index to be efficient.

Actual Results

Deletion results in a full table scan. Here's a practical example from a customer with over 3 million rows in their cwd_membership table, and a deletion of a single user's group memberships that took over 7 seconds:

explain analyze delete from cwd_membership where membership_type='GROUP_USER' and child_name='<username>' and directory_id=10000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Delete on cwd_membership (cost=0.00..103367.51 rows=86 width=6) (actual time=7311.263..7311.263 rows=0 loops=1)
-> Seq Scan on cwd_membership (cost=0.00..103367.51 rows=86 width=6) (actual time=755.912..7308.464 rows=95 loops=1)
Filter: (((membership_type)::text = 'GROUP_USER'::text) AND ((child_name)::text = '<username>'::text) AND (directory_id = '10000'::numeric))
Rows Removed by Filter: 3003405
Planning time: 0.155 ms
Execution time: 7311.305 ms

When deleting several users this drastically slows down sync time. A sync example from the same customer that took 243 seconds to delete 134 users' group memberships:

 Caesium-1-3 INFO [c.a.crowd.directory.DbCachingRemoteChangeOperations] deleted [ 134 ] users in [ 243102ms ]

Adding a covering index drastically reduces these sync times, deleting 243 users' group memberships in 1.7 seconds:

Caesium-1-1 INFO [c.a.crowd.directory.DbCachingRemoteChangeOperations] deleted [ 243 ] users in [ 1724ms ]

Workaround

Add a new composite indexing covering query predicates:

create index idx_mem_child_dir on cwd_membership (membership_type,child_name,directory_id);

With this the query planner will use an index scan, rather than a full table scan:

explain (analyze,buffers) delete from cwd_membership where membership_type='GROUP_USER' and child_name ='<username>' and directory_id =1;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on cwd_membership  (cost=0.28..11.84 rows=3 width=6) (actual time=0.096..0.096 rows=0 loops=1)
   Buffers: shared hit=9 dirtied=3
   ->  Index Scan using idx_mem_child_dir on cwd_membership  (cost=0.28..11.84 rows=3 width=6) (actual time=0.016..0.019 rows=3 loops=1)
         Index Cond: (((membership_type)::text = 'GROUP_USER'::text) AND ((child_name)::text = '<username>'::text) AND (directory_id = 1::numeric))
         Buffers: shared hit=4
 Planning time: 0.113 ms
 Execution time: 0.114 ms

mentioned in: Page Failed to load; Page Failed to load

Jason D Smith added a comment - 01/May/2020 10:03 PM

I wonder if there is a similar workaround for the query that searches for groups

020-04-30 15:32:57,180-0700 CrowdUsnChangedCacheRefresher:thread-2 INFO ServiceRunner [c.a.c.d.ldap.cache.UsnChangedCacheRefresher] found [ 32294 ] remote groups in [ 101477ms ]

Jason D Smith added a comment - 01/May/2020 10:03 PM I wonder if there is a similar workaround for the query that searches for groups 020-04-30 15:32:57,180-0700 CrowdUsnChangedCacheRefresher:thread-2 INFO ServiceRunner [c.a.c.d.ldap.cache.UsnChangedCacheRefresher] found [ 32294 ] remote groups in [ 101477ms ]

TJ Royall added a comment - 30/Apr/2020 12:38 PM

mdoar2: It is indeed the size of the cwd_membership table that matters; the easiest way to inflate that is to add a large number of users and add them to groups. You should see the effect at any scale (which can you can verify via explain timings), but it does become more pronounced and evident the larger the table becomes.

I though I had raised this as Jira Core; thanks for catching that! I'll move it now.

TJ Royall added a comment - 30/Apr/2020 12:38 PM mdoar2 : It is indeed the size of the cwd_membership table that matters; the easiest way to inflate that is to add a large number of users and add them to groups. You should see the effect at any scale (which can you can verify via explain timings), but it does become more pronounced and evident the larger the table becomes. I though I had raised this as Jira Core; thanks for catching that! I'll move it now.

Matt Doar added a comment - 30/Apr/2020 12:36 AM

Should this one be in JRASERVER since it looks like core DB stuff?

Matt Doar added a comment - 30/Apr/2020 12:36 AM Should this one be in JRASERVER since it looks like core DB stuff?

Matt Doar added a comment - 30/Apr/2020 12:32 AM

"Generate a large number of users and groups " - how many is large here? Isn't it the size of the cwd_membership table that matters?

Matt Doar added a comment - 30/Apr/2020 12:32 AM "Generate a large number of users and groups " - how many is large here? Isn't it the size of the cwd_membership table that matters?

Jira Data Center

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Issue Links

Forms

Activity

Collapse comment: Jason D Smith added a comment - 01/May/2020 10:03 PM

Expand comment: Jason D Smith added a comment - 01/May/2020 10:03 PM

Collapse comment: TJ Royall added a comment - 30/Apr/2020 12:38 PM

Expand comment: TJ Royall added a comment - 30/Apr/2020 12:38 PM

Collapse comment: Matt Doar added a comment - 30/Apr/2020 12:36 AM

Expand comment: Matt Doar added a comment - 30/Apr/2020 12:36 AM

Collapse comment: Matt Doar added a comment - 30/Apr/2020 12:32 AM

Expand comment: Matt Doar added a comment - 30/Apr/2020 12:32 AM

People

Dates