-
Bug
-
Resolution: Unresolved
-
Low (View bug fix roadmap)
-
None
-
8.5.4, 7.13.13
-
7.13
-
3
-
Severity 2 - Major
-
1
-
Issue Summary
At present, user deletions during directory syncs do not utilize any indexes on the cwd_membership table, meaning each deletion requires in a full table scan. When the table is large the DB rapidly exhausts any in-memory buffers and has to constantly re-scan the table.
Examining the SQL logging underneath the sync, the deletion has three predicates on membership_type, child_name, and directory_id:
delete from cwd_membership where membership_type='GROUP_USER' and child_name='<username>' and directory_id=<id>;
No existing index is suitable for this query. There is an index that would be suitable if the query utilized lower_child_name instead:
"idx_mem_dir_child" btree (lower_child_name, membership_type, directory_id)
list of all indexes for the table cwd_membership:
"pk_cwd_membership" PRIMARY KEY, btree (id) "uk_mem_parent_child_type" UNIQUE, btree (parent_id, child_id, membership_type) "idx_mem_dir_child" btree (lower_child_name, membership_type, directory_id) "idx_mem_dir_parent" btree (lower_parent_name, membership_type, directory_id) "idx_mem_dir_parent_child" btree (lower_parent_name, lower_child_name, membership_type, directory_id)
Steps to Reproduce
- Generate a large number of users and groups in local Jira directory or remote directory; ensure users have one or more group memberships
- If using a remote directory, perform a full sync and observe elapsed time for deletions
- If using a local directory, delete group memberships for a single user using the following query and benchmark the time:
delete from cwd_membership where membership_type='GROUP_USER' and child_name='<username>' and directory_id=<id>;
Expected Results
Deletion should utilize a covering index to be efficient.
Actual Results
Deletion results in a full table scan. Here's a practical example from a customer with over 3 million rows in their cwd_membership table, and a deletion of a single user's group memberships that took over 7 seconds:
explain analyze delete from cwd_membership where membership_type='GROUP_USER' and child_name='<username>' and directory_id=10000; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Delete on cwd_membership (cost=0.00..103367.51 rows=86 width=6) (actual time=7311.263..7311.263 rows=0 loops=1) -> Seq Scan on cwd_membership (cost=0.00..103367.51 rows=86 width=6) (actual time=755.912..7308.464 rows=95 loops=1) Filter: (((membership_type)::text = 'GROUP_USER'::text) AND ((child_name)::text = '<username>'::text) AND (directory_id = '10000'::numeric)) Rows Removed by Filter: 3003405 Planning time: 0.155 ms Execution time: 7311.305 ms
When deleting several users this drastically slows down sync time. A sync example from the same customer that took 243 seconds to delete 134 users' group memberships:
Caesium-1-3 INFO [c.a.crowd.directory.DbCachingRemoteChangeOperations] deleted [ 134 ] users in [ 243102ms ]
Adding a covering index drastically reduces these sync times, deleting 243 users' group memberships in 1.7 seconds:
Caesium-1-1 INFO [c.a.crowd.directory.DbCachingRemoteChangeOperations] deleted [ 243 ] users in [ 1724ms ]
Workaround
Add a new composite indexing covering query predicates:
create index idx_mem_child_dir on cwd_membership (membership_type,child_name,directory_id);
With this the query planner will use an index scan, rather than a full table scan:
explain (analyze,buffers) delete from cwd_membership where membership_type='GROUP_USER' and child_name ='<username>' and directory_id =1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Delete on cwd_membership (cost=0.28..11.84 rows=3 width=6) (actual time=0.096..0.096 rows=0 loops=1) Buffers: shared hit=9 dirtied=3 -> Index Scan using idx_mem_child_dir on cwd_membership (cost=0.28..11.84 rows=3 width=6) (actual time=0.016..0.019 rows=3 loops=1) Index Cond: (((membership_type)::text = 'GROUP_USER'::text) AND ((child_name)::text = '<username>'::text) AND (directory_id = 1::numeric)) Buffers: shared hit=4 Planning time: 0.113 ms Execution time: 0.114 ms
I wonder if there is a similar workaround for the query that searches for groups