-
Suggestion
-
Resolution: Unresolved
-
None
-
None
-
8
-
7
-
We are using the below SQL query for finding duplicates while running the 'LabelUniqueIndexUpgradeTask':
public static final String SQL_SELECT_DUPLICATE_LABELS = "SELECT l1.* " + "FROM LABEL l1, LABEL l2 " + "WHERE l1.LABELID <> l2.LABELID " + " AND (l1.NAME = l2.NAME OR (l1.NAME IS NULL AND l2.NAME IS NULL)) " + " AND (l1.OWNER = l2.OWNER OR (l1.OWNER IS NULL AND l2.OWNER IS NULL)) " + " AND (l1.NAMESPACE = l2.NAMESPACE OR (l1.NAMESPACE IS NULL AND l2.NAMESPACE IS NULL)) " + "ORDER BY l1.NAME, l1.OWNER, l1.NAMESPACE, l1.LABELID ASC";
In a PostgreSQL database with 80k records in LABEL table it took nearly 5minutes to execute this query. In customer's instance it took +10 minutes.
When we checked the SQL query with EXPLAIN, it turned out that the indexes are not used effectively.
(Execution Time: 304666.246 ms)
Nested Loop (cost=1001.35..73786.65 rows=159723919 width=71) (actual time=20.236..304665.775 rows=324 loops=1) -> Gather Merge (cost=1001.06..18070.10 rows=88091 width=71) (actual time=20.143..620.023 rows=88168 loops=1) Workers Planned: 1 Workers Launched: 1 -> Incremental Sort (cost=1.05..7159.85 rows=51818 width=71) (actual time=0.874..397.286 rows=44084 loops=2) " Sort Key: l1.name, l1.owner, l1.namespace, l1.labelid" Presorted Key: l1.name Full-sort Groups: 105 Sort Method: quicksort Average Memory: 27kB Peak Memory: 27kB " Pre-sorted Groups: 14 Sort Methods: quicksort, external merge Average Memory: 23kB Peak Memory: 25kB Average Disk: 241kB Peak Disk: 3384kB" Worker 0: Full-sort Groups: 542 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB " Pre-sorted Groups: 17 Sort Methods: quicksort, external merge Average Memory: 23kB Peak Memory: 25kB Average Disk: 120kB Peak Disk: 2048kB" -> Parallel Index Scan using l_name_idx on label l1 (cost=0.29..5026.71 rows=51818 width=71) (actual time=0.040..7.619 rows=44084 loops=2) -> Index Scan using l_name_idx on label l2 (cost=0.29..0.62 rows=1 width=55) (actual time=3.445..3.448 rows=0 loops=88168) Index Cond: ((name)::text = (l1.name)::text) Filter: ((l1.labelid <> labelid) AND (((l1.owner)::text = (owner)::text) OR ((l1.owner IS NULL) AND (owner IS NULL))) AND ((l1.namespace)::text = (namespace)::text)) Rows Removed by Filter: 51714 Planning Time: 3.087 ms Execution Time: 304666.246 ms
Based on the our research, 'OR' condition is possibly confusing the query planner.
When the query adjusted as below, the execution time drastically drops to milliseconds
SELECT l1.* FROM label l1 JOIN label l2 ON l1.name = l2.name AND l1.owner = l2.owner AND l1.namespace = l2.namespace AND l1.labelid <> l2.labelid UNION ALL SELECT l1.* FROM label l1 JOIN label l2 ON l1.name = l2.name AND l1.owner IS NULL AND l2.owner IS NULL AND l1.namespace = l2.namespace AND l1.labelid <> l2.labelid ORDER BY name, owner, namespace, labelid;
EXPLAIN output: (Execution Time: 93.398 ms)
Gather Merge (cost=31598083.15..50527758.37 rows=161522001 width=71) (actual time=91.720..93.302 rows=324 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=31597083.13..31883111.67 rows=114411417 width=71) (actual time=83.133..83.137 rows=108 loops=3)
" Sort Key: l1.name, l1.owner, l1.namespace, l1.labelid"
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 70kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Parallel Append (cost=0.29..2205342.37 rows=114411417 width=71) (actual time=41.869..82.713 rows=108 loops=3)
-> Parallel Hash Join (cost=2966.26..6933.39 rows=14827 width=71) (actual time=28.958..50.164 rows=320 loops=1)
Hash Cond: (((l1.name)::text = (l2.name)::text) AND ((l1.owner)::text = (l2.owner)::text) AND ((l1.namespace)::text = (l2.namespace)::text))
Join Filter: (l1.labelid <> l2.labelid)
Rows Removed by Join Filter: 67527
-> Parallel Seq Scan on label l1 (cost=0.00..1551.64 rows=51864 width=71) (actual time=0.003..2.940 rows=88168 loops=1)
-> Parallel Hash (cost=1551.64..1551.64 rows=51864 width=55) (actual time=28.532..28.532 rows=88168 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 6848kB
-> Parallel Seq Scan on label l2 (cost=0.00..1551.64 rows=51864 width=55) (actual time=0.008..11.059 rows=88168 loops=1)
-> Nested Loop (cost=0.29..11131.89 rows=94998115 width=71) (actual time=41.911..65.984 rows=1 loops=3)
-> Parallel Seq Scan on label l1_1 (cost=0.00..1551.64 rows=12238 width=71) (actual time=0.017..1.705 rows=6880 loops=3)
Filter: (owner IS NULL)
Rows Removed by Filter: 22509
-> Index Scan using l_name_idx on label l2_1 (cost=0.29..0.77 rows=1 width=22) (actual time=0.009..0.009 rows=0 loops=20641)
Index Cond: ((name)::text = (l1_1.name)::text)
Filter: ((owner IS NULL) AND (l1_1.labelid <> labelid) AND ((l1_1.namespace)::text = (namespace)::text))
Rows Removed by Filter: 4
Planning Time: 1.089 ms
Execution Time: 93.398 ms
Please edit the SQL query as above (or any other form that improves the execution time) to make it more performant for the customers that have large LABEL table.
- relates to
-
CONFSERVER-99168 Confluence upgrade fails on Oracle db that using VARCHAR2 when executing label upgrade task
-
- Closed
-
-
CONFSERVER-99185 9.2 LTS Backport - Confluence upgrade fails on Oracle db that using VARCHAR2 when executing label upgrade task
-
- Closed
-
-
CONFSERVER-99419 A Confluence instance with an Oracle or MSSQL DB backend fails to upgrade to an affected Confluence version if the label table has multiple rows of NULL names and NULL owners
-
- Closed
-