Loading...

Type: Suggestion
Resolution: Unresolved
Fix Version/s: None
Component/s: Content - Tasks
Labels:
None

UIS:
2
Support reference count:
7

We are using the below SQL query for finding duplicates while running the 'LabelUniqueIndexUpgradeTask':

    public static final String SQL_SELECT_DUPLICATE_LABELS = "SELECT l1.* " +
            "FROM LABEL l1, LABEL l2 " +
            "WHERE l1.LABELID <> l2.LABELID " +
            "   AND (l1.NAME = l2.NAME OR (l1.NAME IS NULL AND l2.NAME IS NULL)) " +
            "   AND (l1.OWNER = l2.OWNER OR (l1.OWNER IS NULL AND l2.OWNER IS NULL)) " +
            "   AND (l1.NAMESPACE = l2.NAMESPACE OR (l1.NAMESPACE IS NULL AND l2.NAMESPACE IS NULL)) " +
            "ORDER BY l1.NAME, l1.OWNER, l1.NAMESPACE, l1.LABELID ASC";

In a PostgreSQL database with 80k records in LABEL table it took nearly 5minutes to execute this query. In customer's instance it took +10 minutes.

When we checked the SQL query with EXPLAIN, it turned out that the indexes are not used effectively.

(Execution Time: 304666.246 ms)

Nested Loop  (cost=1001.35..73786.65 rows=159723919 width=71) (actual time=20.236..304665.775 rows=324 loops=1)
  ->  Gather Merge  (cost=1001.06..18070.10 rows=88091 width=71) (actual time=20.143..620.023 rows=88168 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Incremental Sort  (cost=1.05..7159.85 rows=51818 width=71) (actual time=0.874..397.286 rows=44084 loops=2)
"              Sort Key: l1.name, l1.owner, l1.namespace, l1.labelid"
              Presorted Key: l1.name
              Full-sort Groups: 105  Sort Method: quicksort  Average Memory: 27kB  Peak Memory: 27kB
"              Pre-sorted Groups: 14  Sort Methods: quicksort, external merge  Average Memory: 23kB  Peak Memory: 25kB  Average Disk: 241kB  Peak Disk: 3384kB"
              Worker 0:  Full-sort Groups: 542  Sort Method: quicksort  Average Memory: 28kB  Peak Memory: 28kB
"                Pre-sorted Groups: 17  Sort Methods: quicksort, external merge  Average Memory: 23kB  Peak Memory: 25kB  Average Disk: 120kB  Peak Disk: 2048kB"
              ->  Parallel Index Scan using l_name_idx on label l1  (cost=0.29..5026.71 rows=51818 width=71) (actual time=0.040..7.619 rows=44084 loops=2)
  ->  Index Scan using l_name_idx on label l2  (cost=0.29..0.62 rows=1 width=55) (actual time=3.445..3.448 rows=0 loops=88168)
        Index Cond: ((name)::text = (l1.name)::text)
        Filter: ((l1.labelid <> labelid) AND (((l1.owner)::text = (owner)::text) OR ((l1.owner IS NULL) AND (owner IS NULL))) AND ((l1.namespace)::text = (namespace)::text))
        Rows Removed by Filter: 51714
Planning Time: 3.087 ms
Execution Time: 304666.246 ms

Based on the our research, 'OR' condition is possibly confusing the query planner.

When the query adjusted as below, the execution time drastically drops to milliseconds

SELECT l1.*
FROM label l1
JOIN label l2
  ON l1.name = l2.name
 AND l1.owner = l2.owner
 AND l1.namespace = l2.namespace
 AND l1.labelid <> l2.labelid
UNION ALL
SELECT l1.*
FROM label l1
JOIN label l2
  ON l1.name = l2.name
 AND l1.owner IS NULL
 AND l2.owner IS NULL
 AND l1.namespace = l2.namespace
 AND l1.labelid <> l2.labelid
ORDER BY name, owner, namespace, labelid;

EXPLAIN output: (Execution Time: 93.398 ms)

Gather Merge  (cost=31598083.15..50527758.37 rows=161522001 width=71) (actual time=91.720..93.302 rows=324 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Sort  (cost=31597083.13..31883111.67 rows=114411417 width=71) (actual time=83.133..83.137 rows=108 loops=3)
"        Sort Key: l1.name, l1.owner, l1.namespace, l1.labelid"
        Sort Method: quicksort  Memory: 25kB
        Worker 0:  Sort Method: quicksort  Memory: 70kB
        Worker 1:  Sort Method: quicksort  Memory: 25kB
        ->  Parallel Append  (cost=0.29..2205342.37 rows=114411417 width=71) (actual time=41.869..82.713 rows=108 loops=3)
              ->  Parallel Hash Join  (cost=2966.26..6933.39 rows=14827 width=71) (actual time=28.958..50.164 rows=320 loops=1)
                    Hash Cond: (((l1.name)::text = (l2.name)::text) AND ((l1.owner)::text = (l2.owner)::text) AND ((l1.namespace)::text = (l2.namespace)::text))
                    Join Filter: (l1.labelid <> l2.labelid)
                    Rows Removed by Join Filter: 67527
                    ->  Parallel Seq Scan on label l1  (cost=0.00..1551.64 rows=51864 width=71) (actual time=0.003..2.940 rows=88168 loops=1)
                    ->  Parallel Hash  (cost=1551.64..1551.64 rows=51864 width=55) (actual time=28.532..28.532 rows=88168 loops=1)
                          Buckets: 131072  Batches: 1  Memory Usage: 6848kB
                          ->  Parallel Seq Scan on label l2  (cost=0.00..1551.64 rows=51864 width=55) (actual time=0.008..11.059 rows=88168 loops=1)
              ->  Nested Loop  (cost=0.29..11131.89 rows=94998115 width=71) (actual time=41.911..65.984 rows=1 loops=3)
                    ->  Parallel Seq Scan on label l1_1  (cost=0.00..1551.64 rows=12238 width=71) (actual time=0.017..1.705 rows=6880 loops=3)
                          Filter: (owner IS NULL)
                          Rows Removed by Filter: 22509
                    ->  Index Scan using l_name_idx on label l2_1  (cost=0.29..0.77 rows=1 width=22) (actual time=0.009..0.009 rows=0 loops=20641)
                          Index Cond: ((name)::text = (l1_1.name)::text)
                          Filter: ((owner IS NULL) AND (l1_1.labelid <> labelid) AND ((l1_1.namespace)::text = (namespace)::text))
                          Rows Removed by Filter: 4
Planning Time: 1.089 ms
Execution Time: 93.398 ms

Please edit the SQL query as above (or any other form that improves the execution time) to make it more performant for the customers that have large LABEL table.

relates to

CONFSERVER-99168 Confluence upgrade fails on Oracle db that using VARCHAR2 when executing label upgrade task

Closed

CONFSERVER-99185 9.2 LTS Backport - Confluence upgrade fails on Oracle db that using VARCHAR2 when executing label upgrade task

Closed

CONFSERVER-99419 A Confluence instance with an Oracle or MSSQL DB backend fails to upgrade to an affected Confluence version if the label table has multiple rows of NULL names and NULL owners

Closed

mentioned in: Page Loading...; Page Loading...

Details

Description

Attachments

Issue Links

Activity

People

Dates