Improve query in HibernatePullRequestParticipantDao.searchUsers()

XMLWordPrintable

    • CtB - Improve Existing

      The database query made by HibernatePullRequestParticipantDao.searchUsers() can be slow for large table sizes, taking upwards of half a second. For example:

      => EXPLAIN ANALYZE SELECT sta_normal_user.user_id,
        sta_normal_user.deleted_timestamp,
        sta_normal_user.locale,
        sta_normal_user.slug,
        sta_normal_user.time_zone,
        sta_normal_user.name
      FROM sta_normal_user
      JOIN stash_user ON sta_normal_user.user_id=stash_user.id
      WHERE sta_normal_user.user_id IN (
        SELECT DISTINCT sta_pr_participant.user_id
          FROM sta_pr_participant join sta_pull_request ON sta_pull_request.id=sta_pr_participant.pr_id AND sta_pull_request.to_repository_id=7462
         WHERE sta_pr_participant.pr_role=0
      )
      ORDER BY sta_normal_user.name OFFSET 0 ROWS FETCH FIRST 25 ROWS ONLY;
                                                                                                QUERY PLAN
      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       Limit  (cost=282185.51..282185.57 rows=25 width=51) (actual time=532.760..534.357 rows=25 loops=1)
         ->  Sort  (cost=282185.51..282197.39 rows=4752 width=51) (actual time=532.758..534.353 rows=25 loops=1)
               Sort Key: sta_normal_user.name
               Sort Method: top-N heapsort  Memory: 27kB
               ->  Nested Loop  (cost=277846.12..282051.41 rows=4752 width=51) (actual time=516.879..533.737 rows=2635 loops=1)
                     ->  Merge Join  (cost=277845.83..280507.07 rows=4752 width=55) (actual time=516.860..530.501 rows=2635 loops=1)
                           Merge Cond: (sta_normal_user.user_id = sta_pr_participant.user_id)
                           ->  Index Scan using pk_sta_normal_user on sta_normal_user  (cost=0.29..1372.28 rows=38885 width=51) (actual time=0.009..7.347 rows=34107 loops=1)
                           ->  Unique  (cost=277845.54..278978.18 rows=4752 width=4) (actual time=516.821..520.094 rows=2636 loops=1)
                                 ->  Gather Merge  (cost=277845.54..278954.42 rows=9504 width=4) (actual time=516.819..519.528 rows=6731 loops=1)
                                       Workers Planned: 2
                                       Workers Launched: 2
                                       ->  Sort  (cost=276845.52..276857.40 rows=4752 width=4) (actual time=514.029..514.168 rows=2244 loops=3)
                                             Sort Key: sta_pr_participant.user_id
                                             Sort Method: quicksort  Memory: 97kB
                                             Worker 0:  Sort Method: quicksort  Memory: 97kB
                                             Worker 1:  Sort Method: quicksort  Memory: 97kB
                                             ->  HashAggregate  (cost=276507.79..276555.31 rows=4752 width=4) (actual time=513.379..513.668 rows=2244 loops=3)
                                                   Group Key: sta_pr_participant.user_id
                                                   Batches: 1  Memory Usage: 337kB
                                                   Worker 0:  Batches: 1  Memory Usage: 337kB
                                                   Worker 1:  Batches: 1  Memory Usage: 337kB
                                                   ->  Parallel Hash Join  (cost=153181.70..276310.87 rows=78768 width=4) (actual time=132.619..501.635 rows=63647 loops=3)
                                                         Hash Cond: (sta_pr_participant.pr_id = sta_pull_request.id)
                                                         ->  Parallel Seq Scan on sta_pr_participant  (cost=0.00..121612.17 rows=577903 width=12) (actual time=0.005..246.722 rows=464030 loops=3)
                                                               Filter: (pr_role = 0)
                                                               Rows Removed by Filter: 2429101
                                                         ->  Parallel Hash  (cost=152193.46..152193.46 rows=79059 width=8) (actual time=128.775..128.775 rows=63647 loops=3)
                                                               Buckets: 262144  Batches: 1  Memory Usage: 9536kB
                                                               ->  Parallel Seq Scan on sta_pull_request  (cost=0.00..152193.46 rows=79059 width=8) (actual time=0.041..111.677 rows=63647 loops=3)
                                                                     Filter: (to_repository_id = 7462)
                                                                     Rows Removed by Filter: 400383
                     ->  Index Only Scan using pk_stash_user on stash_user  (cost=0.29..0.32 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=2635)
                           Index Cond: (id = sta_normal_user.user_id)
                           Heap Fetches: 2
       Planning Time: 0.551 ms
       Execution Time: 534.450 ms
      (37 rows)
      

      a equivalent but better performing alternative would be:

      => EXPLAIN ANALYZE SELECT DISTINCT sta_normal_user.user_id,
             sta_normal_user.deleted_timestamp,
             sta_normal_user.locale,
             sta_normal_user.slug,
             sta_normal_user.time_zone,
             sta_normal_user.name
      FROM sta_normal_user
      JOIN stash_user ON sta_normal_user.user_id = stash_user.id
      JOIN sta_pr_participant ON sta_pr_participant.user_id = sta_normal_user.user_id
      JOIN sta_pull_request ON sta_pull_request.id = sta_pr_participant.pr_id
      WHERE sta_pull_request.to_repository_id = 7462
        AND sta_pr_participant.pr_role = 0
      ORDER BY sta_normal_user.name
      OFFSET 0 ROWS FETCH FIRST 25 ROWS ONLY;
                                                                                           QUERY PLAN
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       Limit  (cost=72.31..1846.60 rows=25 width=51) (actual time=6.670..67.875 rows=25 loops=1)
         ->  Unique  (cost=72.31..2759802.96 rows=38885 width=51) (actual time=6.669..67.866 rows=25 loops=1)
               ->  Incremental Sort  (cost=72.31..2757740.33 rows=137509 width=51) (actual time=6.668..67.646 rows=1277 loops=1)
                     Sort Key: sta_normal_user.name, sta_normal_user.user_id, sta_normal_user.deleted_timestamp, sta_normal_user.locale, sta_normal_user.slug, sta_normal_user.time_zone
                     Presorted Key: sta_normal_user.name
                     Full-sort Groups: 14  Sort Method: quicksort  Average Memory: 28kB  Peak Memory: 28kB
                     Pre-sorted Groups: 19  Sort Method: quicksort  Average Memory: 25kB  Peak Memory: 25kB
                     ->  Nested Loop  (cost=1.45..2753990.89 rows=137509 width=51) (actual time=3.015..66.817 rows=1330 loops=1)
                           ->  Nested Loop  (cost=1.01..1905098.86 rows=1008871 width=59) (actual time=0.172..38.184 rows=7682 loops=1)
                                 Join Filter: (sta_normal_user.user_id = sta_pr_participant.user_id)
                                 ->  Nested Loop  (cost=0.58..15000.33 rows=38885 width=55) (actual time=0.020..1.316 rows=394 loops=1)
                                       ->  Index Scan using uq_normal_user_name on sta_normal_user  (cost=0.29..2363.22 rows=38885 width=51) (actual time=0.012..0.338 rows=394 loops=1)
                                       ->  Index Only Scan using pk_stash_user on stash_user  (cost=0.29..0.32 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=394)
                                             Index Cond: (id = sta_normal_user.user_id)
                                             Heap Fetches: 44
                                 ->  Index Scan using idx_sta_pr_participant_user on sta_pr_participant  (cost=0.43..44.96 rows=292 width=12) (actual time=0.005..0.091 rows=19 loops=394)
                                       Index Cond: (user_id = stash_user.id)
                                       Filter: (pr_role = 0)
                                       Rows Removed by Filter: 90
                           ->  Memoize  (cost=0.44..0.90 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=7682)
                                 Cache Key: sta_pr_participant.pr_id
                                 Cache Mode: logical
                                 Hits: 0  Misses: 7682  Evictions: 0  Overflows: 0  Memory Usage: 593kB
                                 ->  Index Scan using pk_sta_pull_request on sta_pull_request  (cost=0.43..0.89 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=7682)
                                       Index Cond: (id = sta_pr_participant.pr_id)
                                       Filter: (to_repository_id = 7462)
                                       Rows Removed by Filter: 1
       Planning Time: 1.164 ms
       Execution Time: 67.931 ms
      (29 rows)
      

            Assignee:
            Ben Humphreys
            Reporter:
            Ben Humphreys
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: