Uploaded image for project: 'Bitbucket Data Center'
  1. Bitbucket Data Center
  2. BSERV-9289

Improve CommitIndex.search() performance for large commit indexes on PostgreSQL

    XMLWordPrintable

Details

    • Suggestion
    • Resolution: Fixed
    • 4.11.0
    • None
    • None
    • We collect Bitbucket feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

    Description

      This is something of an offshoot from https://jira.atlassian.com/browse/BSERV-8649

      Environment:

      • OS: Ubuntu 15.04 (unlikely to matter)
      • DB: Postgres 9.4.5

      I have setup an instance with a single repository containing the Linux kernel source code. As a result there are ~550,000 commits in the changeset table. A request for a single commit results in the following query being executed:

      LOG:  duration: 52.547 ms  execute S_5: select this_.id as id1_7_0_, this_.author_timestamp as author_t2_7_0_
          from changeset this_ 
          where this_.id like $1
          limit $2
      DETAIL:  parameters: $1 = 'a202fbbf56e819de83876827c4bf5da2bfbac5ec%', $2 = '3'
      
      

      This lookup does not filter by repository id, likely to find commits in forks, so for instances with many repositories with large numbers of commits this could be extremely slow.

      This relates to BSERV-8649 because in that case a PR comment contains many objects that look like commit hashes. In that issue we are likely to solve it by limiting the number of commits we try to lookup, or limit it with some timeout. This issue is to deal with the general inefficiency of the service itself.

      For Postgres at least (so far I've only confirmed Postgres is actually slow), this can be improved by adding an index that specifically deals with LIKE based queries that match the start of the string.

      I have a testcase where I took 100 commits (git rev-list --all | head -n 100 | cut -c1-12) and added them to a PR comment. The page load time is ~6s:

      $ time curl -u admin:admin http://localhost:7990/bitbucket/projects/PROJECT_1/repos/linux-clone/pull-requests/1/overview
      ..
      real	0m6.016s
      
      

      Then I added an appropriate index to `changeset.id`:

      CREATE INDEX changeset_id_text_pattern_ops_idx ON changeset(id text_pattern_ops);
      
      

      The same request then takes ~0.3s.

      Obviously if we were to address this performance problem by adding a specialised index it would involve something different for each database

      Attachments

        Issue Links

          Activity

            People

              behumphreys Ben Humphreys
              behumphreys Ben Humphreys
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: