Uploaded image for project: 'Bitbucket Data Center'
  1. Bitbucket Data Center
  2. BSERV-7518

Git clone hang during stress test

    XMLWordPrintable

Details

    Description

      A "git clone" stress test of Stash can result in failures under some circumstances. Below are the broad steps used to reproduce this problem:

      1) Create an EC2 instance for testing

      • Instance Type: c4.8xlarge
      • OS: Ubuntu Server 14.04 LTS (HVM), SSD Volume Type - ami-69631053
      • Storage: EBS Provisioned SSD 3000 IOPS

      2) Install openjdk-7-jdk and git using apt-get install

      3) Install Stash 3.9.2 via installer

      4) Create a repository with a clone of the Linux kernel

      5) Restart Stash (Why? This will become evident later)

      6) Execute, locally on the same server as Stash, 20 concurrent git clones of the repository. These need to be executed at the same instant so a script like this works best:

      #!/bin/bash
      for i in $(seq 1 20)
      do
          git clone http://localhost:7990/scm/project/linux-clone.git linux$i > log$i 2>&1 &
      done
      

      The result will be some clones run successfully to completion and some hang (apparently forever). The atlassian-stash-access.log will show evidence of the same, for example the failed clones will result in the following entries (note the second last column indicates the failed clones failed very quickly):

      127.0.0.1 | http | o@T6YXXFx30x63x5 | admin | 2015-06-18 00:30:18,643 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 215 | - |
      127.0.0.1 | http | o@T6YXXFx30x67x9 | admin | 2015-06-18 00:30:18,644 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 203 | - |
      127.0.0.1 | http | o@T6YXXFx30x77x19 | admin | 2015-06-18 00:30:18,644 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 199 | - |
      127.0.0.1 | http | o@T6YXXFx30x70x12 | admin | 2015-06-18 00:30:18,644 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 203 | - |
      127.0.0.1 | http | o@T6YXXFx30x62x19 | admin | 2015-06-18 00:30:18,644 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 240 | - |
      127.0.0.1 | http | o@T6YXXFx30x60x17 | admin | 2015-06-18 00:30:18,644 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 242 | - |
      127.0.0.1 | http | o@T6YXXFx30x68x11 | admin | 2015-06-18 00:30:18,644 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 203 | - |
      127.0.0.1 | http | o@T6YXXFx30x59x16 | admin | 2015-06-18 00:30:18,644 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 243 | - |
      127.0.0.1 | http | o@T6YXXFx30x73x15 | admin | 2015-06-18 00:30:18,644 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 202 | - |
      127.0.0.1 | http | o@T6YXXFx30x71x13 | admin | 2015-06-18 00:30:18,644 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 203 | - |
      

      and the following entries for the successful clones:

      127.0.0.1 | http | o@T6YXXFx30x76x18 | admin | 2015-06-18 00:31:36,323 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 77881 | - |
      127.0.0.1 | http | o@T6YXXFx30x64x6 | admin | 2015-06-18 00:31:36,490 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 78054 | - |
      127.0.0.1 | http | o@T6YXXFx30x69x10 | admin | 2015-06-18 00:31:36,528 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 78087 | - |
      127.0.0.1 | http | o@T6YXXFx30x58x15 | admin | 2015-06-18 00:31:36,618 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 78217 | - |
      127.0.0.1 | http | o@T6YXXFx30x74x16 | admin | 2015-06-18 00:31:36,746 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 78304 | - |
      127.0.0.1 | http | o@T6YXXFx30x65x7 | admin | 2015-06-18 00:31:36,856 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 78420 | - |
      127.0.0.1 | http | o@T6YXXFx30x75x17 | admin | 2015-06-18 00:31:36,933 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 78491 | - |
      127.0.0.1 | http | o@T6YXXFx30x72x14 | admin | 2015-06-18 00:31:36,938 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:miss, clone | 78497 | - |
      127.0.0.1 | http | o@T6YXXFx30x61x18 | admin | 2015-06-18 00:31:37,745 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 79341 | - |
      127.0.0.1 | http | o@T6YXXFx30x66x8 | admin | 2015-06-18 00:31:37,869 | "POST /scm/lt/linux-clone.git/git-upload-pack HTTP/1.1" | "" "git/1.9.1" | cache:hit, clone | 79432 | - |
      

      Other observations about the problem:

      • This only occurs when the load test is executed immediately after starting Stash. Subsequent runs of the test after the first failed one, do succeed. Also, running a single git clone prior to executing the testcase will result in the testcase succeeding.
      • Have not been able to reproduce the problem when the SCM upload-pack cache is disabled. I did so using the REST interface:
        curl -v -u adminuser:password -X PUT localhost:7990/rest/scm-cache/latest/config/upload-pack/enabled/false
      • Adding a one second sleep in the loop that forks the "git clone" operations (that is put a delay between the start of each clone) the problem does not occur. Even adding a 200ms delay appears to avoid the problem
      • The problem can be reproduced using HTTP but not SSH
      • Problem can be reproduced with the git clients either running locally or remotely
      • The number of successful versus failed clones varies. Have seen failures in the range 7-16 of 20.

      Attachments

        Issue Links

          Activity

            People

              rfriend rikf
              behumphreys Ben Humphreys
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: