• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Highest Highest
    • 5.0.0, 4.14.5
    • 4.12.1, 4.14.3, 4.14.4
    • None
    • None

      Summary

      Bitbucket fails to release SSH SCM hosting tickets if SSH commands terminate early. Hosting tickets will be leaked until a restart is required.

      Expected Results

      Bitbucket will remove hosting tickets when necessary.

      Actual Results

      The ticket pool becomes exhausted.

      Workaround

      Restart Bitbucket when it becomes overloaded.

      Solution

      Upgrade to 4.14.5 or above.

          Form Name

            [BSERV-9729] Bitbucket doesn't always release SSH hosting tickets

            Ranjith Thettallil added a comment - - edited

            The description: "4) Analyse the log file to determine what the minimum scm-hosting ticket usage was during the period monitored. If usage reached zero then there is no ticket leak" is misleading.  "If usage reached zero then there is no ticket leak." is this true? I am getting a feeling that if is reaches(0/12), then isn't it bug?

            Further below it says "So obviously if you ever see a log message where after release the two numbers are equal there is certainly no leak." ?

             

             

            Ranjith Thettallil added a comment - - edited The description: "4) Analyse the log file to determine what the minimum scm-hosting ticket usage was during the period monitored. If usage reached zero then there is no ticket leak" is misleading.  "If usage reached zero then there is no ticket leak." is this true? I am getting a feeling that if is reaches(0/12), then isn't it bug? Further below it says "So obviously if you ever see a log message where after release the two numbers are equal there is certainly no leak." ?    

            Ben Humphreys added a comment - - edited

            There have been some requests for instructions to help determine if a system is impacted by this bug; versus exhausting available tickets due to real load.

            It is possible to rule out this bug (i.e. a ticket leak) by using the fact that:

            • An idle system should have 0 used hosting tickets; and
            • At some point (perhaps in the early hours of the morning, or over a weekend) the system will be idle, even if only for a few seconds

            The second point may not be true for all systems; but lets say you have reason to believe system load is low/negligible at some point during the night:

            1) Before you go home, enable detailed logging for ticket acquire & release (see instructions below)

            2) Confirm you are seeing in the atlassian-bitbucket.log file this new logging. The output will look like this:

            "POST /scm/project_1/rep_1.git/git-receive-pack HTTP/1.1" c.a.s.i.t.SemaphoreThrottleService Acquired [scm-hosting] ticket (57/58)
            "POST /scm/project_1/rep_1.git/git-receive-pack HTTP/1.1" c.a.s.i.t.SemaphoreThrottleService Released [scm-hosting] ticket (58/58)

            3) When you arrive in the morning, disable detailed logging (see instructions below)

            4) Analyse the log file to determine what the minimum scm-hosting ticket usage was during the period monitored. If usage reached zero then there is no ticket leak. See instructions below for a description of the logging output and how to interpret it.

            How to ENABLE logging for ticket acquire/release

            curl -u <username>:<password> -v -X PUT -d "" -H "Content-Type: application/json" http://bitbucket.example.com:7990/rest/api/latest/logs/logger/com.atlassian.stash.internal.throttle/trace
            

            How to DISABLE logging for ticket acquire/release

            curl -u <username>:<password> -v -X PUT -d "" -H "Content-Type: application/json" http://bitbucket.example.com:7990/rest/api/latest/logs/logger/com.atlassian.stash.internal.throttle/info
            

            How to interpret logging for ticket acquire/release
            Once detailed logging is enabled per the above instructions each time a ticket is acquired you will see this log message:

            "POST /scm/project_1/rep_1.git/git-receive-pack HTTP/1.1" c.a.s.i.t.SemaphoreThrottleService Acquired [scm-hosting] ticket (57/58)
            

            The above says after the ticket was acquired there were 57 remaining out of a maximum of 58 still available.

            and each time one is released:

            "POST /scm/project_1/rep_1.git/git-receive-pack HTTP/1.1" c.a.s.i.t.SemaphoreThrottleService Released [scm-hosting] ticket (58/58)
            

            The above is essentially saying that after the ticket was released there were none used; 58 of 58 are available. So obviously if you ever see a log message where after release the two numbers are equal there is certainly no leak.

            Note that on Bitbucket Server 4.10 the second number may change. This is due to a new feature that dynamically tunes available hosting tickets based on CPU usage. Prior to Bitbucket Server 4.10 this number was always fixed. This however doesn't change your approach to determining if an scm-hosting ticket leak exists, what you are interested in is purely the difference between the two numbers (x/y) upon ticket release.

            Ben Humphreys added a comment - - edited There have been some requests for instructions to help determine if a system is impacted by this bug; versus exhausting available tickets due to real load. It is possible to rule out this bug (i.e. a ticket leak) by using the fact that: An idle system should have 0 used hosting tickets; and At some point (perhaps in the early hours of the morning, or over a weekend) the system will be idle, even if only for a few seconds The second point may not be true for all systems; but lets say you have reason to believe system load is low/negligible at some point during the night: 1) Before you go home, enable detailed logging for ticket acquire & release (see instructions below) 2) Confirm you are seeing in the atlassian-bitbucket.log file this new logging. The output will look like this: "POST /scm/project_1/rep_1.git/git-receive-pack HTTP/1.1" c.a.s.i.t.SemaphoreThrottleService Acquired [scm-hosting] ticket (57/58) "POST /scm/project_1/rep_1.git/git-receive-pack HTTP/1.1" c.a.s.i.t.SemaphoreThrottleService Released [scm-hosting] ticket (58/58) 3) When you arrive in the morning, disable detailed logging (see instructions below) 4) Analyse the log file to determine what the minimum scm-hosting ticket usage was during the period monitored. If usage reached zero then there is no ticket leak. See instructions below for a description of the logging output and how to interpret it. How to ENABLE logging for ticket acquire/release curl -u <username>:<password> -v -X PUT -d "" -H "Content-Type: application/json" http://bitbucket.example.com:7990/rest/api/latest/logs/logger/com.atlassian.stash.internal.throttle/trace How to DISABLE logging for ticket acquire/release curl -u <username>:<password> -v -X PUT -d "" -H "Content-Type: application/json" http://bitbucket.example.com:7990/rest/api/latest/logs/logger/com.atlassian.stash.internal.throttle/info How to interpret logging for ticket acquire/release Once detailed logging is enabled per the above instructions each time a ticket is acquired you will see this log message: "POST /scm/project_1/rep_1.git/git-receive-pack HTTP/1.1" c.a.s.i.t.SemaphoreThrottleService Acquired [scm-hosting] ticket (57/58) The above says after the ticket was acquired there were 57 remaining out of a maximum of 58 still available. and each time one is released: "POST /scm/project_1/rep_1.git/git-receive-pack HTTP/1.1" c.a.s.i.t.SemaphoreThrottleService Released [scm-hosting] ticket (58/58) The above is essentially saying that after the ticket was released there were none used; 58 of 58 are available. So obviously if you ever see a log message where after release the two numbers are equal there is certainly no leak. Note that on Bitbucket Server 4.10 the second number may change. This is due to a new feature that dynamically tunes available hosting tickets based on CPU usage. Prior to Bitbucket Server 4.10 this number was always fixed. This however doesn't change your approach to determining if an scm-hosting ticket leak exists, what you are interested in is purely the difference between the two numbers (x/y) upon ticket release.

            Our Bitbucket instance ran into this issue, and we encountered 3 Bitbucket outages in 24h.  Ben Stuart from Atlassian support (the same person who reported this) recommended that we upgrade to 4.14.5 or later to fix it.  

            Question: Should I be concerned that SSH commands are terminating early in our instance enough to cause outages?  We are currently upgraded to 4.14.5 to get this fix, but I still don't know what recent change in our Bitbucket instance is now causing more SSH commands to be terminated early to "activate" this bug.  I want to be sure that I've truly solved root cause to prevent future outages.

            Andrew Snyder added a comment - Our Bitbucket instance ran into this issue, and we encountered 3 Bitbucket outages in 24h.  Ben Stuart from Atlassian support (the same person who reported this) recommended that we upgrade to 4.14.5 or later to fix it.   Question: Should I be concerned that SSH commands are terminating early in our instance enough to cause outages?  We are currently upgraded to 4.14.5 to get this fix, but I still don't know what recent change in our Bitbucket instance is now causing more SSH commands to be terminated early to "activate" this bug.  I want to be sure that I've truly solved root cause to prevent future outages.

              Unassigned Unassigned
              bstuart Ben Stuart (Inactive)
              Affected customers:
              0 This affects my team
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: