Uploaded image for project: 'Jira Platform Cloud'
  1. Jira Platform Cloud
  2. JRACLOUD-81846

HTTP Status 403 returned for DB connection error when calling REST APIs

      Summary

      Bulk rest api calls that fails with DB connection errors is returning a 403 response code to the client.

      1. Content response is even more misleading :

      <html>
      <head>
          <title>Forbidden (403)</title> 
      <!--[if IE]><![endif]-->
      <script type="text/javascript" >
      

      2. Jira application logs indicates DB connection error from too many request within a short period of time.

      ERROR	com.atlassian.plugins.rest.common.error.jersey.ThrowableExceptionMapper	
      Uncaught exception thrown by REST service: org.ofbiz.core.entity.GenericDataSourceException: 
      Unable to establish a connection with the database. (PSQL_TOO_MANY_CONNECTIONS Exception already occurred in this workcontext, skipping next getConnection)
      

      Jira is returning wrong response code along with error message. The external App sends API request to Jira and it failed. Jira returns 403 error while it can return 429 error. If the App receives 403, it will stop operation. If the App receives 429, it should retry sending request again.

      Steps to Reproduce

      1. Call REST APIs in continuous succession

      Expected Results

      If the rate limit is hit, the status 429. Throttling limit state should be returned in headers like here

      Actual Results

      HTTP 403 response code returned with the following content

      <html>
      <head>
          <title>Forbidden (403)</title> 
      <!--[if IE]><![endif]-->
      <script type="text/javascript" >
      

          Form Name

            [JRACLOUD-81846] HTTP Status 403 returned for DB connection error when calling REST APIs

            We acknowledge existence of the problem with unexpected DB connection errors returned by API, and realize how painful it is for Ecosystem vendors, Jira and apps users. We’d love to deliver fix asap, but unfortunately it will take us at least few more months as solution requires major updates to our rate limiting mechanisms. Good news is that we work on that already.

            403 errors indicate Atlassian infrastructure is overloaded by the number of incoming requests, and in order to avoid Jira outage we rate limit some of them. Current rate limiting mechanism is implemented at Jira site level and is exposed to "noisy neighbourhood" issue. Requests coming from the users, your app and other apps accessing our infrastructure are treated in the same manner, so single misbehaving consumer can trigger rate limiting for the whole Jira site.

            We are actively working on per consumer rate limiting which should help us provide meaningful information in the response headers. We’ll let you when we have got more details.

            At the time being we suggest:

            • Retry failed requests using back-off mechanism to lower the frequency of retries. In other words - retry request responsibly, by giving Jira more time to recover if previous retry attempt failed.
            • Use dynamic webhooks to be notified about changes that happened in Jira.
            • Use bulk REST APIs or Jira expressions to minimise amount of requests and data returned from Jira.

            Mateusz Szerszyński added a comment - We acknowledge existence of the problem with unexpected DB connection errors returned by API, and realize how painful it is for Ecosystem vendors, Jira and apps users. We’d love to deliver fix asap, but unfortunately it will take us at least few more months as solution requires major updates to our rate limiting mechanisms. Good news is that we work on that already. 403 errors indicate Atlassian infrastructure is overloaded by the number of incoming requests, and in order to avoid Jira outage we rate limit some of them. Current rate limiting mechanism is implemented at Jira site level and is exposed to "noisy neighbourhood" issue. Requests coming from the users, your app and other apps accessing our infrastructure are treated in the same manner, so single misbehaving consumer can trigger rate limiting for the whole Jira site. We are actively working on per consumer rate limiting which should help us provide meaningful information in the response headers. We’ll let you when we have got more details. At the time being we suggest: Retry failed requests using back-off mechanism to lower the frequency of retries. In other words - retry request responsibly, by giving Jira more time to recover if previous retry attempt failed. Use dynamic webhooks to be notified about changes that happened in Jira. Use bulk REST APIs or Jira expressions to minimise amount of requests and data returned from Jira.

            403's generally should not be retried.

            I don't agree this should be a 429.  429 should client side exponential backoff, but unless there's an actual request rate limit, clients can't really know how they should act if they want to be well-behaved and avoid it. 

            This is system behavior that, from the client perspective, is totally dependent on hidden variables and random.

            This exception should just return a 500.  500's should be safe to retry always.  (also assuming the request is atomic, idempotent, and clients use exponential backoff.

            Eric Psalmond added a comment - 403's generally should not be retried. I don't agree this should be a 429.  429 should client side exponential backoff, but unless there's an actual request rate limit, clients can't really know how they should act if they want to be well-behaved and avoid it.  This is system behavior that, from the client perspective, is totally dependent on hidden variables and random. This exception should just return a 500.  500's should be safe to retry always.  (also assuming the request is atomic, idempotent, and clients use exponential backoff.

            This is greatly impacting myself and team as well. Please fix ASAP!

            Jess Carter added a comment - This is greatly impacting myself and team as well. Please fix ASAP!

            Evan Wolf added a comment -

            Effecting plugins functionality that used to work without issue. Cloud slowness may be acceptable but not failures!

            Evan Wolf added a comment - Effecting plugins functionality that used to work without issue. Cloud slowness may be acceptable but not failures!

            Brydie McCoy (CB) added a comment - - edited

            This is also affecting our app, we are starting to see more and more 403s as well.  Customers are getting frustrated because they can't rely on our functionality.   

            Also, previous complaints about 403's have resulted in people telling us to add retries - if the source truly is a rate limit we may potentially be making things worse for ourselves. 

            (https://ecosystem.atlassian.net/browse/ACJIRA-1868 was our last 403 problem - though here they said it wasn't rate limiting causing it...)

             

            Brydie McCoy (CB) added a comment - - edited This is also affecting our app, we are starting to see more and more 403s as well.  Customers are getting frustrated because they can't rely on our functionality.    Also, previous complaints about 403's have resulted in people telling us to add retries - if the source truly is a rate limit we may potentially be making things worse for ourselves.  ( https://ecosystem.atlassian.net/browse/ACJIRA-1868 was our last 403 problem - though here they said it wasn't rate limiting causing it...)  

            We have hundreds of failing requests related to API rate limits, some are relatively easy to identify, e.g. request with status code 500 and "org.postgresql.util.PSQLException: FATAL: too many connections for role \"GL62fZE7FxApdF2RarwDJW\" message in body. 

            However others, especially 403 are extremely difficult to identify as API rate limit. I am collecting various cases here:  https://ecosystem.atlassian.net/browse/ACJIRA-1929

            Maciej Dudziak [Deviniti] added a comment - We have hundreds of failing requests related to API rate limits, some are relatively easy to identify, e.g. request with status code 500 and "org.postgresql.util.PSQLException: FATAL: too many connections for role \"GL62fZE7FxApdF2RarwDJW\" message in body.  However others, especially 403 are extremely difficult to identify as API rate limit. I am collecting various cases here:  https://ecosystem.atlassian.net/browse/ACJIRA-1929

            What are vendors supposed to do to work around this problem while we wait for a fix? This was reported 10 months ago and there is no indication from Atlassian that it is being addressed. Our apps are failing, making us look bad to our customers and implementing retry mechanisms in our product will only aggravate the problem. In the last week alone, we have received unexpected 403s for over 16,000 requests.

            Jon Bevan [Adaptavist] added a comment - What are vendors supposed to do to work around this problem while we wait for a fix? This was reported 10 months ago and there is no indication from Atlassian that it is being addressed. Our apps are failing, making us look bad to our customers and implementing retry mechanisms in our product will only aggravate the problem. In the last week alone, we have received unexpected 403s for over 16,000 requests.

            Please escalate!

            David put it nicely. PLEASE escalate this issue!

            Jennifer Hagbi added a comment - David put it nicely. PLEASE escalate this issue!

            Fully agree with David here, this greatly impacts all our Cloud Apps as well.

            Bartłomiej Styczyński added a comment - Fully agree with David here, this greatly impacts all our Cloud Apps as well.

              Unassigned Unassigned
              rmacalinao Ramon M
              Affected customers:
              45 This affects my team
              Watchers:
              64 Start watching this issue

                Created:
                Updated: