NOTE: This bug report is for JIRA Server. Using JIRA Cloud? See the corresponding bug report.

      Summary

      Atlassian HTTP Client's callbacks' ThreadPoolExecutor doesn't queue callbacks, so it might reject tasks submitted to it during high load (when no threads available right at the moment of submitting a task).
      In such cases it throws java.util.concurrent.RejectedExecutionException from the execute method in a submitting thread.
      Apache HTTP Client doesn't expect that, so it throws org.apache.http.nio.reactor.IOReactorException when catching RejectedExecutionException and changes IOReactor status to STOPPED therefore preventing further requests being sent by this client bean.
      One of the consequences is that WebHooks stop working after it happens.

      Environment

      Any JIRA up to 7.2.0

      Steps to Reproduce

      1. Enable DEBUG logging for com.atlassian.httpclient
      2. Set up an HTTP endpoint (preferably having a timeout of 30+ sec before responding)
      3. Configure a WebHook against it
      4. Make a bulk edit of 1000 issues
      5. Cross the fingers
      6. Repeat bulk edits until you see java.util.concurrent.RejectedExecutionException in the logs.

      Expected Results

      • No java.util.concurrent.RejectedExecutionException thrown
      • No org.apache.http.nio.reactor.IOReactorException thrown
      • All JIRA's outgoing HTTP requests (including webhooks) continue working

      Actual Results

      • Apache HTTP client's reactor status is STOPPED
      • Webhooks (as well as some other functionality) do not work

      Workaround

      That might only decrease the probability of this issue, but not fix it completely:

      • Decrease the number of webhooks
      • Decrease the number of errors when calling webhooks (e.g. get rid of webhooks often throwing TimeoutExceptions, etc.)
      • Better hardware might also help slightly

      Actions that help if the issue actually happened:

      • Disabling/enabling the Atlassian HTTP Client plugin
        OR
      • Restarting JIRA.

      Note:

      • Disabling/enabling the Atlassian HTTP Client Plugin will trigger other plugins to be restarted. Schedule downtime to make sure it is not affecting users that using JIRA.
      • Disabling/enabling the Atlassian HTTP Client Plugin will disable other Jira apps (Jira Software, Jira Service Desk). You need to reenable them manually from Administrations > Add-ons > Manage add-ons after that.

          Form Name

            [JRASERVER-61937] Atlassian HTTP client might stop working at high load

            Adding that configuration did not work for JRE 11.

            The core issue is this (except for Jira instead of Bitbucket): BSERV-12131

            The root of the issue appears to be JDK-8214418 (which is not publicly published) but details on the fix in JRE 13 are located here.

            Atlassian Support advised the fixes (workarounds) in BSERV-12131.

            Jonathan Hult added a comment - Adding that configuration did not work for JRE 11. The core issue is this (except for Jira instead of Bitbucket):  BSERV-12131 The root of the issue appears to be JDK-8214418 (which is not publicly published) but details on the fix in JRE 13 are located here . Atlassian Support advised the fixes (workarounds) in  BSERV-12131 .

            This still seems to be an issue (at least on our 8.7.1 Server instance). Support just advised me to put in the above config. I will do this and report back.

            Jonathan Hult added a comment - This still seems to be an issue (at least on our 8.7.1 Server instance). Support just advised me to put in the above config. I will do this and report back.

            Sean Yong added a comment - - edited

            Hi All,

            To add on top of ialexeyenko's comment above:

            There's no specific limit nor one value that fits all environment and hence, the limit is configurable. The queue itself isn't as expensive and it doesn't put too much pressure unless you're operating more than 100,000 or even millions. As such, you can increase the value to 2048 as a start or even double the original value if necessary.

            How to Set It

            Add the following system property in the setenv.sh file:

            -Dcom.atlassian.httpclient.options.threadWorkQueueLimit=2048
            

            Cheers,
            Sean Yong
            Atlassian Premier Support

            Sean Yong added a comment - - edited Hi All, To add on top of ialexeyenko 's comment above : There's no specific limit nor one value that fits all environment and hence, the limit is configurable. The queue itself isn't as expensive and it doesn't put too much pressure unless you're operating more than 100,000 or even millions. As such, you can increase the value to 2048 as a start or even double the original value if necessary. How to Set It Add the following system property in the setenv.sh file: -Dcom.atlassian.httpclient.options.threadWorkQueueLimit=2048 Cheers, Sean Yong Atlassian Premier Support

            David Yu added a comment - - edited

            Correction: nevermind, seems the doc suggests only values that are changed get sent. Still, it'd be nice to have some customization to the body.

            Have given any consideration to allowing users to customize the webhook payload? Today, it's basically all or nothing. Most will go with all. This might be fine for some tickets until you hit those occasionally ones with 100+ comments. I imagine you can fire them off quicker if the body wasn't so huge. See JRA-63205.

            David Yu added a comment - - edited Correction: nevermind, seems the doc suggests only values that are changed get sent. Still, it'd be nice to have some customization to the body. Have given any consideration to allowing users to customize the webhook payload? Today, it's basically all or nothing. Most will go with all. This might be fine for some tickets until you hit those occasionally ones with 100+ comments. I imagine you can fire them off quicker if the body wasn't so huge. See JRA-63205 .

            Ignat (Inactive) added a comment - - edited

            Hi everyone,

            Issue is fixed for 7.2.6. Upgrading JIRA Platform (or JIRA Core) to version 7.2.6 Server is necessary to resolve the problem. In a typical scenario after upgrading to 7.2.6 the problem should be gone.

            However on large JIRA instances, the defaults that JIRA is now shipped with might be not sufficient, fortunately JIRA now allows to configure HTTP client to suite any load pattern. 

            Since JIRA 7.2.6 if JIRA dropping HTTP request, the WARN log message is shown:

            Exceeded the limit of requests waiting for execution. Increase the value of the system property com.atlassian.httpclient.options.threadWorkQueueLimit to pre
            vent these situations in the future. Current value of com.atlassian.httpclient.options.threadWorkQueueLimit = 256.
            

             

            So this allow now to detect if JIRA HTTP client is operational, and if it's current configuration matches the JIRA load patterns. If necessary, the value of a system property

            com.atlassian.httpclient.options.threadWorkQueueLimit
            

            may be increased to address the issue with missing HTTP requests.

            this property is present only since JIRA 7.2.6 and will have no effects on previous JIRA versions.


            Cheers,
            Ignat
            JIRA Bugmaster.

            Ignat (Inactive) added a comment - - edited Hi everyone, Issue is fixed for 7.2.6. Upgrading JIRA Platform (or JIRA Core) to version 7.2.6 Server is necessary to resolve the problem. In a typical scenario after upgrading to 7.2.6 the problem should be gone. However on large JIRA instances, the defaults that JIRA is now shipped with might be not sufficient, fortunately JIRA now allows to configure HTTP client to suite any load pattern.  Since JIRA 7.2.6 if JIRA dropping HTTP request, the WARN log message is shown: Exceeded the limit of requests waiting for execution. Increase the value of the system property com.atlassian.httpclient.options.threadWorkQueueLimit to pre vent these situations in the future . Current value of com.atlassian.httpclient.options.threadWorkQueueLimit = 256.   So this allow now to detect if JIRA HTTP client is operational, and if it's current configuration matches the JIRA load patterns. If necessary, the value of a system property com.atlassian.httpclient.options.threadWorkQueueLimit may be increased to address the issue with missing HTTP requests. this property is present only since JIRA 7.2.6 and will have no effects on previous JIRA versions. – Cheers, Ignat JIRA Bugmaster.

            Hi to IT Department,

            We need at least 1 week time from today to run this release internally and verify that we can make it live. We usually ship bugfix releases bi-weekly however technical challenges or issues uncovered along the way might make the release slip the schedule.

            I hope it gives a good enough approximation, so you can adjust your upgrade plans.

            Cheers,
            Ignat
            JIRA Bugmaster.

            Ignat (Inactive) added a comment - Hi to IT Department, We need at least 1 week time from today to run this release internally and verify that we can make it live. We usually ship bugfix releases bi-weekly however technical challenges or issues uncovered along the way might make the release slip the schedule. I hope it gives a good enough approximation, so you can adjust your upgrade plans. Cheers, Ignat JIRA Bugmaster.

            Is there an eta for the release of 7.2.5

            IT Department added a comment - Is there an eta for the release of 7.2.5

            purchases12, Thank you for your interest.

            We are working on a solution. The fixed version will be released not earlier than two weeks from now. I'm going to update you as soon as there's more information.

            Artur Pawelczyk (Inactive) added a comment - purchases12 , Thank you for your interest. We are working on a solution. The fixed version will be released not earlier than two weeks from now. I'm going to update you as soon as there's more information.

            Hi

            There doesn't seem to be much progress being made with this bug?

            Is there any eta at all?

            Thanks

            IT Department added a comment - Hi There doesn't seem to be much progress being made with this bug? Is there any eta at all? Thanks

            Hi purchases12,

            It is very important issue and we have put it onto our short term backlolog but there is no ETA for the fix.

            Please watch this issue to receive immediate notifications as soon as they are available.

            Thanks,
            Jacek Jaroczynski
            JIRA Bugmaster
            [Atlassian]

            Jacek Jaroczynski (Inactive) added a comment - Hi purchases12 , It is very important issue and we have put it onto our short term backlolog but there is no ETA for the fix. Please watch this issue to receive immediate notifications as soon as they are available. Thanks, Jacek Jaroczynski JIRA Bugmaster [Atlassian]

              apawelczyk Artur Pawelczyk (Inactive)
              ibruzgin Ivan Bruzgin (Inactive)
              Affected customers:
              11 This affects my team
              Watchers:
              27 Start watching this issue

                Created:
                Updated:
                Resolved: