Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-74492

Enabling asynchronous WebHooks may result in the Web-Hook-Events-Processor queue overflowing with too many UserAttributeStoredEvent and other WebHooks not being triggered as a result

XMLWordPrintable

      Issue Summary

      Jira 8.13 notice 🚨

      Asynchronous WebHooks have been enabled by default on versions:

      8.13.21
      8.13.22
      8.13.23
      8.13.24
      

      If you were unaware of this, refer to the Workaround section.

      Asynchronous WebHooks availability

      Async WebHooks were shipped through JRASERVER-68174 and are fully supported only in Jira 8.13, 8.20 and 9.2 onwards following this chart:

      8.13.21 – 8.13.24: Enabled by default and not optimally implemented. Upgrade to 8.13.25 or later when possible.
      
      8.13.25 – 8.13.99*: Fully supported
      8.20.12 – 8.20.99*: Fully supported
      9.2.0   – 9.99.99*: Fully supported
      
      * "hypothetical versions", meaning it's supported on all future releases of that branch.
      

      Enabling asynchronous WebHooks makes use of the Web-Hook-Events-Processor queue, with a default Thread pool of 10 processors and maximum queue size of 200. This is for each node, and a full queue in "node 1" doesn't route new events to the idle queue in "node 2", for example.

      In large instances, specially making use of 3rd party SAML/SSO apps, this queue is regularly flooded with too many UserAttributeStoredEvent objects.
      When the queue reaches it's limit (200 default), it starts rejecting new events until it's able to accommodate more. WebHooks listening to events that've been rejected because of full queue won't be triggered.

      Steps to Reproduce

      1. Setup Jira with SAML/SSO (preferably using a 3rd party app).
      2. Configure some 10 WebHooks to listen to all the events available and do not "Exclude body"
      3. Setup a JMeter or other request automation to simulate Issue updates, transitions and comment creations.

      There's suspicion even REST API requests allowlisted to bypass the SAML/SSO will trigger the UserAttributeStoredEvent and flood the queue given enough load.

      The default queue and/or Thread pool size can be reduced or the Issues being commented, transitioned or updated be populated with lots of data to delay the processors and help reproduce the queue overflow. (ref.: How to find the Issues with most data in Jira)

      Expected Results

      Jira's able to handle enterprise-level demand out-of-the-box.
      Or events that aren't possible for WebHooks to listen to shouldn't be put into the queue to be discarded in sequence.

      Actual Results

      Rejected events are observed in atlassian-jira.log:

      2022-10-31 11:46:48,839-0700 http-nio-8080-exec-68 url: /rest/api/2/issue/JIRA-12345/comment; user: charlie ERROR charlie 000x0000000x0 session00 127.0.0.1 /rest/api/2/issue/JIRA-12345/comment [c.a.event.internal.AsynchronousAbleEventDispatcher] There was an exception thrown trying to dispatch event [com.atlassian.jira.event.comment.CommentCreatedEvent@431012fb] from the invoker [SingleParameterMethodListenerInvoker{method=public void com.atlassian.webhooks.plugin.WebHookEventsProcessor.onEvent(java.lang.Object), listener=com.atlassian.webhooks.plugin.WebHookEventsProcessor@1539bc1f}]
      java.lang.RuntimeException: Task com.atlassian.sal.core.executor.ThreadLocalDelegateRunnable@6a201cb8 rejected from java.util.concurrent.ThreadPoolExecutor@490244d8[Running, pool size = 10, active threads = 10, queued tasks = 200, completed tasks = 2246209]. Listener: com.atlassian.webhooks.plugin.WebHookEventsProcessor event: com.atlassian.jira.event.comment.CommentCreatedEvent
      

      We can parse the logs to "group by and count" the events rejected on that queue with this command (Linux):

      egrep "pool size.*queued tasks.*WebHookEventsProcessor" $JIRA_HOME/log/atlassian-jira.log* | awk '{print $NF}' | sort | uniq -c | sort -nr | column -tx
      

      Sample output:

      2241  com.atlassian.crowd.event.user.UserAttributeStoredEvent
      144   com.atlassian.jira.event.issue.IssueEvent
      15    com.atlassian.jira.event.comment.CommentCreatedEvent
      7     com.atlassian.jira.event.issue.link.IssueLinkCreatedEvent
      

      We see a disproportional amount of rejected UserAttributeStoredEvent and also other Event types being rejected (probably as consequence of the flood of the first event type).

      Workaround

      The workarounds listed here are mutually exclusive: choose one of them. (though you can implement both, the Feature Flag takes precedence and the pool and queue size as irrelevant)

      Note: The workarounds work for Jira 8.x and 9.x only.

      A. Disabling async WebHooks

      If you're on Jira Core versions 8.13.21–24 or on other Jira 8.20.x or Jira 9.2.x and have enabled the async WebHooks, you may disable it with the Feature Flag below and restart Jira (one node at a time will work):

      com.atlassian.jira.webhookEventsAsyncProcessing.disabled

      (ref.: How to manage dark features in Jira)

      This will cause WebHooks to be processed within the same request/Thread that triggered the event and may increase response times to end-users or API-clients.

      See Best practices on working with WebHooks in Jira on how to make best use of WebHooks in Jira.

      B. Increasing queue size

      The Web-Hook-Events-Processor Thread pool and queue size can be adjusted with the following JVM startup parameters (example with their default values):

      -Dwebhooks.executor.queue.size=200 -Dwebhooks.executor.thread.pool.size=10
      

      (ref.: Setting properties and options on startup)

      A restart is required for the new params to work. In Jira Data Center it can be a rolling restart (one node at a time).

      Β 


              Unassigned Unassigned
              rmartinez3@atlassian.com Rodrigo Martinez
              Votes:
              5 Vote for this issue
              Watchers:
              19 Start watching this issue

                Created:
                Updated: