-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
8.13.21, 8.13.22, 8.13.24, 8.20.12, 9.2.0, 8.13.25, 8.13.27, 8.20.14, 9.3.1
-
8.13
-
21
-
Severity 2 - Major
-
14
-
Issue Summary
Jira 8.13 notice π¨
Asynchronous WebHooks have been enabled by default on versions:
8.13.21 8.13.22 8.13.23 8.13.24
If you were unaware of this, refer to the Workaround section.
Asynchronous WebHooks availability
Async WebHooks were shipped through JRASERVER-68174 and are fully supported only in Jira 8.13, 8.20 and 9.2 onwards following this chart:
8.13.21 β 8.13.24: Enabled by default and not optimally implemented. Upgrade to 8.13.25 or later when possible. 8.13.25 β 8.13.99*: Fully supported 8.20.12 β 8.20.99*: Fully supported 9.2.0 β 9.99.99*: Fully supported * "hypothetical versions", meaning it's supported on all future releases of that branch.
Enabling asynchronous WebHooks makes use of the Web-Hook-Events-Processor queue, with a default Thread pool of 10 processors and maximum queue size of 200. This is for each node, and a full queue in "node 1" doesn't route new events to the idle queue in "node 2", for example.
In large instances, specially making use of 3rd party SAML/SSO apps, this queue is regularly flooded with too many UserAttributeStoredEvent objects.
When the queue reaches it's limit (200 default), it starts rejecting new events until it's able to accommodate more. WebHooks listening to events that've been rejected because of full queue won't be triggered.
Steps to Reproduce
- Setup Jira with SAML/SSO (preferably using a 3rd party app).
- Configure some 10 WebHooks to listen to all the events available and do not "Exclude body"
- Setup a JMeter or other request automation to simulate Issue updates, transitions and comment creations.
There's suspicion even REST API requests allowlisted to bypass the SAML/SSO will trigger the UserAttributeStoredEvent and flood the queue given enough load.
The default queue and/or Thread pool size can be reduced or the Issues being commented, transitioned or updated be populated with lots of data to delay the processors and help reproduce the queue overflow. (ref.: How to find the Issues with most data in Jira)
Expected Results
Jira's able to handle enterprise-level demand out-of-the-box.
Or events that aren't possible for WebHooks to listen to shouldn't be put into the queue to be discarded in sequence.
Actual Results
Rejected events are observed in atlassian-jira.log:
2022-10-31 11:46:48,839-0700 http-nio-8080-exec-68 url: /rest/api/2/issue/JIRA-12345/comment; user: charlie ERROR charlie 000x0000000x0 session00 127.0.0.1 /rest/api/2/issue/JIRA-12345/comment [c.a.event.internal.AsynchronousAbleEventDispatcher] There was an exception thrown trying to dispatch event [com.atlassian.jira.event.comment.CommentCreatedEvent@431012fb] from the invoker [SingleParameterMethodListenerInvoker{method=public void com.atlassian.webhooks.plugin.WebHookEventsProcessor.onEvent(java.lang.Object), listener=com.atlassian.webhooks.plugin.WebHookEventsProcessor@1539bc1f}] java.lang.RuntimeException: Task com.atlassian.sal.core.executor.ThreadLocalDelegateRunnable@6a201cb8 rejected from java.util.concurrent.ThreadPoolExecutor@490244d8[Running, pool size = 10, active threads = 10, queued tasks = 200, completed tasks = 2246209]. Listener: com.atlassian.webhooks.plugin.WebHookEventsProcessor event: com.atlassian.jira.event.comment.CommentCreatedEvent
We can parse the logs to "group by and count" the events rejected on that queue with this command (Linux):
egrep "pool size.*queued tasks.*WebHookEventsProcessor" $JIRA_HOME/log/atlassian-jira.log* | awk '{print $NF}' | sort | uniq -c | sort -nr | column -tx
Sample output:
2241 com.atlassian.crowd.event.user.UserAttributeStoredEvent 144 com.atlassian.jira.event.issue.IssueEvent 15 com.atlassian.jira.event.comment.CommentCreatedEvent 7 com.atlassian.jira.event.issue.link.IssueLinkCreatedEvent
We see a disproportional amount of rejected UserAttributeStoredEvent and also other Event types being rejected (probably as consequence of the flood of the first event type).
Workaround
The workarounds listed here are mutually exclusive: choose one of them. (though you can implement both, the Feature Flag takes precedence and the pool and queue size as irrelevant)
Note: The workarounds work for Jira 8.x and 9.x only.
A. Disabling async WebHooks
If you're on Jira Core versions 8.13.21β24 or on other Jira 8.20.x or Jira 9.2.x and have enabled the async WebHooks, you may disable it with the Feature Flag below and restart Jira (one node at a time will work):
com.atlassian.jira.webhookEventsAsyncProcessing.disabled
(ref.: How to manage dark features in Jira)
This will cause WebHooks to be processed within the same request/Thread that triggered the event and may increase response times to end-users or API-clients.
See Best practices on working with WebHooks in Jira on how to make best use of WebHooks in Jira.
B. Increasing queue size
The Web-Hook-Events-Processor Thread pool and queue size can be adjusted with the following JVM startup parameters (example with their default values):
-Dwebhooks.executor.queue.size=200 -Dwebhooks.executor.thread.pool.size=10
(ref.: Setting properties and options on startup)
A restart is required for the new params to work. In Jira Data Center it can be a rolling restart (one node at a time).
Β
- is cloned by
-
RAID-3194 Loading...
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
- relates to
-
JDCWUMAL-5 Loading...
-
PSR-808 Loading...