Type: Bug
Resolution: Fixed
Priority: Highest
Fix Version/s: 4.4.0
Affects Version/s: 3.9.0, (20)
3.9.1, 3.9.2, 3.9.3, 3.9.4, 3.9.6, 3.9.7, 3.9.8, 3.9.9, 3.9.10, 3.9.11, 3.10.0, 3.10.1, 3.10.2, 3.11.0, 3.11.1, 3.16.0, 3.16.1, 4.0.0, 4.1.0, 4.2.0
Component/s: SLA
Labels:
- cqt
- jr-85
- performance
- pse-request
- te-85
- te-rv

Support reference count:
207
Symptom Severity:
Severity 2 - Major
UIS:
1,218
Bug Fix Policy:
View Atlassian Server bug fix policy

Atlassian Update – 10 September 2021

Hi everyone,

Thank you for your feedback on the ticket and supporting our team in our investigation!

After analysing the problem, we have identified the issue of "Poor performance with high CPU and a high number of SdOffThreadEventJobRunner threads" has been fixed in JSD 4.4.0. This problem occurred due to the unbounded nature of threads before 4.4.0, which resulted in a high DB load on the instance.

However - we have identified two other issues with the JSM async processing logic which we need to resolve, related to the problems reported in this bug ticket. These issues are tracked in their respective tickets -

1. ~~JSDSERVER-5730~~

This is related to deadlocking of threads when there are frequent actions on one request. A fix for this is released behind a dark feature in 4.9.0
The development team will be working on enabling this by default in future. More details on the fix -
https://confluence.atlassian.com/jirakb/deadlocking-in-jira-service-desk-when-frequently-updating-the-same-issue-979428323.html

2. ~~JSDSERVER-8635~~

This is related to threads deadlocking when processing for one issue takes over 5 minutes. This ticket is currently gathering impact.

Please let us know if you have any further concerns with the above, please open a support ticket via https://support.atlassian.com

Thank you,

Alex

Description

JSD 3.9.0 attempts to address some of the friction between the SLA system and automation (~~JSDSERVER-4743~~) and poor issue creation performance by introducing a wrapper event type (inspired by OnCommitEvent) and an “expectation” system.

The expectation system gives features that are interested in one or more eligible event types a way to explicitly define the work that should be done before a wrapped event is dispatched, by submitting “jobs” that are executed in the strict cluster-wide order of their submission (no more than one job at a time for each issue) using a thread pool to avoid blocking any request threads (though we just use the submitting thread if it’s not a request thread).

The wrapper event type does the same for the work that should be done after what we refer to as “completion”, by defining @EventListener methods of type public void(ServiceDeskWrappedOnCompletionEvent).

At least two recent support cases have involved severe performance degradation of a node in and/or the database for an instance that seems to have been caused or exacerbated by the expectation system, so we’ll link potential causes to this issue as we find them.

Diagnosis

High CPU usage on DB server
Increased number of threads used by the Jira process
High number of SdOffThreadEventJobRunner threads on thread dumps connecting to the database

Possible workaround (JSD 3.9+)

These steps affect the expectation system such that jobs are always executed immediately on the submitting thread, without touching any OffThreadEventJobRunner or PSMQ code paths, as if the submitting threads are never request threads (~~JSDSERVER-5730~~).

Go to the dark feature settings page (<baseURL>/secure/SiteDarkFeatures!default.jspa)
Remove the feature flag sd.internal.base.off.thread.on.completion.events.enabled, if it exists
Add the following feature flag: sd.internal.base.off.thread.on.completion.events.disabled
Restart JIRA

SLA accuracy shouldn’t be negatively affected, but issue creation might take longer as a result. WHEN issue created automation rules with SLA-related JQL should still work (~~JSDSERVER-4743~~).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

dbmonitor-connection-pool-20180417.png
60 kB
18/Apr/2018 7:49 PM
dbmonitor-connection-pool-20180418.png
30 kB
18/Apr/2018 7:49 PM

is caused by

JSDSERVER-5730 OffThreadEventJobRunner job execution threads wait for their turn in a very expensive way

Closed

JSDSERVER-5732 OffThreadEventJobRunner uses an unbounded ThreadPoolExecutor that can exhaust the DBCP

Closed

is related to

JSDSERVER-5730 OffThreadEventJobRunner job execution threads wait for their turn in a very expensive way

Closed

blocks: GHS-143296 You do not have permission to view this issue

causes: PS-28140 You do not have permission to view this issue; ITOPS-1449 Loading...

is cloned by: JSMDC-2830 Loading...; JSMDC-3898 Loading...

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

relates to: DELTA-809 Loading...; PSR-213 Loading...; SSE-582 Loading...

(1 causes, 2 is cloned by, 40 mentioned in, 3 relates to)

Alex Cooksey added a comment - 10/Sep/2021 12:12 AM

Atlassian Update – 10 September 2021

Hi everyone,

Thank you for your feedback on the ticket and supporting our team in our investigation!

1. ~~JSDSERVER-5730~~

This is related to deadlocking of threads when there are frequent actions on one request. A fix for this is released behind a dark feature in 4.9.0
The development team will be working on enabling this by default in future. More details on the fix -
https://confluence.atlassian.com/jirakb/deadlocking-in-jira-service-desk-when-frequently-updating-the-same-issue-979428323.html

2. ~~JSDSERVER-8635~~

This is related to threads deadlocking when processing for one issue takes over 5 minutes. This ticket is currently gathering impact.

Please let us know if you have any further concerns with the above, please open a support ticket via https://support.atlassian.com

Thank you,

Alex

Alex Cooksey added a comment - 10/Sep/2021 12:12 AM Atlassian Update – 10 September 2021 Hi everyone, Thank you for your feedback on the ticket and supporting our team in our investigation! After analysing the problem, we have identified the issue of "Poor performance with high CPU and a high number of SdOffThreadEventJobRunner threads" has been fixed in JSD 4.4.0 . This problem occurred due to the unbounded nature of threads before 4.4.0, which resulted in a high DB load on the instance. However - we have identified two other issues with the JSM async processing logic which we need to resolve, related to the problems reported in this bug ticket. These issues are tracked in their respective tickets - 1. JSDSERVER-5730 This is related to deadlocking of threads when there are frequent actions on one request. A fix for this is released behind a dark feature in 4.9.0 The development team will be working on enabling this by default in future. More details on the fix - https://confluence.atlassian.com/jirakb/deadlocking-in-jira-service-desk-when-frequently-updating-the-same-issue-979428323.html 2. JSDSERVER-8635 This is related to threads deadlocking when processing for one issue takes over 5 minutes. This ticket is currently gathering impact. Please let us know if you have any further concerns with the above, please open a support ticket via https://support.atlassian.com Thank you, Alex

Stephan Vos added a comment - 19/Aug/2021 6:53 AM

We experiencing a crash that seems to be related to this. (Service Management Server 4.15)

There is contention on the Queue and Message tables which seems to cause either regular Deadlock or sometimes results in a MySQL crash.

Aug 19 07:05:10 localhost mysqld[11332]: Some pointers may be invalid and cause the dump to abort.
Aug 19 07:05:10 localhost mysqld[11332]: Query (0x7f34040107c0): update `AO_319474_MESSAGE` set `CLAIMANT` = null, `CLAIMANT_TIME` = null where `AO_319474_MESSAGE`.`QUEUE_ID` = 1214021 and `AO_319474_MESSAGE`.`CLAIMANT` is not null
Aug 19 07:05:10 localhost mysqld[11332]: Connection ID (thread ID): 12046760

Stephan Vos added a comment - 19/Aug/2021 6:53 AM We experiencing a crash that seems to be related to this. (Service Management Server 4.15) There is contention on the Queue and Message tables which seems to cause either regular Deadlock or sometimes results in a MySQL crash. Aug 19 07:05:10 localhost mysqld [11332] : Some pointers may be invalid and cause the dump to abort. Aug 19 07:05:10 localhost mysqld [11332] : Query (0x7f34040107c0): update `AO_319474_MESSAGE` set `CLAIMANT` = null, `CLAIMANT_TIME` = null where `AO_319474_MESSAGE`.`QUEUE_ID` = 1214021 and `AO_319474_MESSAGE`.`CLAIMANT` is not null Aug 19 07:05:10 localhost mysqld [11332] : Connection ID (thread ID): 12046760

Alex Cooksey added a comment - 30/Jun/2021 3:41 AM - edited

Atlassian Update – 30 June 2021

Hi everyone,

Thank you for your feedback regarding this bug and bringing it to our attention. Since this bug has been re-opened in May we have resumed our investigation into this issue.

Currently we're unable to reproduce this issue and are looking to work with customers directly affected who are able to reproduce the problem.

If you're able to reproduce the issue, can you please open a support ticket via https://support.atlassian.com, and let the support engineer know about this ticket and this message. We'll work with you to make sure the development team has the information they need to begin working on resolving it.

Please let us know if you have any concerns with the above.

Thank you,

Alex

Alex Cooksey added a comment - 30/Jun/2021 3:41 AM - edited Atlassian Update – 30 June 2021 Hi everyone, Thank you for your feedback regarding this bug and bringing it to our attention. Since this bug has been re-opened in May we have resumed our investigation into this issue. Currently we're unable to reproduce this issue and are looking to work with customers directly affected who are able to reproduce the problem. If you're able to reproduce the issue, can you please open a support ticket via https://support.atlassian.com , and let the support engineer know about this ticket and this message. We'll work with you to make sure the development team has the information they need to begin working on resolving it. Please let us know if you have any concerns with the above. Thank you, Alex

Gonchik Tsymzhitov added a comment - 22/Jun/2021 1:34 PM

Hi Denise,
Thank you!
Sorry for bothering you.

Gonchik Tsymzhitov added a comment - 22/Jun/2021 1:34 PM Hi Denise, Thank you! Sorry for bothering you.

Andrea Hakim added a comment - 04/Jun/2021 2:56 PM

Issue occurring in JSD 4.15

Andrea Hakim added a comment - 04/Jun/2021 2:56 PM Issue occurring in JSD 4.15

Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 20/May/2021 8:40 PM

To be clear, I am not on the Jira Development team, so I can't update the fix version, and I can't fix this problem myself. We are experiencing this on an internal Jira that my team owns, so I have reopened it for the Jira team to re-triage.

Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 20/May/2021 8:40 PM To be clear, I am not on the Jira Development team, so I can't update the fix version, and I can't fix this problem myself. We are experiencing this on an internal Jira that my team owns, so I have reopened it for the Jira team to re-triage.

Gonchik Tsymzhitov added a comment - 17/May/2021 4:26 PM

dunterwurzacher Could you clean a fix version please ?

Gonchik Tsymzhitov added a comment - 17/May/2021 4:26 PM dunterwurzacher Could you clean a fix version please ?

Gonchik Tsymzhitov added a comment - 16/May/2021 9:04 PM

dunterwurzacher Thank you!

Gonchik Tsymzhitov added a comment - 16/May/2021 9:04 PM dunterwurzacher Thank you!

Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 05/May/2021 9:45 PM

Reopening this for the Jira team to triage again, as it seems to still be occurring for a lot of folks in later versions.

Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 05/May/2021 9:45 PM Reopening this for the Jira team to triage again, as it seems to still be occurring for a lot of folks in later versions.

Kevin Dalton added a comment - 30/Mar/2021 9:13 PM

Curious what the workaround would do for those not running Service Desk. Is there still a benefit of adding sd.internal.base.off.thread.on.completion.events.disabled?

Kevin Dalton added a comment - 30/Mar/2021 9:13 PM Curious what the workaround would do for those not running Service Desk. Is there still a benefit of adding sd.internal.base.off.thread.on.completion.events.disabled?

Assignee:: Mohil Chandra

Reporter:: Delan Azabani (Inactive)

Affected customers:: 195 This affects my team

Watchers:: 188 Start watching this issue

Created:: 29/Mar/2018 2:33 AM

Updated:: 23/Jan/2025 11:16 AM

Resolved:: 10/Sep/2021 12:12 AM

Details

Description

Description

Diagnosis

Possible workaround (JSD 3.9+)

Attachments

Attachments

Issue Links

Forms

Activity

[JSDSERVER-5736] Poor performance with high CPU and a high number of SdOffThreadEventJobRunner threads

Collapse comment: Alex Cooksey added a comment - 10/Sep/2021 12:12 AM

Expand comment: Alex Cooksey added a comment - 10/Sep/2021 12:12 AM

Collapse comment: Stephan Vos added a comment - 19/Aug/2021 6:53 AM

Expand comment: Stephan Vos added a comment - 19/Aug/2021 6:53 AM

Collapse comment: Alex Cooksey added a comment - 30/Jun/2021 3:41 AM, Edited by Alex Cooksey - 30/Jun/2021 4:10 AM

Expand comment: Alex Cooksey added a comment - 30/Jun/2021 3:41 AM, Edited by Alex Cooksey - 30/Jun/2021 4:10 AM

Collapse comment: Gonchik Tsymzhitov added a comment - 22/Jun/2021 1:34 PM

Expand comment: Gonchik Tsymzhitov added a comment - 22/Jun/2021 1:34 PM

Collapse comment: Andrea Hakim added a comment - 04/Jun/2021 2:56 PM

Expand comment: Andrea Hakim added a comment - 04/Jun/2021 2:56 PM

Collapse comment: Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 20/May/2021 8:40 PM

Expand comment: Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 20/May/2021 8:40 PM

Collapse comment: Gonchik Tsymzhitov added a comment - 17/May/2021 4:26 PM

Expand comment: Gonchik Tsymzhitov added a comment - 17/May/2021 4:26 PM

Collapse comment: Gonchik Tsymzhitov added a comment - 16/May/2021 9:04 PM

Expand comment: Gonchik Tsymzhitov added a comment - 16/May/2021 9:04 PM

Collapse comment: Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 05/May/2021 9:45 PM

Expand comment: Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 05/May/2021 9:45 PM

Collapse comment: Kevin Dalton added a comment - 30/Mar/2021 9:13 PM

Expand comment: Kevin Dalton added a comment - 30/Mar/2021 9:13 PM

People

Dates