[JSDSERVER-8635] Off-threading caused most threads to enter cluster lock wait

Type: Bug
Resolution: Fixed
Priority: Highest
Fix Version/s: 4.21.0, 4.20.3, 4.13.15
Affects Version/s: 4.13.9
Component/s: SLA
Labels:
None

Support reference count:
17
Symptom Severity:
Severity 1 - Critical
UIS:
434
Bug Fix Policy:
View Atlassian Server bug fix policy

Issue Summary

With off-threading enabled, AO_319474_MESSAGE database table is growing and most SdSerialisedOffThreadProcessor threads are stuck waiting for cluster lock.

Checking the clusterlockstatus database table, the cluster locks held by the Jira node are associated to issues with substantial amount of comments (~5000 comments) and the MESSAGE_COUNT for those issues are high.

This can happen when SLA events are being processed off thread with database backed events.

Steps to Reproduce

Update issue comment that trigger SLA update whilst issues are being created by multiple concurrent users.

Expected Results

Relevant SLA and automation are updated.

Actual Results

SLA and automation are not updated.

Most SdSerialisedOffThreadProcessor threads are in cluster lock wait, e.g:

"SdSerialisedOffThreadProcessor:thread-1" #1484 prio=5 os_prio=0 tid=0x00007efa7359f000 nid=0x3afde waiting on condition [0x00007ef809f8f000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at com.atlassian.beehive.db.DatabaseClusterLock.sleep(DatabaseClusterLock.java:621)
	at com.atlassian.beehive.db.DatabaseClusterLock.uninterruptibleWait(DatabaseClusterLock.java:127)
	at com.atlassian.beehive.db.DatabaseClusterLock.lock(DatabaseClusterLock.java:107)
	at com.atlassian.servicedesk.internal.sla.customfield.SlaFieldUpdateLockManagerImpl.lockSlaUpdate(SlaFieldUpdateLockManagerImpl.java:24)

Workaround

One option is to disable off thread processing for SLA events. This is not always recommended, as it will result in SLAs being calculated on the HTTP threads so users may notice slower response times on the UI.

Go to the dark feature settings page (<baseURL>/secure/SiteDarkFeatures!default.jspa)
Remove the feature flag sd.internal.base.off.thread.on.completion.events.enabled, if it exists
Remove the feature flag sd.internal.base.db.backed.completion.events.enabled, if it exists
Add the following feature flag: sd.internal.base.off.thread.on.completion.events.disabled
Add the following feature flag: sd.internal.bounded.off.thread.on.completion.events.disabled
Restart JIRA

Additional fixVersion, only for 4.13.15+, and 4.20.3+, 4.21.0 and onwards

A more complete fix, which improves the way that cluster locks are handled with off-threading enabled is available in 4.21.0, enabled by default, and also in Jira 4.13.15+, and 4.20.3+, enabled by setting dark features (ensuring any dark features ending with on.completion.events.disabled are removed):

sd.internal.bounded.off.thread.on.completion.events.enabled
sd.internal.base.db.backed.completion.events.enabled
sd.internal.base.heartbeat.automatic.renewal.events.enabled

causes: HOT-97221 You do not have permission to view this issue; HOT-97285 You do not have permission to view this issue; HOT-97419 You do not have permission to view this issue

is cloned by: JSMDC-10672 You do not have permission to view this issue

mentioned in: Page Failed to load; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(16 mentioned in)

Assignee:: Elton Santos

Reporter:: KellyW (Inactive)

Affected customers:: 17 This affects my team

Watchers:: 43 Start watching this issue

Created:: 13/Aug/2021 7:01 AM

Updated:: 09/Jun/2023 2:13 PM

Resolved:: 14/Dec/2021 3:37 AM

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Additional fixVersion, only for 4.13.15+, and 4.20.3+, 4.21.0 and onwards

Attachments

Issue Links

Forms

Activity

People

Dates