Uploaded image for project: 'Jira Service Management Data Center'
  1. Jira Service Management Data Center
  2. JSDSERVER-8635

Off-threading caused most threads to enter cluster lock wait

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Highest
    • 4.21.0, 4.20.3, 4.13.15
    • 4.13.9
    • SLA
    • None

    Description

      Issue Summary

      With off-threading enabled, AO_319474_MESSAGE database table is growing and most SdSerialisedOffThreadProcessor threads are stuck waiting for cluster lock.

      Checking the clusterlockstatus database table, the cluster locks held by the Jira node are associated to issues with substantial amount of comments (~5000 comments) and the MESSAGE_COUNT for those issues are high.

      This can happen when SLA events are being processed off thread with database backed events.

      Steps to Reproduce

      Update issue comment that trigger SLA update whilst issues are being created by multiple concurrent users.

      Expected Results

      Relevant SLA and automation are updated.

      Actual Results

      SLA and automation are not updated.

      Most SdSerialisedOffThreadProcessor threads are in cluster lock wait, e.g:

      "SdSerialisedOffThreadProcessor:thread-1" #1484 prio=5 os_prio=0 tid=0x00007efa7359f000 nid=0x3afde waiting on condition [0x00007ef809f8f000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
      	at java.lang.Thread.sleep(Native Method)
      	at com.atlassian.beehive.db.DatabaseClusterLock.sleep(DatabaseClusterLock.java:621)
      	at com.atlassian.beehive.db.DatabaseClusterLock.uninterruptibleWait(DatabaseClusterLock.java:127)
      	at com.atlassian.beehive.db.DatabaseClusterLock.lock(DatabaseClusterLock.java:107)
      	at com.atlassian.servicedesk.internal.sla.customfield.SlaFieldUpdateLockManagerImpl.lockSlaUpdate(SlaFieldUpdateLockManagerImpl.java:24)

      Workaround

      One option is to disable off thread processing for SLA events. This is not always recommended, as it will result in SLAs being calculated on the HTTP threads so users may notice slower response times on the UI. 

      1. Go to the dark feature settings page (<baseURL>/secure/SiteDarkFeatures!default.jspa)
      2. Remove the feature flag sd.internal.base.off.thread.on.completion.events.enabled, if it exists
      3. Remove the feature flag sd.internal.base.db.backed.completion.events.enabled, if it exists
      4. Add the following feature flag: sd.internal.base.off.thread.on.completion.events.disabled
      5. Add the following feature flag: sd.internal.bounded.off.thread.on.completion.events.disabled
      6. Restart JIRA

      Additional fixVersion, only for 4.13.15+, and 4.20.3+, 4.21.0 and onwards

      A more complete fix, which improves the way that cluster locks are handled with off-threading enabled is available in 4.21.0, enabled by default, and also in Jira 4.13.15+, and 4.20.3+, enabled by setting dark features (ensuring any dark features ending with on.completion.events.disabled  are removed):

      sd.internal.bounded.off.thread.on.completion.events.enabled
      sd.internal.base.db.backed.completion.events.enabled
      sd.internal.base.heartbeat.automatic.renewal.events.enabled

      Attachments

        Issue Links

          Activity

            People

              esantos2 Elton Santos
              kwong2@atlassian.com KellyW (Inactive)
              Votes:
              17 Vote for this issue
              Watchers:
              42 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Backbone Issue Sync