Uploaded image for project: 'Jira Service Management Data Center'
  1. Jira Service Management Data Center
  2. JSDSERVER-11770

PSMQ Processor threads can get stuck if database connection is lost

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: High High
    • None
    • 4.16.2
    • SLA
    • 2
    • Severity 2 - Major
    • 50
    • Hide

      Engineering cant currently reproduce issue - working with CSS on running through the steps to do so and hash out next steps.

      Show
      Engineering cant currently reproduce issue - working with CSS on running through the steps to do so and hash out next steps.

      Issue Summary

      When Jira loses the database connection, PSMQ processor threads can get stuck in waiting and not recover.

      This issue is similar to the one observed at JRASERVER-73252.

      Steps to Reproduce

      1. Restart Jira's database with the application running, or;
      2. Experience a brief loss of connectivity to the database server;

      Expected Results

      Jira connects to the database and continues to work normally, if configured correctly.

      Actual Results

      SdSerialisedOffThreadProcessor:thread-% threads are all stuck in WAITING state:

      "SdSerialisedOffThreadProcessor:thread-1" #2188 prio=5 os_prio=0 cpu=7377209.83ms elapsed=419831.37s tid=0x00007f67a6ab4000 nid=0x15d06 waiting on condition  [0x00007f40ecf48000]
         java.lang.Thread.State: WAITING (parking)
      	at jdk.internal.misc.Unsafe.park(java.base@11.0.14/Native Method)
      	- parking to wait for  <0x00007f69ee511a78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.14/LockSupport.java:194)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.14/AbstractQueuedSynchronizer.java:2081)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.14/ScheduledThreadPoolExecutor.java:1170)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.14/ScheduledThreadPoolExecutor.java:899)
      	at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.14/ThreadPoolExecutor.java:1054)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.14/ThreadPoolExecutor.java:1114)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.14/ThreadPoolExecutor.java:628)
      	at java.lang.Thread.run(java.base@11.0.14/Thread.java:834)
      

      This will cause the PSMQ MESSAGE table to keep growing indefinitely, taking a long time to recover after restart depending on the cluster size.

      Ideally, the threads should recover after a database connection problem, or log an error

      Workaround

      It has been observed that sometimes a rolling restart of the cluster might not be sufficient to get the threads to recover. This problem might be related to a cache replication inconsistency in the cluster.

      If threads do not recover after a rolling restart please perform a full restart of the cluster, by stopping all nodes and starting one after the other.

              Unassigned Unassigned
              rbaldasso Rodrigo Baldasso
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: