-
Bug
-
Resolution: Cannot Reproduce
-
High
-
None
-
4.16.2
-
2
-
Severity 2 - Major
-
50
-
-
Issue Summary
When Jira loses the database connection, PSMQ processor threads can get stuck in waiting and not recover.
This issue is similar to the one observed at JRASERVER-73252.
Steps to Reproduce
- Restart Jira's database with the application running, or;
- Experience a brief loss of connectivity to the database server;
Expected Results
Jira connects to the database and continues to work normally, if configured correctly.
Actual Results
SdSerialisedOffThreadProcessor:thread-% threads are all stuck in WAITING state:
"SdSerialisedOffThreadProcessor:thread-1" #2188 prio=5 os_prio=0 cpu=7377209.83ms elapsed=419831.37s tid=0x00007f67a6ab4000 nid=0x15d06 waiting on condition [0x00007f40ecf48000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.14/Native Method) - parking to wait for <0x00007f69ee511a78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.14/LockSupport.java:194) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.14/AbstractQueuedSynchronizer.java:2081) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.14/ScheduledThreadPoolExecutor.java:1170) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.14/ScheduledThreadPoolExecutor.java:899) at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.14/ThreadPoolExecutor.java:1054) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.14/ThreadPoolExecutor.java:1114) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.14/ThreadPoolExecutor.java:628) at java.lang.Thread.run(java.base@11.0.14/Thread.java:834)
This will cause the PSMQ MESSAGE table to keep growing indefinitely, taking a long time to recover after restart depending on the cluster size.
Ideally, the threads should recover after a database connection problem, or log an error
Workaround
It has been observed that sometimes a rolling restart of the cluster might not be sufficient to get the threads to recover. This problem might be related to a cache replication inconsistency in the cluster.
If threads do not recover after a rolling restart please perform a full restart of the cluster, by stopping all nodes and starting one after the other.