On busy large Jira instance (Jira 8.3.0+ with JSD 4.3.0 or higher), the SdOffThreadEventJobRunner threads might run into an active deadlock situation constantly trying to poll the PSMQ table.
Summary of the problem:
- As of JSD 4.3.0 the SdOffThreadEventJobRunner thread pool is limited to 5 threads by default.
- Upon an issue update, If none of the 5 threads have issue context to process the first item in the queue, none of them will be able to process that event and they will continue polling the queue forever.
- This problem could be triggered on a large very active instance, mainly if a certain issue is getting too frequent updates in a short time.
- This could result is a huge database pressure and could cause a performance issue for Jira is the database can't keep up.
- Such a deadlock will be resolved eventually, but some issues might end up with a corrupted SLA.
- NA - was not able to reproduce locally.
- The issue was observed on large very active client instances
- The SdOffThreadEventJobRunner thread should not be polling messages over and over in a deadlock situation.
- The amount of traffic to the database should be normal.
The logs (with Debug enabled on com.querydsl) show a huge amount of queries (40k+) hitting the database over and over:
- Increasing the number of SdOffThreadEventJobRunner threads from 5 to 12 didn't actually help mitigate the issue.
- Increasing the threads much further to a much bigger number could trigger another issue similar to what was fixed by OffThreadEventJobRunner uses an unbounded ThreadPoolExecutor that can exhaust the DBCP -
- Related KB - Deadlocking in Jira Service Desk when frequently updating the same issue
There is no known effective workaround for this issue currently.