Service Desk notifications completely stop being sent if a comment is added to a request and contains a high number of links (~100k).
The job responsible to send the notifications gets completely stuck, and re-starting Jira does not resolve the issue.
If you are impacted by the bug, you'll see the following symptoms:
- if you run the following query in the database, you'll find that there are a lot of customer notifications waiting to be sent (SENT_TIME is null):
SELECT count (*) FROM "AO_4E8AE6_NOTIF_BATCH_QUEUE" WHERE "SENT_TIME" is null
- if you run the following query in the database, you'll find that the job responsible to send the notifications is stuck and shows as "Already running"
select * from rundetails where job_id = 'sd.custom.notification.batch.send';
- Example of results:
id | job_id | start_time | run_duration | run_outcome | info_message
71507893 | sd.custom.notification.batch.send | 2019-08-14 16:35:37.707+00 | 1 | A | Already running
- you can also verify that this job is stuck from the UI in ⚙ > System > Scheduler details, as it shows as "Already running":
- if you generate thread dumps, you'll see that there is a runnable thread stuck in the method com.atlassian.servicedesk.internal.feature.customer.request.IssueUrlConverterImpl.replaceIssueUrlsWithPortalRequestUrls:
"Caesium-2-1" #20668 daemon prio=5 tid=0x0000000004770000 nid=0x5b42 runnable [0x00007f740d93f000]
at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl$$Lambda$2582/1308621309.apply(Unknown Source)
- When enabling the debugging package com.atlassian.servicedesk.plugins.notifications in ⚙ > System > Logging and profiling > Configure logging level for another package, noticed that the job responsible to send the notifications keep being skipped (because there is already a job that is stuck):
2019-07-10 10:29:08,193 PsmqAsyncExecutors-job:thread-5747 DEBUG XXXXXXX XXXXXXX XXXXXX XX.XXX.X.XX /secure/CommentAssignIssue.jspa [c.a.s.p.n.internal.scheduler.NotificationBatchScheduler] Notification batch sending job already defined. Skip scheduling.
The workaround consists in deleting all the customer notifications from the table "AO_4E8AE6_NOTIF_BATCH_QUEUE" which are waiting to be sent by the job that is stuck. Unfortunately, it is the only workaround known as of now.
Please be aware that if you apply this workaround, you will lose all these pending notifications as they will be deleted from the database.
The steps are:
- Stop Jira
- Backup your database
- Run the following DELETE query. Note that this query has been tested on a postgreSQL database:
delete from "AO_4E8AE6_NOTIF_BATCH_QUEUE" WHERE "SENT_TIME" is null;
- Start Jira