Uploaded image for project: 'Jira Service Management Data Center'
  1. Jira Service Management Data Center
  2. JSDSERVER-6516

Customer notifications stop being sent from any Service Desk issue

      Issue Summary

      Service Desk notifications completely stop being sent if a comment is added to a request and contains a high number of links (~100k).

      The job responsible to send the notifications gets completely stuck, and re-starting Jira does not resolve the issue.

      Diagnosis steps

      If you are impacted by the bug, you'll see the following symptoms:

      • if you run the following query in the database, you'll find that there are a lot of customer notifications waiting to be sent (SENT_TIME is null):
        SELECT count (*) FROM "AO_4E8AE6_NOTIF_BATCH_QUEUE" WHERE "SENT_TIME" is null
        
      • if you run the following query in the database, you'll find that the job responsible to send the notifications is stuck and shows as "Already running"
        • Query:
          select * from rundetails where job_id = 'sd.custom.notification.batch.send';
          
        • Example of results:
              id    |              job_id               |         start_time         | run_duration | run_outcome |  info_message   
          ----------+-----------------------------------+----------------------------+--------------+-------------+-----------------
           71507893 | sd.custom.notification.batch.send | 2019-08-14 16:35:37.707+00 |            1 | A           | Already running
          
      • you can also verify that this job is stuck from the UI in âš™ > System > Scheduler details, as it shows as "Already running":
      • if you generate thread dumps, you'll see that there is a runnable thread stuck in the method com.atlassian.servicedesk.internal.feature.customer.request.IssueUrlConverterImpl.replaceIssueUrlsWithPortalRequestUrls:
        "Caesium-2-1" #20668 daemon prio=5 tid=0x0000000004770000 nid=0x5b42 runnable [0x00007f740d93f000]
           java.lang.Thread.State: RUNNABLE
        	at java.lang.String.indexOf(String.java:1769)
        	at java.lang.String.indexOf(String.java:1718)
        	at org.apache.commons.lang.StringUtils.replaceEach(StringUtils.java:4075)
        	at org.apache.commons.lang.StringUtils.replaceEach(StringUtils.java:3868)
        	at com.atlassian.servicedesk.internal.feature.customer.request.IssueUrlConverterImpl.replaceIssueUrlsWithPortalRequestUrls(IssueUrlConverterImpl.java:69)
        	at com.atlassian.servicedesk.internal.feature.customer.request.CustomerTextRendererImpl.updateCustomerTextIntertal(CustomerTextRendererImpl.java:159)
        	at com.atlassian.servicedesk.internal.feature.customer.request.CustomerTextRendererImpl.updateEmailTextForCustomer(CustomerTextRendererImpl.java:154)
        	at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildMultiPartHtmlEmailBody(StylingBodyFinaliserImpl.java:79)
        	at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildMessageBodyForRecipient(StylingBodyFinaliserImpl.java:72)
        	at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.lambda$buildHtmlBody$0(StylingBodyFinaliserImpl.java:55)
        	at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl$$Lambda$2582/1308621309.apply(Unknown Source)
        	at io.atlassian.fugue.Either$RightProjection.map(Either.java:872)
        	at io.atlassian.fugue.Either.map(Either.java:217)
        	at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildHtmlBody(StylingBodyFinaliserImpl.java:55)
        
      • When enabling the debugging package com.atlassian.servicedesk.plugins.notifications in âš™ > System > Logging and profiling > Configure logging level for another package, noticed that the job responsible to send the notifications keep being skipped (because there is already a job that is stuck):
        2019-07-10 10:29:08,193 PsmqAsyncExecutors-job:thread-5747 DEBUG XXXXXXX XXXXXXX XXXXXX XX.XXX.X.XX /secure/CommentAssignIssue.jspa [c.a.s.p.n.internal.scheduler.NotificationBatchScheduler] Notification batch sending job already defined. Skip scheduling.
        

      Workaround

      The workaround consists in deleting all the customer notifications from the table "AO_4E8AE6_NOTIF_BATCH_QUEUE" which are waiting to be sent by the job that is stuck. Unfortunately, it is the only workaround known as of now.

      Please be aware that if you apply this workaround, you will lose all these pending notifications as they will be deleted from the database.

      The steps are:

      1. Stop Jira
      2. Backup your database
      3. Run the following DELETE query. Note that this query has been tested on a postgreSQL database:
        delete from "AO_4E8AE6_NOTIF_BATCH_QUEUE" WHERE "SENT_TIME" is null;
        
      4. Start Jira

            [JSDSERVER-6516] Customer notifications stop being sent from any Service Desk issue

            Conny Postma made changes -
            Remote Link Original: This issue links to "Page (Atlassian Documentation)" [ 548901 ]
            Julien Rey made changes -
            Link New: This issue is related to JSDSERVER-7346 [ JSDSERVER-7346 ]
            Julien Rey made changes -
            Remote Link New: This issue links to "Page (Atlassian Documentation)" [ 548901 ]
            Julien Rey made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 496838 ]
            set-jac-bot made changes -
            Julien Rey made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 472331 ]
            Vinicius Fontes made changes -
            Description Original: *Issue Summary*

            Service Desk notifications completely stop being sent if a comment is added to a request and contains a high number of links (~100k).

            The job responsible to send the notifications gets completely stuck, and re-starting Jira does not resolve the issue.

            *Diagnosis steps*

            If you are impacted by the bug, you'll see the following symptoms:
             - if you run the following query in the database, you'll find that there are a lot of customer notifications waiting to be sent (SENT_TIME is null):
            {code:java}
            SELECT count (*) FROM "AO_4E8AE6_NOTIF_BATCH_QUEUE" WHERE "SENT_TIME" is null
            {code}
             - if you run the following query in the database, you'll find that the job responsible to send the notifications is stuck and shows as "Already running"
             -- Query:
            {code:java}
            select * from rundetails where job_id = 'sd.custom.notification.batch.send';
            {code}
             -- Example of results:
            {code:java}
                id | job_id | start_time | run_duration | run_outcome | info_message
            ----------+-----------------------------------+----------------------------+--------------+-------------+-----------------
             71507893 | sd.custom.notification.batch.send | 2019-08-14 16:35:37.707+00 | 1 | A | Already running
            {code}
             - you can also verify that this job is stuck from the UI in *âš™ > System > Scheduler details*, as it shows as "Already running":
            !JobAlreadyRunning.png|thumbnail!
             - if you generate thread dumps, you'll see that there is a runnable thread stuck in the method {{com.atlassian.servicedesk.internal.feature.customer.request.IssueUrlConverterImpl.replaceIssueUrlsWithPortalRequestUrls}}:
            {code:java}
            "Caesium-2-1" #20668 daemon prio=5 tid=0x0000000004770000 nid=0x5b42 runnable [0x00007f740d93f000]
               java.lang.Thread.State: RUNNABLE
            at java.lang.String.indexOf(String.java:1769)
            at java.lang.String.indexOf(String.java:1718)
            at org.apache.commons.lang.StringUtils.replaceEach(StringUtils.java:4075)
            at org.apache.commons.lang.StringUtils.replaceEach(StringUtils.java:3868)
            at com.atlassian.servicedesk.internal.feature.customer.request.IssueUrlConverterImpl.replaceIssueUrlsWithPortalRequestUrls(IssueUrlConverterImpl.java:69)
            at com.atlassian.servicedesk.internal.feature.customer.request.CustomerTextRendererImpl.updateCustomerTextIntertal(CustomerTextRendererImpl.java:159)
            at com.atlassian.servicedesk.internal.feature.customer.request.CustomerTextRendererImpl.updateEmailTextForCustomer(CustomerTextRendererImpl.java:154)
            at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildMultiPartHtmlEmailBody(StylingBodyFinaliserImpl.java:79)
            at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildMessageBodyForRecipient(StylingBodyFinaliserImpl.java:72)
            at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.lambda$buildHtmlBody$0(StylingBodyFinaliserImpl.java:55)
            at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl$$Lambda$2582/1308621309.apply(Unknown Source)
            at io.atlassian.fugue.Either$RightProjection.map(Either.java:872)
            at io.atlassian.fugue.Either.map(Either.java:217)
            at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildHtmlBody(StylingBodyFinaliserImpl.java:55)
            {code}
             - When enabling the debugging package {{com.atlassian.servicedesk.plugins.notifications}} in *âš™ > System > Logging and profiling > Configure logging level for another package*, noticed that the job responsible to send the notifications keep being skipped (because there is already a job that is stuck):
            {code:java}
            2019-07-10 10:29:08,193 PsmqAsyncExecutors-job:thread-5747 DEBUG XXXXXXX XXXXXXX XXXXXX XX.XXX.X.XX /secure/CommentAssignIssue.jspa [c.a.s.p.n.internal.scheduler.NotificationBatchScheduler] Notification batch sending job already defined. Skip scheduling.
            {code}

            *Workaround*

            The workaround consists in deleting all the customer notifications from the table "AO_4E8AE6_NOTIF_BATCH_QUEUE" which are waiting to be sent by the job that is stuck. Unfortunately, it is the only workaround known as of now.

            (!) Please be aware that if you apply this workaround, you will lose all these pending notifications as they will be deleted from the database.

            The steps are:
             # Stop Jira
             # Backup your database
             # Run the following DELETE query. (!) Note that this query has been tested on a postgreSQL database:
            {code:java}
            delete from "AO_4E8AE6_NOTIF_BATCH_QUEUE" WHERE "SENT_TIME" is null;
            {code}
             # Start Jira
            New: *Issue Summary*

            Service Desk notifications completely stop being sent if a comment is added to a request and contains a high number of links (~100k).

            The job responsible to send the notifications gets completely stuck, and re-starting Jira does not resolve the issue.

            *Diagnosis steps*

            If you are impacted by the bug, you'll see the following symptoms:
             - if you run the following query in the database, you'll find that there are a lot of customer notifications waiting to be sent (SENT_TIME is null):
            {code:java}
            SELECT count (*) FROM "AO_4E8AE6_NOTIF_BATCH_QUEUE" WHERE "SENT_TIME" is null
            {code}
             - if you run the following query in the database, you'll find that the job responsible to send the notifications is stuck and shows as "Already running"
             -- Query:
            {code:java}
            select * from rundetails where job_id = 'sd.custom.notification.batch.send';
            {code}
             -- Example of results:
            {noformat:java}
                id | job_id | start_time | run_duration | run_outcome | info_message
            ----------+-----------------------------------+----------------------------+--------------+-------------+-----------------
             71507893 | sd.custom.notification.batch.send | 2019-08-14 16:35:37.707+00 | 1 | A | Already running
            {noformat}
             - you can also verify that this job is stuck from the UI in *âš™ > System > Scheduler details*, as it shows as "Already running":
            !JobAlreadyRunning.png|thumbnail!
             - if you generate thread dumps, you'll see that there is a runnable thread stuck in the method {{com.atlassian.servicedesk.internal.feature.customer.request.IssueUrlConverterImpl.replaceIssueUrlsWithPortalRequestUrls}}:
            {code:java}
            "Caesium-2-1" #20668 daemon prio=5 tid=0x0000000004770000 nid=0x5b42 runnable [0x00007f740d93f000]
               java.lang.Thread.State: RUNNABLE
            at java.lang.String.indexOf(String.java:1769)
            at java.lang.String.indexOf(String.java:1718)
            at org.apache.commons.lang.StringUtils.replaceEach(StringUtils.java:4075)
            at org.apache.commons.lang.StringUtils.replaceEach(StringUtils.java:3868)
            at com.atlassian.servicedesk.internal.feature.customer.request.IssueUrlConverterImpl.replaceIssueUrlsWithPortalRequestUrls(IssueUrlConverterImpl.java:69)
            at com.atlassian.servicedesk.internal.feature.customer.request.CustomerTextRendererImpl.updateCustomerTextIntertal(CustomerTextRendererImpl.java:159)
            at com.atlassian.servicedesk.internal.feature.customer.request.CustomerTextRendererImpl.updateEmailTextForCustomer(CustomerTextRendererImpl.java:154)
            at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildMultiPartHtmlEmailBody(StylingBodyFinaliserImpl.java:79)
            at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildMessageBodyForRecipient(StylingBodyFinaliserImpl.java:72)
            at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.lambda$buildHtmlBody$0(StylingBodyFinaliserImpl.java:55)
            at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl$$Lambda$2582/1308621309.apply(Unknown Source)
            at io.atlassian.fugue.Either$RightProjection.map(Either.java:872)
            at io.atlassian.fugue.Either.map(Either.java:217)
            at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildHtmlBody(StylingBodyFinaliserImpl.java:55)
            {code}
             - When enabling the debugging package {{com.atlassian.servicedesk.plugins.notifications}} in *âš™ > System > Logging and profiling > Configure logging level for another package*, noticed that the job responsible to send the notifications keep being skipped (because there is already a job that is stuck):
            {code:java}
            2019-07-10 10:29:08,193 PsmqAsyncExecutors-job:thread-5747 DEBUG XXXXXXX XXXXXXX XXXXXX XX.XXX.X.XX /secure/CommentAssignIssue.jspa [c.a.s.p.n.internal.scheduler.NotificationBatchScheduler] Notification batch sending job already defined. Skip scheduling.
            {code}

            *Workaround*

            The workaround consists in deleting all the customer notifications from the table "AO_4E8AE6_NOTIF_BATCH_QUEUE" which are waiting to be sent by the job that is stuck. Unfortunately, it is the only workaround known as of now.

            (!) Please be aware that if you apply this workaround, you will lose all these pending notifications as they will be deleted from the database.

            The steps are:
             # Stop Jira
             # Backup your database
             # Run the following DELETE query. (!) Note that this query has been tested on a postgreSQL database:
            {code:java}
            delete from "AO_4E8AE6_NOTIF_BATCH_QUEUE" WHERE "SENT_TIME" is null;
            {code}
             # Start Jira
            Julien Rey made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 458671 ]
            Nhi Nguyen (Inactive) made changes -
            Fix Version/s New: 4.4.2 [ 89399 ]
            Fix Version/s Original: 4.4.0 [ 87496 ]
            Nhi Nguyen (Inactive) made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 449644 ]

              mreil1 Markus Reil (Inactive)
              jrey Julien Rey
              Affected customers:
              0 This affects my team
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: