Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-29033

Email sending stops due to mail job hanging in sendMessage()

      Mail periodically stops being sent, coinciding with a loss of connection between the Confluence server and the mail server. The emails back up in the Mail Queue, not the Error Queue, and can not be sent. The emails are subsequently lost when the server is restarted.

      NB: In this scenario, sending a test email will still work, because it bypasses the mail queue and is sent directly.

      Cause

      Mail stops sending because the mail thread cannot connect to the server, and there is no timeout set. Consequently, the mail thread will continue to hang and mails will not be sent, even when the connection is re-established. Confluence sets the timeout to 0 programatically no matter what is specified in the Resource tag, and a 0 value means the thread will never timeout, and can never recover.

      Diagnosis

      To find out if this issue is affecting you, take a couple of external thread dumps while the problem is occurring, spaced around 1 minute apart. If you see a thread that exists across both dumps which looks similar to the one under 'Symptoms' below, and references this method (or something similar eg SMTPMailServerImpl.connect()), then you are running in to this bug:

      	- locked <0x0000000784505360> (a com.sun.mail.smtp.SMTPTransport)
      	at com.atlassian.mail.server.impl.SMTPMailServerImpl.send(SMTPMailServerImpl.java:166)
      

      If you're unsure of the contents of the thread dump or if you need any help at any stage, feel free to contact Support.

      Workaround 1 - for mail configured directly through the front end of Confluence.

      You will need to add the timeout properties to the database entry directly.

      1. Shut down Confluence. Unfortunately at this point you will lose any mail that is pending. There is currently no way to avoid this.
      2. Find the entry in the database:
        select * from BANDANA where BANDANAKEY = 'atlassian.confluence.smtp.mail.accounts'
        
      3. Copy the entire contents of the BANDANAVALUE column to a text file
      4. Find the <property name="mail.smtp.timeout" value="0"/> property and edit it to <property name="mail.smtp.timeout" value="10000"/> to set a timeout of 10 seconds
      5. Add this property directly below it to set the connection timeout to 10 seconds : <property name="mail.smtp.connectiontimeout" value="10000"/>
      6. You should now have these two lines in the BANDANAVALUE text:
                    <property name="mail.smtp.timeout" value="10000"/>
                    <property name="mail.smtp.connectiontimeout" value="10000"/>
        
      7. Insert the value back in to the database NB: do NOT copy the below query, this is an example only - you must use the full edited BANDANAVALUE you extracted in Step 3:
        update BANDANA set BANDANAVALUE = '<linked-hash-map>
             ....
                    <property name="mail.smtp.timeout" value="10000"/>
                    <property name="mail.smtp.connectiontimeout" value="10000"/>
             ....
        </linked-hash-map>' where BANDANAKEY = 'atlassian.confluence.smtp.mail.accounts';
        
      8. Restart Confluence

      If a new username or password is set through the GUI, the timeout value will be reset back to 0. If you change any of the configuration of the server, you will need to reapply this workaround.

      Also, if after setting these timeouts, you encounter another connection problem which causes the mail processing to time out, the mails may end up in the Error Queue (Confluence Admin > Mail Queue). You may need to resend them from there.

      Workaround 2 - for JNDI mail server resources defined in the server.xml

      1. Edit the server.xml
      2. Add these two parameters to the resource tag:
        mail.smtp.timeout="10000"
        mail.smtp.connectiontimeout="10000"
        
      3. Restart Confluence

      NB: If, after setting these timeouts, you encounter another connection problem which causes the mail processing to time out, the mails may end up in the Error Queue (Confluence Admin > Mail Queue). You may need to resend them from there.

      Symptoms

      The thread that is stuck can look like this:

      "scheduler_Worker-4" daemon prio=10 tid=0x00007fce257b5000 nid=0xa39 runnable [0x00007fcda7bf9000]
         java.lang.Thread.State: RUNNABLE
      	at java.net.SocketInputStream.socketRead0(Native Method)
      	at java.net.SocketInputStream.read(SocketInputStream.java:150)
      	at java.net.SocketInputStream.read(SocketInputStream.java:121)
      	at com.sun.mail.util.TraceInputStream.read(TraceInputStream.java:110)
      	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
      	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
      	- locked <0x0000000784507958> (a java.io.BufferedInputStream)
      	at com.sun.mail.util.LineInputStream.readLine(LineInputStream.java:89)
      	at com.sun.mail.smtp.SMTPTransport.readServerResponse(SMTPTransport.java:2188)
      	at com.sun.mail.smtp.SMTPTransport.rcptTo(SMTPTransport.java:1699)
      	at com.sun.mail.smtp.SMTPTransport.sendMessage(SMTPTransport.java:1120)
      	- locked <0x0000000784505360> (a com.sun.mail.smtp.SMTPTransport)
      	at com.atlassian.mail.server.impl.SMTPMailServerImpl.send(SMTPMailServerImpl.java:166)
      	at com.atlassian.confluence.jmx.JmxSMTPMailServer.send(JmxSMTPMailServer.java:80)
      	at com.atlassian.confluence.mail.template.AbstractMailNotificationQueueItem.send(AbstractMailNotificationQueueItem.java:135)
      	at com.atlassian.confluence.mail.template.PreRenderedMailNotificationQueueItem.send(PreRenderedMailNotificationQueueItem.java:140)
      	at com.atlassian.confluence.mail.template.AbstractMailNotificationQueueItem.execute(AbstractMailNotificationQueueItem.java:106)
      	at com.atlassian.core.task.AbstractErrorQueuedTaskQueue$TaskDecorator.execute(AbstractErrorQueuedTaskQueue.java:107)
      	at com.atlassian.core.task.AbstractTaskQueue.flush(AbstractTaskQueue.java:45)
      	at com.atlassian.core.task.AbstractErrorQueuedTaskQueue.flush(AbstractErrorQueuedTaskQueue.java:37)
      	at com.atlassian.quartz.jobs.TaskQueueFlushJob.doExecute(TaskQueueFlushJob.java:32)
      	at com.atlassian.quartz.jobs.AbstractJob.executeInternal(AbstractJob.java:88)
      	at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
      	at com.atlassian.confluence.setup.quartz.DelegatingClusterAwareQuartzJobBean.executeJob(DelegatingClusterAwareQuartzJobBean.java:16)
      	at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:64)
      	at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46)
      	at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
      	at org.quartz.core.JobRunShell.run(JobRunShell.java:199)
      	at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20)
      	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
      

            [CONFSERVER-29033] Email sending stops due to mail job hanging in sendMessage()

            Hi atlassian370. This issue has not regressed in recent versions, so it's possible that you are running into another issue to do with mail sending. You can contact Support and get them to help you troubleshoot if you're still having problems.

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - Hi atlassian370 . This issue has not regressed in recent versions, so it's possible that you are running into another issue to do with mail sending. You can contact Support and get them to help you troubleshoot if you're still having problems.

            Sören Kornetzki added a comment - - edited

            Discovered this issue today as of 2017-04-28 with Confluence 6.1.2. It this issue back or missdetected by Atlassian Support Tools? However my Atlassian Support Request via Atlassian Support Tools did not send out and this error came up...

            Sören Kornetzki added a comment - - edited Discovered this issue today as of 2017-04-28 with Confluence 6.1.2. It this issue back or missdetected by Atlassian Support Tools? However my Atlassian Support Request via Atlassian Support Tools did not send out and this error came up...

            Hi gajan.umapathy. This issue was resolved in 5.6, so if you are having problems on 5.6.6 you are likely running into some other issue. I'd recommend contacting Atlassian Support - they can help you troubleshoot the issue.

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - Hi gajan.umapathy . This issue was resolved in 5.6, so if you are having problems on 5.6.6 you are likely running into some other issue. I'd recommend contacting Atlassian Support - they can help you troubleshoot the issue.

            We seem to be having similar issues on 5.6.6

            Gajan Umapathy added a comment - We seem to be having similar issues on 5.6.6

            We are using Confluence Version 5.6.3 and still having a problem with sending the daily update notification via email. We can manually push this emails in email queue under the admin-site. Only the daily update notifications are hanging. All other email notifications have been sent.

            Mark Wittig added a comment - We are using Confluence Version 5.6.3 and still having a problem with sending the daily update notification via email. We can manually push this emails in email queue under the admin-site. Only the daily update notifications are hanging. All other email notifications have been sent.

            Thanks for the explanation Denise. This is definitely a work around but doesn't seem like a permanent solution.

            May I suggest that a better long term solution is to either bundle and use an existing free product to send mail messages that respects SMTP RFCs, or build in the functionality to validate and appropriately respond to smtp status codes (e.g. http://www.ietf.org/rfc/rfc1893.txt). In our case the problem stemmed from confluence attempting to send mail to a user who's address no longer existed in our domain. Our MTA responded with the appropriate "invalid address" permanent failure code and terminated the connection, however confluence just kept trying indefinitely, thus hanging its mail queue.

            aaron

            Aaron Wyatt added a comment - Thanks for the explanation Denise. This is definitely a work around but doesn't seem like a permanent solution. May I suggest that a better long term solution is to either bundle and use an existing free product to send mail messages that respects SMTP RFCs, or build in the functionality to validate and appropriately respond to smtp status codes (e.g. http://www.ietf.org/rfc/rfc1893.txt ). In our case the problem stemmed from confluence attempting to send mail to a user who's address no longer existed in our domain. Our MTA responded with the appropriate "invalid address" permanent failure code and terminated the connection, however confluence just kept trying indefinitely, thus hanging its mail queue. aaron

            Hi aaron.wyatt, sorry for the delay in confirming back to you. The fix that was implemented in 5.6 will fix your existing configuration from previous versions. An upgrade task runs and updates your mail config in the bandana table to insert the timeout value, so it's essentially the same as the workaround above. It doesn't provide a web interface to set the timeout, though you can always update the value in the bandana table if you have a need to set a different timeout. Otherwise, it will be set at 10000ms.

            Hope this helps!
            -Denise
            Atlassian Support

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - Hi aaron.wyatt , sorry for the delay in confirming back to you. The fix that was implemented in 5.6 will fix your existing configuration from previous versions. An upgrade task runs and updates your mail config in the bandana table to insert the timeout value, so it's essentially the same as the workaround above. It doesn't provide a web interface to set the timeout, though you can always update the value in the bandana table if you have a need to set a different timeout. Otherwise, it will be set at 10000ms. Hope this helps! -Denise Atlassian Support

            Ok, thank you Brian for the excellent documentation.
            In the meantime, I realized that the workaround in this JIRA did not apply for me because on my Confluence Test server where I simulate my Changes, there was no mail server configured.
            Now it makes all sense to me

            Kristof Vandermeersch added a comment - Ok, thank you Brian for the excellent documentation. In the meantime, I realized that the workaround in this JIRA did not apply for me because on my Confluence Test server where I simulate my Changes, there was no mail server configured. Now it makes all sense to me

            Hi vdrmeer,

            I have added some steps and examples in the following article: Set SMTP Timeout.

            Cheers,

            Brian Boyle
            Atlassian Support

            BrianB (Inactive) added a comment - Hi vdrmeer , I have added some steps and examples in the following article: Set SMTP Timeout . Cheers, Brian Boyle Atlassian Support

            Kristof Vandermeersch added a comment - - edited

            Can you post a complete example of the SQL in Workaround 1?
            There's an error in it anyway, the opening tag should be <linked-hash-map>... and not </linked-hash-map> I guess.
            Also, my initial value of the Bandanakey is an empty linkedhashmap: </linked-hash-map>
            So I'm really really interested what I should put instead of the dots in your SQL sample ....

            When I do this
            update BANDANA set BANDANAVALUE = '<linked-hash-map>
            <property name="mail.smtp.timeout" value="10000"/>
            <property name="mail.smtp.connectiontimeout" value="10000"/>
            </linked-hash-map>' where BANDANAKEY = 'atlassian.confluence.smtp.mail.accounts';
            my Confluence logs show this:

            2014-09-03 16:07:29,989 WARN [localhost-startStop-1] [confluence.setup.bandana.ConfluenceDaoBandanaPersister] getObjectFromValue Configuration could not be loaded because class could not be found (context: _GLOBAL, key: atlassian.confluence.smtp.mail.accounts).
            com.thoughtworks.xstream.converters.ConversionException: null
            ...

            Kristof Vandermeersch added a comment - - edited Can you post a complete example of the SQL in Workaround 1? There's an error in it anyway, the opening tag should be <linked-hash-map>... and not </linked-hash-map> I guess. Also, my initial value of the Bandanakey is an empty linkedhashmap: </linked-hash-map> So I'm really really interested what I should put instead of the dots in your SQL sample .... When I do this update BANDANA set BANDANAVALUE = '<linked-hash-map> <property name="mail.smtp.timeout" value="10000"/> <property name="mail.smtp.connectiontimeout" value="10000"/> </linked-hash-map>' where BANDANAKEY = 'atlassian.confluence.smtp.mail.accounts'; my Confluence logs show this: 2014-09-03 16:07:29,989 WARN [localhost-startStop-1] [confluence.setup.bandana.ConfluenceDaoBandanaPersister] getObjectFromValue Configuration could not be loaded because class could not be found (context: _GLOBAL, key: atlassian.confluence.smtp.mail.accounts). com.thoughtworks.xstream.converters.ConversionException: null ...

            Maybe I'm just not seeing it, but how exactly is this issue "fixed" in 5.6? Is it fixed by providing a way to specify a connection timeout in the web UI or is it truly fixed by providing a new functional mail queue that responds appropriately to smtp connection errors?

            Aaron Wyatt added a comment - Maybe I'm just not seeing it, but how exactly is this issue "fixed" in 5.6? Is it fixed by providing a way to specify a connection timeout in the web UI or is it truly fixed by providing a new functional mail queue that responds appropriately to smtp connection errors?

            Thank you Denise, this helps.

            Kristof Vandermeersch added a comment - Thank you Denise, this helps.

            Hi vdrmeer, I've actually just confirmed with the developers that the JVM parameters will not work as a workaround here (it seems to just have been luck that some people who applied them didn't run into the same issue again).

            You will need to apply the official workaround in the database, or upgrade to Confluence 5.6 when it is released to resolve the issue.

            Sorry for the confusion surrounding these parameters. Hope this helps.

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - - edited Hi vdrmeer , I've actually just confirmed with the developers that the JVM parameters will not work as a workaround here (it seems to just have been luck that some people who applied them didn't run into the same issue again). You will need to apply the official workaround in the database, or upgrade to Confluence 5.6 when it is released to resolve the issue. Sorry for the confusion surrounding these parameters. Hope this helps.

            Hi Denise,
            can you confirm that the suggestion of Jerry Qassar works or does not work?
            ------------------------
            It seems to be a lot easier and is working, so far, to use the workaround of directly setting Java system properties as mentioned in one of the comments above:
            -Dmail.smtp.connectiontimeout=10000 -Dmail.smtp.timeout=10000
            ------------------------
            The idea is to add these params in my Confluence Service startup params (Confluence runs as a Windows Service)
            We're on Confluence 5.5.4

            Kristof Vandermeersch added a comment - Hi Denise, can you confirm that the suggestion of Jerry Qassar works or does not work? ------------------------ It seems to be a lot easier and is working, so far, to use the workaround of directly setting Java system properties as mentioned in one of the comments above: -Dmail.smtp.connectiontimeout=10000 -Dmail.smtp.timeout=10000 ------------------------ The idea is to add these params in my Confluence Service startup params (Confluence runs as a Windows Service) We're on Confluence 5.5.4

            Hi robertn, it is fixed in Cofluence 5.6, so 5.5.4 will still be affected. I have updated the Affects Versions to reflect this. The workaround should work for you, but if you're running into any trouble applying it, just contact Support and they will help you out.

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - Hi robertn , it is fixed in Cofluence 5.6, so 5.5.4 will still be affected. I have updated the Affects Versions to reflect this. The workaround should work for you, but if you're running into any trouble applying it, just contact Support and they will help you out.

            I am having this problem (I believe, it is a stuck email queue and has a timeout value of 0) on version 5.5.4, are you sure it is fixed?

            Ben Wardwell (Inactive) added a comment - I am having this problem (I believe, it is a stuck email queue and has a timeout value of 0) on version 5.5.4, are you sure it is fixed?

            Workaround must be more informative. Explain steps #3 and #6. Which lines of command's output to copy and which to paste.

            Dmitriy Varlamov added a comment - Workaround must be more informative. Explain steps #3 and #6. Which lines of command's output to copy and which to paste.

            If you update your setenv.sh with "-Dmail.smtp.connectiontimeout=10000 -Dmail.smtp.timeout=10000", how do you verify that the changes have been applied? Simply checking the Java Runtime Arguments in the UI?

            Jennifer H. added a comment - If you update your setenv.sh with "-Dmail.smtp.connectiontimeout=10000 -Dmail.smtp.timeout=10000", how do you verify that the changes have been applied? Simply checking the Java Runtime Arguments in the UI?

            As I understand it, sometimes the only fix is to restart and you lose all the pending outgoing emails.
            If this is true, it's very bad.

            Craig Emery added a comment - As I understand it, sometimes the only fix is to restart and you lose all the pending outgoing emails. If this is true, it's very bad.

            Hi jqassar,

            Regarding your comment here, thank you for your input! These parameters can help in some cases, but they are not picked up in all cases, so often the only workaround is to edit the database directly.

            So for everyone else, if the parameters do not work for you, apply the workaround above to modify the database.

            -Denise.

            UPDATE: We have confirmed with the developers that the JVM parameters will not work as a workaround. The only workaround is to make the above changes to the database. Alternatively upgrading to 5.6 when it is available will resolve the problem.

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - - edited Hi jqassar , Regarding your comment here , thank you for your input! These parameters can help in some cases, but they are not picked up in all cases, so often the only workaround is to edit the database directly. So for everyone else, if the parameters do not work for you, apply the workaround above to modify the database. -Denise. UPDATE: We have confirmed with the developers that the JVM parameters will not work as a workaround. The only workaround is to make the above changes to the database. Alternatively upgrading to 5.6 when it is available will resolve the problem.

            Hi qingpei

            Thanks for getting in touch. This issue is currently on our backlog and we hope to be addressing it in the near future. Please keep following this issue for further updates.

            Regards
            Steve Haffenden
            Confluence Bugmaster
            Atlassian

            Steve Haffenden (Inactive) added a comment - Hi qingpei Thanks for getting in touch. This issue is currently on our backlog and we hope to be addressing it in the near future. Please keep following this issue for further updates. Regards Steve Haffenden Confluence Bugmaster Atlassian

            this is really a critical issue which blocks me to use "atlassian support tools" now.
            any one can take a look at this?

            qingpei wang added a comment - this is really a critical issue which blocks me to use "atlassian support tools" now. any one can take a look at this?

            @jerry,
            -Dmail.smtp.connectiontimeout=10000 -Dmail.smtp.timeout=10000
            does not work for me
            added this,restart the confluence. there is no email notification neither.

            qingpei wang added a comment - @jerry, -Dmail.smtp.connectiontimeout=10000 -Dmail.smtp.timeout=10000 does not work for me added this,restart the confluence. there is no email notification neither.

            Step 4 is the output from the database query. You won't, however, see that property in some versions of Confluence. It seems to be a lot easier and is working, so far, to use the workaround of directly setting Java system properties as mentioned in one of the comments above:

            -Dmail.smtp.connectiontimeout=10000 -Dmail.smtp.timeout=10000

            Add these to your setenv.sh script.

            Jerry Qassar added a comment - Step 4 is the output from the database query. You won't, however, see that property in some versions of Confluence. It seems to be a lot easier and is working, so far, to use the workaround of directly setting Java system properties as mentioned in one of the comments above: -Dmail.smtp.connectiontimeout=10000 -Dmail.smtp.timeout=10000 Add these to your setenv.sh script.

            which file does "step 4" involved ? I can not find this

            qingpei wang added a comment - which file does "step 4" involved ? I can not find this

            Matt Ryall added a comment -

            Workaround is to configure system properties as shown in the description.

            Matt Ryall added a comment - Workaround is to configure system properties as shown in the description.

            Matt Ryall added a comment -

            I noticed we don't seem to allow setting a connection timeout in AbstractMailServer, but I also couldn't see where we set a default read timeout in Confluence.

            Matt Ryall added a comment - I noticed we don't seem to allow setting a connection timeout in AbstractMailServer, but I also couldn't see where we set a default read timeout in Confluence.

              dave@atlassian.com dave (Inactive)
              matt@atlassian.com Matt Ryall
              Affected customers:
              36 This affects my team
              Watchers:
              43 Start watching this issue

                Created:
                Updated:
                Resolved: