Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-29033

Email sending stops due to mail job hanging in sendMessage()

      Mail periodically stops being sent, coinciding with a loss of connection between the Confluence server and the mail server. The emails back up in the Mail Queue, not the Error Queue, and can not be sent. The emails are subsequently lost when the server is restarted.

      NB: In this scenario, sending a test email will still work, because it bypasses the mail queue and is sent directly.

      Cause

      Mail stops sending because the mail thread cannot connect to the server, and there is no timeout set. Consequently, the mail thread will continue to hang and mails will not be sent, even when the connection is re-established. Confluence sets the timeout to 0 programatically no matter what is specified in the Resource tag, and a 0 value means the thread will never timeout, and can never recover.

      Diagnosis

      To find out if this issue is affecting you, take a couple of external thread dumps while the problem is occurring, spaced around 1 minute apart. If you see a thread that exists across both dumps which looks similar to the one under 'Symptoms' below, and references this method (or something similar eg SMTPMailServerImpl.connect()), then you are running in to this bug:

      	- locked <0x0000000784505360> (a com.sun.mail.smtp.SMTPTransport)
      	at com.atlassian.mail.server.impl.SMTPMailServerImpl.send(SMTPMailServerImpl.java:166)
      

      If you're unsure of the contents of the thread dump or if you need any help at any stage, feel free to contact Support.

      Workaround 1 - for mail configured directly through the front end of Confluence.

      You will need to add the timeout properties to the database entry directly.

      1. Shut down Confluence. Unfortunately at this point you will lose any mail that is pending. There is currently no way to avoid this.
      2. Find the entry in the database:
        select * from BANDANA where BANDANAKEY = 'atlassian.confluence.smtp.mail.accounts'
        
      3. Copy the entire contents of the BANDANAVALUE column to a text file
      4. Find the <property name="mail.smtp.timeout" value="0"/> property and edit it to <property name="mail.smtp.timeout" value="10000"/> to set a timeout of 10 seconds
      5. Add this property directly below it to set the connection timeout to 10 seconds : <property name="mail.smtp.connectiontimeout" value="10000"/>
      6. You should now have these two lines in the BANDANAVALUE text:
                    <property name="mail.smtp.timeout" value="10000"/>
                    <property name="mail.smtp.connectiontimeout" value="10000"/>
        
      7. Insert the value back in to the database NB: do NOT copy the below query, this is an example only - you must use the full edited BANDANAVALUE you extracted in Step 3:
        update BANDANA set BANDANAVALUE = '<linked-hash-map>
             ....
                    <property name="mail.smtp.timeout" value="10000"/>
                    <property name="mail.smtp.connectiontimeout" value="10000"/>
             ....
        </linked-hash-map>' where BANDANAKEY = 'atlassian.confluence.smtp.mail.accounts';
        
      8. Restart Confluence

      If a new username or password is set through the GUI, the timeout value will be reset back to 0. If you change any of the configuration of the server, you will need to reapply this workaround.

      Also, if after setting these timeouts, you encounter another connection problem which causes the mail processing to time out, the mails may end up in the Error Queue (Confluence Admin > Mail Queue). You may need to resend them from there.

      Workaround 2 - for JNDI mail server resources defined in the server.xml

      1. Edit the server.xml
      2. Add these two parameters to the resource tag:
        mail.smtp.timeout="10000"
        mail.smtp.connectiontimeout="10000"
        
      3. Restart Confluence

      NB: If, after setting these timeouts, you encounter another connection problem which causes the mail processing to time out, the mails may end up in the Error Queue (Confluence Admin > Mail Queue). You may need to resend them from there.

      Symptoms

      The thread that is stuck can look like this:

      "scheduler_Worker-4" daemon prio=10 tid=0x00007fce257b5000 nid=0xa39 runnable [0x00007fcda7bf9000]
         java.lang.Thread.State: RUNNABLE
      	at java.net.SocketInputStream.socketRead0(Native Method)
      	at java.net.SocketInputStream.read(SocketInputStream.java:150)
      	at java.net.SocketInputStream.read(SocketInputStream.java:121)
      	at com.sun.mail.util.TraceInputStream.read(TraceInputStream.java:110)
      	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
      	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
      	- locked <0x0000000784507958> (a java.io.BufferedInputStream)
      	at com.sun.mail.util.LineInputStream.readLine(LineInputStream.java:89)
      	at com.sun.mail.smtp.SMTPTransport.readServerResponse(SMTPTransport.java:2188)
      	at com.sun.mail.smtp.SMTPTransport.rcptTo(SMTPTransport.java:1699)
      	at com.sun.mail.smtp.SMTPTransport.sendMessage(SMTPTransport.java:1120)
      	- locked <0x0000000784505360> (a com.sun.mail.smtp.SMTPTransport)
      	at com.atlassian.mail.server.impl.SMTPMailServerImpl.send(SMTPMailServerImpl.java:166)
      	at com.atlassian.confluence.jmx.JmxSMTPMailServer.send(JmxSMTPMailServer.java:80)
      	at com.atlassian.confluence.mail.template.AbstractMailNotificationQueueItem.send(AbstractMailNotificationQueueItem.java:135)
      	at com.atlassian.confluence.mail.template.PreRenderedMailNotificationQueueItem.send(PreRenderedMailNotificationQueueItem.java:140)
      	at com.atlassian.confluence.mail.template.AbstractMailNotificationQueueItem.execute(AbstractMailNotificationQueueItem.java:106)
      	at com.atlassian.core.task.AbstractErrorQueuedTaskQueue$TaskDecorator.execute(AbstractErrorQueuedTaskQueue.java:107)
      	at com.atlassian.core.task.AbstractTaskQueue.flush(AbstractTaskQueue.java:45)
      	at com.atlassian.core.task.AbstractErrorQueuedTaskQueue.flush(AbstractErrorQueuedTaskQueue.java:37)
      	at com.atlassian.quartz.jobs.TaskQueueFlushJob.doExecute(TaskQueueFlushJob.java:32)
      	at com.atlassian.quartz.jobs.AbstractJob.executeInternal(AbstractJob.java:88)
      	at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
      	at com.atlassian.confluence.setup.quartz.DelegatingClusterAwareQuartzJobBean.executeJob(DelegatingClusterAwareQuartzJobBean.java:16)
      	at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:64)
      	at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46)
      	at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
      	at org.quartz.core.JobRunShell.run(JobRunShell.java:199)
      	at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20)
      	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
      

            [CONFSERVER-29033] Email sending stops due to mail job hanging in sendMessage()

            Hi atlassian370. This issue has not regressed in recent versions, so it's possible that you are running into another issue to do with mail sending. You can contact Support and get them to help you troubleshoot if you're still having problems.

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - Hi atlassian370 . This issue has not regressed in recent versions, so it's possible that you are running into another issue to do with mail sending. You can contact Support and get them to help you troubleshoot if you're still having problems.

            Sören Kornetzki added a comment - - edited

            Discovered this issue today as of 2017-04-28 with Confluence 6.1.2. It this issue back or missdetected by Atlassian Support Tools? However my Atlassian Support Request via Atlassian Support Tools did not send out and this error came up...

            Sören Kornetzki added a comment - - edited Discovered this issue today as of 2017-04-28 with Confluence 6.1.2. It this issue back or missdetected by Atlassian Support Tools? However my Atlassian Support Request via Atlassian Support Tools did not send out and this error came up...

            Hi gajan.umapathy. This issue was resolved in 5.6, so if you are having problems on 5.6.6 you are likely running into some other issue. I'd recommend contacting Atlassian Support - they can help you troubleshoot the issue.

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - Hi gajan.umapathy . This issue was resolved in 5.6, so if you are having problems on 5.6.6 you are likely running into some other issue. I'd recommend contacting Atlassian Support - they can help you troubleshoot the issue.

            We seem to be having similar issues on 5.6.6

            Gajan Umapathy added a comment - We seem to be having similar issues on 5.6.6

            We are using Confluence Version 5.6.3 and still having a problem with sending the daily update notification via email. We can manually push this emails in email queue under the admin-site. Only the daily update notifications are hanging. All other email notifications have been sent.

            Mark Wittig added a comment - We are using Confluence Version 5.6.3 and still having a problem with sending the daily update notification via email. We can manually push this emails in email queue under the admin-site. Only the daily update notifications are hanging. All other email notifications have been sent.

            Thanks for the explanation Denise. This is definitely a work around but doesn't seem like a permanent solution.

            May I suggest that a better long term solution is to either bundle and use an existing free product to send mail messages that respects SMTP RFCs, or build in the functionality to validate and appropriately respond to smtp status codes (e.g. http://www.ietf.org/rfc/rfc1893.txt). In our case the problem stemmed from confluence attempting to send mail to a user who's address no longer existed in our domain. Our MTA responded with the appropriate "invalid address" permanent failure code and terminated the connection, however confluence just kept trying indefinitely, thus hanging its mail queue.

            aaron

            Aaron Wyatt added a comment - Thanks for the explanation Denise. This is definitely a work around but doesn't seem like a permanent solution. May I suggest that a better long term solution is to either bundle and use an existing free product to send mail messages that respects SMTP RFCs, or build in the functionality to validate and appropriately respond to smtp status codes (e.g. http://www.ietf.org/rfc/rfc1893.txt ). In our case the problem stemmed from confluence attempting to send mail to a user who's address no longer existed in our domain. Our MTA responded with the appropriate "invalid address" permanent failure code and terminated the connection, however confluence just kept trying indefinitely, thus hanging its mail queue. aaron

            Hi aaron.wyatt, sorry for the delay in confirming back to you. The fix that was implemented in 5.6 will fix your existing configuration from previous versions. An upgrade task runs and updates your mail config in the bandana table to insert the timeout value, so it's essentially the same as the workaround above. It doesn't provide a web interface to set the timeout, though you can always update the value in the bandana table if you have a need to set a different timeout. Otherwise, it will be set at 10000ms.

            Hope this helps!
            -Denise
            Atlassian Support

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - Hi aaron.wyatt , sorry for the delay in confirming back to you. The fix that was implemented in 5.6 will fix your existing configuration from previous versions. An upgrade task runs and updates your mail config in the bandana table to insert the timeout value, so it's essentially the same as the workaround above. It doesn't provide a web interface to set the timeout, though you can always update the value in the bandana table if you have a need to set a different timeout. Otherwise, it will be set at 10000ms. Hope this helps! -Denise Atlassian Support

            Ok, thank you Brian for the excellent documentation.
            In the meantime, I realized that the workaround in this JIRA did not apply for me because on my Confluence Test server where I simulate my Changes, there was no mail server configured.
            Now it makes all sense to me

            Kristof Vandermeersch added a comment - Ok, thank you Brian for the excellent documentation. In the meantime, I realized that the workaround in this JIRA did not apply for me because on my Confluence Test server where I simulate my Changes, there was no mail server configured. Now it makes all sense to me

            Hi vdrmeer,

            I have added some steps and examples in the following article: Set SMTP Timeout.

            Cheers,

            Brian Boyle
            Atlassian Support

            BrianB (Inactive) added a comment - Hi vdrmeer , I have added some steps and examples in the following article: Set SMTP Timeout . Cheers, Brian Boyle Atlassian Support

            Kristof Vandermeersch added a comment - - edited

            Can you post a complete example of the SQL in Workaround 1?
            There's an error in it anyway, the opening tag should be <linked-hash-map>... and not </linked-hash-map> I guess.
            Also, my initial value of the Bandanakey is an empty linkedhashmap: </linked-hash-map>
            So I'm really really interested what I should put instead of the dots in your SQL sample ....

            When I do this
            update BANDANA set BANDANAVALUE = '<linked-hash-map>
            <property name="mail.smtp.timeout" value="10000"/>
            <property name="mail.smtp.connectiontimeout" value="10000"/>
            </linked-hash-map>' where BANDANAKEY = 'atlassian.confluence.smtp.mail.accounts';
            my Confluence logs show this:

            2014-09-03 16:07:29,989 WARN [localhost-startStop-1] [confluence.setup.bandana.ConfluenceDaoBandanaPersister] getObjectFromValue Configuration could not be loaded because class could not be found (context: _GLOBAL, key: atlassian.confluence.smtp.mail.accounts).
            com.thoughtworks.xstream.converters.ConversionException: null
            ...

            Kristof Vandermeersch added a comment - - edited Can you post a complete example of the SQL in Workaround 1? There's an error in it anyway, the opening tag should be <linked-hash-map>... and not </linked-hash-map> I guess. Also, my initial value of the Bandanakey is an empty linkedhashmap: </linked-hash-map> So I'm really really interested what I should put instead of the dots in your SQL sample .... When I do this update BANDANA set BANDANAVALUE = '<linked-hash-map> <property name="mail.smtp.timeout" value="10000"/> <property name="mail.smtp.connectiontimeout" value="10000"/> </linked-hash-map>' where BANDANAKEY = 'atlassian.confluence.smtp.mail.accounts'; my Confluence logs show this: 2014-09-03 16:07:29,989 WARN [localhost-startStop-1] [confluence.setup.bandana.ConfluenceDaoBandanaPersister] getObjectFromValue Configuration could not be loaded because class could not be found (context: _GLOBAL, key: atlassian.confluence.smtp.mail.accounts). com.thoughtworks.xstream.converters.ConversionException: null ...

              dave@atlassian.com dave (Inactive)
              matt@atlassian.com Matt Ryall
              Affected customers:
              36 This affects my team
              Watchers:
              43 Start watching this issue

                Created:
                Updated:
                Resolved: