Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-29033

Email sending stops due to mail job hanging in sendMessage()

    XMLWordPrintable

Details

    Description

      Mail periodically stops being sent, coinciding with a loss of connection between the Confluence server and the mail server. The emails back up in the Mail Queue, not the Error Queue, and can not be sent. The emails are subsequently lost when the server is restarted.

      NB: In this scenario, sending a test email will still work, because it bypasses the mail queue and is sent directly.

      Cause

      Mail stops sending because the mail thread cannot connect to the server, and there is no timeout set. Consequently, the mail thread will continue to hang and mails will not be sent, even when the connection is re-established. Confluence sets the timeout to 0 programatically no matter what is specified in the Resource tag, and a 0 value means the thread will never timeout, and can never recover.

      Diagnosis

      To find out if this issue is affecting you, take a couple of external thread dumps while the problem is occurring, spaced around 1 minute apart. If you see a thread that exists across both dumps which looks similar to the one under 'Symptoms' below, and references this method (or something similar eg SMTPMailServerImpl.connect()), then you are running in to this bug:

      	- locked <0x0000000784505360> (a com.sun.mail.smtp.SMTPTransport)
      	at com.atlassian.mail.server.impl.SMTPMailServerImpl.send(SMTPMailServerImpl.java:166)
      

      If you're unsure of the contents of the thread dump or if you need any help at any stage, feel free to contact Support.

      Workaround 1 - for mail configured directly through the front end of Confluence.

      You will need to add the timeout properties to the database entry directly.

      1. Shut down Confluence. Unfortunately at this point you will lose any mail that is pending. There is currently no way to avoid this.
      2. Find the entry in the database:
        select * from BANDANA where BANDANAKEY = 'atlassian.confluence.smtp.mail.accounts'
        
      3. Copy the entire contents of the BANDANAVALUE column to a text file
      4. Find the <property name="mail.smtp.timeout" value="0"/> property and edit it to <property name="mail.smtp.timeout" value="10000"/> to set a timeout of 10 seconds
      5. Add this property directly below it to set the connection timeout to 10 seconds : <property name="mail.smtp.connectiontimeout" value="10000"/>
      6. You should now have these two lines in the BANDANAVALUE text:
                    <property name="mail.smtp.timeout" value="10000"/>
                    <property name="mail.smtp.connectiontimeout" value="10000"/>
        
      7. Insert the value back in to the database NB: do NOT copy the below query, this is an example only - you must use the full edited BANDANAVALUE you extracted in Step 3:
        update BANDANA set BANDANAVALUE = '<linked-hash-map>
             ....
                    <property name="mail.smtp.timeout" value="10000"/>
                    <property name="mail.smtp.connectiontimeout" value="10000"/>
             ....
        </linked-hash-map>' where BANDANAKEY = 'atlassian.confluence.smtp.mail.accounts';
        
      8. Restart Confluence

      If a new username or password is set through the GUI, the timeout value will be reset back to 0. If you change any of the configuration of the server, you will need to reapply this workaround.

      Also, if after setting these timeouts, you encounter another connection problem which causes the mail processing to time out, the mails may end up in the Error Queue (Confluence Admin > Mail Queue). You may need to resend them from there.

      Workaround 2 - for JNDI mail server resources defined in the server.xml

      1. Edit the server.xml
      2. Add these two parameters to the resource tag:
        mail.smtp.timeout="10000"
        mail.smtp.connectiontimeout="10000"
        
      3. Restart Confluence

      NB: If, after setting these timeouts, you encounter another connection problem which causes the mail processing to time out, the mails may end up in the Error Queue (Confluence Admin > Mail Queue). You may need to resend them from there.

      Symptoms

      The thread that is stuck can look like this:

      "scheduler_Worker-4" daemon prio=10 tid=0x00007fce257b5000 nid=0xa39 runnable [0x00007fcda7bf9000]
         java.lang.Thread.State: RUNNABLE
      	at java.net.SocketInputStream.socketRead0(Native Method)
      	at java.net.SocketInputStream.read(SocketInputStream.java:150)
      	at java.net.SocketInputStream.read(SocketInputStream.java:121)
      	at com.sun.mail.util.TraceInputStream.read(TraceInputStream.java:110)
      	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
      	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
      	- locked <0x0000000784507958> (a java.io.BufferedInputStream)
      	at com.sun.mail.util.LineInputStream.readLine(LineInputStream.java:89)
      	at com.sun.mail.smtp.SMTPTransport.readServerResponse(SMTPTransport.java:2188)
      	at com.sun.mail.smtp.SMTPTransport.rcptTo(SMTPTransport.java:1699)
      	at com.sun.mail.smtp.SMTPTransport.sendMessage(SMTPTransport.java:1120)
      	- locked <0x0000000784505360> (a com.sun.mail.smtp.SMTPTransport)
      	at com.atlassian.mail.server.impl.SMTPMailServerImpl.send(SMTPMailServerImpl.java:166)
      	at com.atlassian.confluence.jmx.JmxSMTPMailServer.send(JmxSMTPMailServer.java:80)
      	at com.atlassian.confluence.mail.template.AbstractMailNotificationQueueItem.send(AbstractMailNotificationQueueItem.java:135)
      	at com.atlassian.confluence.mail.template.PreRenderedMailNotificationQueueItem.send(PreRenderedMailNotificationQueueItem.java:140)
      	at com.atlassian.confluence.mail.template.AbstractMailNotificationQueueItem.execute(AbstractMailNotificationQueueItem.java:106)
      	at com.atlassian.core.task.AbstractErrorQueuedTaskQueue$TaskDecorator.execute(AbstractErrorQueuedTaskQueue.java:107)
      	at com.atlassian.core.task.AbstractTaskQueue.flush(AbstractTaskQueue.java:45)
      	at com.atlassian.core.task.AbstractErrorQueuedTaskQueue.flush(AbstractErrorQueuedTaskQueue.java:37)
      	at com.atlassian.quartz.jobs.TaskQueueFlushJob.doExecute(TaskQueueFlushJob.java:32)
      	at com.atlassian.quartz.jobs.AbstractJob.executeInternal(AbstractJob.java:88)
      	at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
      	at com.atlassian.confluence.setup.quartz.DelegatingClusterAwareQuartzJobBean.executeJob(DelegatingClusterAwareQuartzJobBean.java:16)
      	at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:64)
      	at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46)
      	at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
      	at org.quartz.core.JobRunShell.run(JobRunShell.java:199)
      	at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20)
      	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
      

      Attachments

        Issue Links

          Activity

            People

              dave@atlassian.com dave (Inactive)
              matt@atlassian.com Matt Ryall
              Votes:
              36 Vote for this issue
              Watchers:
              43 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: