Problem
2 Jira Mail functionalities stop working and get stuck indefinitely due to an infinite timeout value used to fetch a new Oauth 2.0 access token.
The 2 features that are known to be impacted are:
- The Jira Incoming Mail Handler feature (used to fetch incoming emails from a mailbox and convert them into new issues/comments)
- The Jira Outgoing Mail feature (used to send any notification email from Jira, JSM, Automation For Jira, Filter subscriptiosn...)
This bug is similar to the bug https://jira.atlassian.com/browse/JSDSERVER-12620, which was impacting the JSM (Jira Service Management) Mail Handler/channels, also due to an infinite timeout value used to fetch a new Oauth 2.0 access token.
Environment
- Theoretically it affects all version of Jira which supports OAuth 2 for incoming mail. Investigation was done on 8.20.11.
- At least 1 Mail Handler needs to be configured using Oauth 2.0 as the authorization method for the bug to be replicable
- Alternatively, configure the Jira Outgoing Mail Feature with Oauth 2.0
Steps to Reproduce (in the case of the Incoming Jira Mail Handler functionality)
This issue has not been replicated in our environment since, to replicate it, we would need to create some "network latency" at the right time to trigger the bug.
Even though the steps to replicate the issue are not fully identified, they should look like the ones listed below:
- Install Jira
- Create a project
- Configure a Mail Handler in Settings > Incoming Mail using Oauth 2.0 as the authentication method as per this documentation
- Verify that the Mail Handler is able to fetch incoming emails from the mailbox
- Wait for about 1h
- This is usually the time when the Oauth 2.0 Access token expires (for Gmail and Microsoft)
- The Mail Handler Job will try to fetch a new Oauth 2.0 Access token, using the Oauth 2.0 Refresh token
- Create some "network latency/interruption" that will prevent Gmail or Microsoft (depending on which Mail Server is used for the test) from sending a response back to the Mail Puller Job
Expected Results
The HTTP request sent by the Mail Handler job should eventually timeout, since no reply was sent back from the Mail Server, and the Mail Puller job should stop its own execution, until it is scheduled again 1 minute later.
Then, 1 minute later, it should try again to request an Oauth 2.0 Access Token, get a new one, and pull new emails again.
Actual Results
The HTTP request does not timeout.
As a result:
- the Mail Handler job never completes its own execution
- the Mail Handler gets completely stuck and is no scheduled anymore
- No new emails are fetched by any Mail Handlers configured in any projects
- Depending on the situation, all other services might also stop running because all 4 Caesium threads are stuck in their Mail Handler job
Diagnosis steps to identify this bug
- All mail handler stops processing emails. No specific error in the logs. Looks like the incoming mail logs just stop printing.
- Other services (eg. outgoing mail, etc) might not be working as well
When collecting thread dumps while the issue is happening, we will observe different behaviors depending on the Jira version
- On Jira version below 9.4.13, 9.12.1, 9.13.0
- we should see a long running Caesium thread with the following stack trace:
"Caesium-1-1" #341 daemon prio=5 os_prio=0 cpu=6418578.13ms elapsed=365662.90s tid=0x000000004af71000 nid=0x18ec runnable [0x00000000772bd000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(java.base@11.0.13/Native Method) at java.net.SocketInputStream.socketRead(java.base@11.0.13/Unknown Source) at java.net.SocketInputStream.read(java.base@11.0.13/Unknown Source) at java.net.SocketInputStream.read(java.base@11.0.13/Unknown Source) at sun.security.ssl.SSLSocketInputRecord.read(java.base@11.0.13/Unknown Source) at sun.security.ssl.SSLSocketInputRecord.readFully(java.base@11.0.13/Unknown Source) at sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(java.base@11.0.13/Unknown Source) at sun.security.ssl.SSLSocketInputRecord.decode(java.base@11.0.13/Unknown Source) at sun.security.ssl.SSLTransport.decode(java.base@11.0.13/Unknown Source) at sun.security.ssl.SSLSocketImpl.decode(java.base@11.0.13/Unknown Source) at sun.security.ssl.SSLSocketImpl.readApplicationRecord(java.base@11.0.13/Unknown Source) at sun.security.ssl.SSLSocketImpl$AppInputStream.read(java.base@11.0.13/Unknown Source) at java.io.BufferedInputStream.fill(java.base@11.0.13/Unknown Source) at java.io.BufferedInputStream.read1(java.base@11.0.13/Unknown Source) at java.io.BufferedInputStream.read(java.base@11.0.13/Unknown Source) - locked <0x00000006aac28990> (a java.io.BufferedInputStream) at sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.13/Unknown Source) at sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.13/Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.13/Unknown Source) - locked <0x00000006a814f0a0> (a sun.net.www.protocol.https.DelegateHttpsURLConnection) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.13/Unknown Source) - locked <0x00000006a814f0a0> (a sun.net.www.protocol.https.DelegateHttpsURLConnection) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(java.base@11.0.13/Unknown Source) - locked <0x00000006a814f190> (a sun.net.www.protocol.https.HttpsURLConnectionImpl) at com.nimbusds.oauth2.sdk.http.HTTPRequest.send(HTTPRequest.java:899) at com.atlassian.oauth2.client.lib.token.DefaultTokenService.getToken(DefaultTokenService.java:139) at com.atlassian.oauth2.client.lib.token.DefaultTokenService.forceRefresh(DefaultTokenService.java:71) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.refreshToken(DefaultTokenHandler.java:144) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.refreshTokenIfNeeded(DefaultTokenHandler.java:131) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.lambda$getRefreshedToken$1(DefaultTokenHandler.java:110) at com.atlassian.oauth2.client.storage.DefaultTokenHandler$$Lambda$3405/0x0000000803cb3c40.call(Unknown Source) at com.atlassian.oauth2.client.util.concurrent.KeyedLocks.executeWithLock(KeyedLocks.java:33) - locked <0x00000006a814f2a0> (a java.util.concurrent.atomic.AtomicInteger) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:109) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:102) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:95) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:35) at com.atlassian.mail.auth.JiraOauth2AuthenticationContext.refreshAccessToken(JiraOauth2AuthenticationContext.java:104) at com.atlassian.mail.auth.JiraOauth2AuthenticationContext.recoverOrRethrow(JiraOauth2AuthenticationContext.java:82) at com.atlassian.mail.auth.JiraOauth2AuthenticationContext.connectService(JiraOauth2AuthenticationContext.java:75) at com.atlassian.mail.server.AbstractMailServer.smartConnect(AbstractMailServer.java:164) at com.atlassian.jira.service.services.mail.MailServicesHelper$$Lambda$3191/0x0000000803ac8840.connect(Unknown Source) at com.atlassian.jira.service.services.mail.MailServicesHelper.lambda$connectUsing$2(MailServicesHelper.java:81) at com.atlassian.jira.service.services.mail.MailServicesHelper$$Lambda$3195/0x0000000803ac9840.apply(Unknown Source) at java.util.Optional.map(java.base@11.0.13/Unknown Source) at com.atlassian.jira.service.services.mail.MailServicesHelper.handleAuthAwareMailServer(MailServicesHelper.java:52) at com.atlassian.jira.service.services.mail.MailServicesHelper.getConnectedStore(MailServicesHelper.java:46) at com.atlassian.jira.service.services.mail.MailFetcherService.getConnectedStore(MailFetcherService.java:424) at com.atlassian.jira.service.services.mail.MailFetcherService$MessageProviderImpl.getAndProcessMail(MailFetcherService.java:144) at com.atlassian.jira.service.services.mail.MailFetcherService.processMessages(MailFetcherService.java:388) at com.atlassian.jira.service.services.mail.MailFetcherService.runImpl(MailFetcherService.java:380) at com.atlassian.jira.service.services.file.AbstractMessageHandlingService.run(AbstractMessageHandlingService.java:229) at com.atlassian.jira.service.JiraServiceContainerImpl.run(JiraServiceContainerImpl.java:68) at com.atlassian.jira.service.ServiceRunner.runService(ServiceRunner.java:62) at com.atlassian.jira.service.ServiceRunner.runServiceId(ServiceRunner.java:44) at com.atlassian.jira.service.ServiceRunner.runJob(ServiceRunner.java:32) at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:134) at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:106) at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:90) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:435) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:430) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:454) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:382) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$$Lambda$3172/0x0000000803ab6040.accept(Unknown Source) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.lang.Thread.run(java.base@11.0.13/Unknown Source) Locked ownable synchronizers: - <0x0000000638f7fd70> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - <0x00000006aac215c8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - <0x00000006aac22770> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
- we should see a long running Caesium thread with the following stack trace:
- On Jira versions 9.4.13, 9.12.1, 9.13.0 and above:
- We should see a Caesium Thread running one of the Jira Incoming Mail Handler that is BLOCKED:
"Caesium-1-3" #1069 daemon prio=5 os_prio=0 cpu=3241798.36ms elapsed=366973.12s tid=0x00007f844bb3e800 nid=0x775d waiting for monitor entry [0x00007f7e31aaf000] java.lang.Thread.State: BLOCKED (on object monitor) at com.atlassian.oauth2.common.concurrent.KeyedLocks.executeWithLock(KeyedLocks.java:36) - waiting to lock <0x000000031b1d3b88> (a java.util.concurrent.atomic.AtomicInteger) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:109) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:102) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:95) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:35) at com.atlassian.mail.auth.JiraOauth2AuthenticationContext.refreshAccessToken(JiraOauth2AuthenticationContext.java:104) at com.atlassian.mail.auth.JiraOauth2AuthenticationContext.recoverOrRethrow(JiraOauth2AuthenticationContext.java:82) at com.atlassian.mail.auth.JiraOauth2AuthenticationContext.connectService(JiraOauth2AuthenticationContext.java:75) at com.atlassian.mail.server.AbstractMailServer.smartConnect(AbstractMailServer.java:165) at com.atlassian.jira.service.services.mail.MailServicesHelper$$Lambda$6807/0x0000000806638440.connect(Unknown Source) at com.atlassian.jira.service.services.mail.MailServicesHelper.lambda$connectUsing$2(MailServicesHelper.java:81) at com.atlassian.jira.service.services.mail.MailServicesHelper$$Lambda$6808/0x000000080663a040.apply(Unknown Source) at java.util.Optional.map(java.base@11.0.9.1/Optional.java:265) at com.atlassian.jira.service.services.mail.MailServicesHelper.handleAuthAwareMailServer(MailServicesHelper.java:52) at com.atlassian.jira.service.services.mail.MailServicesHelper.getConnectedStore(MailServicesHelper.java:46) at com.atlassian.jira.service.services.mail.MailFetcherService.getConnectedStore(MailFetcherService.java:485) at com.atlassian.jira.service.services.mail.MailFetcherService$MessageProviderImpl.getAndProcessMail(MailFetcherService.java:197) at com.atlassian.jira.service.services.mail.MailFetcherService.processMessages(MailFetcherService.java:449) at com.atlassian.jira.service.services.mail.MailFetcherService.runImpl(MailFetcherService.java:441) at com.atlassian.jira.service.services.file.AbstractMessageHandlingService.run(AbstractMessageHandlingService.java:234) at com.atlassian.jira.service.JiraServiceContainerImpl.run(JiraServiceContainerImpl.java:68) at com.atlassian.jira.service.ServiceRunner.runService(ServiceRunner.java:62) at com.atlassian.jira.service.ServiceRunner.runServiceId(ServiceRunner.java:44) at com.atlassian.jira.service.ServiceRunner.runJob(ServiceRunner.java:32) at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:134) at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:106) at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:90) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:500) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:495) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:519) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:415) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$$Lambda$5172/0x0000000805578840.accept(Unknown Source) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834)
- The lock is held by a ForkJoinPool.commonPool-worker-X thread that is indefinitely waiting for a response on from the server while performing an Oauth 2.0 Access Token Refresh operation:
"ForkJoinPool.commonPool-worker-7" #179 daemon prio=5 os_prio=0 cpu=428307.92ms elapsed=367174.10s tid=0x00007f8434503800 nid=0x7383 runnable [0x00007f841aa4e000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(java.base@11.0.9.1/Native Method) at java.net.SocketInputStream.socketRead(java.base@11.0.9.1/SocketInputStream.java:115) at java.net.SocketInputStream.read(java.base@11.0.9.1/SocketInputStream.java:168) at java.net.SocketInputStream.read(java.base@11.0.9.1/SocketInputStream.java:140) at sun.security.ssl.SSLSocketInputRecord.read(java.base@11.0.9.1/SSLSocketInputRecord.java:476) at sun.security.ssl.SSLSocketInputRecord.readFully(java.base@11.0.9.1/SSLSocketInputRecord.java:459) at sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(java.base@11.0.9.1/SSLSocketInputRecord.java:243) at sun.security.ssl.SSLSocketInputRecord.decode(java.base@11.0.9.1/SSLSocketInputRecord.java:181) at sun.security.ssl.SSLTransport.decode(java.base@11.0.9.1/SSLTransport.java:110) at sun.security.ssl.SSLSocketImpl.decode(java.base@11.0.9.1/SSLSocketImpl.java:1411) at sun.security.ssl.SSLSocketImpl.readApplicationRecord(java.base@11.0.9.1/SSLSocketImpl.java:1376) at sun.security.ssl.SSLSocketImpl$AppInputStream.read(java.base@11.0.9.1/SSLSocketImpl.java:963) at java.io.BufferedInputStream.fill(java.base@11.0.9.1/BufferedInputStream.java:252) at java.io.BufferedInputStream.read1(java.base@11.0.9.1/BufferedInputStream.java:292) at java.io.BufferedInputStream.read(java.base@11.0.9.1/BufferedInputStream.java:351) - locked <0x000000031b1d3570> (a java.io.BufferedInputStream) at sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.9.1/HttpClient.java:754) at sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.9.1/HttpClient.java:689) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.9.1/HttpURLConnection.java:1615) - locked <0x000000031b1d35c8> (a sun.net.www.protocol.https.DelegateHttpsURLConnection) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.9.1/HttpURLConnection.java:1520) - locked <0x000000031b1d35c8> (a sun.net.www.protocol.https.DelegateHttpsURLConnection) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(java.base@11.0.9.1/HttpsURLConnectionImpl.java:250) - locked <0x000000031b1d3720> (a sun.net.www.protocol.https.HttpsURLConnectionImpl) at com.nimbusds.oauth2.sdk.http.HTTPRequest.send(HTTPRequest.java:997) at com.atlassian.oauth2.client.lib.token.DefaultTokenService.getToken(DefaultTokenService.java:143) at com.atlassian.oauth2.client.lib.token.DefaultTokenService.forceRefresh(DefaultTokenService.java:75) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.refreshToken(DefaultTokenHandler.java:144) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.refreshTokenIfNeeded(DefaultTokenHandler.java:131) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.lambda$getRefreshedToken$1(DefaultTokenHandler.java:110) at com.atlassian.oauth2.client.storage.DefaultTokenHandler$$Lambda$11056/0x00000008084ddc40.call(Unknown Source) at com.atlassian.oauth2.common.concurrent.KeyedLocks.executeWithLock(KeyedLocks.java:37) - locked <0x000000031b1d3b88> (a java.util.concurrent.atomic.AtomicInteger) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:109) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:102) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:95) at com.atlassian.oauth2.client.storage.DefaultTokenHandler.getRefreshedToken(DefaultTokenHandler.java:35) at com.atlassian.mail.auth.JiraOauth2AuthenticationContext.refreshAccessToken(JiraOauth2AuthenticationContext.java:104) at com.atlassian.mail.auth.JiraOauth2AuthenticationContext.recoverOrRethrow(JiraOauth2AuthenticationContext.java:82) at com.atlassian.mail.auth.JiraOauth2AuthenticationContext.connectService(JiraOauth2AuthenticationContext.java:75) at com.atlassian.mail.server.AbstractMailServer.smartConnect(AbstractMailServer.java:165) at com.atlassian.jira.ipd.mail.DefaultIncomingConnectionVerifier.doVerify(DefaultIncomingConnectionVerifier.java:86) at com.atlassian.jira.ipd.mail.DefaultIncomingConnectionVerifier.verifyConnection(DefaultIncomingConnectionVerifier.java:72) at com.atlassian.jira.ipd.mail.IncomingMailServerConnectionIpdJob.lambda$generateMetrics$4(IncomingMailServerConnectionIpdJob.java:102) at com.atlassian.jira.ipd.mail.IncomingMailServerConnectionIpdJob$$Lambda$3548/0x0000000804d86040.accept(Unknown Source) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(java.base@11.0.9.1/ForEachOps.java:183) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base@11.0.9.1/ArrayList.java:1655) at java.util.stream.AbstractPipeline.copyInto(java.base@11.0.9.1/AbstractPipeline.java:484) at java.util.stream.ForEachOps$ForEachTask.compute(java.base@11.0.9.1/ForEachOps.java:290) at java.util.concurrent.CountedCompleter.exec(java.base@11.0.9.1/CountedCompleter.java:746) at java.util.concurrent.ForkJoinTask.doExec(java.base@11.0.9.1/ForkJoinTask.java:290) at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(java.base@11.0.9.1/ForkJoinPool.java:1020) at java.util.concurrent.ForkJoinPool.scan(java.base@11.0.9.1/ForkJoinPool.java:1656) at java.util.concurrent.ForkJoinPool.runWorker(java.base@11.0.9.1/ForkJoinPool.java:1594) at java.util.concurrent.ForkJoinWorkerThread.run(java.base@11.0.9.1/ForkJoinWorkerThread.java:183)
- We should see a Caesium Thread running one of the Jira Incoming Mail Handler that is BLOCKED:
Root Cause
The reason behind this bug is the fact that the Mail Handler uses a function from an external library HTTPRequest.java to fetch a new Oauth 2.0 Access token after it expires.
By default, this function is not using any timeout value, which means that the timeout will be infinite (the request will never time out if no response is sent back).
To fix this issue, we should use the method setConnectTimeout() from the external library to explicitly set the Timetout value before we send the HTTP Request.
- It has been identified that a readTimeout value will need to be set as well in order to stop the hung process retrieving the refresh token. Due to this finding we are reopening this bug report for additional review.
Workaround
The only way to fix this issue is to reset the Mail Handler job by re-starting the Jira application.
- is related to
-
JSDSERVER-12620 The JSM Mail Handlers (channels) stop pulling new emails due to an infinite timeout used to fetch Oauth 2.0 tokens
-
- Closed
-