-
Bug
-
Resolution: Fixed
-
Low
-
7.2.7, 7.2.9, 7.0.0
-
7
-
5
-
Severity 2 - Major
-
16
-
Summary
JIRA Data Center's Heartbeat jobs can be delayed due to busy Caesium threads (timed operations) causing failure in an instance's heartbeat jobs. This can cause cluster nodes to be inconsistent for longer than normal because sync jobs cannot proceed until the Caesium queue is cleared.
Instance Health Checks show two failed checks:
- Cluster Cache Replication
- Shared Home
The observed cases have involved Caesium threads busy processing incoming mail handlers. It could be any other long running scheduled task (Directory sync, Email sending, etc)
Environment
- JIRA Data Center
Steps to Reproduce
- Configure Mail handler to process an inbox with a very large number of emails (thousands).
- Leave JIRA Data Center running for 10-15 minutes after restart
Expected Results
Heartbeat job are not blocked/delayed by other long running scheduled jobs.
Actual Results
Heartbeat job doesn't run and that cause the Cluster keep-alive timeout (and nodes marked offline). That leads to cache replications stop working. Health checks will throw an error on Cache replication between nodes and communication to Shared Home directory.
Thread dumps show all 4 Caesium threads are busy handling emails. Example thread:
"Caesium-1-1" daemon prio=5 tid=0x00000000000000a3 nid=0 runnable java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) - locked <0x0000000032c2276e> (a java.lang.Object) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) ... at com.sun.mail.iap.ResponseInputStream.readResponse(ResponseInputStream.java:103) at com.sun.mail.iap.Response. (Response.java:114) at com.sun.mail.imap.protocol.IMAPResponse. (IMAPResponse.java:60) at com.sun.mail.imap.protocol.IMAPProtocol.readResponse(IMAPProtocol.java:390) at com.sun.mail.iap.Protocol.command(Protocol.java:354) - locked <0x000000005994ebc6> (a com.sun.mail.imap.protocol.IMAPProtocol) at com.sun.mail.imap.protocol.IMAPProtocol.fetch(IMAPProtocol.java:2113) ... at com.sun.mail.imap.protocol.IMAPProtocol.peekBody(IMAPProtocol.java:1705) at com.sun.mail.imap.IMAPMessage.getHeader(IMAPMessage.java:878) - locked <0x000000007ace4ecd> (a java.lang.Object) at com.atlassian.jira.plugins.mail.handlers.AbstractMessageHandler.getPrecedenceHeader(AbstractMessageHandler.java:1409) at com.atlassian.jira.plugins.mail.handlers.AbstractMessageHandler.checkBulk(AbstractMessageHandler.java:473) at com.atlassian.jira.plugins.mail.handlers.AbstractMessageHandler.canHandleMessage(AbstractMessageHandler.java:415) at com.atlassian.jira.plugins.mail.handlers.CreateOrCommentHandler.handleMessage(CreateOrCommentHandler.java:54) at com.atlassian.jira.service.services.mail.MailFetcherService$1.process(MailFetcherService.java:376) at com.atlassian.jira.service.services.mail.MailFetcherService$MessageProviderImpl.getAndProcessMail(MailFetcherService.java:255) at com.atlassian.jira.service.services.mail.MailFetcherService.runImpl(MailFetcherService.java:366) at com.atlassian.jira.service.services.file.AbstractMessageHandlingService.run(AbstractMessageHandlingService.java:229) at com.atlassian.jira.service.JiraServiceContainerImpl.run(JiraServiceContainerImpl.java:61) at com.atlassian.jira.service.ServiceRunner.runService(ServiceRunner.java:62) at com.atlassian.jira.service.ServiceRunner.runServiceId(ServiceRunner.java:44) at com.atlassian.jira.service.ServiceRunner.runJob(ServiceRunner.java:32) at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:153) at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:118) at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:97) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:443) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:438) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:462) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:390) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$1.consume(CaesiumSchedulerService.java:285) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$1.consume(CaesiumSchedulerService.java:282) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:65) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:59) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:34) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - None
Suggested Fixes
- Implemented - Separate heartbeat service from shared Caesium service.
- Process only unread messages?
Workaround
- If it is blocked/delayed by Mail handler jobs, delete or move the older, previously read messages in the inbox. This accelerates email processing, freeing the Caesium threads more quickly and avoiding the problem.
- was split into
-
JRASERVER-65809 As a Jira Administrator I want to configure number of scheduler threads
- Gathering Interest
- is blocked by
-
PSR-44 Loading...
- is cloned by
-
RUM-1858 Loading...
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...