-
Bug
-
Resolution: Fixed
-
Highest
-
7.1.9, Archived Jira Cloud, 8.3.4, 8.5.4, 8.13.0, 8.18.2, 8.20.0, 8.22.0, 9.0.0, 9.4.0
-
7.01
-
206
-
Severity 2 - Major
-
413
-
-
Summary
Scheduled Jobs will stop being triggered by the Jira Scheduler when a database operation fails at the precise moment the Scheduler tried to trigger these jobs.
Impact of the bug
Examples of impacted functionalities
The following functionalities can be impacted, since they rely on a scheduled job to run:
- Mail Queue (mails might keep piling in the Mail Queue, since the Mail Queue service might stop being scheduled)
- Jira Incoming Mail Handler
- Jira Service Management (JSM) Mail Handler
- Jira Batched Notifications
- Jira Service Management (JSM) Notifications
- User directory (LDAP) sync
- Automation rules from "Automation for Jira"
So basically, if there was a DB operation failure (or temporary DB connectivity issue) while the Jira Scheduler try to run any of these jobs, the Scheduler will simply ignore them in the future and stop running them until a re-start of the Jira application (or the impacted Jira node) is done.
Note that the list above is not exhaustive and that any other functionality that relies on a scheduled job might be impacted, if the database connection/operation error occurs right at the time the job was supposed to be scheduled.
Difference of the impact between Jira Server and Data Center
The impact of the bug is different depending if you are using Jira Server (or Data Center single node), or if you are using Data Center multi node:
- For Jira Server / Jira Data Center (JDC) single node
- any scheduled jobs can be impacted
- For Jira Data Center (JDC) multi node
- only the scheduled jobs executed locally on each node such as the Mail Queue Service will be impacted
The reason behind this difference between single node vs multi node is because on JDC multi-node environments:
- most jobs are executed using the cluster lock system (so called "beehive"). These jobs can only be run by 1 node at a time, and there is a logic that automatically unlocks these jobs in case they get stuck due to a database operation failure, which was implemented in Jira 8.3.0 as per the bug JIRA DC might lose Cluster lock due database connectivity problems
- some jobs (such as the mail queue service) are not using the cluster lock system, and are executed "locally". This means that each Jira node has an instance of this job and this job can be executed simultaneously by any node
For more information about the difference between the jobs using the cluster lock (beehive) system and the jobs run locally, you can refer to the developer page Developing for high availability and clustering.
Consequence on the Mail Queue
Since any type of notification (Jira batched/non-batched notifications, JSM customer notifications) rely on the Mail Queue Service to be sent from the Mail Queue, if the Mail Queue job gets impacted by this bug, then the following will happen:
- For Jira Server / Jira Data Center (JDC) single node
- The Mail Queue will keep piling up until it is manually flushed by a Jira admin
- Notifications (of any type) will completely stop being sent
- For Jira Data Center (JDC) multi node
- The Mail Queue will keep piling on only on the impacted Jira nodes (since each Jira node is managing its own Mail Queue Service)
- Notifications (of any type) will intermittently not being sent, depending on which Jira node the notification was triggered from:
- If the notification was triggered from a node with a functioning Mail Queue service, it will be sent as expected
- If the notification was triggered from a node with a non-functioning Mail Queue service (due to this bug), it will not be sent and be stuck in the mail queue until it's manually flushed by a Jira admin
Environment
- Jira Server / Jira Data Center single node (any scheduled jobs can be impacted)
- Jira Data Center multi node (only for jobs executed locally such as the Mail Queue Service can be impacted)
Steps to Reproduce
- Schedule a Job
- Introduce a breakpoint at CaesiumSchedulerService.executeClusteredJob
- Interrupt database connectivity
Actual Results
Jira will lose track of the job and will never execute it again, unless a restart is performed.
Also, there will be an error recorded in the Jira logs at the time where the Jira scheduler tried to trigger the scheduled job, but failed due to a DB connection/operation failure, similar to any of the error listed below:
- Example 1 (failure to schedule the mail queue service which job id is com.atlassian.jira.service.JiraService:10000):
2022-03-05 11:42:00,729 Caesium-1-3 ERROR ServiceRunner [c.a.s.caesium.impl.SchedulerQueueWorker] Unhandled exception thrown by job QueuedJob[jobId=com.atlassian.jira.service.JiraService:10000,deadline=1646509320000] com.opensymphony.module.propertyset.PropertyImplementationException: Unable to load values for CacheKey[entityName=jira.properties,entityId=1] at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.propEx(CachingOfBizPropertyEntryStore.java:374) at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.resolve(CachingOfBizPropertyEntryStore.java:128) at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.getEntry(CachingOfBizPropertyEntryStore.java:151) at com.atlassian.jira.propertyset.CachingOfBizPropertySet.get(CachingOfBizPropertySet.java:189) at com.opensymphony.module.propertyset.AbstractPropertySet.getString(AbstractPropertySet.java:305) at com.atlassian.jira.config.properties.ApplicationPropertiesStore.getStringFromDb(ApplicationPropertiesStore.java:234) at com.atlassian.jira.config.properties.ApplicationPropertiesImpl.getString(ApplicationPropertiesImpl.java:53) at com.atlassian.jira.scheduler.JiraCaesiumSchedulerConfiguration.getDefaultTimeZone(JiraCaesiumSchedulerConfiguration.java:30) at com.atlassian.scheduler.caesium.impl.RunTimeCalculator.getTimeZone(RunTimeCalculator.java:115) at com.atlassian.scheduler.caesium.impl.RunTimeCalculator.nextRunTime(RunTimeCalculator.java:96) at com.atlassian.scheduler.caesium.impl.RunTimeCalculator.nextRunTime(RunTimeCalculator.java:70) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.calculateNextRunTime(CaesiumSchedulerService.java:444) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeLocalJob(CaesiumSchedulerService.java:401) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:380) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.lang.Thread.run(Thread.java:748) Caused by: com.atlassian.cache.CacheException: com.atlassian.jira.exception.DataAccessException: com.microsoft.sqlserver.jdbc.SQLServerException: Cannot open database "MSSQLJiraDBprod" requested by the login. The login failed. ClientConnectionId:29ea07a4-6e17-4ffa-9602-940d44963f71 at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:113) at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.resolve(CachingOfBizPropertyEntryStore.java:126) ... 16 more Caused by: com.atlassian.jira.exception.DataAccessException: com.microsoft.sqlserver.jdbc.SQLServerException: Cannot open database "MSSQLJiraDBprod" requested by the login. The login failed. ClientConnectionId:29ea07a4-6e17-4ffa-9602-940d44963f71 at com.atlassian.jira.database.DatabaseAccessorImpl.borrowConnection(DatabaseAccessorImpl.java:167) at com.atlassian.jira.database.DefaultQueryDslAccessor$1.executeQuery(DefaultQueryDslAccessor.java:84) at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.query(CachingOfBizPropertyEntryStore.java:326) ... ... 17 more Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Cannot open database "MSSQLJiraDBprod" requested by the login. The login failed. ClientConnectionId:29ea07a4-6e17-4ffa-9602-940d44963f71 at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262) at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:283) at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:129) at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:37) at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:5333) ...
- Example 2 (failure to execute the Batched Notification job when using a Postgres DB) :
2020-12-05 15:07:52,615-0600 Caesium-1-1 ERROR ServiceRunner [c.a.s.caesium.impl.SchedulerQueueWorker] Unhandled exception thrown by job QueuedJob[jobId=com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJobSchedulerImpl.mentions,deadline=1607202432603] java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor404.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at ... com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:409) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:454) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:382) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.lang.Thread.run(Thread.java:748) Caused by: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: Unable to establish a connection with the database. (The connection attempt failed.) at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findListIteratorByCondition(DefaultOfBizDelegator.java:408) at com.quisapps.jira.fieldsecurity.ofbiz.SecureOfBizDelegator.findListIteratorByCondition(SecureOfBizDelegator.java:309) ... 17 more Caused by: org.ofbiz.core.entity.GenericDataSourceException: Unable to establish a connection with the database. (The connection attempt failed.) at org.ofbiz.core.entity.jdbc.SQLProcessor.getConnection(SQLProcessor.java:343) at org.ofbiz.core.entity.GenericDAO.createEntityListIterator(GenericDAO.java:870) at org.ofbiz.core.entity.GenericDAO.selectListIteratorByCondition(GenericDAO.java:857) at org.ofbiz.core.entity.GenericHelperDAO.findListIteratorByCondition(GenericHelperDAO.java:216) at org.ofbiz.core.entity.GenericDelegator.findListIteratorByCondition(GenericDelegator.java:1243) at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findListIteratorByCondition(DefaultOfBizDelegator.java:405) ... 18 more Caused by: org.postgresql.util.PSQLException: The connection attempt failed. at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:292) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:211) at org.postgresql.Driver.makeConnection(Driver.java:458) ... com.atlassian.jira.ofbiz.sql.JiraSupportedDatabasesCompatibleJNDIFactory.getConnection(JiraSupportedDatabasesCompatibleJNDIFactory.java:38) at org.ofbiz.core.entity.TransactionFactory.getConnection(TransactionFactory.java:114) at org.ofbiz.core.entity.ConnectionFactory.getConnection(ConnectionFactory.java:59) at org.ofbiz.core.entity.jdbc.SQLProcessor.getConnection(SQLProcessor.java:340) ... 24 more Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ...
- Example 3 (failure to execute the Batched Notification job when using a MS SQL Server DB):
2020-06-28 07:37:40,642+0200 Caesium-1-4 ERROR ServiceRunner [c.a.s.caesium.impl.CaesiumSchedulerService] Unhandled exception during the attempt to execute job 'com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJobSchedulerImpl.mentions'; will attempt recovery in 60 seconds com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:SELECT ID, JOB_ID, JOB_RUNNER_KEY, SCHED_TYPE, INTERVAL_MILLIS, FIRST_RUN, CRON_EXPRESSION, TIME_ZONE, NEXT_RUN, VERSION, PARAMETERS FROM dbo.clusteredjob WHERE JOB_ID=? (Connection reset) at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findListIteratorByCondition(DefaultOfBizDelegator.java:408) at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.findListIteratorByCondition(WrappingOfBizDelegator.java:283) at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.forEach(SelectQueryImpl.java:227) at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.consumeWith(SelectQueryImpl.java:214) at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.singleValue(SelectQueryImpl.java:191) at com.atlassian.jira.scheduler.OfBizClusteredJobDao.find(OfBizClusteredJobDao.java:88) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:409) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:454) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:382) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.lang.Thread.run(Thread.java:748) Caused by: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:SELECT ID, JOB_ID, JOB_RUNNER_KEY, SCHED_TYPE, INTERVAL_MILLIS, FIRST_RUN, CRON_EXPRESSION, TIME_ZONE, NEXT_RUN, VERSION, PARAMETERS FROM dbo.clusteredjob WHERE JOB_ID=? (Connection reset) at org.ofbiz.core.entity.jdbc.SQLProcessor.executeQuery(SQLProcessor.java:533) at org.ofbiz.core.entity.GenericDAO.createEntityListIterator(GenericDAO.java:877) at org.ofbiz.core.entity.GenericDAO.selectListIteratorByCondition(GenericDAO.java:857) at org.ofbiz.core.entity.GenericHelperDAO.findListIteratorByCondition(GenericHelperDAO.java:216) at org.ofbiz.core.entity.GenericDelegator.findListIteratorByCondition(GenericDelegator.java:1243) ... 12 more Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2887) at com.microsoft.sqlserver.jdbc.TDSChannel.write(IOBuffer.java:2045) at com.microsoft.sqlserver.jdbc.TDSWriter.flush(IOBuffer.java:4146)
Expected Results.
Jira will run the job under its schedule as soon database connectivity is resumed
Workaround
- For Jira Server / Jira Data Center (JDC) single node
- Restart the Jira application
- For Jira Data Center (JDC) multi node
- Restart the node impacted by the bug
For the Mail Queue Service, go to Administration > System > Services > click Edit for the Mail Queue Service > click Update without changing anything.
- causes
-
JRASERVER-61954 Scheduled Jobs Unresponsive/Hangs e.g. mail handlers stop processing emails to create issues due to database connectivity interuptions
- Closed
-
JSDSERVER-3653 Jira Service Desk stops processing email following any connectivity issues between the instance and database
- Closed
-
TESLA-642 Loading...
-
BOOM-19 Loading...
- is duplicated by
-
JRASERVER-70479 Database connectivity issues break scheduled jobs
- Closed
- relates to
-
JRACLOUD-62072 Scheduled Jobs can be lost track of when a database operation fails at the moment the job is claimed
- Closed
-
JRASERVER-71876 DB connection failure causing threads stuck on user authentication
- Closed
-
JRASERVER-76750 Create a document with details about the Jira Scheduler Stats
- Gathering Interest
-
DBCON-1 Loading...
- links to
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
- was cloned as
-
JDEV-37233 Loading...
-
RAID-28 Loading...