Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Highest
Fix Version/s: 9.11.0, 9.4.18
Affects Version/s: 7.1.9, Archived Jira Cloud, 8.3.4, 8.5.4, 8.13.0, 8.18.2, 8.20.0, 8.22.0, 9.0.0, 9.4.0
Component/s: Scheduled Tasks
Labels:

Introduced in Version:
7.01
Support reference count:
206
Symptom Severity:
Severity 2 - Major
UIS:
413
Bug Fix Policy:
View Atlassian Server bug fix policy
Current Status:

Hide

Atlassian Update – 20 February 2024

Jira 9.4 (JSM 5.4) (LTS) backport notice

Dear Customers,

We're happy to announce that this issue fix has been backported to Jira 9.4.18 (JSM 5.4.18).

Show
Atlassian Update – 20 February 2024 Jira 9.4 (JSM 5.4) (LTS) backport notice Dear Customers, We're happy to announce that this issue fix has been backported to Jira 9.4.18 (JSM 5.4.18).

Description

Summary

Scheduled Jobs will stop being triggered by the Jira Scheduler when a database operation fails at the precise moment the Scheduler tried to trigger these jobs.

Impact of the bug

Examples of impacted functionalities

The following functionalities can be impacted, since they rely on a scheduled job to run:

Mail Queue (mails might keep piling in the Mail Queue, since the Mail Queue service might stop being scheduled)
Jira Incoming Mail Handler
Jira Service Management (JSM) Mail Handler
Jira Batched Notifications
Jira Service Management (JSM) Notifications
User directory (LDAP) sync
Automation rules from "Automation for Jira"

So basically, if there was a DB operation failure (or temporary DB connectivity issue) while the Jira Scheduler try to run any of these jobs, the Scheduler will simply ignore them in the future and stop running them until a re-start of the Jira application (or the impacted Jira node) is done.

Note that the list above is not exhaustive and that any other functionality that relies on a scheduled job might be impacted, if the database connection/operation error occurs right at the time the job was supposed to be scheduled.

Difference of the impact between Jira Server and Data Center

The impact of the bug is different depending if you are using Jira Server (or Data Center single node), or if you are using Data Center multi node:

For Jira Server / Jira Data Center (JDC) single node
- any scheduled jobs can be impacted
For Jira Data Center (JDC) multi node
- only the scheduled jobs executed locally on each node such as the Mail Queue Service will be impacted

The reason behind this difference between single node vs multi node is because on JDC multi-node environments:

most jobs are executed using the cluster lock system (so called "beehive"). These jobs can only be run by 1 node at a time, and there is a logic that automatically unlocks these jobs in case they get stuck due to a database operation failure, which was implemented in Jira 8.3.0 as per the bug JIRA DC might lose Cluster lock due database connectivity problems
some jobs (such as the mail queue service) are not using the cluster lock system, and are executed "locally". This means that each Jira node has an instance of this job and this job can be executed simultaneously by any node

For more information about the difference between the jobs using the cluster lock (beehive) system and the jobs run locally, you can refer to the developer page Developing for high availability and clustering.

Consequence on the Mail Queue

Since any type of notification (Jira batched/non-batched notifications, JSM customer notifications) rely on the Mail Queue Service to be sent from the Mail Queue, if the Mail Queue job gets impacted by this bug, then the following will happen:

For Jira Server / Jira Data Center (JDC) single node
- The Mail Queue will keep piling up until it is manually flushed by a Jira admin
- Notifications (of any type) will completely stop being sent
For Jira Data Center (JDC) multi node
- The Mail Queue will keep piling on only on the impacted Jira nodes (since each Jira node is managing its own Mail Queue Service)
- Notifications (of any type) will intermittently not being sent, depending on which Jira node the notification was triggered from:
  - If the notification was triggered from a node with a functioning Mail Queue service, it will be sent as expected
  - If the notification was triggered from a node with a non-functioning Mail Queue service (due to this bug), it will not be sent and be stuck in the mail queue until it's manually flushed by a Jira admin

Environment

Jira Server / Jira Data Center single node (any scheduled jobs can be impacted)
Jira Data Center multi node (only for jobs executed locally such as the Mail Queue Service can be impacted)

Steps to Reproduce

Schedule a Job
Introduce a breakpoint at CaesiumSchedulerService.executeClusteredJob
Interrupt database connectivity

Actual Results

Jira will lose track of the job and will never execute it again, unless a restart is performed.

Also, there will be an error recorded in the Jira logs at the time where the Jira scheduler tried to trigger the scheduled job, but failed due to a DB connection/operation failure, similar to any of the error listed below:

Example 1 (failure to schedule the mail queue service which job id is com.atlassian.jira.service.JiraService:10000):

2022-03-05 11:42:00,729 Caesium-1-3 ERROR ServiceRunner     [c.a.s.caesium.impl.SchedulerQueueWorker] Unhandled exception thrown by job QueuedJob[jobId=com.atlassian.jira.service.JiraService:10000,deadline=1646509320000]
com.opensymphony.module.propertyset.PropertyImplementationException: Unable to load values for CacheKey[entityName=jira.properties,entityId=1]
	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.propEx(CachingOfBizPropertyEntryStore.java:374)
	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.resolve(CachingOfBizPropertyEntryStore.java:128)
	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.getEntry(CachingOfBizPropertyEntryStore.java:151)
	at com.atlassian.jira.propertyset.CachingOfBizPropertySet.get(CachingOfBizPropertySet.java:189)
	at com.opensymphony.module.propertyset.AbstractPropertySet.getString(AbstractPropertySet.java:305)
	at com.atlassian.jira.config.properties.ApplicationPropertiesStore.getStringFromDb(ApplicationPropertiesStore.java:234)
	at com.atlassian.jira.config.properties.ApplicationPropertiesImpl.getString(ApplicationPropertiesImpl.java:53)
	at com.atlassian.jira.scheduler.JiraCaesiumSchedulerConfiguration.getDefaultTimeZone(JiraCaesiumSchedulerConfiguration.java:30)
	at com.atlassian.scheduler.caesium.impl.RunTimeCalculator.getTimeZone(RunTimeCalculator.java:115)
	at com.atlassian.scheduler.caesium.impl.RunTimeCalculator.nextRunTime(RunTimeCalculator.java:96)
	at com.atlassian.scheduler.caesium.impl.RunTimeCalculator.nextRunTime(RunTimeCalculator.java:70)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.calculateNextRunTime(CaesiumSchedulerService.java:444)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeLocalJob(CaesiumSchedulerService.java:401)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:380)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.atlassian.cache.CacheException: com.atlassian.jira.exception.DataAccessException: com.microsoft.sqlserver.jdbc.SQLServerException: Cannot open database "MSSQLJiraDBprod" requested by the login. The login failed. ClientConnectionId:29ea07a4-6e17-4ffa-9602-940d44963f71
	at com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:113)
	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.resolve(CachingOfBizPropertyEntryStore.java:126)
	... 16 more
Caused by: com.atlassian.jira.exception.DataAccessException: com.microsoft.sqlserver.jdbc.SQLServerException: Cannot open database "MSSQLJiraDBprod" requested by the login. The login failed. ClientConnectionId:29ea07a4-6e17-4ffa-9602-940d44963f71
	at com.atlassian.jira.database.DatabaseAccessorImpl.borrowConnection(DatabaseAccessorImpl.java:167)
	at com.atlassian.jira.database.DefaultQueryDslAccessor$1.executeQuery(DefaultQueryDslAccessor.java:84)
	at com.atlassian.jira.propertyset.CachingOfBizPropertyEntryStore.query(CachingOfBizPropertyEntryStore.java:326)
...

	... 17 more
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Cannot open database "MSSQLJiraDBprod" requested by the login. The login failed. ClientConnectionId:29ea07a4-6e17-4ffa-9602-940d44963f71
	at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
	at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:283)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:129)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:37)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:5333)
...

Example 2 (failure to execute the Batched Notification job when using a Postgres DB) :

2020-12-05 15:07:52,615-0600 Caesium-1-1 ERROR ServiceRunner     [c.a.s.caesium.impl.SchedulerQueueWorker] Unhandled exception thrown by job QueuedJob[jobId=com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJobSchedulerImpl.mentions,deadline=1607202432603]
java.lang.reflect.InvocationTargetException
	at sun.reflect.GeneratedMethodAccessor404.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at 
...
com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:409)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:454)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:382)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: Unable to establish a connection with the database. (The connection attempt failed.)
	at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findListIteratorByCondition(DefaultOfBizDelegator.java:408)
	at com.quisapps.jira.fieldsecurity.ofbiz.SecureOfBizDelegator.findListIteratorByCondition(SecureOfBizDelegator.java:309)
	... 17 more
Caused by: org.ofbiz.core.entity.GenericDataSourceException: Unable to establish a connection with the database. (The connection attempt failed.)
	at org.ofbiz.core.entity.jdbc.SQLProcessor.getConnection(SQLProcessor.java:343)
	at org.ofbiz.core.entity.GenericDAO.createEntityListIterator(GenericDAO.java:870)
	at org.ofbiz.core.entity.GenericDAO.selectListIteratorByCondition(GenericDAO.java:857)
	at org.ofbiz.core.entity.GenericHelperDAO.findListIteratorByCondition(GenericHelperDAO.java:216)
	at org.ofbiz.core.entity.GenericDelegator.findListIteratorByCondition(GenericDelegator.java:1243)
	at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findListIteratorByCondition(DefaultOfBizDelegator.java:405)
	... 18 more
Caused by: org.postgresql.util.PSQLException: The connection attempt failed.
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:292)
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:211)
	at org.postgresql.Driver.makeConnection(Driver.java:458)
...
com.atlassian.jira.ofbiz.sql.JiraSupportedDatabasesCompatibleJNDIFactory.getConnection(JiraSupportedDatabasesCompatibleJNDIFactory.java:38)
	at org.ofbiz.core.entity.TransactionFactory.getConnection(TransactionFactory.java:114)
	at org.ofbiz.core.entity.ConnectionFactory.getConnection(ConnectionFactory.java:59)
	at org.ofbiz.core.entity.jdbc.SQLProcessor.getConnection(SQLProcessor.java:340)
	... 24 more
Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
...

Example 3 (failure to execute the Batched Notification job when using a MS SQL Server DB):

2020-06-28 07:37:40,642+0200 Caesium-1-4 ERROR ServiceRunner     [c.a.s.caesium.impl.CaesiumSchedulerService] Unhandled exception during the attempt to execute job 'com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJobSchedulerImpl.mentions'; will attempt recovery in 60 seconds
com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:SELECT ID, JOB_ID, JOB_RUNNER_KEY, SCHED_TYPE, INTERVAL_MILLIS, FIRST_RUN, CRON_EXPRESSION, TIME_ZONE, NEXT_RUN, VERSION, PARAMETERS FROM dbo.clusteredjob WHERE JOB_ID=? (Connection reset)
	at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findListIteratorByCondition(DefaultOfBizDelegator.java:408)
	at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.findListIteratorByCondition(WrappingOfBizDelegator.java:283)
	at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.forEach(SelectQueryImpl.java:227)
	at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.consumeWith(SelectQueryImpl.java:214)
	at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.singleValue(SelectQueryImpl.java:191)
	at com.atlassian.jira.scheduler.OfBizClusteredJobDao.find(OfBizClusteredJobDao.java:88)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:409)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:454)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:382)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:SELECT ID, JOB_ID, JOB_RUNNER_KEY, SCHED_TYPE, INTERVAL_MILLIS, FIRST_RUN, CRON_EXPRESSION, TIME_ZONE, NEXT_RUN, VERSION, PARAMETERS FROM dbo.clusteredjob WHERE JOB_ID=? (Connection reset)
	at org.ofbiz.core.entity.jdbc.SQLProcessor.executeQuery(SQLProcessor.java:533)
	at org.ofbiz.core.entity.GenericDAO.createEntityListIterator(GenericDAO.java:877)
	at org.ofbiz.core.entity.GenericDAO.selectListIteratorByCondition(GenericDAO.java:857)
	at org.ofbiz.core.entity.GenericHelperDAO.findListIteratorByCondition(GenericHelperDAO.java:216)
	at org.ofbiz.core.entity.GenericDelegator.findListIteratorByCondition(GenericDelegator.java:1243)
	... 12 more
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2887)
	at com.microsoft.sqlserver.jdbc.TDSChannel.write(IOBuffer.java:2045)
	at com.microsoft.sqlserver.jdbc.TDSWriter.flush(IOBuffer.java:4146)

Expected Results.

Jira will run the job under its schedule as soon database connectivity is resumed

Workaround

For Jira Server / Jira Data Center (JDC) single node
- Restart the Jira application
For Jira Data Center (JDC) multi node
- Restart the node impacted by the bug

For the Mail Queue Service, go to Administration > System > Services > click Edit for the Mail Queue Service > click Update without changing anything.

Attachments

Issue Links

causes

JRASERVER-61954 Scheduled Jobs Unresponsive/Hangs e.g. mail handlers stop processing emails to create issues due to database connectivity interuptions

Closed

JSDSERVER-3653 Jira Service Desk stops processing email following any connectivity issues between the instance and database

Closed

TESLA-642 Loading...

BOOM-19 Loading...

is duplicated by

JRASERVER-70479 Database connectivity issues break scheduled jobs

Closed

relates to

JRACLOUD-62072 Scheduled Jobs can be lost track of when a database operation fails at the moment the job is claimed

Closed

JRASERVER-71876 DB connection failure causing threads stuck on user authentication

Closed

JRASERVER-76750 Create a document with details about the Jira Scheduler Stats

Gathering Interest

DBCON-1 Loading...

links to

Caesium #13

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

was cloned as: JDEV-37233 Loading...; RAID-28 Loading...

(4 relates to, 1 links to, 44 mentioned in, 2 was cloned as)

Activity

People

Assignee:: Jakub Reczycki

Reporter:: Oswaldo Hernandez (Inactive)

Votes:: 61 Vote for this issue

Watchers:: 100 Start watching this issue

Dates

Created:: 29/Jul/2016 12:56 AM

Updated:: 03/Apr/2024 4:11 PM

Resolved:: 30/Aug/2023 2:48 PM