-
Bug
-
Resolution: Unresolved
-
Highest
-
None
-
8.0.0, 8.10.2, 8.5.19, 8.13.17, 8.20.8, 8.20.12, 9.4.9, 9.10.2, 9.12.2
-
8
-
129
-
Severity 2 - Major
-
98
-
-
Problem
Batched notifications execute periodical tasks (every 14 days) to clean up old events from the AO_733371_EVENT and AO_733371_EVENT_RECIPIENT tables. However, on unknown circumstances, this leads to extremely slow queries on the database if using Microsoft SQL Server. This issue seems to happen more often on older MSSQL versions such as 2012, 2014, 2016, but it was also reported with the version 2019.
The issue seems to be related to JRASERVER-65168 where it was described that single column indexes used by Jira might lead to this problem. Even though that specific bug is resolved, it wasn't created multi-column indexes for the batched notifications tables.
The same problem can cause delays in batched notifications delivery.
Example query:
select "AO_733371_EVENT"."ID", "AO_733371_EVENT"."EVENT_TYPE", "AO_733371_EVENT"."USER_KEY", "AO_733371_EVENT"."ACTION", "AO_733371_EVENT"."ACTION_ID", "AO_733371_EVENT"."EVENT_BUNDLE_ID", "AO_733371_EVENT"."CREATED" from "jiraschema"."AO_733371_EVENT" "AO_733371_EVENT" where exists ( select "AO_733371_EVENT_RECIPIENT"."ID" from "jiraschema"."AO_733371_EVENT_RECIPIENT" "AO_733371_EVENT_RECIPIENT" where "AO_733371_EVENT_RECIPIENT"."EVENT_ID" = "AO_733371_EVENT"."ID" and "AO_733371_EVENT_RECIPIENT"."STATUS" = 'NEW' and "AO_733371_EVENT_RECIPIENT"."CONSUMER_NAME" = 'mailMentionEventConsumer' and "AO_733371_EVENT_RECIPIENT"."SEND_DATE" <= '2020-04-29 06:14:41') and ("AO_733371_EVENT"."EVENT_TYPE" in (92638,92593,92594,92594,92595,92596,92597,92599,92599,92600)) and "AO_733371_EVENT"."ID" >= 85125 order by "AO_733371_EVENT"."ID" asc offset @P24 rows fetch next @P25 rows only
Steps to Reproduce
- Set up Batched Notifications (enabled by default on Jira 8.0)
- Have a large number of email notifications occurring daily
- Monitor the database CPU usage with graphs
Expected Results
- The cleanup tasks occur without affecting the database performance, as it does on other DBMSs such as Postgres.
- The Jira Batched Notifications are sent in a timely manner (based on the frequency setting)
Actual Results
- The cleanup tasks take several hours to complete, with the CPU spiking to 100% usage during that period. General slowness is experienced on Jira due to this.
- The Jira Batched Notifications are sent with hours of delay
Note
When analyzing thread dumps, we should see 2 Caesium threads running batched notification jobs:
- Thread 1
"Caesium-1-2" #276 daemon prio=5 os_prio=0 tid=0x00007feae318d000 nid=0x85f0 runnable [0x00007fe7c2578000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2023) at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6418) - locked <0x00007ff035c7c3d8> (a com.microsoft.sqlserver.jdbc.TDSReader) at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7579) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:590) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:522) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2930) - locked <0x00007feccf08a6e8> (a java.lang.Object) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:444) ... at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$searchEvents$10(BatchNotificationJob.java:266) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$7455/1356348673.apply(Unknown Source) at com.atlassian.jira.plugins.inform.performance.MeasurementWorkerFactory$1.measure(MeasurementWorkerFactory.java:39) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.searchEvents(BatchNotificationJob.java:264) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$notifyUsers$1(BatchNotificationJob.java:112) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$7454/2336314.apply(Unknown Source) at com.atlassian.jira.plugins.inform.performance.MeasurementWorkerFactory$1.measure(MeasurementWorkerFactory.java:39) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.notifyUsers(BatchNotificationJob.java:109) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$runJob$0(BatchNotificationJob.java:86) ... at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:382) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$$Lambda$2167/528009948.accept(Unknown Source) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.lang.Thread.run(Thread.java:748)
- Thread 2:
"Caesium-1-1" #275 daemon prio=5 os_prio=0 tid=0x00007feae318c800 nid=0x85ef runnable [0x00007fe7c48a1000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2023) at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6418) - locked <0x00007ff476800db8> (a com.microsoft.sqlserver.jdbc.TDSReader) at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7579) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:590) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:522) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2930) - locked <0x00007feec8026738> (a java.lang.Object) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:444) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83) at com.atlassian.jira.ofbiz.sql.PreparedStatementWrapper.executeQuery(PreparedStatementWrapper.java:42) at com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement.lambda$executeQuery$5(DiagnosticPreparedStatement.java:59) ... at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$searchEvents$10(BatchNotificationJob.java:266) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$7455/1356348673.apply(Unknown Source) at com.atlassian.jira.plugins.inform.performance.MeasurementWorkerFactory$1.measure(MeasurementWorkerFactory.java:39) ... at com.atlassian.jira.plugins.inform.performance.MeasurementWorkerFactory$1.measure(MeasurementWorkerFactory.java:39) at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.runJob(BatchNotificationJob.java:84) at com.atlassian.jira.plugins.inform.batching.cron.ConditionalJobRunner.runJob(ConditionalJobRunner.java:33) at com.atlassian.jira.plugins.inform.batching.cron.ConditionalJobRunner.runJob(ConditionalJobRunner.java:33 ... at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.lang.Thread.run(Thread.java:748)
When analyzing long running queries on the DB side, we should see long queries that look like this:
select "AO_733371_EVENT"."ID", "AO_733371_EVENT"."EVENT_TYPE", "AO_733371_EVENT"."USER_KEY", "AO_733371_EVENT"."ACTION", "AO_733371_EVENT"."ACTION_ID", "AO_733371_EVENT"."EVENT_BUNDLE_ID", "AO_733371_EVENT"."CREATED" from "dbo"."AO_733371_EVENT" "AO_733371_EVENT" where exists (select "AO_733371_EVENT_RECIPIENT"."ID" from "dbo"."AO_733371_EVENT_RECIPIENT" "AO_733371_EVENT_RECIPIENT" where "AO_733371_EVENT_RECIPIENT"."EVENT_ID" = "AO_733371_EVENT"."ID" and "AO_733371_EVENT_RECIPIENT"."STATUS" = @P0 and "AO_733371_EVENT_RECIPIENT"."CONSUMER_NAME" = @P1 and "AO_733371_EVENT_RECIPIENT"."SEND_DATE" <= @P2) and ("AO_733371_EVENT"."EVENT_TYPE" in (@P3, @P4, @P5, @P6, @P7, @P8, @P9, @P10, @P11, @P12, @P13, @P14, @P15, @P16, @P17, @P18, @P19, @P20, @P21, @P22, @P23, @P24, @P25, @P26, @P27)) and "AO_733371_EVENT"."ID" >= @P28 order by "AO_733371_EVENT"."ID" asc offset @P29 rows fetch next @P30 rows only
Workaround
No confirmed workaround available.
- Users that have upgrade to newer MSSQL versions (such as 2017) noticed that the occurrence of the problem is smaller.
- If you're using newer MSSQL versions and experiencing this problem, verify the compatibility mode set to this database. If it was upgraded from previous MSSQL versions, it might be set to mode 120/110, and there was improved query execution on MSSQL 2017 as can be verified on this Microsoft document
- is related to
-
JRASERVER-65168 Add Multi-Column Index to JIRA Tables
- Closed
- is mentioned by
-
BOAR-616 Loading...
- Mentioned in
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
- relates to
-
ACE-6368 Loading...
-
MNSTR-5967 Loading...
-
MNSTR-6171 Loading...
- was split into
-
MNSTR-4422 Loading...