[JRASERVER-74296] (Oracle DB only) Node with index older than 3 months fails to start in healthy state due to SQLDataException (ORA-01873)

Type: Bug
Resolution: Fixed
Priority: Low (View bug fix roadmap)
Fix Version/s: 9.3.0
Affects Version/s: 8.14.0, 8.20.0, 9.2.0, 8.20.13
Component/s: Data Center - Index
Labels:
- rbst

Introduced in Version:
8.14
Support reference count:
1
Symptom Severity:
Severity 3 - Minor
Bug Fix Policy:
View Atlassian Server bug fix policy

Issue Summary

Node with index older than 3 months fails to start in healthy state due to SQLDataException(ORA-01873).
Please be informed that this issue is encountered in the when Jira is connected to Oracle DB.

This is reproducible on Data Center: yes

This bug is caused by particular way Oracle DB's perform interval operations on queries.

What's worth noting is that it happens in circumstances we generally do not support: recycling nodes with months-old indexes in active clusters(such time period is way longer than our index replication data retention time).

Steps to Reproduce

Only on cluster using Oracle DB
One of nodes is shut down and has index containing only data older than ~100 days
Start the node

Expected Results

Node starts successfully without any alarming messages and without unnecessary long-lasting operations.

Actual Results

In all cases the following stack trace can be found in the logs.

Other symptoms differ depending on the version(see below).

com.querydsl.core.QueryException: Caught SQLDataException for select ISSUE_VERSION.issue_id, ISSUE_VERSION.parent_issue_id, ISSUE_VERSION.update_time, ISSUE_VERSION.index_version, ISSUE_VERSION.deleted
from issue_version ISSUE_VERSION
where ISSUE_VERSION.update_time > current_timestamp + interval '-19496228' second
at com.querydsl.sql.DefaultSQLExceptionTranslator.translate(DefaultSQLExceptionTranslator.java:50)
at com.querydsl.sql.Configuration.translate(Configuration.java:459)
at com.querydsl.sql.AbstractSQLQuery.fetch(AbstractSQLQuery.java:502)
at com.atlassian.jira.versioning.VersioningDao.lambda$findVersionsUpdatedInTheLast$15(VersioningDao.java:279)
at com.atlassian.jira.database.DefaultQueryDslAccessor.lambda$executeQuery$0(DefaultQueryDslAccessor.java:64)
at com.atlassian.jira.database.DatabaseAccessorImpl.lambda$runInTransaction$0(DatabaseAccessorImpl.java:105)
at com.atlassian.jira.database.DatabaseAccessorImpl.executeQuery(DatabaseAccessorImpl.java:74)
at com.atlassian.jira.database.DatabaseAccessorImpl.runInTransaction(DatabaseAccessorImpl.java:100)
at com.atlassian.jira.database.DefaultQueryDslAccessor.executeQuery(DefaultQueryDslAccessor.java:63)
at com.atlassian.jira.versioning.VersioningDao.findVersionsUpdatedInTheLast(VersioningDao.java:271)
at com.atlassian.jira.versioning.EntityVersioningManagerImpl.findEntityVersionsUpdatedInTheLast(EntityVersioningManagerImpl.java:259)
at com.atlassian.jira.versioning.EntityVersioningManagerImpl.findEntityVersionsUpdatedInTheLast(EntityVersioningManagerImpl.java:251)
at com.atlassian.jira.versioning.EntityVersioningManagerWithStats.findEntityVersionsUpdatedInTheLast(EntityVersioningManagerWithStats.java:270)
at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.reindexWithVersionCheckEntitiesUpdatedInTheLast(DefaultIndexRecoveryManager.java:227)
at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.reindexWithVersionCheckEntitiesUpdatedInTheLast(DefaultIndexRecoveryManager.java:203)
at com.atlassian.jira.cluster.DefaultClusterManager.rebuildLocalIndex(DefaultClusterManager.java:354)
at com.atlassian.jira.cluster.DefaultClusterManager.checkIndexOnStart(DefaultClusterManager.java:207)
at com.atlassian.jira.startup.ClusteringLauncher.clusterSynchronizedCheckIndex(ClusteringLauncher.java:99)
at com.atlassian.jira.startup.ClusteringLauncher.start(ClusteringLauncher.java:130)
...
Caused by: java.sql.SQLDataException: ORA-01873: the leading precision of the interval is too small

Jira DC 8.14.0

Node starts in seemingly healthy state, but it uses original(possibly obsolete) local index and the replication of index changes from other nodes does not work. The second symptom can be confirmed by looking whether the logs contain recurring message after startup:

[INDEX-REPLAY] Node re-index service is not running: currentNode.isClustered=true, notRunningCounter=156, paused=true, lastPausedStacktrace=java.lang.Throwable

Jira DC 9.2.0

Node fails to start and on entry, it only displays page with the stack trace above.

Jira DC 9.3.0

Node eventually starts in healthy state, although it performs snapshot recovery on startup and in some cases (if snapshot in shared home does not exist or it also contains only issues not updated for 3+ months), performs full foreground reindex. Depending on the number of issues, comments, fields, etc, this foreground reindexing might take up to several hours.

Workaround

8.14.0 --> Impacted Version

As advised in: https://confluence.atlassian.com/jirakb/jira-fails-to-startup-due-to-a-corrupted-index-snapshot-causing-a-sqldataexception-1103435237.html

Remove the content of the folder <JIRA_SHARED_HOME_FOLDER>/export/indexsnapshots (we recommend copying its content in a temporary directory)
Start the Jira application
Verify that the Jira application starts successfully and that the SQL exception mentioned above is no longer mentioned in the Jira logs
If the startup is successful, you might want to either perform a full re-index on the node that was failing to start via the page ⚙ > System > Indexing
update or create at least one issue(to prevent repeating the problematic scenario on the next restart)

9.2.0 --> Impacted Version

disable local index rebuild attempt phase of Jira startup (using system property
```
-Dcom.atlassian.jira.startup.rebuild.local.index=false
```
see step 1 in this KB article for details )

see points 1-3 for 9.3.0

9.3.0 --> Contains Partial fix.

Starting from 9.3.0 the node startup would not fail even though we see the SQLDataException. Message can be safely ignored, the node is healthy.

However, the node while coming up will try to look for any snapshot in the <JIRA_SHARED_HOME_FOLDER>/export/indexsnapshots directory.

There are 3 possibilitie here.

1. If it finds any snapshot containing at least 1 issue updated within the last three months(from the current date), it will proceed with copying the snapshot to the local directory and then replay the index differences between the database and the indexes within the snapshot and startup successfully. Happy Scenario.

2. Another possibility is that if the index snapshot contains no issues that were updated less than 100 days ago,
then it would proceed with a foreground reindex during startup. This could go on for hours based on the instance size

3. The Same scenario will happen if there is no index snapshot in the shared directory. In that case, Jira will also perform foreground reindexing.

If we want to avoid the reindexing during startup and would like to run a Full Foreground Reindex from UI, then we could add the below JVM property in the setenv.sh or in windows services(on Windows). (you can find the steps here). Important: the node's index cannot be treated as up-to-date until the foreground reindex is triggered manually eventually.

-Dcom.atlassian.jira.startup.allow.full.reindex=false

links to

ASCI-135

There are no comments yet on this issue.

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Jira DC 8.14.0

Jira DC 9.2.0

Jira DC 9.3.0

Workaround

8.14.0 --> Impacted Version

9.2.0 --> Impacted Version

9.3.0 --> Contains Partial fix.

Attachments

Issue Links

Forms

Activity

People

Dates