Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-74296

(Oracle DB only) Node with index older than 3 months fails to start in healthy state due to SQLDataException (ORA-01873)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Low
    • 9.3.0
    • 8.14.0, 8.20.0, 9.2.0, 8.20.13
    • Data Center - Index

    Description

      Issue Summary

      Node with index older than 3 months fails to start in healthy state due to SQLDataException(ORA-01873).
      Please be informed that this issue is encountered in the when Jira is connected to Oracle DB.

      This is reproducible on Data Center: yes

      This bug is caused by particular way Oracle DB's perform interval operations on queries.

      What's worth noting is that it happens in circumstances we generally do not support: recycling nodes with months-old indexes in active clusters(such time period is way longer than our index replication data retention time).

      Steps to Reproduce

      1. Only on cluster using Oracle DB
      2. One of nodes is shut down and has index containing only data older than ~100 days
      3. Start the node

      Expected Results

      Node starts successfully without any alarming messages and without unnecessary long-lasting operations.

      Actual Results

      In all cases the following stack trace can be found in the logs.

      Other symptoms differ depending on the version(see below).

      com.querydsl.core.QueryException: Caught SQLDataException for select ISSUE_VERSION.issue_id, ISSUE_VERSION.parent_issue_id, ISSUE_VERSION.update_time, ISSUE_VERSION.index_version, ISSUE_VERSION.deleted
      from issue_version ISSUE_VERSION
      where ISSUE_VERSION.update_time > current_timestamp + interval '-19496228' second
      at com.querydsl.sql.DefaultSQLExceptionTranslator.translate(DefaultSQLExceptionTranslator.java:50)
      at com.querydsl.sql.Configuration.translate(Configuration.java:459)
      at com.querydsl.sql.AbstractSQLQuery.fetch(AbstractSQLQuery.java:502)
      at com.atlassian.jira.versioning.VersioningDao.lambda$findVersionsUpdatedInTheLast$15(VersioningDao.java:279)
      at com.atlassian.jira.database.DefaultQueryDslAccessor.lambda$executeQuery$0(DefaultQueryDslAccessor.java:64)
      at com.atlassian.jira.database.DatabaseAccessorImpl.lambda$runInTransaction$0(DatabaseAccessorImpl.java:105)
      at com.atlassian.jira.database.DatabaseAccessorImpl.executeQuery(DatabaseAccessorImpl.java:74)
      at com.atlassian.jira.database.DatabaseAccessorImpl.runInTransaction(DatabaseAccessorImpl.java:100)
      at com.atlassian.jira.database.DefaultQueryDslAccessor.executeQuery(DefaultQueryDslAccessor.java:63)
      at com.atlassian.jira.versioning.VersioningDao.findVersionsUpdatedInTheLast(VersioningDao.java:271)
      at com.atlassian.jira.versioning.EntityVersioningManagerImpl.findEntityVersionsUpdatedInTheLast(EntityVersioningManagerImpl.java:259)
      at com.atlassian.jira.versioning.EntityVersioningManagerImpl.findEntityVersionsUpdatedInTheLast(EntityVersioningManagerImpl.java:251)
      at com.atlassian.jira.versioning.EntityVersioningManagerWithStats.findEntityVersionsUpdatedInTheLast(EntityVersioningManagerWithStats.java:270)
      at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.reindexWithVersionCheckEntitiesUpdatedInTheLast(DefaultIndexRecoveryManager.java:227)
      at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.reindexWithVersionCheckEntitiesUpdatedInTheLast(DefaultIndexRecoveryManager.java:203)
      at com.atlassian.jira.cluster.DefaultClusterManager.rebuildLocalIndex(DefaultClusterManager.java:354)
      at com.atlassian.jira.cluster.DefaultClusterManager.checkIndexOnStart(DefaultClusterManager.java:207)
      at com.atlassian.jira.startup.ClusteringLauncher.clusterSynchronizedCheckIndex(ClusteringLauncher.java:99)
      at com.atlassian.jira.startup.ClusteringLauncher.start(ClusteringLauncher.java:130)
      ...
      Caused by: java.sql.SQLDataException: ORA-01873: the leading precision of the interval is too small
      

       

      Jira DC 8.14.0

      Node starts in seemingly healthy state, but it uses original(possibly obsolete) local index and the replication of index changes from other nodes does not work. The second symptom can be confirmed by looking whether the logs contain recurring message after startup:

      [INDEX-REPLAY] Node re-index service is not running: currentNode.isClustered=true, notRunningCounter=156, paused=true, lastPausedStacktrace=java.lang.Throwable
      

      Jira DC 9.2.0

      Node fails to start and on entry, it only displays page with the stack trace above.

      Jira DC 9.3.0

      Node eventually starts in healthy state, although it performs snapshot recovery on startup and in some cases (if snapshot in shared home does not exist or it also contains only issues not updated for 3+ months), performs full foreground reindex. Depending on the number of issues, comments, fields, etc, this foreground reindexing might take up to several hours.

       

      Workaround

      8.14.0 --> Impacted Version

      As advised in:  https://confluence.atlassian.com/jirakb/jira-fails-to-startup-due-to-a-corrupted-index-snapshot-causing-a-sqldataexception-1103435237.html

      • Remove the content of the folder <JIRA_SHARED_HOME_FOLDER>/export/indexsnapshots (we recommend copying its content in a temporary directory)
      • Start the Jira application
      • Verify that the Jira application starts successfully and that the SQL exception mentioned above is no longer mentioned in the Jira logs
      • If the startup is successful, you might want to either perform a full re-index on the node that was failing to start via the page ⚙ > System > Indexing
      • update or create at least one issue(to prevent repeating the problematic scenario on the next restart)

      9.2.0 --> Impacted Version

      1. disable local index rebuild attempt phase of Jira startup (using system property
        -Dcom.atlassian.jira.startup.rebuild.local.index=false

        see step 1 in this KB article for details )

      1. see points 1-3 for 9.3.0

      9.3.0 --> Contains Partial fix.

      Starting from 9.3.0 the node startup would not fail even though we see the SQLDataException. Message can be safely ignored, the node is healthy.

      However, the node while coming up will try to look for any snapshot in the <JIRA_SHARED_HOME_FOLDER>/export/indexsnapshots directory.

      There are 3 possibilitie here.

      1. If it finds any snapshot containing at least 1 issue updated within the last three months(from the current date), it will proceed with copying the snapshot to the local directory and then replay the index differences between the database and the indexes within the snapshot and startup successfully. Happy Scenario.

      2. Another possibility is that if the index snapshot contains no issues that were updated less than 100 days ago,
      then it would proceed with a foreground reindex during startup. This could go on for hours based on the instance size

      3. The Same scenario will happen if there is no index snapshot in the shared directory. In that case, Jira will also perform foreground reindexing.

      If we want to avoid the reindexing during startup and would like to run a Full Foreground Reindex from UI, then we could add the below JVM property in the setenv.sh or in windows services(on Windows). (you can find the steps here). Important: the node's index cannot be treated as up-to-date until the foreground reindex is triggered manually eventually. 

      -Dcom.atlassian.jira.startup.allow.full.reindex=false

      Attachments

        Issue Links

          Activity

            People

              jreczycki Jakub Reczycki
              jreczycki Jakub Reczycki
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: