Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
9.4.8
-
9.04
-
9
-
Severity 2 - Major
-
86
-
Description
Issue Summary
Index recovery unhelpfully triggers a full lock reindex if an issue in the index catchup phase fails to index.
For example, the implementation of JSWSERVER-20133 results in the issue containing the immense value to be marked as failed.
During a full reindex, the reindex will "complete" to 100%, despite individual failures. During index recovery during node startup, the catch up process will also sustain individual failures, however, the catch-up process will be deemed failed and the node will then proceed to performing a full lock reindex.
This is unhelpful because the node startup will be delayed, potentially for a very long time, and, the full reindex will also fail to index these issues - the underlying problem is not fixed.
Enterprise Jira instance could take 24 hours or longer to perform full lock reindex. Startup is often a sensitive time for Jira Administrators. While other nodes may be typically online to mitigate the impact, this is not guaranteed, for example, environment failover, environment restore, etc.
Steps to Reproduce
- Configure a 2 node Jira v9.4.8 cluster. Have 1 node started, the other stopped.
- Add an issue with some sample data
- Perform a scheduled Jira snapshot to create a snapshot in shared_home
- Insert a customfield value over the UTF8 encoding limit of 32766 (we inserted a checklist item via Checklist for Jira v6.2.5 to do this)
- Delete Node 2's local index, then start the node
Expected Results
- Node 2 starts and initiates the index recovery procedure, picking the snapshot in shared_home
- The index catch up phase executes, but the issue with the immense term fails to index.
- An warning is logged for the admin, but the startup continues
Actual Results
- Node 2 starts and initiates the index recovery procedure, picking the snapshot in shared_home
[c.a.jira.cluster.DefaultClusterManager] Current node: <NODE_NAME>. Rebuilding local index will be skipped, because replicated index operation is missing necessary data. This is likely to happen after system restore. 2023-10-16 18:01:37,736-0400 main INFO [c.a.jira.index.DefaultIndexFetcher] [INDEX_FETCHER] Jira will try to recover the most recent snapshot: <SNAPSHOT>, chosen based on "last modified" file property from <N> existing snapshots: <SNAPSHOTS>, timestamp: <TIMESTAMP>). Only snapshots younger than 192 hours are considered usable. 2023-10-16 18:03:26,544-0400 main INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Restoring search indexes from snapshot and catching up with changes - 1% complete... Starting 2023-10-16 18:03:26,734-0400 main INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Restoring search indexes - 30% complete... Restored index backup
- The index catch up phase executes, but the issue with the immense term fails to index.
2023-10-16 18:03:27,646-0400 main INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Re-indexing issues updated in the last {1 days, 18 hours, 18 minutes, and 27 seconds}. (Note: it's an intentionally wider range) 2023-10-16 18:04:13,666-0400 IssueIndexer:thread-71 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Updating index with recent changes - 31% complete... Processing issue changes is 1% complete ... 2023-10-16 18:06:40,136-0400 IssueIndexer:thread-48 ERROR [c.a.j.issue.index.DocumentScrubber] A document contained a potential immense term in field customfield_12000. The field has been removed from the document. 2023-10-16 18:06:40,136-0400 IssueIndexer:thread-48 ERROR [c.a.j.issue.index.DocumentScrubber] A document contained a potential immense term in field customfield_12000_item. The field has been removed from the document. ... 2023-10-16 18:06:41,265-0400 IssueIndexer:thread-48 WARN [c.a.jira.index.AccumulatingResultBuilder] Indexing failed for Issue - '10002' 2023-10-16 18:06:41,265-0400 IssueIndexer:thread-48 WARN [c.a.jira.index.AccumulatingResultBuilder] Document contains at least one immense term in field="customfield_10002_item" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[73, 91, 116, 99, 114, 100, 95, 99, 104, 101, 99, 107, 109, 97, 114, 120, 95, 101, 120, 116, 101, 114, 110, 97, 108, 95, 102, 105, 110, 100]...', original message: bytes can be at most 32766 in length; got 68089 java.lang.IllegalArgumentException: Document contains at least one immense term in field="customfield_10002_item" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[73, 91, 116, 99, 114, 100, 95, 99, 104, 101, 99, 107, 109, 97, 114, 120, 95, 101, 120, 116, 101, 114, 110, 97, 108, 95, 102, 105, 110, 100]...', original message: bytes can be at most 32766 in length; got 68089 at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:796) 2023-10-16 18:06:42,760-0400 main WARN [c.a.jira.index.AccumulatingResultBuilder] Indexing completed with 1 errors com.atlassian.jira.index.IndexingFailureException: Indexing completed with 1 errors 2023-10-16 18:07:11,479-0400 IssueIndexer:thread-57 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Updating index with recent changes - 79% complete... Processing issue changes is 97% complete 2023-10-16 18:07:16,720-0400 IssueIndexer:thread-14 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Updating index with recent changes - 80% complete... Processing issue changes is 99% complete
- The failure is recognised by the snapshot restore process, and a full-lock reindex is commenced:
2023-10-16 18:07:47,800-0400 main ERROR [c.a.jira.cluster.DefaultClusterManager] Current node: <NODE_NAME>. Couldn't recover index even though it had been found in shared. Current list of other nodes: [<NODE_LIST>] com.atlassian.jira.index.IndexingFailureException: Indexing completed with 1 errors at com.atlassian.jira.index.AccumulatingResultBuilder$CompositeResult.await(AccumulatingResultBuilder.java:151) at com.atlassian.jira.issue.index.DefaultIssueIndexer.doIssuesOperationBatchMode(DefaultIssueIndexer.java:263) at com.atlassian.jira.issue.index.DefaultIssueIndexer.reindexIssuesBatchMode(DefaultIssueIndexer.java:250) at com.atlassian.jira.issue.index.DefaultIndexManager.lambda$reindexIssuesBatchMode$15(DefaultIndexManager.java:1024) at com.atlassian.jira.util.thread.JiraThreadLocalUtils.lambda$wrap$1(JiraThreadLocalUtils.java:156) at com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock(DefaultIndexManager.java:423) at com.atlassian.jira.issue.index.DefaultIndexManager.reindexIssuesBatchMode(DefaultIndexManager.java:1024) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.atlassian.jira.config.component.SwitchingInvocationHandler.invoke(SwitchingInvocationHandler.java:38) at com.sun.proxy.$Proxy86.reindexIssuesBatchMode(Unknown Source) at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.reindexIssueByIds(DefaultIndexRecoveryManager.java:326) at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.reindexOutdatedEntities(DefaultIndexRecoveryManager.java:306) at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.reindexWithVersionCheckEntitiesUpdatedInTheLast(DefaultIndexRecoveryManager.java:245) at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.access$300(DefaultIndexRecoveryManager.java:74) at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager$ReplaceIndexRunner.catchUp(DefaultIndexRecoveryManager.java:529) at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager$ReplaceIndexRunner.run(DefaultIndexRecoveryManager.java:487) <SNIP> at com.sun.proxy.$Proxy68.withReindexLock(Unknown Source) at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:171) at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.safeRecoverIndexFromBackup(DefaultIndexRecoveryManager.java:138) at com.atlassian.jira.index.DefaultIndexFetcher.recoverIndexFromMostRecentSnapshot(DefaultIndexFetcher.java:108) at com.atlassian.jira.cluster.DefaultClusterManager.pickIndexSnapshotFromSharedHome(DefaultClusterManager.java:428) at com.atlassian.jira.cluster.DefaultClusterManager.checkIndexOnStart(DefaultClusterManager.java:221) at com.atlassian.jira.startup.ClusteringLauncher.clusterSynchronizedCheckIndex(ClusteringLauncher.java:99) at com.atlassian.jira.startup.ClusteringLauncher.start(ClusteringLauncher.java:130) at com.atlassian.jira.startup.DefaultJiraLauncher.postDBActivated(DefaultJiraLauncher.java:166) at com.atlassian.jira.startup.DefaultJiraLauncher.lambda$postDbLaunch$2(DefaultJiraLauncher.java:144) at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrEnqueue(DatabaseConfigurationManagerImpl.java:307) at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrWhenDatabaseActivated(DatabaseConfigurationManagerImpl.java:202) at com.atlassian.jira.startup.DefaultJiraLauncher.postDbLaunch(DefaultJiraLauncher.java:135) <SNIP> 2023-10-16 18:07:47,801-0400 main INFO [c.a.jira.cluster.DefaultClusterManager] Current node: <NODE_NAME> Will trigger full foreground reindex. System property com.atlassian.jira.startup.allow.full.reindex is not defined, and it will default to true. You can disable triggering full foreground reindex by setting this property to "false". 2023-10-16 18:07:47,801-0400 main INFO [c.a.jira.cluster.DefaultClusterManager] Current node: <NODE_NAME>. Triggering full-reindex. Current list of other nodes: [<NODE_LIST>] 2023-10-16 18:07:47,857-0400 main INFO [c.a.jira.cluster.DefaultClusterManager] Current node: <NODE_NAME>. Triggered full-reindex, requestId: 10500. Waiting for reindex to finish... 2023-10-16 18:07:49,992-0400 JiraTaskExecutionThread-1 INFO anonymous [c.a.j.index.request.DefaultReindexRequestManager] Re-indexing started 2023-10-16 18:07:49,993-0400 JiraTaskExecutionThread-1 INFO anonymous [c.a.j.util.index.CompositeIndexLifecycleManager] Reindex All starting...
Workaround
Currently there is no known workaround for this behavior. A workaround will be added here when available