[JRASERVER-72944] Restoring an index snapshot after a full re-index might trigger the index fixer, delaying the node start up

Type: Bug
Resolution: Fixed
Priority: High (View bug fix roadmap)
Fix Version/s: 9.0.0
Affects Version/s: 8.10.0, 8.13.2, 8.22.2, 8.20.10, 8.20.14
Component/s: Data Center - Index
Labels:

Introduced in Version:
8.1
Support reference count:
16
Symptom Severity:
Severity 3 - Minor
UIS:
68
Bug Fix Policy:
View Atlassian Server bug fix policy
Current Status:

Hide

Atlassian Update – 21 Mar 2022

Hi everyone,

In Jira 9.0 we ensure versions to each entity (issue, comment, worklog and change history). After upgrading to this version all missing versions will be properly added as a part of upgrade task.

Also we change behaviour of making index snapshots. Now if Jira instance hasn't consistent index it will not make a snapshot and, as a result, snapshot with corrupted index will not be put to shared home directory.

More details how to handle situation when index is not consistent can be found here: https://confluence.atlassian.com/jirakb/indexing-inconsistency-troubleshooting-1114800953.html

Thank you,
Michał Błajet,
Jira Developer

Show
Atlassian Update – 21 Mar 2022 Hi everyone, In Jira 9.0 we ensure versions to each entity (issue, comment, worklog and change history). After upgrading to this version all missing versions will be properly added as a part of upgrade task. Also we change behaviour of making index snapshots. Now if Jira instance hasn't consistent index it will not make a snapshot and, as a result, snapshot with corrupted index will not be put to shared home directory. More details how to handle situation when index is not consistent can be found here: https://confluence.atlassian.com/jirakb/indexing-inconsistency-troubleshooting-1114800953.html Thank you, Michał Błajet, Jira Developer

Issue Summary

When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
While this process happens, the index replication is kept on hold.

Steps to Reproduce

Stop all nodes
Start 1 node
Run a full locked reindex
Start a second node

Expected Results

After getting the index snapshot from the first node, the node should operate normally in a short time.

Actual Results

If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

During this time, we see the index-fixer process stuck at 60%, at the atlassian-jira.log:

2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)

Workaround 1

Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.
Example SQL:

update issue_version set update_time = 'epoch' 
update comment_version set update_time = 'epoch' 
update worklog_version set update_time = 'epoch'

mentioned in: Page Loading...; Page Loading...; Page Failed to load; Page Failed to load; Page Failed to load; Page Loading...; Page Loading...; Page Loading...

relates to: DELTA-1140 Loading...

(3 mentioned in, 1 relates to)

There are no comments yet on this issue.

Assignee:: Maciej Swinarski (Inactive)

Reporter:: Allan Gandelman

Affected customers:: 6 This affects my team

Watchers:: 28 Start watching this issue

Created:: 25/Oct/2021 3:41 PM

Updated:: 01/Sep/2023 8:06 PM

Resolved:: 21/Jun/2022 9:39 AM

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround 1

Attachments

Issue Links

Forms

Activity

People

Dates