[JRASERVER-72944] Restoring an index snapshot after a full re-index might trigger the index fixer, delaying the node start up

Rodrigo Martinez made changes - 01/Sep/2023 8:06 PM

Remote Link

Original: This issue links to "DELTA-1140 (Bulldog)" [ 593702 ]

New: This issue links to "DELTA-1140 (JIRA Server (Bulldog))" [ 593702 ]

Antoni Kowalski made changes - 31/Jul/2023 10:15 AM

Remote Link

New: This issue links to "Page (Confluence)" [ 794643 ]

Alp made changes - 28/Jul/2023 6:33 AM

Description

Original: h3. Issue Summary

When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
While this process happens, the index replication is kept on hold.

h3. Steps to Reproduce
# Stop all nodes
# Start 1 node
# Run a full locked reindex
# Start a second node

h3. Expected Results

After getting the index snapshot from the first node, the node should operate normally in a short time.
h3. Actual Results

If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

During this time, we see the index-fixer process stuck at 60%, at the {{{}atlassian-jira.log{}}}:
{code:java}
2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
{code}
h3. Workaround 1
Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.
Example SQL:
{code:java}
update issue_version set update_time = 'epoch'
update comment_version set update_time = 'epoch'
update worklog_version set update_time = 'epoch' {code}
h3. Workaround 2
The workaround below will help you to start a new Jira node faster but you will still need to run a lock-reindexing before you put it behind the load balancer:
* Disable Index Recovery by navigating to *System > Indexing (in the Advanced section)* and setting *Enable Index Recovery* OFF.
* Move the Index Snapshots that you currently have from *JIRA_INSTALL/temp/* (on Node1) and *SHARED_FOLDER/export/indexsnapshots/* to another folder so that when starting the new node, it shouldn't go and search for the snapshots
* Copy Jira Install and Home directories to the new node
* Edit the *cluster.properties* file in the local home directory and change *jira.node.id* to a new, unique identified.
* Go to the *Jira_Home/caches* folder +on the new node+ and delete *indexesV1*
* Start the new node
* Login to Jira over the new node and run lock reindexing
* Once the indexing is done, put it behind the load balancer
* Enable index recovery again

New: h3. Issue Summary

When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
While this process happens, the index replication is kept on hold.

h3. Steps to Reproduce
# Stop all nodes
# Start 1 node
# Run a full locked reindex
# Start a second node

h3. Expected Results

After getting the index snapshot from the first node, the node should operate normally in a short time.
h3. Actual Results

If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

During this time, we see the index-fixer process stuck at 60%, at the {{{}atlassian-jira.log{}}}:
{code:java}
2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
{code}
h3. Workaround 1
Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.
Example SQL:
{code:java}
update issue_version set update_time = 'epoch'
update comment_version set update_time = 'epoch'
update worklog_version set update_time = 'epoch' {code}

Devisree Gedda made changes - 28/Apr/2023 12:23 PM

Remote Link

New: This issue links to "Page (Confluence)" [ 756142 ]

Filipi Lima made changes - 23/Mar/2023 5:51 PM

Affects Version/s		New: 8.20.10 [ 100697 ]
Affects Version/s		New: 8.20.14 [ 103090 ]

Kunal Kishore made changes - 14/Feb/2023 5:10 AM

Remote Link

New: This issue links to "Page (Confluence)" [ 731060 ]

Alp made changes - 21/Oct/2022 8:28 AM

Description

Original: h3. Issue Summary

When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
While this process happens, the index replication is kept on hold.

h3. Steps to Reproduce
# Stop all nodes
# Start 1 node
# Run a full locked reindex
# Start a second node

h3. Expected Results

After getting the index snapshot from the first node, the node should operate normally in a short time.
h3. Actual Results

If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

During this time, we see the index-fixer process stuck at 60%, at the {{{}atlassian-jira.log{}}}:
{code:java}
2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
{code}
h3. Workaround

Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.

Example SQL:
{code:java}
update issue_version set update_time = 'epoch'
update comment_version set update_time = 'epoch'
update worklog_version set update_time = 'epoch' {code}

New: h3. Issue Summary

When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
While this process happens, the index replication is kept on hold.

h3. Steps to Reproduce
# Stop all nodes
# Start 1 node
# Run a full locked reindex
# Start a second node

h3. Expected Results

After getting the index snapshot from the first node, the node should operate normally in a short time.
h3. Actual Results

If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

During this time, we see the index-fixer process stuck at 60%, at the {{{}atlassian-jira.log{}}}:
{code:java}
2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
{code}
h3. Workaround 1
Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.
Example SQL:
{code:java}
update issue_version set update_time = 'epoch'
update comment_version set update_time = 'epoch'
update worklog_version set update_time = 'epoch' {code}
h3. Workaround 2
The workaround below will help you to start a new Jira node faster but you will still need to run a lock-reindexing before you put it behind the load balancer:
* Disable Index Recovery by navigating to *System > Indexing (in the Advanced section)* and setting *Enable Index Recovery* OFF.
* Move the Index Snapshots that you currently have from *JIRA_INSTALL/temp/* (on Node1) and *SHARED_FOLDER/export/indexsnapshots/* to another folder so that when starting the new node, it shouldn't go and search for the snapshots
* Copy Jira Install and Home directories to the new node
* Edit the *cluster.properties* file in the local home directory and change *jira.node.id* to a new, unique identified.
* Go to the *Jira_Home/caches* folder +on the new node+ and delete *indexesV1*
* Start the new node
* Login to Jira over the new node and run lock reindexing
* Once the indexing is done, put it behind the load balancer
* Enable index recovery again

Maciej Swinarski (Inactive) made changes - 04/Jul/2022 8:23 AM

Assignee

New: Maciej Swinarski [ mswinarski ]

Mila made changes - 30/Jun/2022 5:20 PM

Remote Link

New: This issue links to "Page (Confluence)" [ 658307 ]

Maciej Swinarski (Inactive) made changes - 22/Jun/2022 8:06 AM

Description

Original: h3. Issue Summary
When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
While this process happens, the index replication is kept on hold.

h3. Steps to Reproduce
# Stop all nodes
# Start 1 node
# Run a full locked reindex
# Start a second node

h3. Expected Results
After getting the index snapshot from the first node, the node should operate normally in a short time.

h3. Actual Results
If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

During this time, we see the index-fixer process stuck at 60%, at the {{atlassian-jira.log}}:
{code}
2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
{code}

h3. Workaround

Currently there is no known workaround for this behavior. A workaround will be added here when available

New: h3. Issue Summary

When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
While this process happens, the index replication is kept on hold.

h3. Steps to Reproduce
# Stop all nodes
# Start 1 node
# Run a full locked reindex
# Start a second node

h3. Expected Results

After getting the index snapshot from the first node, the node should operate normally in a short time.
h3. Actual Results

If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

During this time, we see the index-fixer process stuck at 60%, at the {{{}atlassian-jira.log{}}}:
{code:java}
2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
{code}
h3. Workaround

Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.

Example SQL:
{code:java}
update issue_version set update_time = 'epoch'
update comment_version set update_time = 'epoch'
update worklog_version set update_time = 'epoch' {code}

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround 1

Attachments

Issue Links

Forms

Activity

People

Dates