[JRASERVER-62669] Automatic restore of indexes will fail if the node that registered the latest index operation is unavailable

Type: Bug
Resolution: Fixed
Priority: High (View bug fix roadmap)
Fix Version/s: 8.19.1, 8.20.0
Affects Version/s: 6.4.14, 7.2.2, 7.2.9, 7.3.7, 7.12.1
Component/s: Data Center - Other, Indexing
Labels:

Introduced in Version:
6.04
Support reference count:
53
Symptom Severity:
Severity 2 - Major
UIS:
38
Bug Fix Policy:
View Atlassian Server bug fix policy

Summary

When starting a node that has outdated indexes (meaning: not having the latest index operation in the cluster replicated), it is supposed to automatically restore the indexes from the node that registered the latest index operation. However, if that node is unavailable or if that node never wrote a row to replicatedindexoperation, this will fail and the node will not have the latest index updates. See related JRASERVER-66550

Expected Behavior

Copy indexes from another node that has healthy indexes (have updated the indexes with latest operation).

Actual Behavior

The index restore fails and the below message is registered on the logs of the node that claimed the index restore request.

2016-09-30 16:25:33,487 ClusterMessageHandlerServiceThread:thread-1 INFO      [jira.index.ha.DefaultIndexCopyService] Index backup started. Requesting node: node3
2016-09-30 16:25:33,488 ClusterMessageHandlerServiceThread:thread-1 WARN      [jira.index.ha.DefaultIndexCopyService] Index backup failed - latest index operation not found. Requesting node: node3

Steps to Reproduce

Setup a new JIRA instance, which will be node1;
Create a new project and some issues.
Setup another JIRA instance, which will be node2. This node will restore the index from node1. Do not create or modify any issues in this node.
Setup yet another JIRA instance, which will be node3. This node might attempt to restore the index from node2. (it depends which node claims the request first)
node2 will fail to create an index snapshot as there are no rows in replicatedindexoperation having node_id = node2.

Note

Problem is node2 shouldn't reply to the index request since it doesn't satisfy sanity check, see JRASERVER-66550.
That will leave only node1 as a potential index provider, which fixes the problem.

Workaround

option1

On the new node in the cluster (node 3, in this case), manually copy the indexes from node 2:

Go to Administration > System > Indexing;
Under Copy the Search Index from another node, select node 2 and copy;

Make sure the indexes on the chosen node are healthy by looking at the Indexing health check under the Administration > System > Support Tools > Health Checks section of that node.

option2

Touch any issues at the newly created node, so it will pass sanity check and will be able to share index.

option3

Perform a re-index operation on every new node brought up.

depends on

JRASERVER-66550 JIRA Datacenter - Add additional Lucene index checks before propagating index to other nodes

Gathering Interest

is related to

JRASERVER-72125 Index replication service is paused indefinitely after failing to obtain an index snapshot from another node

Closed

is resolved by: ASCI-8 You do not have permission to view this issue

mentioned in: Jira Health Check shows the message Index replication for cluster node "node" is behind by "number" seconds; Page Failed to load; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

relates to: DCKUBE-628 Loading...

(18 mentioned in, 1 relates to)

Kevin Terminella added a comment - 13/Oct/2016 3:45 PM

Getting this fixed would help us out greatly as we are autoscaling the number of JDC nodes based on usage patterns and we have to do a lot of workarounds in order to ensure that instances are healthy before they get released to our load balancer.

Per my earlier comment, preventing a node from picking up the index request if it doesn't have a latest index operation would be a big help.

Kevin Terminella added a comment - 13/Oct/2016 3:45 PM Getting this fixed would help us out greatly as we are autoscaling the number of JDC nodes based on usage patterns and we have to do a lot of workarounds in order to ensure that instances are healthy before they get released to our load balancer. Per my earlier comment, preventing a node from picking up the index request if it doesn't have a latest index operation would be a big help.

Kevin Terminella added a comment - 13/Oct/2016 3:41 PM

During our investigation, we found that this problem is bigger than we originally thought.

Anytime more than one server is brought into server at the same time this issue could be encountered, see the below scenario:

1) Server 1 is online and has a full index and has a latest index operation
2) Server 2 and 3 are brought online at roughly the same time
3) Server 2 comes up before server 3 and gets it's index from Server 1
4) Server 3 comes up and when it requests an index there is a 50% chance that Server 2 will try to fulfill the request, which won't work as it doesn't have a latest operation on it's index

At a minimum, nodes that don't have a latest index operation should be prevented from trying to send their index to a requesting node. Even better would be to update the latest index operation when a node gets it's index built from another node.

Kevin Terminella added a comment - 13/Oct/2016 3:41 PM During our investigation, we found that this problem is bigger than we originally thought. Anytime more than one server is brought into server at the same time this issue could be encountered, see the below scenario: 1) Server 1 is online and has a full index and has a latest index operation 2) Server 2 and 3 are brought online at roughly the same time 3) Server 2 comes up before server 3 and gets it's index from Server 1 4) Server 3 comes up and when it requests an index there is a 50% chance that Server 2 will try to fulfill the request, which won't work as it doesn't have a latest operation on it's index At a minimum, nodes that don't have a latest index operation should be prevented from trying to send their index to a requesting node. Even better would be to update the latest index operation when a node gets it's index built from another node.

Assignee:: Unassigned

Reporter:: Joao Palharini (Inactive)

Affected customers:: 34 This affects my team

Watchers:: 44 Start watching this issue

Created:: 30/Sep/2016 7:43 PM

Updated:: 17/Jan/2025 12:36 PM

Resolved:: 05/Nov/2021 12:49 PM

Details

Description

Summary

Expected Behavior

Actual Behavior

Steps to Reproduce

Note

Workaround

option1

option2

option3

Attachments

Issue Links

Forms

Activity

Collapse comment: Kevin Terminella added a comment - 13/Oct/2016 3:45 PM

Expand comment: Kevin Terminella added a comment - 13/Oct/2016 3:45 PM

Collapse comment: Kevin Terminella added a comment - 13/Oct/2016 3:41 PM

Expand comment: Kevin Terminella added a comment - 13/Oct/2016 3:41 PM

People

Dates