Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-62669

Automatic restore of indexes will fail if the node that registered the latest index operation is unavailable

      Summary

      When starting a node that has outdated indexes (meaning: not having the latest index operation in the cluster replicated), it is supposed to automatically restore the indexes from the node that registered the latest index operation. However, if that node is unavailable or if that node never wrote a row to replicatedindexoperation, this will fail and the node will not have the latest index updates. See related JRASERVER-66550

      Expected Behavior

      Copy indexes from another node that has healthy indexes (have updated the indexes with latest operation).

      Actual Behavior

      The index restore fails and the below message is registered on the logs of the node that claimed the index restore request.

      2016-09-30 16:25:33,487 ClusterMessageHandlerServiceThread:thread-1 INFO      [jira.index.ha.DefaultIndexCopyService] Index backup started. Requesting node: node3
      2016-09-30 16:25:33,488 ClusterMessageHandlerServiceThread:thread-1 WARN      [jira.index.ha.DefaultIndexCopyService] Index backup failed - latest index operation not found. Requesting node: node3
      

      Steps to Reproduce

      1. Setup a new JIRA instance, which will be node1;
      2. Create a new project and some issues.
      3. Setup another JIRA instance, which will be node2. This node will restore the index from node1. Do not create or modify any issues in this node.
      4. Setup yet another JIRA instance, which will be node3. This node might attempt to restore the index from node2. (it depends which node claims the request first)
      5. node2 will fail to create an index snapshot as there are no rows in replicatedindexoperation having node_id = node2.

      Note

      Problem is node2 shouldn't reply to the index request since it doesn't satisfy sanity check, see JRASERVER-66550.
      That will leave only node1 as a potential index provider, which fixes the problem.

      Workaround

      option1

      On the new node in the cluster (node 3, in this case), manually copy the indexes from node 2:

      1. Go to Administration > System > Indexing;
      2. Under Copy the Search Index from another node, select node 2 and copy;

      Make sure the indexes on the chosen node are healthy by looking at the Indexing health check under the Administration > System > Support Tools > Health Checks section of that node.

      option2
      1. Touch any issues at the newly created node, so it will pass sanity check and will be able to share index.
      option3
      • Perform a re-index operation on every new node brought up.

            [JRASERVER-62669] Automatic restore of indexes will fail if the node that registered the latest index operation is unavailable

            Getting this fixed would help us out greatly as we are autoscaling the number of JDC nodes based on usage patterns and we have to do a lot of workarounds in order to ensure that instances are healthy before they get released to our load balancer.

            Per my earlier comment, preventing a node from picking up the index request if it doesn't have a latest index operation would be a big help.

            Kevin Terminella added a comment - Getting this fixed would help us out greatly as we are autoscaling the number of JDC nodes based on usage patterns and we have to do a lot of workarounds in order to ensure that instances are healthy before they get released to our load balancer. Per my earlier comment, preventing a node from picking up the index request if it doesn't have a latest index operation would be a big help.

            During our investigation, we found that this problem is bigger than we originally thought.

            Anytime more than one server is brought into server at the same time this issue could be encountered, see the below scenario:

            1) Server 1 is online and has a full index and has a latest index operation
            2) Server 2 and 3 are brought online at roughly the same time
            3) Server 2 comes up before server 3 and gets it's index from Server 1
            4) Server 3 comes up and when it requests an index there is a 50% chance that Server 2 will try to fulfill the request, which won't work as it doesn't have a latest operation on it's index

            At a minimum, nodes that don't have a latest index operation should be prevented from trying to send their index to a requesting node. Even better would be to update the latest index operation when a node gets it's index built from another node.

            Kevin Terminella added a comment - During our investigation, we found that this problem is bigger than we originally thought. Anytime more than one server is brought into server at the same time this issue could be encountered, see the below scenario: 1) Server 1 is online and has a full index and has a latest index operation 2) Server 2 and 3 are brought online at roughly the same time 3) Server 2 comes up before server 3 and gets it's index from Server 1 4) Server 3 comes up and when it requests an index there is a 50% chance that Server 2 will try to fulfill the request, which won't work as it doesn't have a latest operation on it's index At a minimum, nodes that don't have a latest index operation should be prevented from trying to send their index to a requesting node. Even better would be to update the latest index operation when a node gets it's index built from another node.

              Unassigned Unassigned
              jpalharini Joao Palharini (Inactive)
              Affected customers:
              34 This affects my team
              Watchers:
              44 Start watching this issue

                Created:
                Updated:
                Resolved: