Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-47045

When a re-indexing node is abruptly stopped indexing still shows as in progress on other nodes, preventing future reindexing

    XMLWordPrintable

Details

    • 6.03
    • 143
    • Severity 2 - Major
    • 688
    • Hide
      Atlassian Update - 6 June 2019

      Dear Jira users,

      we're glad to announce that this issue will be addressed in our upcoming 8.3.0 release!

      Jira DC will now recognize when a re-indexing node goes down and leaves a stale reindexing job in progress. Admins will be notified on the Re-Indexing page and prompted to remove the job, clearing the way to run future re-indexes.

      We plan to back port this fix to version 7.13 (our latest Enterprise Release version). Please watch the fix versions in this issue for further updates. This fix will not be backported to the 7.6 series; however, a couple of workarounds are available. Please see the Problem mitigation section under the Description of this issue.

      Thank you,
      Grazyna Kaszkur

      Product Manager,
      Jira Server and Data Center

      Show
      Atlassian Update - 6 June 2019 Dear Jira users, we're glad to announce that this issue will be addressed in our upcoming 8.3.0 release! Jira DC will now recognize when a re-indexing node goes down and leaves a stale reindexing job in progress. Admins will be notified on the Re-Indexing page and prompted to remove the job, clearing the way to run future re-indexes. We plan to back port this fix to version 7.13 (our latest Enterprise Release version). Please watch the fix versions in this issue for further updates. This fix will not be backported to the 7.6 series; however, a couple of workarounds are available. Please see the Problem mitigation section under the Description of this issue. Thank you, Grazyna Kaszkur
 Product Manager, Jira Server and Data Center

    Description

      Summary

      When a re-indexing node goes down (terminated abruptly or crashed), all the other nodes are not able to tell that re-indexing is no longer in progress. There is no impact for normal JIRA functionality. This is UI problem and you will be not able to run reindex.

      Please note that there is another similar problem related to project reindex, see JRASERVER-72055.

      Environment

      • JIRA Data Center
      • Apache Load Balancer

      Steps to Reproduce

      1. Setup a Jira Data Center with atleast 2 nodes
      2. Start re-indexing from one node
      3. Access the re-indexing administration page from the other nodes to see the Re-indexing progress summary
      4. Abruptly terminate the JIRA process on the re-indexing node
      5. Access re-indexing administration again on the node that is still online

      Expected Results

      This should show that the re-indexing process has been terminated, and should allow you to restart re-indexing if needed.

      Actual Results

      The online node(s) are not able to detect that the re-indexing started in the other node has been terminated.
      You will see the error in UI: 'Lock JIRA and rebuild index' option is unavailable while other indexing operations are in progress.'

      Notes
      • When user starts FullReindexing JIRA schedules the job at specific node and add non-cancellable task into Global cache. This cache is synced with each node.
      • When node dies, value still present in the cache and all nodes see it and believes that index is still running.
      • Whole cluster shutdown is needed to flush the cache
      • Looking into the database from our test we can see that there is an entry in the replicatedindexoperation table such as the following for each re-index triggered:
        mysql> SELECT * FROM replicatedindexoperation;
        +-------+---------------------+---------+----------------+-------------+--------------+--------------------------+-------------------------+
        | ID    | INDEX_TIME          | NODE_ID | AFFECTED_INDEX | ENTITY_TYPE | AFFECTED_IDS | OPERATION                | FILENAME                |
        +-------+---------------------+---------+----------------+-------------+--------------+--------------------------+-------------------------+
        | 10500 | 2015-11-19 20:00:11 | tnode1  | ALL            | NONE        |              | FULL_REINDEX_START       |                         |
        | 10501 | 2015-11-19 20:32:02 | tnode1  | ALL            | NONE        |              | FULL_REINDEX_END         | IndexSnapshot_10500.zip |
        +-------+---------------------+---------+----------------+-------------+--------------+--------------------------+-------------------------+
        4 rows in set (0.00 sec)
        
        • There is another entry when this reindex ends. For the killed re-indexing node there is a start with no end entry.

      Problem mitigation

      To partially address the problem and mitigate some of the possible scenarios, we implemented 2 new features:

      • JRASERVER-68885 - Remove the stale indexing Job associated with the current node on startup
      • JRASERVER-68616 - As an JIRA Datacenter Administrator I want to delete the reindexing task from offline node

      Please check them for more details.

      Workarounds

      1. Shut down all nodes in the cluster.
        • This is needed since JIRA keeps cluster reindex status in memory (see Notes section). Leaving one available and restarting all others will keep the in-memory cache.
      2. Start nodes

      Attachments

        1. image-2018-02-11-02-22-08-771.png
          30 kB
          Randy Zhu
        2. indexing_not_available.JPG
          44 kB
          Andriy Yakovlev [Atlassian]
        3. taskid.png
          251 kB
          Vinicius Fontes

        Issue Links

          Activity

            People

              sutecht Seth Utecht (Inactive)
              takindele Taiwo Akindele (Inactive)
              Votes:
              71 Vote for this issue
              Watchers:
              100 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: