Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-42916

Stale node ids should automatically be removed in Jira Data Center

    XMLWordPrintable

Details

    • 657
    • 78
    • Hide
      Atlassian Update – 16 June 2020

      Hi everyone,

      Thank you for your votes and comments on this issue. We would like to inform you that this suggestion will be addressed in the upcoming Jira Data Center version 8.10.0 release.

      We’ve decided to provide more automated way of handling stale (No heartbeat) nodes in Jira Data Center. Before the changes, if a node lost connection to the cluster for 5 minutes, its state changed from “Active” to “No heartbeat”. If such node was not moved to the “Offline” state, it might cause performance degradation.

      We’ve automated this process and the solution is as follows:

      • If a node is in the “No heartbeat” state for longer than 2 days, it will be automatically moved to the “Offline” state. Admins will be informed about this via warning in atlassian-jira.log file and will see such state on the Clustering page. During this period you will be able to check the node or restart it.
      • If a node is in “Offline” state for longer than 2 days, it will be automatically removed from the cluster. Also, you will be informed about such action through the info logs in your atlassian-jira.log file.

      Additionally based on the feedback we received in the comments below, we will be adding in Jira Data Center version 8.11.0 a possibility of adjusting stale nodes retention period of 2 days. You can find more details about this suggestion under this thread.

      Moreover since Jira Data Center 8.6 we are bringing more visibility about nodes in your cluster by introducing Clustering page in the admin panel. In the newly released Jira Data Center version 8.9 we have extended this page by adding additional information about statuses of nodes (Active, No heartbeat, Offline) and Jira DC application status (maintenance, error, running, starting) in order to identify the stale nodes more easily.

      Lastly, the changes described above are integrated with the Advanced audit log functionality available in Jira Data Center since version 8.8. Any automatic actions will be logged to give admins more visibility what is happening on their instance. For more details please go here.

      Thank you for voting and commenting on this suggestion,
      Grażyna Kaszkur
      Product manager, Jira Server and Data Center

      Show
      Atlassian Update – 16 June 2020 Hi everyone, Thank you for your votes and comments on this issue. We would like to inform you that this suggestion will be addressed in the upcoming Jira Data Center version 8.10.0 release. We’ve decided to provide more automated way of handling stale (No heartbeat) nodes in Jira Data Center. Before the changes, if a node lost connection to the cluster for 5 minutes, its state changed from “Active” to “No heartbeat”. If such node was not moved to the “Offline” state, it might cause performance degradation. We’ve automated this process and the solution is as follows: If a node is in the “No heartbeat” state for longer than 2 days, it will be automatically moved to the “Offline” state. Admins will be informed about this via warning in atlassian-jira.log file and will see such state on the Clustering page. During this period you will be able to check the node or restart it. If a node is in “Offline” state for longer than 2 days, it will be automatically removed from the cluster. Also, you will be informed about such action through the info logs in your atlassian-jira.log file. Additionally based on the feedback we received in the comments below, we will be adding in Jira Data Center version 8.11.0 a possibility of adjusting stale nodes retention period of 2 days. You can find more details about this suggestion under this thread. Moreover since Jira Data Center 8.6 we are bringing more visibility about nodes in your cluster by introducing Clustering page in the admin panel . In the newly released Jira Data Center version 8.9 we have extended this page by adding additional information about statuses of nodes (Active, No heartbeat, Offline) and Jira DC application status (maintenance, error, running, starting) in order to identify the stale nodes more easily. Lastly, the changes described above are integrated with the Advanced audit log functionality available in Jira Data Center since version 8.8. Any automatic actions will be logged to give admins more visibility what is happening on their instance. For more details please go here. Thank you for voting and commenting on this suggestion, Grażyna Kaszkur Product manager, Jira Server and Data Center
    • We collect Jira feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

    Description

      NOTE: This suggestion is for JIRA Server. Using JIRA Cloud? See the corresponding suggestion.

      Problem Definition

      After changing node id in the cluster.properties file both the old and new id will appear in Cluster Nodes section of System information. The problem is worse for AWS, since it will create many new nodes and never reuse them.

      Suggested Solution

      We should find a way to clear out any old ids, without removing any entries that might be from a temporarily offline node.

      Note

      Having old nodes in the system (table) may cause other problems, see related:

      Workaround

      • In a recent version of Jira we introduced the new REST API to manage the cluster state which mitigates the problem. See JRASERVER-69033.
      • Clean-up old data manually:
      1. Check tables and find all rows related to old nodes:
        select * from clusternode;
        select * from clusternodeheartbeat;
        
      2. Delete the related records:
        delete from clusternode where node_id = '<node_id>';
        delete from clusternodeheartbeat where node_id = '<node_id>';
        
      3. Clean old Replication records:
        // check if clean is nessary 
        select count(id) from replicatedindexoperation where node_id = '<node_id>';
        // delete
        delete from replicatedindexoperation where node_id = '<node_id>';
        

      Attachments

        Issue Links

          Activity

            People

              ddudziak Stasiu
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Votes:
              168 Vote for this issue
              Watchers:
              158 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: