Uploaded image for project: 'Jira Server and Data Center'
  1. Jira Server and Data Center
  2. JRASERVER-72206

Improve Jira logging for NodeAutoShutdownIfOfflineService

    XMLWordPrintable

Details

    • 4
    • We collect Jira feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

    Description

      Problem Definition

      As part of the fix in JRASERVER-42916 we have new NodeAutoShutdownIfOfflineService
      It's responsible for killing and shutting down Jira node. As part of Online check, a node can be automatically flagged as OFFLINE node, so Jira needs to prevent cluster caches corruption. If a node is in the OFFLINE state, other nodes in the cluster stop cache replication. Due to this fact, a node will be scanning its own state.

      (i)If it will detect that was marked as OFFLINE, this Jira node will exit in panic mode. Example logging:

      2021-02-10 11:28:38,632+0000 heartbeat-scheduler-0 ERROR      [c.a.j.cluster.service.NodeAutoShutdownIfOfflineService] [CLUSTER-STATE] This node NODE2 was moved to OFFLINE by another node. This node needs to be shut down as soon as possible.
      2021-02-10 11:28:38,632+0000 heartbeat-scheduler-0 ERROR      [c.a.jira.startup.JiraShutdown] This Jira instance was requested to exit in panic mode
      java.lang.Exception
      	at com.atlassian.jira.startup.JiraShutdown.panic(JiraShutdown.java:19)
      	at com.atlassian.jira.cluster.service.NodeAutoShutdownIfOfflineService.checkCurrentNodeState(NodeAutoShutdownIfOfflineService.java:78)
      	at com.atlassian.jira.cluster.lock.ClusterHeartbeatJob.run(ClusterHeartbeatJob.java:36)
      	at 
      

      Sometimes it's not clear why marked the node offline and what's the current status of the Cluster Hearbeat for the current node.

      Suggested Solution

      Add debug logging into NodeAutoShutdownIfOfflineService.class which will show details about stale node state.

      Workaround

      Run queries manually

      SELECT * FROM clusternodeheartbeat;
      SELECT NODE_ID FROM clusternodeheartbeat WHERE (HEARTBEAT_TIME >  '<NOW - 2 DAYS>' )
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: