Improve Jira logging for NodeAutoShutdownIfOfflineService

XMLWordPrintable

    • Type: Suggestion
    • Resolution: Unresolved
    • None
    • Component/s: Data Center - Other
    • None
    • 3
    • 19

      Problem Definition

      As part of the fix in JRASERVER-42916 we have new NodeAutoShutdownIfOfflineService
      It's responsible for killing and shutting down Jira node. As part of Online check, a node can be automatically flagged as OFFLINE node, so Jira needs to prevent cluster caches corruption. If a node is in the OFFLINE state, other nodes in the cluster stop cache replication. Due to this fact, a node will be scanning its own state.

      (i)If it will detect that was marked as OFFLINE, this Jira node will exit in panic mode. Example logging:

      2021-02-10 11:28:38,632+0000 heartbeat-scheduler-0 ERROR      [c.a.j.cluster.service.NodeAutoShutdownIfOfflineService] [CLUSTER-STATE] This node NODE2 was moved to OFFLINE by another node. This node needs to be shut down as soon as possible.
      2021-02-10 11:28:38,632+0000 heartbeat-scheduler-0 ERROR      [c.a.jira.startup.JiraShutdown] This Jira instance was requested to exit in panic mode
      java.lang.Exception
      	at com.atlassian.jira.startup.JiraShutdown.panic(JiraShutdown.java:19)
      	at com.atlassian.jira.cluster.service.NodeAutoShutdownIfOfflineService.checkCurrentNodeState(NodeAutoShutdownIfOfflineService.java:78)
      	at com.atlassian.jira.cluster.lock.ClusterHeartbeatJob.run(ClusterHeartbeatJob.java:36)
      	at 
      

      Sometimes it's not clear why marked the node offline and what's the current status of the Cluster Hearbeat for the current node.

      Suggested Solution

      Add debug logging into NodeAutoShutdownIfOfflineService.class which will show details about stale node state.

      Workaround

      Run queries manually

      SELECT * FROM clusternodeheartbeat;
      SELECT NODE_ID FROM clusternodeheartbeat WHERE (HEARTBEAT_TIME >  '<NOW - 2 DAYS>' )
      

            Assignee:
            Unassigned
            Reporter:
            Andriy Yakovlev [Atlassian]
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: