-
Suggestion
-
Resolution: Unresolved
-
None
-
None
-
4
-
Problem Definition
As part of the fix in JRASERVER-42916 we have new NodeAutoShutdownIfOfflineService
It's responsible for killing and shutting down Jira node. As part of Online check, a node can be automatically flagged as OFFLINE node, so Jira needs to prevent cluster caches corruption. If a node is in the OFFLINE state, other nodes in the cluster stop cache replication. Due to this fact, a node will be scanning its own state.
(i)If it will detect that was marked as OFFLINE, this Jira node will exit in panic mode. Example logging:
2021-02-10 11:28:38,632+0000 heartbeat-scheduler-0 ERROR [c.a.j.cluster.service.NodeAutoShutdownIfOfflineService] [CLUSTER-STATE] This node NODE2 was moved to OFFLINE by another node. This node needs to be shut down as soon as possible. 2021-02-10 11:28:38,632+0000 heartbeat-scheduler-0 ERROR [c.a.jira.startup.JiraShutdown] This Jira instance was requested to exit in panic mode java.lang.Exception at com.atlassian.jira.startup.JiraShutdown.panic(JiraShutdown.java:19) at com.atlassian.jira.cluster.service.NodeAutoShutdownIfOfflineService.checkCurrentNodeState(NodeAutoShutdownIfOfflineService.java:78) at com.atlassian.jira.cluster.lock.ClusterHeartbeatJob.run(ClusterHeartbeatJob.java:36) at
Sometimes it's not clear why marked the node offline and what's the current status of the Cluster Hearbeat for the current node.
Suggested Solution
Add debug logging into NodeAutoShutdownIfOfflineService.class which will show details about stale node state.
Workaround
Run queries manually
SELECT * FROM clusternodeheartbeat; SELECT NODE_ID FROM clusternodeheartbeat WHERE (HEARTBEAT_TIME > '<NOW - 2 DAYS>' )
- is related to
-
JRASERVER-42916 Stale node ids should automatically be removed in Jira Data Center
- Closed