Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a bug
Priority: Low
Fix Version/s: None
Affects Version/s: 7.13.6, 8.13.6
Component/s: Data Center - Node replication
Labels:

Introduced in Version:
7.13
Support reference count:
23
Symptom Severity:
Severity 3 - Minor
UIS:
1

Issue Summary

Cluster Cache replication health check fails and the nodes cannot communicate with each other to replicate cache.

Name: Cluster Cache Replication
NodeId: null
Is healthy: false
Failure reason: ["The node node3 is not replicating","The node node2 is not replicating"]
Severity: CRITICAL

But the exception from the atlassian-jira.log is generic and does not provide many details on the cause:

LocalQCacheOp{cacheName='com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat', action=PUT, key=node2, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1622831185825} from cache replication queue: [queueId=queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put, queuePath=/var/atlassian/application-data/jira-home/localq/queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put], failuresCount: 1/1. Removing from queue. Error: java.rmi.MarshalException: error marshalling arguments; nested exception is: 
    	java.net.SocketException: Broken pipe (Write failed)
com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpSender$UnrecoverableFailure: java.rmi.MarshalException: error marshalling arguments; nested exception is: 
	java.net.SocketException: Broken pipe (Write failed)
	at com.atlassian.jira.cluster.distribution.localq.rmi.LocalQCacheOpRMISender.send(LocalQCacheOpRMISender.java:90)

Steps to Reproduce

Setup a Jira Data Center with 2 or more nodes
Add two entries in the /etc/hosts file with both external and internal IP mappings, for example:
```
172.20.40.245 node01
127.0.0.1 node01
```
The reason for this behavior is a duplicate entry in the etc/hosts file containing an external and internal IP causing a loop.
The problem can be explained as follows:

Each node uses its own hostname to communicate with each other
In case of duplicated entries in the hosts file, the hostname is resolved to the loopback IP 127.0.1.1
Due to this, master detects that 127.0.1.1 is trying to communicate with it, and it recognizes that IP as itself instead of secondary
The same happens to the secondary node
So basically the nodes are not able to communicate with each other and end up in a loop
Expected Results

Provide more details about the problem with the error message. ie, any indication of the host communication failure or loop.

Actual Results

The error messages are generic and offer no indication that the hostname has a localhost address and an external address just by looking at the cache replication errors alone

Workaround

Check /etc/hosts file for duplicated entries, for example:

127.0.1.1 ip-node-1
127.0.0.1 ip-node-1

You may also check:
JIRA - Only One Node Will Start in Cluster

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(1 mentioned in)

Assignee:: Benjamin Suess
Reporter:: Victoria M
Votes:: 2 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: 29/Jul/2021 4:18 PM
Updated:: 12/Jun/2025 1:55 AM
Resolved:: 12/Jun/2025 1:55 AM

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Issue Links

Activity

People

Dates