Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: Low
Fix Version/s: 7.2.12, 7.6.1
Affects Version/s: 7.2.11, 7.5.3
Component/s: Data Center - Other
Labels:
- pse-request

Fixed in Long Term Support Release/s:

Download 7.6
Introduced in Version:
7.02
Symptom Severity:
Severity 1 - Critical
Bug Fix Policy:
View Atlassian Server bug fix policy

Summary

In Jira Data Center during cache replication value of ehcache.listener.socketTimeoutMillis from clustered.properties (or a default value) should be used as read timeout for remote RMI calls to other nodes in cluster. Instead an infinity is used. Problems with communication with one node can bring entire cluster down.

Environment

JIRA datacenter with multiple nodes
Node A is unresponsive because of extremely high load or high memory pressure or any other condition that makes it unresponsive. However, at this state node is not technically down and still registered as an 'Active' member in the cluster but not processing request either.
Node B still consider node A as 'Active' so it keeps performing cache synchronisation to Node A which not responding to the request and put Node B in stale position.

Symptoms

Expected behaviour

TCP and RMI handshakes will throw an exception after the specified timeout has passed.

Workarounds

Restart or gracefully shutdown the unresponsive node.

Note on fix

It makes EhCache replication use finite (default 5s) timeouts for TCP and RMI handshakes during cache replication.

is related to

JRASERVER-66237 ehcache.listener.socketTimeoutMillis is not used during Naming.lookup of CachePeer

Closed

relates to

JRASERVER-63137 JVM instability at one node affects whole JIRA datacenter cluster

Closed

DELTA-127 Loading...

DELTA-148 Loading...

mentioned in: Page Loading...; Page Loading...

(1 mentioned in)

Assignee:: Unassigned

Reporter:: Karol Lopacinski

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 21/Nov/2017 9:31 AM

Updated:: 22/May/2020 8:23 AM

Resolved:: 21/Nov/2017 1:14 PM

Details

Description

Summary

Environment

Symptoms

Expected behaviour

Workarounds

Note on fix

Attachments

Issue Links

Activity

People

Dates