-
Bug
-
Resolution: Unresolved
-
Low
-
None
-
6.6.3, 6.13.8, 7.19.17
-
16
-
Severity 2 - Major
-
5
-
Issue Summary
In a Data Center environment, an administrator may find that within Confluence Admin > Clustering, the user interface shows that nodes are unable to reach one another other with errors like:
The node [xxxxxx] is temporarily not reachable. Please check the server logs.
The actual cluster is up and running, despite what the UI is suggesting. However, the message in the UI is alarming to administrators and should be corrected.
Environment
This issue was first observed in a 2-node Confluence Data Center 6.6.3 cluster using AWS cluster join method. Both nodes were using Java 8 update 162:
<java.runtime.version>1.8.0_162-b12</java.runtime.version>
Steps to Reproduce
Unknown, the issue may be intermittent as serialVersionUID looks to be auto-generated by the JVM at run time.
Expected Results
Cluster Monitoring UI shows that nodes are able to communicate with each other.
Actual Results
Cluster Monitoring UI shows that nodes cannot reach one another. However, the actual cluster itself is up and running, despite what this UI is saying.
Logs the following corresponding warnings:
2019-02-13 09:22:37,800 WARN [ajp-nio-127.0.0.1-8009-exec-194] [cluster.hazelcast.monitoring.HazelcastClusterMonitoring] getData Exception happened when receiving response from node 438b4c58 -- referer: https://example.confluence.com:9443/plugins/servlet/cluster-monitoring | url: /rest/atlassian-cluster-monitoring/cluster/suppliers/data/com.atlassian.cluster.monitoring.cluster-monitoring-plugin/runtime-information/438b4c58 | traceId: 9c5a10920fd04be5 | userName: admin java.util.concurrent.ExecutionException: com.hazelcast.nio.serialization.HazelcastSerializationException: java.io.InvalidClassException: com.atlassian.confluence.cluster.hazelcast.monitoring.RemoteModuleCallable; local class incompatible: stream classdesc serialVersionUID = 597473803974431210, local class serialVersionUID = 2184010817253012516 at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponseOrThrowException(InvocationFuture.java:357) at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:225) at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:71) at com.atlassian.confluence.cluster.hazelcast.monitoring.HazelcastClusterMonitoring.getData(HazelcastClusterMonitoring.java:79) ... Caused by: com.hazelcast.nio.serialization.HazelcastSerializationException: java.io.InvalidClassException: com.atlassian.confluence.cluster.hazelcast.monitoring.RemoteModuleCallable; local class incompatible: stream classdesc serialVersionUID = 597473803974431210, local class serialVersionUID = 2184010817253012516 ... Caused by: java.io.InvalidClassException: com.atlassian.confluence.cluster.hazelcast.monitoring.RemoteModuleCallable; local class incompatible: stream classdesc serialVersionUID = 597473803974431210, local class serialVersionUID = 2184010817253012516 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1876) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1745) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2033) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427) at com.hazelcast.nio.serialization.DefaultSerializers$ObjectSerializer.read(DefaultSerializers.java:201) at com.hazelcast.nio.serialization.StreamSerializerAdapter.read(StreamSerializerAdapter.java:41) at com.hazelcast.nio.serialization.SerializationServiceImpl.toObject(SerializationServiceImpl.java:276) ...
2019-02-13 09:22:37,802 WARN [ajp-nio-127.0.0.1-8009-exec-194] [cluster.monitoring.rest.ClusterMonitoringResource] getDataProviderInformationForNode Error received when querying remote node [438b4c58]: -- referer: https://example.confluence.com:9443/plugins/servlet/cluster-monitoring | url: /rest/atlassian-cluster-monitoring/cluster/suppliers/data/com.atlassian.cluster.monitoring.cluster-monitoring-plugin/runtime-information/438b4c58 | traceId: 9c5a10920fd04be5 | userName: admin
Notes
Some notes from Development review:
We should manually set the `serialVersionUID` in the class `RemoteModuleCallable`, instead of having it autogenerate. Usually, that generated ID is going to be the same in both nodes, but because this is done by the JVM internally, any minute differences in the environment or just due to sheer luck, a different ID gets generated by different nodes.
Workaround
Restarting Confluence may help, but due to the random nature of this problem, it is not 100% guaranteed to resolve the issue.
- mentioned in
-
Page Failed to load
Form Name |
---|
I confirm that this was our problem. I switched the old one from AppDynamics to Datadog APM agent, restarted Confluence on that node, and the error in the GUI and the logs is gone.