-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Medium
-
Affects Version/s: 6.3.4, 7.4.1, 7.4.7, 7.19.1, 7.19.17, 8.5.14
-
Component/s: Core - Workbox
-
18
-
Severity 2 - Major
-
3
Issue Summary
It was observed when Confluence is configured to link with other applications via Application Link, such as JIRA, if that application becomes unresponsive, Confluence can also gradually become unresponsive as well (Confluence is returning HTTP 503 errors when accessed from the Load Balancer).
The issue continues to occur even after Confluence restart.
In addition, the following stuck threads detections messages are also being thrown in the catalina.out log:
WARNING [ContainerBackgroundProcessor[StandardEngine[Standalone]]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread "http-nio-8090-exec-147" (id=5169) has been active for 63,774 milliseconds (since 11/6/18 7:42 PM) to serve the same request for https://example.atlassian.com/rest/mywork/latest/status/notification/count?_=1541562126288 and may be stuck (configured threshold for this StuckThreadDetectionValve is 60 seconds). There is/are 185 thread(s) in total that are monitored by this Valve and may be stuck. java.lang.Throwable at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397) at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394)
Looking at the thread dumps taken during the incident window, a vast number of long-running HTTP threads are shown and most of them are stuck waiting on SSL connection.
In specific, these RUNNABLE threads are for Confluence's Workbox Notifications:
$ grep -A1 '-exec-' conf_threads.* | grep State | sort | uniq -c 200 conf_threads.1541570464.txt- java.lang.Thread.State: RUNNABLE 10 conf_threads.1541570464.txt- java.lang.Thread.State: WAITING (parking) 200 conf_threads.1541570479.txt- java.lang.Thread.State: RUNNABLE 10 conf_threads.1541570479.txt- java.lang.Thread.State: WAITING (parking) 200 conf_threads.1541570490.txt- java.lang.Thread.State: RUNNABLE 10 conf_threads.1541570490.txt- java.lang.Thread.State: WAITING (parking) 200 conf_threads.1541570501.txt- java.lang.Thread.State: RUNNABLE 10 conf_threads.1541570501.txt- java.lang.Thread.State: WAITING (parking) 200 conf_threads.1541570512.txt- java.lang.Thread.State: RUNNABLE 10 conf_threads.1541570512.txt- java.lang.Thread.State: WAITING (parking) 200 conf_threads.1541570523.txt- java.lang.Thread.State: RUNNABLE 10 conf_threads.1541570523.txt- java.lang.Thread.State: WAITING (parking)
http-nio-8090-exec-200 - priority:5 - threadId:0x00007f6f9e2e6800 - nativeId:0x355e - state:RUNNABLE stackTrace: java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) - locked <0x00000004f564e008> (a java.lang.Object) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385) - locked <0x00000004f564e018> (a java.lang.Object) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397) ... at com.atlassian.sal.core.net.HttpClientRequest.executeAndReturn(HttpClientRequest.java:103) at com.atlassian.plugins.rest.module.jersey.JerseyRequest.executeAndReturn(JerseyRequest.java:131) at com.atlassian.applinks.core.auth.ApplicationLinkRequestAdaptor.execute(ApplicationLinkRequestAdaptor.java:58) at com.atlassian.applinks.oauth.auth.OAuthRequest.execute(OAuthRequest.java:58) at com.atlassian.mywork.host.service.AppLinkHelper.execute(AppLinkHelper.java:64) at com.atlassian.mywork.host.service.AppLinkHelper.execute(AppLinkHelper.java:42) at com.atlassian.mywork.host.service.ClientServiceImpl.verifyAuth(ClientServiceImpl.java:171) at com.atlassian.mywork.host.service.ClientServiceImpl.verifyAuth(ClientServiceImpl.java:136) ... at com.atlassian.mywork.host.service.LocalNotificationServiceImpl.loadCount(LocalNotificationServiceImpl.java:385) at com.atlassian.mywork.host.service.LocalNotificationServiceImpl.lambda$_getCount$2(LocalNotificationServiceImpl.java:375) at com.atlassian.mywork.host.service.LocalNotificationServiceImpl$$Lambda$1096/27140524.get(Unknown Source) ... - locked <0x0000000438ab9e90> (a org.apache.tomcat.util.net.NioChannel)
The issue occurs as each request coming from the Confluence's Workbox notifications is waiting to connect to the linked application (e.g. Jira). This then results in Confluence not having enough threads free in order to service users, leading to performance degradation.
Workaround
Effort should be made to address the linked application's unavailability problem as that is what's triggering this bug.
We may do so by performing the following diagnosis steps:
- Please make sure that the linked application has been configured correctly in the Application Link page.
- Please ensure that Confluence can reach all linked instances (via Application Link) over https.
Check this by using httpclienttest. - If the other end (e.g. Jira) is down or no longer in service, remove the application link
After resolving the issue above, you'll need to restart Confluence for the threads to be released and available to users again.
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...