Summary of Issue
When Crowd makes a connection to Azure AD and the connection locks up, the connection does not time out and various threads and the LDAP Sync thread remain permanently stuck.
Symptoms
- User and Groups stop syncing from Azure AD into Crowd
The synchronisation status for the User Directory is frozen:
Expected result
Any connection failures from Crowd (or the adal4j library) to Azure AD should time out and retry the connection later.
Actual result
The LDAP sync thread initiates the sync and just blocks on the SSL connection (to Azure AD).
Based on the Thread Dumps, it appears that all HTTP threads are also blocked on the on the SSL connection (to Azure AD).
Diagnosis
Capture Thread Dumps of the Crowd process.
The LDAP Caesium sync thread will be blocking:
"Caesium-2-2" daemon prio=5 tid=0x000000000000004b nid=0 waiting on condition
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000003461f0d7> (a java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at com.atlassian.crowd.directory.cache.DeltaQueryCacheRefresher.synchroniseChanges(DeltaQueryCacheRefresher.java:145)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.synchroniseCache(DbCachingRemoteDirectory.java:961)
at com.atlassian.crowd.manager.directory.DirectorySynchroniserImpl.synchronise(DirectorySynchroniserImpl.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:282)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
at com.sun.proxy.$Proxy66.synchronise(Unknown Source)
at com.atlassian.crowd.directory.DbCachingDirectoryPoller.pollChanges(DbCachingDirectoryPoller.java:45)
at com.atlassian.crowd.manager.directory.monitor.poller.DirectoryPollerJobRunner.runJob(DirectoryPollerJobRunner.java:85)
at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:153)
at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:118)
at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:97)
at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:443)
at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:438)
at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:462)
at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:390)
at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$1.consume(CaesiumSchedulerService.java:285)
at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$1.consume(CaesiumSchedulerService.java:282)
at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:65)
at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:59)
at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:34)
at java.lang.Thread.run(Thread.java:745)
Various other threads will be blocking on SSL waiting for data from Azure AD:
"DeltaQueryCacheRefresher-9928705:thread-2" prio=5 tid=0x00000000000001c1 nid=0 runnable
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
at sun.security.ssl.InputRecord.read(InputRecord.java:532)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
- locked <0x0000000001858f35> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:918)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked <0x000000004d888194> (a sun.security.ssl.AppInputStream)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x000000002fcca5bb> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1535)
- locked <0x0000000074ae888a> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
- locked <0x0000000074ae888a> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
at com.atlassian.crowd.directory.authentication.AzureAdTokenRefresher.handle(AzureAdTokenRefresher.java:39)
at com.atlassian.crowd.directory.authentication.AzureAdRefreshTokenFilter.handle(AzureAdRefreshTokenFilter.java:22)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509)
at com.atlassian.crowd.directory.rest.AzureAdRestClient.lambda$getNextPage$7(AzureAdRestClient.java:123)
at com.atlassian.crowd.directory.rest.AzureAdRestClient$$Lambda$435/1961017936.get(Unknown Source)
at com.atlassian.crowd.directory.rest.AzureAdRestClient.handleRequest(AzureAdRestClient.java:133)
at com.atlassian.crowd.directory.rest.AzureAdRestClient.getNextPage(AzureAdRestClient.java:123)
at com.atlassian.crowd.directory.rest.AzureAdPagingWrapper.fetchAllDeltaQueryResults(AzureAdPagingWrapper.java:104)
at com.atlassian.crowd.directory.AzureAdDirectory.performGroupsDeltaQuery(AzureAdDirectory.java:515)
at com.atlassian.crowd.directory.cache.DeltaQueryCacheRefresher$$Lambda$434/936732505.call(Unknown Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"http-nio-8095-exec-28" daemon prio=5 tid=0x000000000000007d nid=0 runnable
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
at sun.security.ssl.InputRecord.read(InputRecord.java:529)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
- locked <0x000000002f0cb643> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1363)
- locked <0x000000002f9dc610> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1391)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1375)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:563)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1512)
- locked <0x000000002d31c8ee> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
- locked <0x000000002d31c8ee> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
at com.atlassian.crowd.directory.authentication.AzureAdTokenRefresher.handle(AzureAdTokenRefresher.java:39)
at com.atlassian.crowd.directory.authentication.AzureAdRefreshTokenFilter.handle(AzureAdRefreshTokenFilter.java:22)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509)
at com.atlassian.crowd.directory.rest.AzureAdRestClient.lambda$searchUsers$0(AzureAdRestClient.java:58)
at com.atlassian.crowd.directory.rest.AzureAdRestClient$$Lambda$349/1569454165.get(Unknown Source)
at com.atlassian.crowd.directory.rest.AzureAdRestClient.handleRequest(AzureAdRestClient.java:133)
at com.atlassian.crowd.directory.rest.AzureAdRestClient.searchUsers(AzureAdRestClient.java:54)
at com.atlassian.crowd.directory.AzureAdDirectory.findUserByName(AzureAdDirectory.java:181)
at com.atlassian.crowd.directory.AzureAdDirectory.authenticate(AzureAdDirectory.java:207)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.authenticateAndUpdateInternalUser(DbCachingRemoteDirectory.java:228)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.performAuthenticationAndUpdateAttributes(DbCachingRemoteDirectory.java:168)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.authenticate(DbCachingRemoteDirectory.java:148)
at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.authenticateUser(DirectoryManagerGeneric.java:283)
...
..
"http-nio-8095-exec-15" daemon prio=5 tid=0x000000000000006b nid=0 runnable
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
at sun.security.ssl.InputRecord.read(InputRecord.java:529)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
- locked <0x000000006f8ff5a0> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1363)
- locked <0x0000000037834c51> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1391)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1375)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:563)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1282)
- locked <0x00000000503f1531> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1257)
- locked <0x00000000503f1531> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250)
- locked <0x00000000214b135b> (a sun.net.www.protocol.https.HttpsURLConnectionImpl)
at com.microsoft.aad.adal4j.AdalOAuthRequest.configureHeaderAndExecuteOAuthCall(AdalOAuthRequest.java:143)
at com.microsoft.aad.adal4j.AdalOAuthRequest.send(AdalOAuthRequest.java:82)
at com.microsoft.aad.adal4j.AdalTokenRequest.executeOAuthRequestAndProcessResponse(AdalTokenRequest.java:79)
at com.microsoft.aad.adal4j.AuthenticationContext.acquireTokenCommon(AuthenticationContext.java:816)
at com.microsoft.aad.adal4j.AuthenticationContext.access$100(AuthenticationContext.java:64)
at com.microsoft.aad.adal4j.AuthenticationContext$1.call(AuthenticationContext.java:172)
at com.microsoft.aad.adal4j.AuthenticationContext$1.call(AuthenticationContext.java:161)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:58)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:37)
at com.microsoft.aad.adal4j.AuthenticationContext.acquireToken(AuthenticationContext.java:161)
at com.microsoft.aad.adal4j.AuthenticationContext.acquireToken(AuthenticationContext.java:357)
at com.atlassian.crowd.directory.authentication.impl.SameThreadAdalAuthenticator.getAdalAuthenticationToken(SameThreadAdalAuthenticator.java:40)
at com.atlassian.crowd.directory.rest.DefaultAzureAdRestClientFactory$1.load(DefaultAzureAdRestClientFactory.java:59)
at com.atlassian.crowd.directory.rest.DefaultAzureAdRestClientFactory$1.load(DefaultAzureAdRestClientFactory.java:55)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
- locked <0x0000000031f9d027> (a com.google.common.cache.LocalCache$StrongEntry)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
at com.atlassian.crowd.directory.authentication.AzureAdTokenRefresher.setTokenInRequest(AzureAdTokenRefresher.java:53)
at com.atlassian.crowd.directory.authentication.AzureAdTokenRefresher.handle(AzureAdTokenRefresher.java:38)
at com.atlassian.crowd.directory.authentication.AzureAdRefreshTokenFilter.handle(AzureAdRefreshTokenFilter.java:22)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509)
at com.atlassian.crowd.directory.rest.AzureAdRestClient.lambda$searchUsers$0(AzureAdRestClient.java:58)
at com.atlassian.crowd.directory.rest.AzureAdRestClient$$Lambda$349/1569454165.get(Unknown Source)
at com.atlassian.crowd.directory.rest.AzureAdRestClient.handleRequest(AzureAdRestClient.java:133)
at com.atlassian.crowd.directory.rest.AzureAdRestClient.searchUsers(AzureAdRestClient.java:54)
at com.atlassian.crowd.directory.AzureAdDirectory.findUserByName(AzureAdDirectory.java:181)
at com.atlassian.crowd.directory.AzureAdDirectory.authenticate(AzureAdDirectory.java:207)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.authenticateAndUpdateInternalUser(DbCachingRemoteDirectory.java:228)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.performAuthenticationAndUpdateAttributes(DbCachingRemoteDirectory.java:168)
at com.atlassian.crowd.directory.DbCachingRemoteDirectory.authenticate(DbCachingRemoteDirectory.java:148)
at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.authenticateUser(DirectoryManagerGeneric.java:283)
...
..
Work around
The default time out when connecting to Azure AD is currently infinity so it will just permanently block waiting for data from Azure AD SSL connection.
- Restarting Crowd will allow Crowd to re-initiate connections to Azure AD