Crowd does not time out Azure AD Authentication if connection from Crowd to Azure AD locks up

XMLWordPrintable

    • 1
    • Severity 2 - Major
    • 2

      Issue Summary

      This is reproducible on Data Center: (yes)

      Crowd connects to Microsoft Entra ID (Formerly Azure AD), via the MsalAuthenticator.getApiToken() that does NOT have a timeout parameter.
      In the case when an Authentication request is made to Azure AD and did not receive a response (eg: due to network-related issues), this thread will stay in RUNNABLE state indefinitely.

      Steps to Reproduce

      1. Configure an Azure AD directory in Crowd DC
      2. Ensure the Crowd DC server cannot reach login.microsoftonline.com (e.g., via firewall rule, DNS failure, network partition, or Azure AD outage)
      3. Trigger a user authentication request against the Azure AD directory (e.g., via REST API call from a connected application like Confluence or Jira)

      Expected Results

      If the authentication task is not completed within a specific time, it should be timed out with an appropriate error message.

      Actual Results

      The Worker Thread like below stays RUNNABLE indefinitely (without timeout), stuck on socket read:

      "ForkJoinPool.commonPool-worker-6248" #432217 [434203] daemon prio=5 os_prio=0 cpu=29636.94ms elapsed=1823164.50s tid=0x00007f39584bf200 nid=434203 runnable  [0x00007f38c7dfc000]
         java.lang.Thread.State: RUNNABLE
      	at sun.nio.ch.SocketDispatcher.read0(java.base@21.0.10/Native Method)
      ...
      	at java.net.HttpURLConnection.getResponseCode(java.base@21.0.10/HttpURLConnection.java:531)
      	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(java.base@21.0.10/HttpsURLConnectionImpl.java:307)
      	at com.microsoft.aad.msal4j.DefaultHttpClient.readResponseFromConnection(DefaultHttpClient.java:121)
      	at com.microsoft.aad.msal4j.DefaultHttpClient.executeHttpPost(DefaultHttpClient.java:72)
      	at com.microsoft.aad.msal4j.DefaultHttpClient.send(DefaultHttpClient.java:46)
      ...
      	at com.microsoft.aad.msal4j.AcquireTokenByAuthorizationGrantSupplier.execute(AcquireTokenByAuthorizationGrantSupplier.java:63)
      	at com.microsoft.aad.msal4j.AcquireTokenByClientCredentialSupplier.acquireTokenByClientCredential(AcquireTokenByClientCredentialSupplier.java:87)
      	at com.microsoft.aad.msal4j.AcquireTokenByClientCredentialSupplier.execute(AcquireTokenByClientCredentialSupplier.java:50)
      ...
      

      The Tomcat Thread like below stays in WAITING indefinitely (without timeout), waiting for the worker thread to complete:

      "http-nio-8095-exec-1" #158 [187] daemon prio=5 os_prio=0 cpu=1883630.56ms elapsed=5262433.90s tid=0x00007f4241a0b3c0 nid=187 waiting on condition  [0x00007f38afff8000]
         java.lang.Thread.State: WAITING (parking)
      	at jdk.internal.misc.Unsafe.park(java.base@21.0.10/Native Method)
      	- parking to wait for  <0x00007f3d5b000088> (a java.util.concurrent.CompletableFuture$Signaller)
      	at java.util.concurrent.locks.LockSupport.park(java.base@21.0.10/LockSupport.java:221)
      	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@21.0.10/CompletableFuture.java:1864)
      	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.10/ForkJoinPool.java:3780)
      	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.10/ForkJoinPool.java:3725)
      	at java.util.concurrent.CompletableFuture.waitingGet(java.base@21.0.10/CompletableFuture.java:1898)
      	at java.util.concurrent.CompletableFuture.get(java.base@21.0.10/CompletableFuture.java:2072)
      	at com.atlassian.crowd.directory.authentication.impl.MsalAuthenticator.getApiToken(MsalAuthenticator.java:32)
      	...
      

      This behavior will also consume Database Connection Pool, and can eventually lead to Crowd not being responsive as all Tomcat/Database connection have been exhausted.
      This issue is simialr to CWD-5213, but on a different code path, hence the impact on this issue is on Authentication

      Workaround

      Currently there is no known workaround for adding a Read/Connect Timeout for this behavior.
      Crowd (the specific node) has to be restarted in order to free up all the threads that are stuck.

              Assignee:
              Unassigned
              Reporter:
              Damien Tan
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: