Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-76406

LDAP connection issues may cause continuous user authentication failure and lead to an outage on Jira when using an internal user directory with LDAP authentication

    XMLWordPrintable

Details

    Description

      Issue Summary

      You can connect your Jira application to an LDAP directory for delegated authentication. This means that Jira will have an internal directory that uses LDAP for authentication only.
      More details on Connecting to an internal directory with LDAP authentication.

      When there's a brief problem, such as network connectivity issue, affecting connection from the Jira backend to the LDAP directory while users were trying to authenticate to Jira, then any new authentication requests from the same user may not work as expected.

      This happens because the thread connecting to the LDAP may hang indefinitely waiting for a response.
      While this thread is active, then a lock is held on Jira's authentication cache, preventing the affected user from completing any new authentication request even if the LDAP is reachable.
      This has a potential to bring a Jira node or an instance down as authentication requests may consume all Tomcat HTTP threads.

      Since Jira doesn't have any default timeout values for delegated directories, nor a native way to adjust it from the UI, then a workaround requires manipulation of data directly in the database.

      Steps to Reproduce

      1. Install a vanilla instance of Jira Data Center.
        • This was validated on Jira Software Data Center version 9.11.2.
      2. Configure an Internal directory with LDAP authentication.
        • This was validated with OpenLDAP as the external directory, but can happen with any other LDAP.
        • Below is the configuration of the user directory used to recreate the problem.
          Directory 1:
          	Directory ID: 10000
          	Name: Delegated LDAP Authentication
          	Active: true
          	Type: DELEGATING
          	Created date: Thu Oct 19 15:40:38 UTC 2023
          	Updated date: Thu Oct 19 16:12:01 UTC 2023
          	Allowed operations: [UPDATE_USER_ATTRIBUTE, UPDATE_ROLE_ATTRIBUTE, DELETE_USER, DELETE_ROLE, UPDATE_GROUP_ATTRIBUTE, UPDATE_GROUP, CREATE_ROLE, CREATE_USER, CREATE_GROUP, DELETE_GROUP, UPDATE_ROLE]
          	Implementation class: com.atlassian.crowd.directory.DelegatedAuthenticationDirectory
          	Encryption type: null
          	Attributes:
          		autoAddGroups: jira-software-users
          		configuration.change.timestamp: 1697731921182
          		crowd.delegated.directory.auto.create.user: true
          		crowd.delegated.directory.auto.update.user: true
          		crowd.delegated.directory.importGroups: true
          		crowd.delegated.directory.type: com.atlassian.crowd.directory.OpenLDAP
          		ldap.basedn: dc=example,dc=org
          		ldap.connection.timeout: 120000
          		ldap.external.id: entryUUID
          		ldap.group.description: description
          		ldap.group.dn: 
          		ldap.group.filter: (objectclass=groupOfUniqueNames)
          		ldap.group.name: cn
          		ldap.group.objectclass: groupOfUniqueNames
          		ldap.group.usernames: uniqueMember
          		ldap.nestedgroups.disabled: true
          		ldap.pagedresults: false
          		ldap.pagedresults.size: 1000
          		ldap.password: ********
          		ldap.referral: false
          		ldap.secure: false
          		ldap.url: ldap://openldap:389
          		ldap.user.displayname: displayName
          		ldap.user.dn: 
          		ldap.user.email: mail
          		ldap.user.filter: (objectclass=inetorgperson)
          		ldap.user.firstname: givenName
          		ldap.user.group: memberOf
          		ldap.user.lastname: sn
          		ldap.user.objectclass: inetorgperson
          		ldap.user.username: cn
          		ldap.user.username.rdn: cn
          		ldap.userdn: cn=admin,dc=example,dc=org
          		ldap.usermembership.use: false
          		ldap.usermembership.use.for.groups: false
          
      3. Create at least 2 users within the external LDAP.
        • In this example we have testuser and testadmin.
      4. Authenticate on Jira with one of the users from the external LDAP to ensure it's working as expected.
      5. Restart Jira to ensure any in-memory caches are cleared.
      6. Simulate a network issue between Jira and the external LDAP.
        1. Within the Jira backend server, force name resolution of the LDAP FQDN to the localhost.
          • In this example we can add the following entry to the /etc/hosts file.
            127.0.0.1 openldap
            
        2. Within the Jira backend server, start listening on the LDAP port.
          • In this example we used netcat to listen on port 389.
            nc -l 389
            
      7. Try to authenticate as testuser and note it will hang.
        • This is expected as the external LDAP is "unreachable".
      8. Fix the "simulated network issue".
        • In our example we should just remove the entry from /etc/hosts file.
      9. Try to authenticate to Jira as testadmin and note it will work.
      10. Try to authenticate to Jira as testuser.

      Expected Results

      New authentication works without any issue as the external LDAP is reachable.

      Actual Results

      Authentication request hangs indefinitely and testuser is unable to access Jira until the application is restarted.

      If a thread dump is taken, we will have the following situation:

      • The thread taking care of the first authentication, while the LDAP was unreachable, is on waiting status holding a lock on the authentication cache.
        "http-nio-8080-exec-4 url: /jira/rest/gadget/1.0/login" daemon prio=5 tid=0x0000000000000022 nid=0 waiting on condition 
           java.lang.Thread.State: TIMED_WAITING (parking)
        	at java.base@11.0.20.1/jdk.internal.misc.Unsafe.park(Native Method)
        	- parking to wait for <0x00000000242e913e> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        	at java.base@11.0.20.1/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
        	at java.base@11.0.20.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
        	at java.base@11.0.20.1/java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:458)
        	at java.naming@11.0.20.1/com.sun.jndi.ldap.LdapRequest.getReplyBer(LdapRequest.java:120)
        	at java.naming@11.0.20.1/com.sun.jndi.ldap.Connection.readReply(Connection.java:443)
        	at java.naming@11.0.20.1/com.sun.jndi.ldap.LdapClient.ldapBind(LdapClient.java:365)
        	- locked <0x0000000037fb2992> (a com.sun.jndi.ldap.LdapClient)
        	at java.naming@11.0.20.1/com.sun.jndi.ldap.LdapClient.authenticate(LdapClient.java:214)
        	- locked <0x0000000037fb2992> (a com.sun.jndi.ldap.LdapClient)
        	at java.naming@11.0.20.1/com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2895)
        	- locked <0x00000000145b5857> (a java.lang.Object)
        	at java.naming@11.0.20.1/com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:348)
        	at java.naming@11.0.20.1/com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:266)
        	at java.naming@11.0.20.1/com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:226)
        	at java.naming@11.0.20.1/com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:284)
        	at java.naming@11.0.20.1/com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:185)
        	at java.naming@11.0.20.1/com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:115)
        	at java.naming@11.0.20.1/javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:730)
        	at java.naming@11.0.20.1/javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:305)
        	at java.naming@11.0.20.1/javax.naming.InitialContext.init(InitialContext.java:236)
        	at java.naming@11.0.20.1/javax.naming.ldap.InitialLdapContext.<init>(InitialLdapContext.java:154)
        	at org.springframework.ldap.core.support.LdapContextSource.getDirContextInstance(LdapContextSource.java:42)
        	at org.springframework.ldap.core.support.AbstractContextSource.createContext(AbstractContextSource.java:351)
        	at org.springframework.ldap.core.support.AbstractContextSource.doGetContext(AbstractContextSource.java:147)
        	at org.springframework.ldap.core.support.AbstractContextSource.getReadWriteContext(AbstractContextSource.java:182)
        	at org.springframework.ldap.transaction.compensating.manager.TransactionAwareContextSourceProxy.getReadWriteContext(TransactionAwareContextSourceProxy.java:88)
        	at org.springframework.ldap.transaction.compensating.manager.TransactionAwareContextSourceProxy.getReadOnlyContext(TransactionAwareContextSourceProxy.java:61)
        	at org.springframework.ldap.core.LdapTemplate.search(LdapTemplate.java:361)
        	at com.atlassian.crowd.directory.ldap.SpringLdapTemplateWrapper$10.timedGet(SpringLdapTemplateWrapper.java:221)
        	at com.atlassian.crowd.directory.ldap.SpringLdapTemplateWrapper$10.timedGet(SpringLdapTemplateWrapper.java:218)
        	at com.atlassian.crowd.directory.ldap.monitoring.TimedSupplier.get(TimedSupplier.java:37)
        	at com.atlassian.crowd.directory.ldap.SpringLdapTemplateWrapper.invokeWithContextClassLoader(SpringLdapTemplateWrapper.java:85)
        	at com.atlassian.crowd.directory.ldap.SpringLdapTemplateWrapper.search(SpringLdapTemplateWrapper.java:218)
        	at com.atlassian.crowd.directory.ldap.SpringLdapTemplateWrapper.searchWithLimitedResults(SpringLdapTemplateWrapper.java:255)
        	at com.atlassian.crowd.directory.SpringLDAPConnector.searchEntitiesWithRequestControls(SpringLDAPConnector.java:416)
        	at com.atlassian.crowd.directory.SpringLDAPConnector.searchEntities(SpringLDAPConnector.java:383)
        	at com.atlassian.crowd.directory.SpringLDAPConnector.searchUserObjects(SpringLDAPConnector.java:586)
        	at com.atlassian.crowd.directory.SpringLDAPConnector.findUserWithAttributesByName(SpringLDAPConnector.java:542)
        	at com.atlassian.crowd.directory.SpringLDAPConnector.findUserByName(SpringLDAPConnector.java:529)
        	at com.atlassian.crowd.directory.SpringLDAPConnector.authenticate(SpringLDAPConnector.java:952)
        	at com.atlassian.crowd.directory.DelegatedAuthenticationDirectory.authenticateAndUpdateOrCreate(DelegatedAuthenticationDirectory.java:195)
        	at com.atlassian.crowd.directory.DelegatedAuthenticationDirectory.authenticate(DelegatedAuthenticationDirectory.java:157)
        	at com.atlassian.crowd.manager.directory.DirectoryManagerGeneric.authenticateUser(DirectoryManagerGeneric.java:306)
        	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.authenticateUser(ApplicationServiceGeneric.java:191)
        	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.authenticate(CrowdServiceImpl.java:70)
        	at com.atlassian.jira.user.JiraDelegatingCrowdService.authenticate(JiraDelegatingCrowdService.java:34)
        	at com.atlassian.jira.user.JiraCrowdService.doAuthenticate(JiraCrowdService.java:119)
        	at com.atlassian.jira.user.JiraCrowdService.lambda$getFromCacheOrLoad$0(JiraCrowdService.java:155)
        	at com.atlassian.jira.user.JiraCrowdService$$Lambda$4414/0x00000008434a2840.call(Unknown Source)
        	at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4868)
        	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3533)
        	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2282)
        	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159)
        	- locked <0x0000000078fd1d5b> (a com.google.common.cache.LocalCache$StrongAccessWriteEntry)
        	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049)
        	at com.google.common.cache.LocalCache.get(LocalCache.java:3966)
        	at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863)
        	at com.atlassian.jira.user.JiraCrowdService.getFromCacheOrLoad(JiraCrowdService.java:148)
        	at com.atlassian.jira.user.JiraCrowdService.authenticate(JiraCrowdService.java:204)
        	at com.atlassian.jira.security.login.JiraSeraphAuthenticator.crowdServiceAuthenticate(JiraSeraphAuthenticator.java:80)
        	at com.atlassian.jira.security.login.JiraSeraphAuthenticator.authenticate(JiraSeraphAuthenticator.java:53)
        	at com.atlassian.seraph.auth.DefaultAuthenticator.login(DefaultAuthenticator.java:98)
        	at com.atlassian.seraph.filter.PasswordBasedLoginFilter.runAuthentication(PasswordBasedLoginFilter.java:132)
        	at com.atlassian.seraph.filter.PasswordBasedLoginFilter.login(PasswordBasedLoginFilter.java:77)
        	at com.atlassian.seraph.filter.BaseLoginFilter.doFilter(BaseLoginFilter.java:110)
        	at com.atlassian.jira.web.filters.JiraLoginFilter.doFilter(JiraLoginFilter.java:77)
        	at com.atlassian.core.filters.AbstractHttpFilter.doFilter(AbstractHttpFilter.java:32)
        
      • The thread taking care of the second authentication request, when the LDAP was already reachable, is waiting for the first thread to release the lock on the cache.
        "http-nio-8080-exec-20 url: /jira/login.jsp" daemon prio=5 tid=0x0000000000000268 nid=0 waiting on condition 
           java.lang.Thread.State: WAITING (parking)
        	at java.base@11.0.20.1/jdk.internal.misc.Unsafe.park(Native Method)
        	- parking to wait for <0x0000000010eb7055> (a com.google.common.util.concurrent.SettableFuture)
        	at java.base@11.0.20.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
        	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:557)
        	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:113)
        	at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:244)
        	at com.google.common.cache.LocalCache$LoadingValueReference.waitForValue(LocalCache.java:3586)
        	at com.google.common.cache.LocalCache$Segment.waitForLoadingValue(LocalCache.java:2179)
        	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2043)
        	at com.google.common.cache.LocalCache.get(LocalCache.java:3966)
        	at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863)
        	at com.atlassian.jira.user.JiraCrowdService.getFromCacheOrLoad(JiraCrowdService.java:148)
        	at com.atlassian.jira.user.JiraCrowdService.authenticate(JiraCrowdService.java:204)
        	at com.atlassian.jira.security.login.JiraSeraphAuthenticator.crowdServiceAuthenticate(JiraSeraphAuthenticator.java:80)
        	at com.atlassian.jira.security.login.JiraSeraphAuthenticator.authenticate(JiraSeraphAuthenticator.java:53)
        	at com.atlassian.seraph.auth.DefaultAuthenticator.login(DefaultAuthenticator.java:98)
        	at com.atlassian.seraph.filter.PasswordBasedLoginFilter.runAuthentication(PasswordBasedLoginFilter.java:132)
        	at com.atlassian.seraph.filter.PasswordBasedLoginFilter.login(PasswordBasedLoginFilter.java:77)
        	at com.atlassian.seraph.filter.BaseLoginFilter.doFilter(BaseLoginFilter.java:110)
        	at com.atlassian.jira.web.filters.JiraLoginFilter.doFilter(JiraLoginFilter.java:77)
        	at com.atlassian.core.filters.AbstractHttpFilter.doFilter(AbstractHttpFilter.java:32)
        

      Since the delegated directory doesn't have a timeout (connection, read, etc) configuration, then the first thread will hang indefinitely.
      On this condition, any new authentication from the same user won't move forward as the cache is locked for that user.
      Authentication of other users would work just fine.

      If there are as many authentication requests from the affected user as the maximum number of Tomcat HTTP threads, then the application will be unreachable as all threads for that connector would be in use and no new connection would be possible.
      This means this bug has the potential to bring an instance (or a node on a clustered instance) down.

      Workaround

      There's no way to adjust timeout parameters for the delegated directory from the UI on the same way we have it for connector directories.

      Also, configuring properties for JNDI pool won't take effect for this.

      The workaround is to adjust the connection timeout property directly in the database as an attribute of the directory.
      That way, a thread locking the authentication cache will be terminated when the timeout exception occurs.

      Here are the steps to add this attribute to the target directory in the database:

      1. Identify the ID of the target user directory from the cwd_directory table.
        SELECT *
        FROM cwd_directory
        WHERE directory_type = 'DELEGATING';
        
      2. Adjust timeouts to the target directory on the cwd_directory_attribute table.
        INSERT INTO cwd_directory_attribute (directory_id, attribute_name, attribute_value)
        VALUES (<ID from previous query>, 'ldap.connection.timeout', '10000');
        INSERT INTO cwd_directory_attribute (directory_id, attribute_name, attribute_value)
        VALUES (<ID from previous query>, 'ldap.search.timelimit', '60000');
        INSERT INTO cwd_directory_attribute (directory_id, attribute_name, attribute_value)
        VALUES (<ID from previous query>, 'ldap.read.timeout', '120000');
        
        • Values are based on default timeouts from the Connector directory type, but you may want to choose different values more appropriate to the instance.
      3. Restart Jira application so the change is applied.
        • If running Data Center on a multi-node configuration, a rolling restart is enough.

      With this workaround, a thread hanging on a connection to the LDAP will be terminated after 2 minutes waiting for a response from the LDAP and the following error will be logged in atlassian-jira.log:

      2023-10-20 19:35:37,330+0000 http-nio-8080-exec-4 url: /jira/rest/gadget/1.0/login INFO anonymous 1173x55x1 1sm9xlr 172.29.228.103,172.50.0.2 /rest/gadget/1.0/login [c.a.c.d.ldap.monitoring.TimedSupplier] Timed call for search using searchexecutor baseDN: dc=example,dc=org, filter: (&(objectclass=inetorgperson)(cn=testuser)) took 120005ms
      2023-10-20 19:35:37,331+0000 http-nio-8080-exec-4 url: /jira/rest/gadget/1.0/login ERROR anonymous 1173x55x1 1sm9xlr 172.29.228.103,172.50.0.2 /rest/gadget/1.0/login [c.a.c.manager.application.ApplicationServiceGeneric] Directory 'Delegated LDAP Authentication (10000)' is not functional during authentication of 'testuser'. Skipped.
      2023-10-20 19:35:37,334+0000 http-nio-8080-exec-4 ERROR      [o.a.c.c.C.[.[localhost].[/jira].[default]] Servlet.service() for servlet [default] in context with path [/jira] threw exception
      com.google.common.util.concurrent.UncheckedExecutionException: com.atlassian.crowd.exception.runtime.OperationFailedException
      	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2055)
      	at com.google.common.cache.LocalCache.get(LocalCache.java:3966)
      	at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863)
      	at com.atlassian.jira.user.JiraCrowdService.getFromCacheOrLoad(JiraCrowdService.java:148)
      	at com.atlassian.jira.user.JiraCrowdService.authenticate(JiraCrowdService.java:204)
      ...
      	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
      	at java.base/java.lang.Thread.run(Thread.java:829)
      Caused by: com.atlassian.crowd.exception.runtime.OperationFailedException
      	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.convertOperationFailedException(CrowdServiceImpl.java:676)
      ...
      	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049)
      	... 133 more
      Caused by: org.springframework.ldap.UncategorizedLdapException: Uncategorized exception occured during LDAP processing; nested exception is javax.naming.NamingException: LDAP response read timed out, timeout used: 120000 ms.
      	at org.springframework.ldap.support.LdapUtils.convertLdapException(LdapUtils.java:228)
      	at org.springframework.ldap.core.support.AbstractContextSource.createContext(AbstractContextSource.java:363)
      	at org.springframework.ldap.core.support.AbstractContextSource.doGetContext(AbstractContextSource.java:147)
      ...
      	at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.authenticateUser(ApplicationServiceGeneric.java:191)
      	at com.atlassian.crowd.embedded.core.CrowdServiceImpl.authenticate(CrowdServiceImpl.java:70)
      	... 141 more
      Caused by: javax.naming.NamingException: LDAP response read timed out, timeout used: 120000 ms.
      	at java.naming/com.sun.jndi.ldap.LdapRequest.getReplyBer(LdapRequest.java:129)
      	at java.naming/com.sun.jndi.ldap.Connection.readReply(Connection.java:443)
      	at java.naming/com.sun.jndi.ldap.LdapClient.ldapBind(LdapClient.java:365)
      

      This ensures no authentication thread will hold a lock on the cache for more than 2 minutes when waiting for a response from the external LDAP.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tmasutti Thiago Masutti
              Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: