Details
Description
Issue Summary
Change the default value of `com.sun.jndi.ldap.connect.pool.timeout` from 0(unlimited) to a lower value, specifically 30 seconds. This adjustment aims to fix LDAP Directory sync issues and authentication failures reported by Bitbucket when stale LDAP pool connections are reused.
The default value of 0 (zero) means that the idle time is unlimited, so connections will never be timed out.
This adjustment aligns with the settings for Jira and Confluence, ensuring that all connections are evicted and preventing the persistence of stale connections in the LDAP pool.
Crowd also recommends changing this value from 0
This issue is reproducible on Data Center (yes).
Steps to Reproduce
- Occasionally, when the LDAP pool connection is closed by the LDAP server or AD, and the same FIN, ACK is dropped due to network issues, Bitbucket continues to believe the connection isn't closed.
- Consequently, the same LDAP connection pool is subsequently used for authentication and LDAP syncs.
- This, in turn, causes Authentication or Directory Sync requests to fail due to timeouts set in Bitbucket.
Expected Results
Evict the LDAP connections every few minutes or seconds so stale connections are all evicted and not reused reducing the chance of it getting reused
Actual Results
Bitbucket currently reuses the stale LDAP pool connection, leading to timeouts when the read timeout or LDAP timeout value is reached.
ldap.read.timeout: 360000
2023-08-24 11:26:14,847 ERROR [Caesium-1-4] c.a.c.d.DbCachingRemoteDirectory Exception occured when performing full synchronization com.atlassian.crowd.exception.OperationFailedException: java.util.concurrent.ExecutionException: com.atlassian.crowd.exception.OperationFailedException: org.springframework.ldap.UncategorizedLdapException: Uncategorized exception occured during LDAP processing; nested exception is javax.naming.NamingException: LDAP response read timed out, timeout used: 360000 ms.; remaining name 'DC=instenv-180103-ldne,DC=local' at com.atlassian.crowd.directory.synchronisation.cache.UsnChangedCacheRefresher.synchroniseAllUsers(UsnChangedCacheRefresher.java:209) at com.atlassian.crowd.directory.synchronisation.cache.AbstractCacheRefresher.synchroniseAll(AbstractCacheRefresher.java:45) at com.atlassian.crowd.directory.synchronisation.cache.UsnChangedCacheRefresher.synchroniseAll(UsnChangedCacheRefresher.java:174) at com.atlassian.crowd.directory.DbCachingRemoteDirectory.synchroniseCache(DbCachingRemoteDirectory.java:1098) at com.atlassian.crowd.manager.directory.DirectorySynchroniserImpl.lambda$synchronise$0(DirectorySynchroniserImpl.java:85) at com.atlassian.crowd.audit.NoOpAuditLogContext.withAuditLogSource(NoOpAuditLogContext.java:17) at com.atlassian.crowd.manager.directory.DirectorySynchroniserImpl.synchronise(DirectorySynchroniserImpl.java:83) at jdk.internal.reflect.GeneratedMethodAccessor1439.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at com.atlassian.crowd.directory.DbCachingDirectoryPoller.pollChanges(DbCachingDirectoryPoller.java:48) at com.atlassian.crowd.manager.directory.monitor.poller.DirectoryPollerJobRunner.runJob(DirectoryPollerJobRunner.java:92) at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:134) at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:106) at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:90) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:435) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:430) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:454) at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:382) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60) at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35) at java.base/java.lang.Thread.run(Thread.java:829) ... 10 frames trimmed Caused by: java.util.concurrent.ExecutionException: com.atlassian.crowd.exception.OperationFailedException: org.springframework.ldap.UncategorizedLdapException: Uncategorized exception occured during LDAP processing; nested exception is javax.naming.NamingException: LDAP response read timed out, timeout used: 360000 ms.; remaining name 'DC=instenv-180103-ldne,DC=local' at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) at com.atlassian.crowd.directory.synchronisation.cache.UsnChangedCacheRefresher.synchroniseAllUsers(UsnChangedCacheRefresher.java:197) ... 22 common frames omitted Caused by: com.atlassian.crowd.exception.OperationFailedException: org.springframework.ldap.UncategorizedLdapException: Uncategorized exception occured during LDAP processing; nested exception is javax.naming.NamingException: LDAP response read timed out, timeout used: 360000 ms.; remaining name 'DC=instenv-180103-ldne,DC=local' at com.atlassian.crowd.directory.SpringLDAPConnector.pageSearchResults(SpringLDAPConnector.java:366) at com.atlassian.crowd.directory.SpringLDAPConnector.searchEntitiesWithRequestControls(SpringLDAPConnector.java:399) at com.atlassian.crowd.directory.SpringLDAPConnector.searchEntities(SpringLDAPConnector.java:383) at com.atlassian.crowd.directory.SpringLDAPConnector.searchUserObjects(SpringLDAPConnector.java:586) at com.atlassian.crowd.directory.SpringLDAPConnector.searchUsers(SpringLDAPConnector.java:931) at com.atlassian.crowd.directory.synchronisation.cache.UsnChangedCacheRefresher.lambda$synchroniseAll$0(UsnChangedCacheRefresher.java:148) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ... 2 common frames omitted Caused by: org.springframework.ldap.UncategorizedLdapException: Uncategorized exception occured during LDAP processing; nested exception is javax.naming.NamingException: LDAP response read timed out, timeout used: 360000 ms.; remaining name 'DC=instenv-180103-ldne,DC=local' at org.springframework.ldap.support.LdapUtils.convertLdapException(LdapUtils.java:228) at org.springframework.ldap.core.LdapTemplate.search(LdapTemplate.java:397) at com.atlassian.crowd.directory.ldap.SpringLdapTemplateWrapper$3.timedGet(SpringLdapTemplateWrapper.java:143) at com.atlassian.crowd.directory.ldap.SpringLdapTemplateWrapper$3.timedGet(SpringLdapTemplateWrapper.java:139) at com.atlassian.crowd.directory.ldap.monitoring.TimedSupplier.get(TimedSupplier.java:37) at com.atlassian.crowd.directory.ldap.SpringLdapTemplateWrapper.invokeWithContextClassLoader(SpringLdapTemplateWrapper.java:85) at com.atlassian.crowd.directory.ldap.SpringLdapTemplateWrapper.search(SpringLdapTemplateWrapper.java:139) at com.atlassian.crowd.directory.SpringLDAPConnector.pageSearchResults(SpringLDAPConnector.java:340) ... 9 common frames omitted Caused by: javax.naming.NamingException: LDAP response read timed out, timeout used: 360000 ms. at java.naming/com.sun.jndi.ldap.LdapRequest.getReplyBer(LdapRequest.java:129) at java.naming/com.sun.jndi.ldap.Connection.readReply(Connection.java:443) at java.naming/com.sun.jndi.ldap.LdapClient.getSearchReply(LdapClient.java:639) at java.naming/com.sun.jndi.ldap.LdapClient.search(LdapClient.java:562) at java.naming/com.sun.jndi.ldap.LdapCtx.doSearch(LdapCtx.java:2014) at java.naming/com.sun.jndi.ldap.LdapCtx.searchAux(LdapCtx.java:1873) at java.naming/com.sun.jndi.ldap.LdapCtx.c_search(LdapCtx.java:1798) at java.naming/com.sun.jndi.toolkit.ctx.ComponentDirContext.p_search(ComponentDirContext.java:392) at java.naming/com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(PartialCompositeDirContext.java:358) at java.naming/javax.naming.directory.InitialDirContext.search(InitialDirContext.java:276) at java.base/jdk.internal.reflect.GeneratedMethodAccessor1158.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.springframework.ldap.transaction.compensating.manager.TransactionAwareDirContextInvocationHandler.invoke(TransactionAwareDirContextInvocationHandler.java:90) at com.atlassian.crowd.directory.ldap.SpringLdapTemplateWrapper$3.lambda$timedGet$0(SpringLdapTemplateWrapper.java:141) at org.springframework.ldap.core.LdapTemplate.search(LdapTemplate.java:363) ... 15 common frames omitted
Workaround
To address this issue, set the default value of `com.sun.jndi.ldap.connect.pool.timeout` to a lower value, such as 30 seconds.
- Append or add the JVM_SUPPORT_RECOMMENDED_ARGS in <BITBUCKET_INSTALL>/bin/_start-webapp.sh on Bitbucket node with the following value
JVM_SUPPORT_RECOMMENDED_ARGS="-Dcom.sun.jndi.ldap.connect.pool.timeout=30000"
- Restart the Bitbucket application to apply the updated JVM arguments.