Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-34820

LDAP Synchronisation can fail unexpectedly due to mistiming in the "LDAP response read time out"

XMLWordPrintable

      Summary

      In this bug, a read timeout exception is thrown before the timeout has passed, as can be seen in the logs below:

      2014-01-29 07:36:47,601 QuartzScheduler_Worker-2 INFO ServiceRunner     [atlassian.crowd.directory.DbCachingRemoteDirectory] synchronisation for directory [ 10000 ] starting
      2014-01-29 07:36:57,975 QuartzScheduler_Worker-2 INFO ServiceRunner     [atlassian.crowd.directory.DbCachingRemoteDirectory] failed synchronisation complete for directory [ 10000 ] in [ 10374ms ]
      2014-01-29 07:36:58,084 QuartzScheduler_Worker-2 ERROR ServiceRunner     [atlassian.crowd.directory.DbCachingDirectoryPoller] Error occurred while refreshing the cache for directory [ 10000 ].
      com.atlassian.crowd.exception.OperationFailedException: org.springframework.ldap.CommunicationException: avengers-the-initiative.net:389; nested exception is javax.naming.CommunicationException: avengers-the-initiative.net:389 [Root exception is java.net.SocketTimeoutException: connect timed out]
      	at com.atlassian.crowd.directory.SpringLDAPConnector.pageSearchResults(SpringLDAPConnector.java:400)
      	at com.atlassian.crowd.directory.SpringLDAPConnector.searchEntitiesWithRequestControls(SpringLDAPConnector.java:435)
      	at com.atlassian.crowd.directory.MicrosoftActiveDirectory.findTombstonesSince(MicrosoftActiveDirectory.java:882)
      	at com.atlassian.crowd.directory.MicrosoftActiveDirectory.findUserTombstonesSince(MicrosoftActiveDirectory.java:824)
      

      Expected Behaviour

      The LDAP connection does not timeout before the Read Timeout (seconds) is reached in the User Directory Configuration, for example Connecting to an LDAP Directory.

      Actual Behaviour

      The connection times out before the Read Timeout (seconds) interval is met.

      Verification

      Review when the synchronisation started in the logs, for example in the below snippet it's 07:36:47,601:

      2014-01-29 07:36:47,601 QuartzScheduler_Worker-2 INFO ServiceRunner     [atlassian.crowd.directory.DbCachingRemoteDirectory] synchronisation for directory [ 10000 ] starting
      

      And compare that to when the timeout occurs:

      2014-01-29 07:36:58,084 QuartzScheduler_Worker-2 ERROR ServiceRunner     [atlassian.crowd.directory.DbCachingDirectoryPoller] Error occurred while refreshing the cache for directory [ 10000 ].
      com.atlassian.crowd.exception.OperationFailedException: org.springframework.ldap.CommunicationException: avengers-the-initiative.net:389; nested exception is javax.naming.CommunicationException: avengers-the-initiative.net:389 [Root exception is java.net.SocketTimeoutException: connect timed out]
      

      Here we can see the timeout occurred at 07:36:58,084 which is around 11 seconds, and the Read Timeout (seconds) was set to 60. It's highly likely this is a result of the Java bug. If the timeout occurs after the interval, this is normal and expected behaviour. For example if the timeout occurred at 07:37:47,684 this would be expected behaviour.

      Cause

      This bug is triggered randomly in some environments, when the Read Timeout field in LDAP directory properties has been set to non-zero value.

      This bug is caused by a known bug (#6968459) affecting Java SE.

      Workaround

      Normally these exceptions can be safely ignored as the synchronisation is self-correcting. That is, problems encountered in one synchronisation round will get fixed in the following synchronisation round.

      If the synchronisation fails to complete successfully repeatedly, a known workaround is to disable read timeout by setting the Read Timeout field in LDAP directory properties to 0. A side-effect of this change is that Crowd will not be able to recover automatically from LDAP requests that take too long to run, which might cause Crowd to stop communicating with LDAP directories until it is restarted.

      For JIRA 6.3.x customers

      If you have confirmed that you are experiencing the aforementioned Java bug, we have developed a temporary fix until Oracle fixes the issue in the JDK itself.

      This is in the form of 2 JARs, a new version of Crowd for JIRA and a patched+shaded version of the JDK's LDAP JNDI provider.

      Source for the patched+shaded version of the JDK's LDAP JNDI provider is available at https://bitbucket.org/atlassian/atlassian-ldap-jndi-provider

      To install these files and apply them as a workaround:

      IMPORTANT: This will only work if the you are running Java 8 and a version of JIRA that uses crowd-ldap-2.8.0-OD-6 or compatible (JIRA 6.3.x)

              Unassigned Unassigned
              mlassau Mark Lassau (Inactive)
              Votes:
              62 Vote for this issue
              Watchers:
              62 Start watching this issue

                Created:
                Updated:
                Resolved: