Uploaded image for project: 'Crowd Data Center'
  1. Crowd Data Center
  2. CWD-2494

LDAP Synchronisation can fail unexpectedly due to mistiming in the "LDAP response read time out"

    • Icon: Bug Bug
    • Resolution: Tracked Elsewhere
    • Icon: Medium Medium
    • None
    • None
    • None
    • None

      In attached log you can see a synchronisation start at 2011-06-01 15:42:59,076 and "time-out" for 120 second timeout at 2011-06-01 15:43:15,739

      Bug description

      In this bug, a read timeout exception is thrown before the timeout has passed. In the log snippet below, read timeout exception was thrown only around 300 milliseconds after last successful LDAP operation:

      2011-06-01 15:43:15,400 QuartzWorker-1 INFO ServiceRunner     [directory.ldap.cache.AbstractCacheRefresher] found [ 210 ] remote user-group memberships in [ 317ms ]
      2011-06-01 15:43:15,715 QuartzWorker-1 INFO ServiceRunner     [atlassian.crowd.directory.DbCachingRemoteDirectory] synchronisation complete in [ 16639ms ]
      2011-06-01 15:43:15,739 QuartzWorker-1 ERROR ServiceRunner     [atlassian.crowd.directory.DbCachingDirectoryPoller] Error occurred while refreshing the cache for directory [ 10000 ].
      com.atlassian.crowd.exception.OperationFailedException: org.springframework.ldap.UncategorizedLdapException: Uncategorized exception occured during LDAP processing; nested exception is javax.naming.NamingException: LDAP response read timed out, timeout used:120000ms.; remaining name 'cn=c-user-1593,ou=childou-c-2000users,ou=loadtesting10k,o=sgi,c=us'
      

      Cause

      This bug is triggered randomly in some environments, when the Read Timeout field in LDAP directory properties has been set to non-zero value.

      This bug is caused by a known bug (#6968459) affecting Java SE.

      Workaround

      Normally these exceptions can be safely ignored as the synchronisation is self-correcting. That is, problems encountered in one synchronisation round will get fixed in the following synchronisation round.

      If the synchronisation fails to complete successfully repeatedly, a known workaround is to disable read timeout by setting the Read Timeout field in LDAP directory properties to 0. A side-effect of this change is that Crowd will not be able to recover automatically from LDAP requests that take too long to run, which might cause Crowd to stop communicating with LDAP directories until it is restarted.

            [CWD-2494] LDAP Synchronisation can fail unexpectedly due to mistiming in the "LDAP response read time out"

            joe added a comment -

            The fix for this issue (JDK-7011441) is present in:

            • JDK 8u60 (1.8.0_60-b27)

            If you're running into this, please upgrade your JVM to pick up the fix.

            (It's also fixed in 7u91 and 6u101, which are not publically available, but which may be available under a support contract.)

            joe added a comment - The fix for this issue ( JDK-7011441 ) is present in: JDK 8u60 (1.8.0_60-b27) If you're running into this, please upgrade your JVM to pick up the fix. (It's also fixed in 7u91 and 6u101, which are not publically available, but which may be available under a support contract.)

            joe added a comment -

            JDK-7011441 has now been resolved in the JDK project, and the fix backported to a number of stable branches. This will likely arrive in an upcoming public Oracle release of the JDK.

            joe added a comment - JDK-7011441 has now been resolved in the JDK project, and the fix backported to a number of stable branches. This will likely arrive in an upcoming public Oracle release of the JDK.

            I am sure that a company like Atlassian that has most if not all products based on Java, does have a support contract with Oracle, which allows you to get help from Oracle on fixing a bug like this.

            Another alternative is to switch no another LDAP client library: http://stackoverflow.com/questions/389746/ldap-java-library

            Sorin Sbarnea (Citrix) added a comment - I am sure that a company like Atlassian that has most if not all products based on Java, does have a support contract with Oracle, which allows you to get help from Oracle on fixing a bug like this. Another alternative is to switch no another LDAP client library: http://stackoverflow.com/questions/389746/ldap-java-library

            joe added a comment -

            Unfortunately, https://bugs.openjdk.java.net/browse/JDK isn't an open JIRA, so I can't add any more information there.

            For the specific JDK bug, this isn't something we can work around in Crowd, aside from using a separate LDAP library. I'm going to close this as 'Tracked Elsewhere' to make clear where the fix needs to be.

            joe added a comment - Unfortunately, https://bugs.openjdk.java.net/browse/JDK isn't an open JIRA, so I can't add any more information there. For the specific JDK bug, this isn't something we can work around in Crowd, aside from using a separate LDAP library. I'm going to close this as 'Tracked Elsewhere' to make clear where the fix needs to be.

            Hi intersol,

            The jdk bug is at https://bugs.openjdk.java.net/browse/JDK-6968459

            Regards,

            Oswaldo Hernández.
            JIRA Bugmaster.
            [Atlassian].

            Oswaldo Hernandez (Inactive) added a comment - Hi intersol , The jdk bug is at https://bugs.openjdk.java.net/browse/JDK-6968459 Regards, Oswaldo Hernández. JIRA Bugmaster. [Atlassian] .

            Can someone put a link to a bug agains current version of Oracle Java - 1.7 — having a bug in an unsupported version of Java seems not very useful, as I am sure Oracle is not going to fix it.

            Sorin Sbarnea added a comment - Can someone put a link to a bug agains current version of Oracle Java - 1.7 — having a bug in an unsupported version of Java seems not very useful, as I am sure Oracle is not going to fix it.

            Can this affect Delegated LDAP directories?

            As you say, the delegated LDAP directories don't expose the Read Timeout property.
            I have to assume (but I haven't confirmed) that this is equivalent to the 0 setting that makes the timeout unlimited.
            Which suggests that they would suffer from the mentioned side-effect of "not being able to recover automatically from LDAP requests that take too long to run".

            This behaviour seems much worse than an occasional spurious timeout, so perhaps we want to raise an issue to add "read timeout" to the Delegated Advanced settings.

            Mark Lassau (Inactive) added a comment - Can this affect Delegated LDAP directories? As you say, the delegated LDAP directories don't expose the Read Timeout property. I have to assume (but I haven't confirmed) that this is equivalent to the 0 setting that makes the timeout unlimited. Which suggests that they would suffer from the mentioned side-effect of "not being able to recover automatically from LDAP requests that take too long to run". This behaviour seems much worse than an occasional spurious timeout, so perhaps we want to raise an issue to add "read timeout" to the Delegated Advanced settings.

            Dave C added a comment -

            Can this affect Delegated LDAP directories? They do not seem to have a timeout parameter as they do not synchronise, however can timeout on authentication.

            Dave C added a comment - Can this affect Delegated LDAP directories? They do not seem to have a timeout parameter as they do not synchronise, however can timeout on authentication.

            joe added a comment -

            I've written a potential fix: Commit 8132068ce172. Unfortunately, the JDK is distributed under the GPL so we can't include the modified file with Crowd.

            The modified version could be downloaded, compiled, placed in its own jar file and then Crowd's startup script changed to use

            -Xbootclasspath/p:jndi-patch.jar
            

            I've submitted a patch to the OpenJDK project so hopefully it'll make it into a release at some point. In the meantime, if you're still seeing this problem, please include logs and the specific OS you're using so we can reproduce the issue and confirm any fixes.

            joe added a comment - I've written a potential fix: Commit 8132068ce172 . Unfortunately, the JDK is distributed under the GPL so we can't include the modified file with Crowd. The modified version could be downloaded, compiled, placed in its own jar file and then Crowd's startup script changed to use -Xbootclasspath/p:jndi-patch.jar I've submitted a patch to the OpenJDK project so hopefully it'll make it into a release at some point. In the meantime, if you're still seeing this problem, please include logs and the specific OS you're using so we can reproduce the issue and confirm any fixes.

            bain added a comment -

            Sorry. I was on Mac 10.6.x and a Apple JDK 1.6.x. Don't have the exact versions.

            bain added a comment - Sorry. I was on Mac 10.6.x and a Apple JDK 1.6.x. Don't have the exact versions.

              Unassigned Unassigned
              mlassau Mark Lassau (Inactive)
              Affected customers:
              10 This affects my team
              Watchers:
              17 Start watching this issue

                Created:
                Updated:
                Resolved: