• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: High High
    • 2.1.0-beta4, 2.1
    • 2.0
    • None
    • None
    • Confluence 3.2, Crowd 2.0

      The group membership lists for some users is sometimes losing some groups when some of the groups are nested. For example, a user customertest2 is a member of the group customer-test, which is a member of the group customer. Once the issue happens, the groups will show up as:

      customertest2 (String) [jira-users, customer-test] (ArrayList)
      

      but they should be:

      customertest2 (String) [jira-users, customer-test, sdk-customer, customer] (ArrayList)
      

      Diagnostic steps: Check to see if flushing the cache on the client (e.g. Confluence Admin Console -> Cache Statistics -> Flush All) fixes the problem temporarily.

      This membership error is cached in the com.atlassian.crowd.integration-all-memberships cache.

      Workarounds:
      We have still not been able to find the source of the problem within the confluence code, however we do know that someone the confluence caches are affecting the crowd caches.

      We have attached the crowd-integration-client-2.0.7-CWD-1996.jar which shades net.sf.ehcache to com.atlassian.crowd.shaded.ehcache. What this means is that there is no possible way that Confluence could have any effect whatsoever with the Crowd integration client's caches. They are essentially different classes that Confluence knows nothing about.

      To apply the patch, upgrade to crowd 2.0.7 and in your confluence instance, remove the any other Crowd integration client JARs from
      CONFLUENCE_INSTALL/confluence/WEB-INF/lib
      and place the attached crowd-integration-client-2.0.7-CWD-1996.jar and restart confluence.

            [CWD-1996] Crowd integration cache loses some nested groups

            shihab added a comment -

            We are marking this issue as fixed for 2.1 as we have incorporated the fix from the customer-provided patch.

            As we were not able to reproduce the exact problem locally, if the problem persists in Crowd 2.1, we will re-open this issue for further investigation.

            shihab added a comment - We are marking this issue as fixed for 2.1 as we have incorporated the fix from the customer-provided patch. As we were not able to reproduce the exact problem locally, if the problem persists in Crowd 2.1, we will re-open this issue for further investigation.

            As we have not been able to reproduce this issue locally, we would appreciate feedback on Crowd 2.1.0-beta4 from customers affected by CWD-1996.

            Steps to upgrade your test instances:

            1. Upgrade your Crowd server to Crowd 2.1.0-beta4
            2. Upgrade your Crowd client libraries in Confluence by:
              1. deleting the old crowd-integration-client-*.jar in Confluence's WEB-INF/lib
              2. copying the new crowd-integration-client-2.1.0-beta4.jar into Confluence's WEB-INF/lib
              3. restart Confluence

            Let us know if you still experience issues with nested groups in Confluence after upgrading to the beta.

            Olli Nevalainen added a comment - As we have not been able to reproduce this issue locally, we would appreciate feedback on Crowd 2.1.0-beta4 from customers affected by CWD-1996 . Steps to upgrade your test instances : Upgrade your Crowd server to Crowd 2.1.0-beta4 Upgrade your Crowd client libraries in Confluence by: deleting the old crowd-integration-client-*.jar in Confluence's WEB-INF/lib copying the new crowd-integration-client-2.1.0-beta4.jar into Confluence's WEB-INF/lib restart Confluence Let us know if you still experience issues with nested groups in Confluence after upgrading to the beta.

            We just released Crowd 2.1 Beta 4 with multiple fixes to nested group caching.

            This is beta software so the usual warnings about not running it on productions systems apply.

            Olli Nevalainen added a comment - We just released Crowd 2.1 Beta 4 with multiple fixes to nested group caching. This is beta software so the usual warnings about not running it on productions systems apply.

            Jeff Kirby added a comment -

            The patch that I last uploaded to CSP-51344 has been working for us for nearly a month without a single failure

            Jeff Kirby added a comment - The patch that I last uploaded to CSP-51344 has been working for us for nearly a month without a single failure

            Jeff Kirby added a comment -

            This problem was common on our Confluence 3.0.2 and rampant after upgrading to Confluence 3.3.1. In September we tried a patch that Atlassian provided. It improved the 3.3.1 bug so that the occurrence lowered from several times an hour to a few times a week, to the 3.0.2 level. In the mean time we investigated the source ourselves as we were able to reproduce the problem on a test server.

            I believe that yesterday we've finally come up with a version that eliminates nested group failure bug! yay! I've posted the patch to CSP-51344 so that Atlassian engineers can look at it.

            There are several problems

            1. NestingHelper uses the crowd integration caches as data structures to pass data between its private members. This is error prone as the cache is highly mutable. Instead use method-scoped maps.
            2. When one item in a crowd integration cache expires, then because of the inter-relationships of nested groups, all the crowd integration caches need to be cleared and rebuilt and all reads or writes to the cache must be blocked until the rebuild finishes.

            Jeff Kirby added a comment - This problem was common on our Confluence 3.0.2 and rampant after upgrading to Confluence 3.3.1. In September we tried a patch that Atlassian provided. It improved the 3.3.1 bug so that the occurrence lowered from several times an hour to a few times a week, to the 3.0.2 level. In the mean time we investigated the source ourselves as we were able to reproduce the problem on a test server. I believe that yesterday we've finally come up with a version that eliminates nested group failure bug! yay! I've posted the patch to CSP-51344 so that Atlassian engineers can look at it. There are several problems NestingHelper uses the crowd integration caches as data structures to pass data between its private members. This is error prone as the cache is highly mutable. Instead use method-scoped maps. When one item in a crowd integration cache expires, then because of the inter-relationships of nested groups, all the crowd integration caches need to be cleared and rebuilt and all reads or writes to the cache must be blocked until the rebuild finishes.

            Hi Dan,

            We're not absolutely certain yet, but it looks like Confluence 3.2. crowd-integration-client-xx.jar has always used ehCache for storing users and groups.

            In Confluence 3.2, we moved from using a Tangosol Coherence-based cache to ehCache for general Confluence caching. At the moment, it appears that Confluence and Crowd are conflicting in the way they use ehCache. Diagnosing this is complicated by the fact that we can't reproduce the issue locally.

            We are working on it though.

            Cheers,
            Dave.

            David O'Flynn [Atlassian] added a comment - Hi Dan, We're not absolutely certain yet, but it looks like Confluence 3.2. crowd-integration-client-xx.jar has always used ehCache for storing users and groups. In Confluence 3.2, we moved from using a Tangosol Coherence-based cache to ehCache for general Confluence caching. At the moment, it appears that Confluence and Crowd are conflicting in the way they use ehCache. Diagnosing this is complicated by the fact that we can't reproduce the issue locally. We are working on it though. Cheers, Dave.

            Hi,

            What's the rationale for explicitly using the singleton cache manager that then gets shared between the Crowd integration client and Confluence?
            If this gets executed before the host application (i.e. Confluence) initializes its own caches then the crowd configuration will be used but not the Confluence one.

            Could this be changed to:

            public CacheImpl(URL configLocation)
                {
                    System.out.println("config location: " + configLocation);
                    this.ehCacheManager = new net.sf.ehcache.CacheManager(configLocation);
            

            Stefan Saasen (Inactive) added a comment - - edited Hi, What's the rationale for explicitly using the singleton cache manager that then gets shared between the Crowd integration client and Confluence? If this gets executed before the host application (i.e. Confluence) initializes its own caches then the crowd configuration will be used but not the Confluence one. Could this be changed to: public CacheImpl(URL configLocation) { System .out.println( "config location: " + configLocation); this .ehCacheManager = new net.sf.ehcache.CacheManager(configLocation);

            Do you have any insight into when this bug started? We have updated from crowd 2.0.3 to crowd 2.0.7 as well as Confluence 3.1 to 3.3.1 and now see this issue.

            Dan

            Dan Radigan added a comment - Do you have any insight into when this bug started? We have updated from crowd 2.0.3 to crowd 2.0.7 as well as Confluence 3.1 to 3.3.1 and now see this issue. Dan

            Hi folks,

            This integration is being completely rewritten for Crowd 2.1 / Confluence 3.5, which will fix this bug.

            Apologies for not being able to fix it sooner; we cannot reproduce the problem here.

            Cheers,
            Dave.

            David O'Flynn [Atlassian] added a comment - - edited Hi folks, This integration is being completely rewritten for Crowd 2.1 / Confluence 3.5, which will fix this bug. Apologies for not being able to fix it sooner; we cannot reproduce the problem here. Cheers, Dave.

            • Crowd has a single default timeout value: 1 hour for all the caches. But Confluence has different times for different caches: 1 hour, 30 min, 10 min and 5 min (confluence-home-dir/config/ehcache.xml).
            • Integrations having the problem are using the crowd-ehcache.xml file provided at <Crowd>/client/config
            • The problem happens to Confluence only. We have not received reports for JIRA or other Apps.

            Renan Battaglin added a comment - Crowd has a single default timeout value: 1 hour for all the caches. But Confluence has different times for different caches: 1 hour, 30 min, 10 min and 5 min (confluence-home-dir/config/ehcache.xml). Integrations having the problem are using the crowd-ehcache.xml file provided at <Crowd>/client/config The problem happens to Confluence only. We have not received reports for JIRA or other Apps.

              onevalainen Olli Nevalainen
              richatkins Richard Atkins
              Affected customers:
              11 This affects my team
              Watchers:
              17 Start watching this issue

                Created:
                Updated:
                Resolved: