A customer with a custom ConfluenceAuthenticator (based on the Soulwing CAS Authenticatory is reporting that their method calls for getUser in the userAccessor class are returning null.

      The result is odd, because the behaviour of the authenticator is to:

      1. Check whether or not they exist in Confluence
      2. Create the user account via userAcessor.addUser
      3. Check the user account just created.

      The user has been added, and saved in Confluence's database yet the subsequent call to check if the user exists returns a null. However, if the server is bounced, the getUser query is successful

      What looked to be caused by not using the userAccessor.saveUser, but implementing this method did not resolve the issue.
      After reviewing the customer's source code with Paul, we suspect this problem may be caused by a particular hibernate query failing to execute:

          <query name="atluser.user_findAllUserNames"><![CDATA[
              select user.name
              from DefaultHibernateUser user
              order by user.name
       ]]></query>
      

      Hibernate logs show null returned immediately after a user is added.

        1. atlassian-bucket-1.2-SNAPSHOT.jar
          137 kB
          Anatoli
        2. atlassian-confluence_with_hibernate_logging2.log
          7.27 MB
          Justin Sabelko
        3. confluence_confg_and_added_files.zip
          436 kB
          Justin Sabelko
        4. ConfluenceCasAuthenticator.java
          6 kB
          Justin Sabelko
        5. custom_code.zip
          54 kB
          Justin Sabelko
        6. tomcat_and_other_confluence_config_files.zip
          9 kB
          Justin Sabelko
        7. UscmConfluenceCasAuthenticator.java
          2 kB
          Justin Sabelko

            [CONFSERVER-10915] calling getUser via the userAccessor returns null

            Hi There

            Thanks for taking the time to raise this issue. As you are no doubt aware this has been on our backlog now for quite some time with no resolution forthcoming. Due to the age and inactivity I'm going to close this issue as won't fix. I believe this better reflects the status of this issue.

            Regards
            Steve Haffenden
            Confluence Bugmaster

            Steve Haffenden (Inactive) added a comment - Hi There Thanks for taking the time to raise this issue. As you are no doubt aware this has been on our backlog now for quite some time with no resolution forthcoming. Due to the age and inactivity I'm going to close this issue as won't fix. I believe this better reflects the status of this issue. Regards Steve Haffenden Confluence Bugmaster

            Anatoli added a comment -

            Arne,

            I have created a separate bug report for your problem. I have explained there why it happens and what can be a workaround.

            Justin,
            Thanks for testing. We will produce a proper fix sometimes in the near future.

            Anatoli.

            Anatoli added a comment - Arne, I have created a separate bug report for your problem. I have explained there why it happens and what can be a workaround. Justin, Thanks for testing. We will produce a proper fix sometimes in the near future. Anatoli.

            That worked! Thanks for describing the problem too, as I was able to modify our code (by simply adding .toLowerCase()) and then get it working on our production server without having to use the beta jar file.

            Thanks again, as this is huge for us!
            Justin

            Justin Sabelko added a comment - That worked! Thanks for describing the problem too, as I was able to modify our code (by simply adding .toLowerCase()) and then get it working on our production server without having to use the beta jar file. Thanks again, as this is huge for us! Justin

            I have a similar problem using getUser, but this is with Confluence 2.5.8 Cluster, Tomcat 6.0, Java 1.5, Oracle 9

            We are using our own single-sign-on plugin which works perfectly well in a non-clustered environment but has difficulties in the clustered environment. The problem occurs only sporadically and we were able to track the problem down to some Atlassian code.

            Here is a scenario that triggers the bug every time:

            • Apache Webserver and both cluster nodes are restarted.
            • User logs in. The Apache Webserver is configured using mod_jk so that every request from this user goes to the same (the first) cluster node.
            • We shut down the first node and restart the Apache Webserver. The user's web browser is not closed.
            • Because the Apache Webserver has been restarted, it sees only the second node. All requests are therefore directed to the second node.
            • The second cluster node doesn't have an active session, because the user logged in on the first cluster node and the second cluster node was unused until now (there was only one user at all accessing the system).
            • Therefore the second cluster node will also go through the single-sign-on process, attempting to log in the user according to the credentials available in the http headers.
            • Now here is the problem: our plugin calls getUser (see the code fragment) which returns an incorrectly initialized user object. The "groups" field is empty which causes other problems further down.
            • We have turned on all log4j debug traces and what we can see is that getUser never goes to the database for this particular user. The corresponding SQL log entries are missing. However when we log in as a different user (using our manual single sign on routine which prevents different credentials in the http request header) then everything is fine again.

            This particular scenario may look far-fetched but we have the same results if someone leaves his browser window open over night. The Apache Webserver will then purge or time-out its session information in the mod_jk handler so when the user continues using his confluence session next morning the Apache will direct him to the other cluster node with a probability of 50%, triggering the very same problem as presented above.

            My theory what happens is:

            • Cluster node 1 tells cluster node 2 about the logged in users (the data accessed by getUser) but the data is somehow incomplete
            • Cluster node 2 finds the incomplete data and uses it without refreshing it from the database
            • If someone else logs in to cluster node 2 his data is not yet in the getUser data cache so it fetches it from the database.

            I was not able to reproduce this behavior using the same Confluence release on a Debian system with MySQL (the production system is a RedHat system with Oracle).

            Here is the code fragment:

            ...
            // we need to find one particular confluence user from his user.id property
            // this is much faster than iterating through a list of all users
            
            connection = session.connection();
            
            preparedStatement = connection.prepareStatement("select users.NAME, OS_PROPERTYENTRY.STRING_VAL " +
            		"from users, OS_PROPERTYENTRY " +
            		"where OS_PROPERTYENTRY.ENTITY_KEY='user.id' and users.ID=OS_PROPERTYENTRY.ENTITY_ID and OS_PROPERTYENTRY.STRING_VAL=?");
            preparedStatement.setString(1,id);
            rs = preparedStatement.executeQuery();
            rs.next();
            String username = rs.getString(1);
            
            // we have the user name at this point. If not, we would have an exception and won't reach this point
            
            LOG.debug("query result: " + username);
            rs.close();
            rs = null;
            preparedStatement.close();
            preparedStatement = null;
            
            // now get the confluence user from the user name. The problem with this code fragment is that it returns usually
            // a properly constructed user object, but sometimes we get an object with user.id = 0 and user.groups = null.
            // We can see from the debug traces that userAccessor.getUser will not retrieve the user data from the database in the latter case.
            
            UserAccessor userAccessor = (UserAccessor) com.atlassian.spring.container.ContainerManager.getComponent("userAccessor");
            user = userAccessor.getUser(username.toLowerCase());
            
            // the purpose of this cast is to get access to user.id and user.groups (which are not accessible from the User class)
            // and to zap the user object if it appears to be broken
            
            DefaultHibernateUser hUser = (DefaultHibernateUser)user;
            LOG.debug("user.name: " + hUser.getName());
            LOG.debug("user.id: " + hUser.getId());
            if (hUser.getGroups() == null) {
            	LOG.debug("user.groups: NULL");
            	user = null;
            } else {
            	LOG.debug("user.groups: NOT NULL");
            	tmpMap.put(id, user);
            }
            ...
            

            And here is the corresponding log4j output from the first cluster node. You can see that the user.groups field is filled (NOT NULL). During execution of this code there are also SELECTs on the Oracle database.

            [2008-04-26 16:04:29,504] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: > UserCache.lookUpById
            [2008-04-26 16:04:29,551] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: query result: arne.schirmacher@.....
            [2008-04-26 16:04:29,934] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.name: arne.schirmacher@.....
            [2008-04-26 16:04:29,935] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.id: 25630817
            [2008-04-26 16:04:29,936] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.groups: NOT NULL
            [2008-04-26 16:04:29,937] [catalina-exec-1] INFO de.company.cf.login.UserCache: add 1 users to 0 existing users in map
            [2008-04-26 16:04:29,938] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: < UserCache.lookUpById
            

            And here is the corresponding log4j output from the second cluster node, after the first one has been shut down. There is no activity on the database. userAccessor.getUser returns a user object without retrieving it from the database, which is probably the reason why not all fields are filled. The only explanation is that this node thinks its user object is already complete. If I log in as another user it does retrieve it from the DB. So it must be caused somehow by the fact that the user was accessed by the first cluster node just a moment ago. Maybe the second cluster node has received a bad data object sent by the first cluster node.

            [2008-04-26 16:05:55,468] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: > UserCache.lookUpById
            [2008-04-26 16:05:55,519] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: query result: arne.schirmacher@.....
            [2008-04-26 16:05:55,529] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.name: arne.schirmacher@.....
            [2008-04-26 16:05:55,530] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.id: 0
            [2008-04-26 16:05:55,531] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.groups: NULL
            [2008-04-26 16:05:55,533] [catalina-exec-1] INFO de.company.cf.login.UserCache: add 0 users to 0 existing users in map
            [2008-04-26 16:05:55,534] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: < UserCache.lookUpById
            

            Please advise how we can force the second cluster node to always visit the database when accessing getUser, or suggest some other workaround.

            schirmacher added a comment - I have a similar problem using getUser, but this is with Confluence 2.5.8 Cluster, Tomcat 6.0, Java 1.5, Oracle 9 We are using our own single-sign-on plugin which works perfectly well in a non-clustered environment but has difficulties in the clustered environment. The problem occurs only sporadically and we were able to track the problem down to some Atlassian code. Here is a scenario that triggers the bug every time: Apache Webserver and both cluster nodes are restarted. User logs in. The Apache Webserver is configured using mod_jk so that every request from this user goes to the same (the first) cluster node. We shut down the first node and restart the Apache Webserver. The user's web browser is not closed. Because the Apache Webserver has been restarted, it sees only the second node. All requests are therefore directed to the second node. The second cluster node doesn't have an active session, because the user logged in on the first cluster node and the second cluster node was unused until now (there was only one user at all accessing the system). Therefore the second cluster node will also go through the single-sign-on process, attempting to log in the user according to the credentials available in the http headers. Now here is the problem: our plugin calls getUser (see the code fragment) which returns an incorrectly initialized user object. The "groups" field is empty which causes other problems further down. We have turned on all log4j debug traces and what we can see is that getUser never goes to the database for this particular user. The corresponding SQL log entries are missing. However when we log in as a different user (using our manual single sign on routine which prevents different credentials in the http request header) then everything is fine again. This particular scenario may look far-fetched but we have the same results if someone leaves his browser window open over night. The Apache Webserver will then purge or time-out its session information in the mod_jk handler so when the user continues using his confluence session next morning the Apache will direct him to the other cluster node with a probability of 50%, triggering the very same problem as presented above. My theory what happens is: Cluster node 1 tells cluster node 2 about the logged in users (the data accessed by getUser) but the data is somehow incomplete Cluster node 2 finds the incomplete data and uses it without refreshing it from the database If someone else logs in to cluster node 2 his data is not yet in the getUser data cache so it fetches it from the database. I was not able to reproduce this behavior using the same Confluence release on a Debian system with MySQL (the production system is a RedHat system with Oracle). Here is the code fragment: ... // we need to find one particular confluence user from his user.id property // this is much faster than iterating through a list of all users connection = session.connection(); preparedStatement = connection.prepareStatement("select users.NAME, OS_PROPERTYENTRY.STRING_VAL " + "from users, OS_PROPERTYENTRY " + "where OS_PROPERTYENTRY.ENTITY_KEY='user.id' and users.ID=OS_PROPERTYENTRY.ENTITY_ID and OS_PROPERTYENTRY.STRING_VAL=?"); preparedStatement.setString(1,id); rs = preparedStatement.executeQuery(); rs.next(); String username = rs.getString(1); // we have the user name at this point. If not, we would have an exception and won't reach this point LOG.debug("query result: " + username); rs.close(); rs = null; preparedStatement.close(); preparedStatement = null; // now get the confluence user from the user name. The problem with this code fragment is that it returns usually // a properly constructed user object, but sometimes we get an object with user.id = 0 and user.groups = null. // We can see from the debug traces that userAccessor.getUser will not retrieve the user data from the database in the latter case. UserAccessor userAccessor = (UserAccessor) com.atlassian.spring.container.ContainerManager.getComponent("userAccessor"); user = userAccessor.getUser(username.toLowerCase()); // the purpose of this cast is to get access to user.id and user.groups (which are not accessible from the User class) // and to zap the user object if it appears to be broken DefaultHibernateUser hUser = (DefaultHibernateUser)user; LOG.debug("user.name: " + hUser.getName()); LOG.debug("user.id: " + hUser.getId()); if (hUser.getGroups() == null) { LOG.debug("user.groups: NULL"); user = null; } else { LOG.debug("user.groups: NOT NULL"); tmpMap.put(id, user); } ... And here is the corresponding log4j output from the first cluster node. You can see that the user.groups field is filled (NOT NULL). During execution of this code there are also SELECTs on the Oracle database. [2008-04-26 16:04:29,504] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: > UserCache.lookUpById [2008-04-26 16:04:29,551] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: query result: arne.schirmacher@..... [2008-04-26 16:04:29,934] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.name: arne.schirmacher@..... [2008-04-26 16:04:29,935] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.id: 25630817 [2008-04-26 16:04:29,936] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.groups: NOT NULL [2008-04-26 16:04:29,937] [catalina-exec-1] INFO de.company.cf.login.UserCache: add 1 users to 0 existing users in map [2008-04-26 16:04:29,938] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: < UserCache.lookUpById And here is the corresponding log4j output from the second cluster node, after the first one has been shut down. There is no activity on the database. userAccessor.getUser returns a user object without retrieving it from the database, which is probably the reason why not all fields are filled. The only explanation is that this node thinks its user object is already complete. If I log in as another user it does retrieve it from the DB. So it must be caused somehow by the fact that the user was accessed by the first cluster node just a moment ago. Maybe the second cluster node has received a bad data object sent by the first cluster node. [2008-04-26 16:05:55,468] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: > UserCache.lookUpById [2008-04-26 16:05:55,519] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: query result: arne.schirmacher@..... [2008-04-26 16:05:55,529] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.name: arne.schirmacher@..... [2008-04-26 16:05:55,530] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.id: 0 [2008-04-26 16:05:55,531] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: user.groups: NULL [2008-04-26 16:05:55,533] [catalina-exec-1] INFO de.company.cf.login.UserCache: add 0 users to 0 existing users in map [2008-04-26 16:05:55,534] [catalina-exec-1] DEBUG de.company.cf.login.UserCache: < UserCache.lookUpById Please advise how we can force the second cluster node to always visit the database when accessing getUser, or suggest some other workaround.

            Anatoli added a comment -

            patched version: only use it for testing as it was not tested for production. Compiled with jdk1.5.

            Anatoli added a comment - patched version: only use it for testing as it was not tested for production. Compiled with jdk1.5.

            Anatoli added a comment - - edited

            Justin,

            After quite a bit of testing I suspect that the problem happens when the ids that are being generated for users (usernames) contain upper characters:
            String ssoGuid = attributes.get("ssoGuid");
            Confluence changes usernames to lowercase before storing them to db it also uses lower case usernames as cache keys. However when we try to retrieve a user from the cache we do not change the username to lowercase and given that the misses are cached too we get null back.

            I will attach the patched version of atlassian-bucket , can you please try to use it and see if it fixes your problem.

            Anatoli added a comment - - edited Justin, After quite a bit of testing I suspect that the problem happens when the ids that are being generated for users (usernames) contain upper characters: String ssoGuid = attributes.get("ssoGuid"); Confluence changes usernames to lowercase before storing them to db it also uses lower case usernames as cache keys. However when we try to retrieve a user from the cache we do not change the username to lowercase and given that the misses are cached too we get null back. I will attach the patched version of atlassian-bucket , can you please try to use it and see if it fixes your problem.

            Thanks Anatoli,
            Unfortunately, the new version of atlassian-user didn't fix the problem we're having. I logged in as a new user and received the same error message as before. Once I flushed the hibernateRepository.users cache, the new user was able to log in.

            I'm attaching all of our custom source code (custom_code.zip), as well as all our added jars and config files (confluence_confg_and_added_files.zip), and finally our tomcat config files (tomcat_and_other_confluence_config_files.zip).

            Thanks for your help!
            Justin

            Justin Sabelko added a comment - Thanks Anatoli, Unfortunately, the new version of atlassian-user didn't fix the problem we're having. I logged in as a new user and received the same error message as before. Once I flushed the hibernateRepository.users cache, the new user was able to log in. I'm attaching all of our custom source code (custom_code.zip), as well as all our added jars and config files (confluence_confg_and_added_files.zip), and finally our tomcat config files (tomcat_and_other_confluence_config_files.zip). Thanks for your help! Justin

            Anatoli added a comment -

            Justin,

            I did not have any luck reproducing the problem with Confluence 2.8. What version of Soulwing CAS do you use? I had to modify your code slightly because I could not find the right dependency. (can you please point to the right one).
            Can you also try using the new version of atlassian-user. We have fixed some caching problems in this version.

            Anatoli added a comment - Justin, I did not have any luck reproducing the problem with Confluence 2.8. What version of Soulwing CAS do you use? I had to modify your code slightly because I could not find the right dependency. (can you please point to the right one). Can you also try using the new version of atlassian-user . We have fixed some caching problems in this version.

            Justin Sabelko added a comment - - edited

            If I disable hibernate repository caching in the atlassian-user.xml file by changing this line...

            <hibernate name="Hibernate Repository" key="hibernateRepository" description="Hibernate Repository" cache="true"/>

            to

            <hibernate name="Hibernate Repository" key="hibernateRepository" description="Hibernate Repository"/>

            ...then it works. However that comes with a significant performance penalty, so I don't really want to disable the caching.

            Justin Sabelko added a comment - - edited If I disable hibernate repository caching in the atlassian-user.xml file by changing this line... <hibernate name="Hibernate Repository" key="hibernateRepository" description="Hibernate Repository" cache="true"/> to <hibernate name="Hibernate Repository" key="hibernateRepository" description="Hibernate Repository"/> ...then it works. However that comes with a significant performance penalty, so I don't really want to disable the caching.

            This also exists in 2.8.0. We really need this to be working as it significantly affects our first-time users...

            Justin Sabelko added a comment - This also exists in 2.8.0. We really need this to be working as it significantly affects our first-time users...

              shaffenden Steve Haffenden (Inactive)
              ktran Kevin Tran [Atlassian]
              Affected customers:
              7 This affects my team
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: