I have a similar problem using getUser, but this is with Confluence 2.5.8 Cluster, Tomcat 6.0, Java 1.5, Oracle 9
We are using our own single-sign-on plugin which works perfectly well in a non-clustered environment but has difficulties in the clustered environment. The problem occurs only sporadically and we were able to track the problem down to some Atlassian code.
Here is a scenario that triggers the bug every time:
- Apache Webserver and both cluster nodes are restarted.
- User logs in. The Apache Webserver is configured using mod_jk so that every request from this user goes to the same (the first) cluster node.
- We shut down the first node and restart the Apache Webserver. The user's web browser is not closed.
- Because the Apache Webserver has been restarted, it sees only the second node. All requests are therefore directed to the second node.
- The second cluster node doesn't have an active session, because the user logged in on the first cluster node and the second cluster node was unused until now (there was only one user at all accessing the system).
- Therefore the second cluster node will also go through the single-sign-on process, attempting to log in the user according to the credentials available in the http headers.
- Now here is the problem: our plugin calls getUser (see the code fragment) which returns an incorrectly initialized user object. The "groups" field is empty which causes other problems further down.
- We have turned on all log4j debug traces and what we can see is that getUser never goes to the database for this particular user. The corresponding SQL log entries are missing. However when we log in as a different user (using our manual single sign on routine which prevents different credentials in the http request header) then everything is fine again.
This particular scenario may look far-fetched but we have the same results if someone leaves his browser window open over night. The Apache Webserver will then purge or time-out its session information in the mod_jk handler so when the user continues using his confluence session next morning the Apache will direct him to the other cluster node with a probability of 50%, triggering the very same problem as presented above.
My theory what happens is:
- Cluster node 1 tells cluster node 2 about the logged in users (the data accessed by getUser) but the data is somehow incomplete
- Cluster node 2 finds the incomplete data and uses it without refreshing it from the database
- If someone else logs in to cluster node 2 his data is not yet in the getUser data cache so it fetches it from the database.
I was not able to reproduce this behavior using the same Confluence release on a Debian system with MySQL (the production system is a RedHat system with Oracle).
Here is the code fragment:
And here is the corresponding log4j output from the first cluster node. You can see that the user.groups field is filled (NOT NULL). During execution of this code there are also SELECTs on the Oracle database.
And here is the corresponding log4j output from the second cluster node, after the first one has been shut down. There is no activity on the database. userAccessor.getUser returns a user object without retrieving it from the database, which is probably the reason why not all fields are filled. The only explanation is that this node thinks its user object is already complete. If I log in as another user it does retrieve it from the DB. So it must be caused somehow by the fact that the user was accessed by the first cluster node just a moment ago. Maybe the second cluster node has received a bad data object sent by the first cluster node.
Please advise how we can force the second cluster node to always visit the database when accessing getUser, or suggest some other workaround.
Hi There
Thanks for taking the time to raise this issue. As you are no doubt aware this has been on our backlog now for quite some time with no resolution forthcoming. Due to the age and inactivity I'm going to close this issue as won't fix. I believe this better reflects the status of this issue.
Regards
Steve Haffenden
Confluence Bugmaster