[BSERV-4631] Preserve user data upon loss of connection to external user directory

Type: Bug
Resolution: Fixed
Priority: High
Fix Version/s: 3.2.0
Affects Version/s: None
Component/s: User Management - LDAP
Labels:

Bug Fix Policy:
View Atlassian Server bug fix policy

Given this scenario:
LDAP > JIRA User Server > Stash

If the connection to LDAP is lost, when the directory comes back online the user must have the following data kept:

global permission
project permission
repository permissions
branch permissions
avatars
recently accessed repositories
SSH keys

In JIRA, for example, if a user from an external directory has been added to a group and the connection to that external directory is lost, when the directory is activated again, the original user is restored (with their prior group permissions).

has a regression in

BSERV-6774 Cleaning up users fail if there is a backlog of more than 100 users

Closed

is related to

BSERV-7119 Preserve group data upon loss of connection to external user directory

Closed

BSERV-5345 Permissions for deleted users have broken links

Closed

BSERV-8148 Preserve local group membership data upon loss of connection to external user directory

Gathering Impact

mentioned in: You do not have permission to create a repository within the project; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(10 mentioned in)

Form Name

Systems Team added a comment - 23/Feb/2015 6:26 PM

I am running Stash 3.5 configured to use Crowd 2.7.2 (with LDAP caching disabled). We had a period of Crowd downtime and Stash remembered Crowd user memberships but has deleted all Group memberships. Is there no delay in Group membership deletion before Stash starts deleting data?

Systems Team added a comment - 23/Feb/2015 6:26 PM I am running Stash 3.5 configured to use Crowd 2.7.2 (with LDAP caching disabled). We had a period of Crowd downtime and Stash remembered Crowd user memberships but has deleted all Group memberships. Is there no delay in Group membership deletion before Stash starts deleting data?

Robin Stocker (Inactive) added a comment - 24/Jul/2014 5:58 AM

Does this fix make an update to the audit log to show that a group or a user has been revoked with an explanation?

This fix did not change anything in the audit log. See STASH-5037 for that.

Also, is there any way to be alerted, or a log file pattern, to know when users or groups are being deleted.

Currently, only users that are deleted from the internal directory appear in the audit log, not those that are deleted because of synchronization.

How frequently will that cleanup job run?

It's currently configured to run every 6 hours. And only users that have been deleted for at least 7 days will be cleaned up.

Robin Stocker (Inactive) added a comment - 24/Jul/2014 5:58 AM Does this fix make an update to the audit log to show that a group or a user has been revoked with an explanation? This fix did not change anything in the audit log. See STASH-5037 for that. Also, is there any way to be alerted, or a log file pattern, to know when users or groups are being deleted. Currently, only users that are deleted from the internal directory appear in the audit log, not those that are deleted because of synchronization. How frequently will that cleanup job run? It's currently configured to run every 6 hours. And only users that have been deleted for at least 7 days will be cleaned up.

mike brosnan added a comment - 24/Jul/2014 4:56 AM

Does this fix make an update to the audit log to show that a group or a user has been revoked with an explanation?

Also, is there any way to be alerted, or a log file pattern, to know when users or groups are being deleted.

How frequently will that cleanup job run?

mike brosnan added a comment - 24/Jul/2014 4:56 AM Does this fix make an update to the audit log to show that a group or a user has been revoked with an explanation? Also, is there any way to be alerted, or a log file pattern, to know when users or groups are being deleted. How frequently will that cleanup job run?

Robin Stocker (Inactive) added a comment - 23/Jul/2014 2:21 AM

The fix for this is going to be in the next release, Stash 3.2.

What the fix does:

Instead of cleaning up the user data immediately when a user is deleted, it is marked for cleanup with a time stamp. A job that is run periodically will then pick up the users that have been deleted for a certain amount of time and remove the user data.

If a user deletion is undone within that time, its user data will be available as before the deletion.

The delay between a user being deleted and its data being cleaned up is long enough so that a temporary connection problem won't cause the loss of data anymore.

Robin Stocker (Inactive) added a comment - 23/Jul/2014 2:21 AM The fix for this is going to be in the next release, Stash 3.2. What the fix does: Instead of cleaning up the user data immediately when a user is deleted, it is marked for cleanup with a time stamp. A job that is run periodically will then pick up the users that have been deleted for a certain amount of time and remove the user data. If a user deletion is undone within that time, its user data will be available as before the deletion. The delay between a user being deleted and its data being cleaned up is long enough so that a temporary connection problem won't cause the loss of data anymore.

mike brosnan added a comment - 18/Jul/2014 9:46 AM

As well as user data loss there had to be a coherent way to recover.

The audit log does not match the current state, surely you can create a patch or api script to reinstate the permissions from the audit log

Mike Brosnan

mike brosnan added a comment - 18/Jul/2014 9:46 AM As well as user data loss there had to be a coherent way to recover. The audit log does not match the current state, surely you can create a patch or api script to reinstate the permissions from the audit log Mike Brosnan

Pete added a comment - 19/Jun/2014 2:16 PM

I found this issue when I had crowd configured NOT to cache results from our LDAP server.

This may well be a crowd problem more than a Stash issue my suggestion is to carefully review your crowd config (and make sure caching is switched on).

Also timeout's set to 0 (rather than 120) also helped me on LDAP directory lookups.

Pete added a comment - 19/Jun/2014 2:16 PM I found this issue when I had crowd configured NOT to cache results from our LDAP server. This may well be a crowd problem more than a Stash issue my suggestion is to carefully review your crowd config (and make sure caching is switched on). Also timeout's set to 0 (rather than 120) also helped me on LDAP directory lookups.

Andrew Teixeira added a comment - 19/Jun/2014 2:10 PM

To add to this, I believe we experienced this a couple of times on our Stash instance as well. All of our Stash projects/repos use permissions for users that are obtained via our Crowd instance. Twice we have come in with several tickets in our queue reporting that no one could connect to Stash any longer, via the web interface or via git-over-SSH using keys.

In both instances, we have found that we cannot log in with Crowd credentials anymore and need to use the backup "admin" local account. When we do, we find that all the users/groups have been removed from all repositories and projects, along with all the SSH keys from individual users. Our only recourse at that point is to restore a copy of the database from the previous day so that we don't need to tell all of our users to re-upload all their SSH keys again. The database restore also corrects all the missing user/group permissions.

We have also looked through the logs in depth and have not been able to find a silver bullet as to why this happened. Our only guess, as others above have stated, seems to be that Stash loses connection to Crowd, and for Stash that means "all users have been deleted" instead of "connection error". This then triggers the bolded stanza above which does a broad wipe of permissions and keys from the system.

As a side note, fixing the connection to Crowd is not an issue, as the Crowd connection fixes itself in this process. The reason Crowd authentication seems to "not work" in these situations is because the links (and just the links, not the Crowd groups themselves) to groups that grant permission in Stash have been removed from Stash properties. The failure mode seems to be:

1. Crowd connection loss.
2. Sweeping permissions/keys wipe
3. Crowd connection restored automatically on subsequent sync attempt

This problem really needs to be investigated and fixed or we will be forced to move our internal Git service to something else that is more reliable since we cannot keep suffering these IT "black eyes".

Andrew Teixeira added a comment - 19/Jun/2014 2:10 PM To add to this, I believe we experienced this a couple of times on our Stash instance as well. All of our Stash projects/repos use permissions for users that are obtained via our Crowd instance. Twice we have come in with several tickets in our queue reporting that no one could connect to Stash any longer, via the web interface or via git-over-SSH using keys. In both instances, we have found that we cannot log in with Crowd credentials anymore and need to use the backup "admin" local account. When we do, we find that all the users/groups have been removed from all repositories and projects, along with all the SSH keys from individual users. Our only recourse at that point is to restore a copy of the database from the previous day so that we don't need to tell all of our users to re-upload all their SSH keys again. The database restore also corrects all the missing user/group permissions. We have also looked through the logs in depth and have not been able to find a silver bullet as to why this happened. Our only guess, as others above have stated, seems to be that Stash loses connection to Crowd, and for Stash that means "all users have been deleted" instead of "connection error". This then triggers the bolded stanza above which does a broad wipe of permissions and keys from the system. As a side note, fixing the connection to Crowd is not an issue, as the Crowd connection fixes itself in this process. The reason Crowd authentication seems to "not work" in these situations is because the links (and just the links, not the Crowd groups themselves) to groups that grant permission in Stash have been removed from Stash properties. The failure mode seems to be: 1. Crowd connection loss. 2. Sweeping permissions/keys wipe 3. Crowd connection restored automatically on subsequent sync attempt This problem really needs to be investigated and fixed or we will be forced to move our internal Git service to something else that is more reliable since we cannot keep suffering these IT "black eyes".

Pete added a comment - 30/Apr/2014 10:42 AM

Hi Stefan,

We use Crowd and it seems on a semi-regular basis (can be once a week), the connection is lost to Crowd (for some reason), and all our services. JIRA, Confluence, Stash and Bamboo for most all that happens is the avatar's become questions marks until those people log back into the server.

HOWEVER on Stash the SSH key is lost which is very annoying (to be clear we DON'T delete or touch the user), it just seems there is a bug somewhere and as much as I like back tracing 500 lines of Java error's I don't have the time to try figuring out what the problem is.

We use the latest version of Crowd and Stash.

If STASH-3412 was fixed this bug wouldn't be a problem.

If this is a crowd issue, please place the bug in the Crowd database for me and place the link in here so others can follow it's progress (or lack thereof).

Cheers
Pete

Pete added a comment - 30/Apr/2014 10:42 AM Hi Stefan, We use Crowd and it seems on a semi-regular basis (can be once a week), the connection is lost to Crowd (for some reason), and all our services. JIRA, Confluence, Stash and Bamboo for most all that happens is the avatar's become questions marks until those people log back into the server. HOWEVER on Stash the SSH key is lost which is very annoying (to be clear we DON'T delete or touch the user), it just seems there is a bug somewhere and as much as I like back tracing 500 lines of Java error's I don't have the time to try figuring out what the problem is. We use the latest version of Crowd and Stash. If STASH-3412 was fixed this bug wouldn't be a problem. If this is a crowd issue, please place the bug in the Crowd database for me and place the link in here so others can follow it's progress (or lack thereof). Cheers Pete

Stefan Saasen (Inactive) added a comment - 28/Apr/2014 4:13 AM

Hi Olivier,

sorry to hear about your trouble. As outlined on https://confluence.atlassian.com/display/STASH/Users+and+groups

You can delete a user or group from Stash's internal user directory, or the external directory from which Stash sources users, such as an LDAP, Crowd or JIRA server.

When a user or group is deleted from such a directory, Stash checks to see if that user still exists in another directory:

If the user or group does not exist in another directory, Stash assumes the intent was to permanently delete them, and we delete the users permissions, SSH keys and 'rememberme' tokens.

So this should only happen if the user was deleted in the parent directory. The summary mentions that only the connection was lost between JIRA and the LDAP server, is the connection loss the only thing that lead to the users disappearing?

Related, user generated content like pull requests and comments won't be deleted when users are deleted from the system.

Stefan Saasen (Inactive) added a comment - 28/Apr/2014 4:13 AM Hi Olivier, sorry to hear about your trouble. As outlined on https://confluence.atlassian.com/display/STASH/Users+and+groups You can delete a user or group from Stash's internal user directory, or the external directory from which Stash sources users, such as an LDAP, Crowd or JIRA server. When a user or group is deleted from such a directory, Stash checks to see if that user still exists in another directory: If the user or group does not exist in another directory, Stash assumes the intent was to permanently delete them, and we delete the users permissions, SSH keys and 'rememberme' tokens. So this should only happen if the user was deleted in the parent directory. The summary mentions that only the connection was lost between JIRA and the LDAP server, is the connection loss the only thing that lead to the users disappearing? Related, user generated content like pull requests and comments won't be deleted when users are deleted from the system.

Olivier Ozoux added a comment - 22/Apr/2014 6:44 PM

To add to that, when the user is no longer available, it's "Disabled" in JIRA and Confluence. In Stash the user is "Deleted". This means that any custom permissions and settings, including the users public keys are completely deleted from the system. There is also no notification to that effect, so it's as if that user had never been setup.

We haven't tested what happens to a user's pull-requests, comments and other data that could be associated with that account. This behavior is a serious problem for us, because it creates a large amount of forensic work to go find out what permissions the newly restored user had, and manually re-create it.

This became a show-stopper last weekend when a problem with our LDAP server caused 500+ users to disappear for a few hours. When the regained access, Confluence and JIRA was fine, but Stash permissions for close to 100 users had to be rebuilt on more than 50 projects. And that's not including the SSH problem, where we've had to request that each user uploads their public key again, since we can't do it for them.

I would definitely rate this higher than a "minor" priority, since Stash is loosing data, without a known workaround

Olivier Ozoux added a comment - 22/Apr/2014 6:44 PM To add to that, when the user is no longer available, it's "Disabled" in JIRA and Confluence. In Stash the user is "Deleted". This means that any custom permissions and settings, including the users public keys are completely deleted from the system. There is also no notification to that effect, so it's as if that user had never been setup. We haven't tested what happens to a user's pull-requests, comments and other data that could be associated with that account. This behavior is a serious problem for us, because it creates a large amount of forensic work to go find out what permissions the newly restored user had, and manually re-create it. This became a show-stopper last weekend when a problem with our LDAP server caused 500+ users to disappear for a few hours. When the regained access, Confluence and JIRA was fine, but Stash permissions for close to 100 users had to be rebuilt on more than 50 projects. And that's not including the SSH problem, where we've had to request that each user uploads their public key again, since we can't do it for them. I would definitely rate this higher than a "minor" priority, since Stash is loosing data, without a known workaround

Assignee:: Robin Stocker (Inactive)

Reporter:: Daniel R

Affected customers:: 7 This affects my team

Watchers:: 22 Start watching this issue

Created:: 22/Apr/2014 4:51 PM

Updated:: 06/Mar/2025 5:18 PM

Resolved:: 22/Dec/2016 10:35 PM

Details

Description

Attachments

Issue Links

Forms

Activity

Collapse comment: Systems Team added a comment - 23/Feb/2015 6:26 PM

Expand comment: Systems Team added a comment - 23/Feb/2015 6:26 PM

Collapse comment: Robin Stocker (Inactive) added a comment - 24/Jul/2014 5:58 AM

Expand comment: Robin Stocker (Inactive) added a comment - 24/Jul/2014 5:58 AM

Collapse comment: mike brosnan added a comment - 24/Jul/2014 4:56 AM

Expand comment: mike brosnan added a comment - 24/Jul/2014 4:56 AM

Collapse comment: Robin Stocker (Inactive) added a comment - 23/Jul/2014 2:21 AM

Expand comment: Robin Stocker (Inactive) added a comment - 23/Jul/2014 2:21 AM

Collapse comment: mike brosnan added a comment - 18/Jul/2014 9:46 AM

Expand comment: mike brosnan added a comment - 18/Jul/2014 9:46 AM

Collapse comment: Pete added a comment - 19/Jun/2014 2:16 PM

Expand comment: Pete added a comment - 19/Jun/2014 2:16 PM

Collapse comment: Andrew Teixeira added a comment - 19/Jun/2014 2:10 PM

Expand comment: Andrew Teixeira added a comment - 19/Jun/2014 2:10 PM

Collapse comment: Pete added a comment - 30/Apr/2014 10:42 AM

Expand comment: Pete added a comment - 30/Apr/2014 10:42 AM

Collapse comment: Stefan Saasen (Inactive) added a comment - 28/Apr/2014 4:13 AM

Expand comment: Stefan Saasen (Inactive) added a comment - 28/Apr/2014 4:13 AM

Collapse comment: Olivier Ozoux added a comment - 22/Apr/2014 6:44 PM

Expand comment: Olivier Ozoux added a comment - 22/Apr/2014 6:44 PM

People

Dates

Backbone Issue Sync