Uploaded image for project: 'Bitbucket Data Center'
  1. Bitbucket Data Center
  2. BSERV-4631

Preserve user data upon loss of connection to external user directory

      Given this scenario:
      LDAP > JIRA User Server > Stash

      If the connection to LDAP is lost, when the directory comes back online the user must have the following data kept:

      • global permission
      • project permission
      • repository permissions
      • branch permissions
      • avatars
      • recently accessed repositories
      • SSH keys

      In JIRA, for example, if a user from an external directory has been added to a group and the connection to that external directory is lost, when the directory is activated again, the original user is restored (with their prior group permissions).

          Form Name

            [BSERV-4631] Preserve user data upon loss of connection to external user directory

            I am running Stash 3.5 configured to use Crowd 2.7.2 (with LDAP caching disabled). We had a period of Crowd downtime and Stash remembered Crowd user memberships but has deleted all Group memberships. Is there no delay in Group membership deletion before Stash starts deleting data?

            Systems Team added a comment - I am running Stash 3.5 configured to use Crowd 2.7.2 (with LDAP caching disabled). We had a period of Crowd downtime and Stash remembered Crowd user memberships but has deleted all Group memberships. Is there no delay in Group membership deletion before Stash starts deleting data?

            Does this fix make an update to the audit log to show that a group or a user has been revoked with an explanation?

            This fix did not change anything in the audit log. See STASH-5037 for that.

            Also, is there any way to be alerted, or a log file pattern, to know when users or groups are being deleted.

            Currently, only users that are deleted from the internal directory appear in the audit log, not those that are deleted because of synchronization.

            How frequently will that cleanup job run?

            It's currently configured to run every 6 hours. And only users that have been deleted for at least 7 days will be cleaned up.

            Robin Stocker (Inactive) added a comment - Does this fix make an update to the audit log to show that a group or a user has been revoked with an explanation? This fix did not change anything in the audit log. See STASH-5037 for that. Also, is there any way to be alerted, or a log file pattern, to know when users or groups are being deleted. Currently, only users that are deleted from the internal directory appear in the audit log, not those that are deleted because of synchronization. How frequently will that cleanup job run? It's currently configured to run every 6 hours. And only users that have been deleted for at least 7 days will be cleaned up.

            Does this fix make an update to the audit log to show that a group or a user has been revoked with an explanation?

            Also, is there any way to be alerted, or a log file pattern, to know when users or groups are being deleted.

            How frequently will that cleanup job run?

            mike brosnan added a comment - Does this fix make an update to the audit log to show that a group or a user has been revoked with an explanation? Also, is there any way to be alerted, or a log file pattern, to know when users or groups are being deleted. How frequently will that cleanup job run?

            The fix for this is going to be in the next release, Stash 3.2.

            What the fix does:

            Instead of cleaning up the user data immediately when a user is deleted, it is marked for cleanup with a time stamp. A job that is run periodically will then pick up the users that have been deleted for a certain amount of time and remove the user data.

            If a user deletion is undone within that time, its user data will be available as before the deletion.

            The delay between a user being deleted and its data being cleaned up is long enough so that a temporary connection problem won't cause the loss of data anymore.

            Robin Stocker (Inactive) added a comment - The fix for this is going to be in the next release, Stash 3.2. What the fix does: Instead of cleaning up the user data immediately when a user is deleted, it is marked for cleanup with a time stamp. A job that is run periodically will then pick up the users that have been deleted for a certain amount of time and remove the user data. If a user deletion is undone within that time, its user data will be available as before the deletion. The delay between a user being deleted and its data being cleaned up is long enough so that a temporary connection problem won't cause the loss of data anymore.

            As well as user data loss there had to be a coherent way to recover.

            The audit log does not match the current state, surely you can create a patch or api script to reinstate the permissions from the audit log

            Mike Brosnan

            mike brosnan added a comment - As well as user data loss there had to be a coherent way to recover. The audit log does not match the current state, surely you can create a patch or api script to reinstate the permissions from the audit log Mike Brosnan

            Pete added a comment -

            I found this issue when I had crowd configured NOT to cache results from our LDAP server.

            This may well be a crowd problem more than a Stash issue my suggestion is to carefully review your crowd config (and make sure caching is switched on).

            Also timeout's set to 0 (rather than 120) also helped me on LDAP directory lookups.

            Pete added a comment - I found this issue when I had crowd configured NOT to cache results from our LDAP server. This may well be a crowd problem more than a Stash issue my suggestion is to carefully review your crowd config (and make sure caching is switched on). Also timeout's set to 0 (rather than 120) also helped me on LDAP directory lookups.

            To add to this, I believe we experienced this a couple of times on our Stash instance as well. All of our Stash projects/repos use permissions for users that are obtained via our Crowd instance. Twice we have come in with several tickets in our queue reporting that no one could connect to Stash any longer, via the web interface or via git-over-SSH using keys.

            In both instances, we have found that we cannot log in with Crowd credentials anymore and need to use the backup "admin" local account. When we do, we find that all the users/groups have been removed from all repositories and projects, along with all the SSH keys from individual users. Our only recourse at that point is to restore a copy of the database from the previous day so that we don't need to tell all of our users to re-upload all their SSH keys again. The database restore also corrects all the missing user/group permissions.

            We have also looked through the logs in depth and have not been able to find a silver bullet as to why this happened. Our only guess, as others above have stated, seems to be that Stash loses connection to Crowd, and for Stash that means "all users have been deleted" instead of "connection error". This then triggers the bolded stanza above which does a broad wipe of permissions and keys from the system.

            As a side note, fixing the connection to Crowd is not an issue, as the Crowd connection fixes itself in this process. The reason Crowd authentication seems to "not work" in these situations is because the links (and just the links, not the Crowd groups themselves) to groups that grant permission in Stash have been removed from Stash properties. The failure mode seems to be:

            1. Crowd connection loss.
            2. Sweeping permissions/keys wipe
            3. Crowd connection restored automatically on subsequent sync attempt

            This problem really needs to be investigated and fixed or we will be forced to move our internal Git service to something else that is more reliable since we cannot keep suffering these IT "black eyes".

            Andrew Teixeira added a comment - To add to this, I believe we experienced this a couple of times on our Stash instance as well. All of our Stash projects/repos use permissions for users that are obtained via our Crowd instance. Twice we have come in with several tickets in our queue reporting that no one could connect to Stash any longer, via the web interface or via git-over-SSH using keys. In both instances, we have found that we cannot log in with Crowd credentials anymore and need to use the backup "admin" local account. When we do, we find that all the users/groups have been removed from all repositories and projects, along with all the SSH keys from individual users. Our only recourse at that point is to restore a copy of the database from the previous day so that we don't need to tell all of our users to re-upload all their SSH keys again. The database restore also corrects all the missing user/group permissions. We have also looked through the logs in depth and have not been able to find a silver bullet as to why this happened. Our only guess, as others above have stated, seems to be that Stash loses connection to Crowd, and for Stash that means "all users have been deleted" instead of "connection error". This then triggers the bolded stanza above which does a broad wipe of permissions and keys from the system. As a side note, fixing the connection to Crowd is not an issue, as the Crowd connection fixes itself in this process. The reason Crowd authentication seems to "not work" in these situations is because the links (and just the links, not the Crowd groups themselves) to groups that grant permission in Stash have been removed from Stash properties. The failure mode seems to be: 1. Crowd connection loss. 2. Sweeping permissions/keys wipe 3. Crowd connection restored automatically on subsequent sync attempt This problem really needs to be investigated and fixed or we will be forced to move our internal Git service to something else that is more reliable since we cannot keep suffering these IT "black eyes".

            Pete added a comment -

            Hi Stefan,

            We use Crowd and it seems on a semi-regular basis (can be once a week), the connection is lost to Crowd (for some reason), and all our services. JIRA, Confluence, Stash and Bamboo for most all that happens is the avatar's become questions marks until those people log back into the server.

            HOWEVER on Stash the SSH key is lost which is very annoying (to be clear we DON'T delete or touch the user), it just seems there is a bug somewhere and as much as I like back tracing 500 lines of Java error's I don't have the time to try figuring out what the problem is.

            We use the latest version of Crowd and Stash.

            If STASH-3412 was fixed this bug wouldn't be a problem.

            If this is a crowd issue, please place the bug in the Crowd database for me and place the link in here so others can follow it's progress (or lack thereof).

            Cheers
            Pete

            Pete added a comment - Hi Stefan, We use Crowd and it seems on a semi-regular basis (can be once a week), the connection is lost to Crowd (for some reason), and all our services. JIRA, Confluence, Stash and Bamboo for most all that happens is the avatar's become questions marks until those people log back into the server. HOWEVER on Stash the SSH key is lost which is very annoying (to be clear we DON'T delete or touch the user), it just seems there is a bug somewhere and as much as I like back tracing 500 lines of Java error's I don't have the time to try figuring out what the problem is. We use the latest version of Crowd and Stash. If STASH-3412 was fixed this bug wouldn't be a problem. If this is a crowd issue, please place the bug in the Crowd database for me and place the link in here so others can follow it's progress (or lack thereof). Cheers Pete

            Hi Olivier,

            sorry to hear about your trouble. As outlined on https://confluence.atlassian.com/display/STASH/Users+and+groups

            You can delete a user or group from Stash's internal user directory, or the external directory from which Stash sources users, such as an LDAP, Crowd or JIRA server.

            When a user or group is deleted from such a directory, Stash checks to see if that user still exists in another directory:

            • If the user or group does not exist in another directory, Stash assumes the intent was to permanently delete them, and we delete the users permissions, SSH keys and 'rememberme' tokens.

            So this should only happen if the user was deleted in the parent directory. The summary mentions that only the connection was lost between JIRA and the LDAP server, is the connection loss the only thing that lead to the users disappearing?

            Related, user generated content like pull requests and comments won't be deleted when users are deleted from the system.

            Stefan Saasen (Inactive) added a comment - Hi Olivier, sorry to hear about your trouble. As outlined on https://confluence.atlassian.com/display/STASH/Users+and+groups You can delete a user or group from Stash's internal user directory, or the external directory from which Stash sources users, such as an LDAP, Crowd or JIRA server. When a user or group is deleted from such a directory, Stash checks to see if that user still exists in another directory: If the user or group does not exist in another directory, Stash assumes the intent was to permanently delete them, and we delete the users permissions, SSH keys and 'rememberme' tokens. So this should only happen if the user was deleted in the parent directory. The summary mentions that only the connection was lost between JIRA and the LDAP server, is the connection loss the only thing that lead to the users disappearing? Related, user generated content like pull requests and comments won't be deleted when users are deleted from the system.

            To add to that, when the user is no longer available, it's "Disabled" in JIRA and Confluence. In Stash the user is "Deleted". This means that any custom permissions and settings, including the users public keys are completely deleted from the system. There is also no notification to that effect, so it's as if that user had never been setup.

            We haven't tested what happens to a user's pull-requests, comments and other data that could be associated with that account. This behavior is a serious problem for us, because it creates a large amount of forensic work to go find out what permissions the newly restored user had, and manually re-create it.

            This became a show-stopper last weekend when a problem with our LDAP server caused 500+ users to disappear for a few hours. When the regained access, Confluence and JIRA was fine, but Stash permissions for close to 100 users had to be rebuilt on more than 50 projects. And that's not including the SSH problem, where we've had to request that each user uploads their public key again, since we can't do it for them.

            I would definitely rate this higher than a "minor" priority, since Stash is loosing data, without a known workaround

            Olivier Ozoux added a comment - To add to that, when the user is no longer available, it's "Disabled" in JIRA and Confluence. In Stash the user is "Deleted". This means that any custom permissions and settings, including the users public keys are completely deleted from the system. There is also no notification to that effect, so it's as if that user had never been setup. We haven't tested what happens to a user's pull-requests, comments and other data that could be associated with that account. This behavior is a serious problem for us, because it creates a large amount of forensic work to go find out what permissions the newly restored user had, and manually re-create it. This became a show-stopper last weekend when a problem with our LDAP server caused 500+ users to disappear for a few hours. When the regained access, Confluence and JIRA was fine, but Stash permissions for close to 100 users had to be rebuilt on more than 50 projects. And that's not including the SSH problem, where we've had to request that each user uploads their public key again, since we can't do it for them. I would definitely rate this higher than a "minor" priority, since Stash is loosing data, without a known workaround

              rstocker Robin Stocker (Inactive)
              drohan Daniel R
              Affected customers:
              7 This affects my team
              Watchers:
              22 Start watching this issue

                Created:
                Updated:
                Resolved: