Uploaded image for project: 'Crowd Data Center'
  1. Crowd Data Center
  2. CWD-3769

Adding a token to database causes transaction to lock indefinitely

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Medium Medium
    • 2.7.2
    • 2.7.1
    • Database
    • None

      Steps to reproduce

      1. Bring up Crowd using HSQLDB 2.3.0 (this may be reproducible with Postgres 8.4, to be confirmed).
      2. Set up Crowd to use an Embedded database.
      3. Continue through the setup process.

      Expected result

      Crowd setup process completes and user is presented with a login screen.

      Observed result

      Crowd hangs after the last screen of the setup process, and never presents the login screen. Crowd becomes unresponsive.

      Workaround 1

      At this point the workaround is to switch the session storage to in-memory from database storage:
      Session Configuration

      The in-memory storage is explained in more detail on the page linked above.

      If you cannot keep crowd up long enough to access the Administration console, or crowd will simply not finish starting up, follow these steps to modify the value directly in the database:

      1. Shutdown crowd - ensure the pid is stopped
      2. Perform a database backup
      3. Connect to the database
      4. Execute this sql:
        update cwd_property 
        set property_value='false'
        where property_name='database.token.storage.enabled';
      5. Restart Crowd
      6. Validate the session storage is now in-memory by Navigating to Administration > Session Config

      Workaround 2

      If workaround 1 does not help, you may try this. However, please note that performing this workaround will force users to login again.

      1. Shutdown crowd - ensure the pid is stopped
      2. Perform a database backup
      3. Connect to the database
      4. Execute this sql:
        delete from cwd_token;
      5. Restart Crowd
      6. Validate the session storage is now in-memory by Navigating to Administration > Session Config

            [CWD-3769] Adding a token to database causes transaction to lock indefinitely

            rgundersen those logs should not be anything to worry about; the error should be from an inner transaction failing, and Crowd should recover and continue to use the outer transaction. Unfortunately, we don't have an easy way of silencing those scary messages without also silencing other, possibly meaningful, errors.

            If you have a problem with Crowd being unstable, please raise a ticket with support, as we still believe this issue is resolved.

            Caspar Krieger (Inactive) added a comment - rgundersen those logs should not be anything to worry about; the error should be from an inner transaction failing, and Crowd should recover and continue to use the outer transaction. Unfortunately, we don't have an easy way of silencing those scary messages without also silencing other, possibly meaningful, errors. If you have a problem with Crowd being unstable, please raise a ticket with support , as we still believe this issue is resolved.

            I think we are getting this same error, and we are using 2.7.2.

            I get this error in our logs:

            2015-03-09 13:55:21,050 http-bio-8445-exec-217 WARN [engine.jdbc.spi.SqlExceptionHelper] SQL Error: 1062, SQLState: 23000
            2015-03-09 13:55:21,050 http-bio-8445-exec-217 ERROR [engine.jdbc.spi.SqlExceptionHelper] Duplicate entry 'lG70v4BxxOuh0EBuvwfhzA00' for key 'uk_token_id_hash'
            2015-03-09 13:55:21,050 http-bio-8445-exec-217 ERROR [jdbc.batch.internal.BatchingBatch] HHH000315: Exception executing batch [could not perform addBatch]
            

            We're using MySQL and we're storing the tokens in the database (not in-memory). Since it's working (intermittently) I have not tried any workarounds but it might be worth investigating a bit more.

            Zuber Khursiwala added a comment - I think we are getting this same error, and we are using 2.7.2. I get this error in our logs: 2015-03-09 13:55:21,050 http-bio-8445-exec-217 WARN [engine.jdbc.spi.SqlExceptionHelper] SQL Error: 1062, SQLState: 23000 2015-03-09 13:55:21,050 http-bio-8445-exec-217 ERROR [engine.jdbc.spi.SqlExceptionHelper] Duplicate entry 'lG70v4BxxOuh0EBuvwfhzA00' for key 'uk_token_id_hash' 2015-03-09 13:55:21,050 http-bio-8445-exec-217 ERROR [jdbc.batch.internal.BatchingBatch] HHH000315: Exception executing batch [could not perform addBatch] We're using MySQL and we're storing the tokens in the database (not in-memory). Since it's working (intermittently) I have not tried any workarounds but it might be worth investigating a bit more.

            We upgraded to version 2.7.1 last night and ran into this issue this morning. Crowd was up last night but would not authenticate users this morning. We fixed the issue by trying workaround #1. There has been so confusion on this issue right from version 2.7.0. Can someone confirm for sure that this issue is in fact resolved on 2.7.2? Also, when is 2.7.2 coming out?
            Thanks

            Adhip Pokharel added a comment - We upgraded to version 2.7.1 last night and ran into this issue this morning. Crowd was up last night but would not authenticate users this morning. We fixed the issue by trying workaround #1. There has been so confusion on this issue right from version 2.7.0. Can someone confirm for sure that this issue is in fact resolved on 2.7.2? Also, when is 2.7.2 coming out? Thanks

            Thank you for your patience with this issue.

            We traced the cause of the problem to how Crowd 2.7.1 uses concurrent database transactions and application-level locking. As you would expect in a problem of this kind, we discover that crashes could occur due to a combination of factors. In Crowd 2.7.2 we have changed the configuration of transactions to ensure that a single request does not create more than one independent transaction at any time. We also fixed and simplified the locking mechanism around the token management. Our testing indicates that these changes have fixed the stability issues.

            If you used the 1st Workaround described above and switched to in-memory token storage, you may want to switch back to database token storage after you upgrade to the upcoming Crowd 2.7.2 release.

            Please open a support ticket if you still experience problems after the upgrade to Crowd 2.7.2. Thank you.

            Diego Berrueta added a comment - Thank you for your patience with this issue. We traced the cause of the problem to how Crowd 2.7.1 uses concurrent database transactions and application-level locking. As you would expect in a problem of this kind, we discover that crashes could occur due to a combination of factors. In Crowd 2.7.2 we have changed the configuration of transactions to ensure that a single request does not create more than one independent transaction at any time. We also fixed and simplified the locking mechanism around the token management. Our testing indicates that these changes have fixed the stability issues. If you used the 1st Workaround described above and switched to in-memory token storage, you may want to switch back to database token storage after you upgrade to the upcoming Crowd 2.7.2 release. Please open a support ticket if you still experience problems after the upgrade to Crowd 2.7.2. Thank you.

            Just as a note, this halted our whole group of services for ~1 day. Everything worked fine for weeks, then suddenly everything started to freeze. Using PostgreSQL 9.3. Workaround seems to have fixed the issue.

            Risto Yrjänä added a comment - Just as a note, this halted our whole group of services for ~1 day. Everything worked fine for weeks, then suddenly everything started to freeze. Using PostgreSQL 9.3. Workaround seems to have fixed the issue.

            Hi Jim,

            Sorry for the delay in replying here! We definitely don't recommend removing FKs from the database. They exist to ensure your data is consistent and robust, and we find we run into many problems when they don't exist (eg when someone is on MySQL with MyISAM and therefore has no foreign keys).

            Is this the foreign key you removed?

            ALTER TABLE cwd_user
              ADD CONSTRAINT fk_user_dir_id FOREIGN KEY (directory_id)
                  REFERENCES cwd_directory (id) MATCH SIMPLE
                  ON UPDATE NO ACTION ON DELETE NO ACTION;
            

            This foreign key exists to ensure that all entries in cwd_user are from a valid directory - this will stop you having orphaned user records, or unintentional duplicates.

            Moving to in memory storage is a much more reliable and less risky workaround and should completely solve the deadlock problem, so I'd recommend doing that only.

            -Denise
            Atlassian Support

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - Hi Jim, Sorry for the delay in replying here! We definitely don't recommend removing FKs from the database. They exist to ensure your data is consistent and robust, and we find we run into many problems when they don't exist (eg when someone is on MySQL with MyISAM and therefore has no foreign keys). Is this the foreign key you removed? ALTER TABLE cwd_user ADD CONSTRAINT fk_user_dir_id FOREIGN KEY (directory_id) REFERENCES cwd_directory (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION; This foreign key exists to ensure that all entries in cwd_user are from a valid directory - this will stop you having orphaned user records, or unintentional duplicates. Moving to in memory storage is a much more reliable and less risky workaround and should completely solve the deadlock problem, so I'd recommend doing that only. -Denise Atlassian Support

            FWIW, we removed the foreign key from table dbo.cwd_user and all of the deadlocks have ceased, and performance has been very good. This resolved the issue for us.

            Jim Pickering added a comment - FWIW, we removed the foreign key from table dbo.cwd_user and all of the deadlocks have ceased, and performance has been very good. This resolved the issue for us.

            @Jim Pickering - Thanks for the heads up!

            Ben Cameron added a comment - @Jim Pickering - Thanks for the heads up!

            @Ben Cameron - Steve Ruiz's variation of Workaround 1 worked for us too; SQL Server 2008 R2 and Crowd 2.7.1. It worked, in that we could use the software, login, etc., Crowd wasn't freezing or crashing. However, we still were getting deadlocks. If you still see deadlocks, try removing the foreign key constraint from table dbo.cwd_user. So far this has resolved all of the deadlocks for us, although we haven't tried switching back to database storage of authentication tokens yet; still using in-memory storage.

            Curious if anyone from Atlassian recommends against removing the foreign key constraint from table dbo.cwd_user.

            Jim Pickering added a comment - @Ben Cameron - Steve Ruiz's variation of Workaround 1 worked for us too; SQL Server 2008 R2 and Crowd 2.7.1. It worked, in that we could use the software, login, etc., Crowd wasn't freezing or crashing. However, we still were getting deadlocks. If you still see deadlocks, try removing the foreign key constraint from table dbo.cwd_user. So far this has resolved all of the deadlocks for us, although we haven't tried switching back to database storage of authentication tokens yet; still using in-memory storage. Curious if anyone from Atlassian recommends against removing the foreign key constraint from table dbo.cwd_user.

            I had this issue on MS SQL Server 2012 Express and Crowd 2.7.1. Steve Ruiz's variation of Workaround 1 worked for me.

            Ben Cameron added a comment - I had this issue on MS SQL Server 2012 Express and Crowd 2.7.1. Steve Ruiz's variation of Workaround 1 worked for me.

              Unassigned Unassigned
              dberrueta Diego Berrueta
              Affected customers:
              38 This affects my team
              Watchers:
              71 Start watching this issue

                Created:
                Updated:
                Resolved: