[CWD-3769] Adding a token to database causes transaction to lock indefinitely

Type: Bug
Resolution: Fixed
Priority: Medium
Fix Version/s: 2.7.2
Affects Version/s: 2.7.1
Component/s: Database
Labels:
None

Bug Fix Policy:
View Atlassian Server bug fix policy

Steps to reproduce

Bring up Crowd using HSQLDB 2.3.0 (this may be reproducible with Postgres 8.4, to be confirmed).
Set up Crowd to use an Embedded database.
Continue through the setup process.

Expected result

Crowd setup process completes and user is presented with a login screen.

Observed result

Crowd hangs after the last screen of the setup process, and never presents the login screen. Crowd becomes unresponsive.

Workaround 1

At this point the workaround is to switch the session storage to in-memory from database storage:
Session Configuration

The in-memory storage is explained in more detail on the page linked above.

If you cannot keep crowd up long enough to access the Administration console, or crowd will simply not finish starting up, follow these steps to modify the value directly in the database:

Shutdown crowd - ensure the pid is stopped
Perform a database backup
Connect to the database

Execute this sql:

update cwd_property 
set property_value='false'
where property_name='database.token.storage.enabled';

Restart Crowd
Validate the session storage is now in-memory by Navigating to Administration > Session Config

Workaround 2

If workaround 1 does not help, you may try this. However, please note that performing this workaround will force users to login again.

Shutdown crowd - ensure the pid is stopped
Perform a database backup
Connect to the database
Execute this sql:
```
delete from cwd_token;
```
Restart Crowd
Validate the session storage is now in-memory by Navigating to Administration > Session Config

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

td2.txt
63 kB
20/Jan/2014 5:56 AM

causes

CWD-3768 A failure in a single DB connection causes deadlock in Crowd

Closed

relates to

CWD-3482 Upgrade to H2 for evaluation and tests

Closed

was split from

CWD-3692 Crowd freezes under heavy load

Closed

mentioned in: Page Failed to load; Page Failed to load; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

requirement of: ADM-50903 Loading...

(8 mentioned in, 1 requirement of)

Caspar Krieger (Inactive) added a comment - 22/Apr/2015 5:31 AM

rgundersen those logs should not be anything to worry about; the error should be from an inner transaction failing, and Crowd should recover and continue to use the outer transaction. Unfortunately, we don't have an easy way of silencing those scary messages without also silencing other, possibly meaningful, errors.

If you have a problem with Crowd being unstable, please raise a ticket with support, as we still believe this issue is resolved.

Caspar Krieger (Inactive) added a comment - 22/Apr/2015 5:31 AM rgundersen those logs should not be anything to worry about; the error should be from an inner transaction failing, and Crowd should recover and continue to use the outer transaction. Unfortunately, we don't have an easy way of silencing those scary messages without also silencing other, possibly meaningful, errors. If you have a problem with Crowd being unstable, please raise a ticket with support , as we still believe this issue is resolved.

Zuber Khursiwala added a comment - 09/Mar/2015 2:16 PM

I think we are getting this same error, and we are using 2.7.2.

I get this error in our logs:

2015-03-09 13:55:21,050 http-bio-8445-exec-217 WARN [engine.jdbc.spi.SqlExceptionHelper] SQL Error: 1062, SQLState: 23000
2015-03-09 13:55:21,050 http-bio-8445-exec-217 ERROR [engine.jdbc.spi.SqlExceptionHelper] Duplicate entry 'lG70v4BxxOuh0EBuvwfhzA00' for key 'uk_token_id_hash'
2015-03-09 13:55:21,050 http-bio-8445-exec-217 ERROR [jdbc.batch.internal.BatchingBatch] HHH000315: Exception executing batch [could not perform addBatch]

We're using MySQL and we're storing the tokens in the database (not in-memory). Since it's working (intermittently) I have not tried any workarounds but it might be worth investigating a bit more.

Zuber Khursiwala added a comment - 09/Mar/2015 2:16 PM I think we are getting this same error, and we are using 2.7.2. I get this error in our logs: 2015-03-09 13:55:21,050 http-bio-8445-exec-217 WARN [engine.jdbc.spi.SqlExceptionHelper] SQL Error: 1062, SQLState: 23000 2015-03-09 13:55:21,050 http-bio-8445-exec-217 ERROR [engine.jdbc.spi.SqlExceptionHelper] Duplicate entry 'lG70v4BxxOuh0EBuvwfhzA00' for key 'uk_token_id_hash' 2015-03-09 13:55:21,050 http-bio-8445-exec-217 ERROR [jdbc.batch.internal.BatchingBatch] HHH000315: Exception executing batch [could not perform addBatch] We're using MySQL and we're storing the tokens in the database (not in-memory). Since it's working (intermittently) I have not tried any workarounds but it might be worth investigating a bit more.

Adhip Pokharel added a comment - 25/Apr/2014 3:25 PM

We upgraded to version 2.7.1 last night and ran into this issue this morning. Crowd was up last night but would not authenticate users this morning. We fixed the issue by trying workaround #1. There has been so confusion on this issue right from version 2.7.0. Can someone confirm for sure that this issue is in fact resolved on 2.7.2? Also, when is 2.7.2 coming out?
Thanks

Adhip Pokharel added a comment - 25/Apr/2014 3:25 PM We upgraded to version 2.7.1 last night and ran into this issue this morning. Crowd was up last night but would not authenticate users this morning. We fixed the issue by trying workaround #1. There has been so confusion on this issue right from version 2.7.0. Can someone confirm for sure that this issue is in fact resolved on 2.7.2? Also, when is 2.7.2 coming out? Thanks

Diego Berrueta added a comment - 23/Apr/2014 6:58 AM

Thank you for your patience with this issue.

We traced the cause of the problem to how Crowd 2.7.1 uses concurrent database transactions and application-level locking. As you would expect in a problem of this kind, we discover that crashes could occur due to a combination of factors. In Crowd 2.7.2 we have changed the configuration of transactions to ensure that a single request does not create more than one independent transaction at any time. We also fixed and simplified the locking mechanism around the token management. Our testing indicates that these changes have fixed the stability issues.

If you used the 1st Workaround described above and switched to in-memory token storage, you may want to switch back to database token storage after you upgrade to the upcoming Crowd 2.7.2 release.

Please open a support ticket if you still experience problems after the upgrade to Crowd 2.7.2. Thank you.

Diego Berrueta added a comment - 23/Apr/2014 6:58 AM Thank you for your patience with this issue. We traced the cause of the problem to how Crowd 2.7.1 uses concurrent database transactions and application-level locking. As you would expect in a problem of this kind, we discover that crashes could occur due to a combination of factors. In Crowd 2.7.2 we have changed the configuration of transactions to ensure that a single request does not create more than one independent transaction at any time. We also fixed and simplified the locking mechanism around the token management. Our testing indicates that these changes have fixed the stability issues. If you used the 1st Workaround described above and switched to in-memory token storage, you may want to switch back to database token storage after you upgrade to the upcoming Crowd 2.7.2 release. Please open a support ticket if you still experience problems after the upgrade to Crowd 2.7.2. Thank you.

Risto Yrjänä added a comment - 10/Apr/2014 3:08 PM

Just as a note, this halted our whole group of services for ~1 day. Everything worked fine for weeks, then suddenly everything started to freeze. Using PostgreSQL 9.3. Workaround seems to have fixed the issue.

Risto Yrjänä added a comment - 10/Apr/2014 3:08 PM Just as a note, this halted our whole group of services for ~1 day. Everything worked fine for weeks, then suddenly everything started to freeze. Using PostgreSQL 9.3. Workaround seems to have fixed the issue.

Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 17/Mar/2014 3:04 AM

Hi Jim,

Sorry for the delay in replying here! We definitely don't recommend removing FKs from the database. They exist to ensure your data is consistent and robust, and we find we run into many problems when they don't exist (eg when someone is on MySQL with MyISAM and therefore has no foreign keys).

Is this the foreign key you removed?

ALTER TABLE cwd_user
  ADD CONSTRAINT fk_user_dir_id FOREIGN KEY (directory_id)
      REFERENCES cwd_directory (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION;

This foreign key exists to ensure that all entries in cwd_user are from a valid directory - this will stop you having orphaned user records, or unintentional duplicates.

Moving to in memory storage is a much more reliable and less risky workaround and should completely solve the deadlock problem, so I'd recommend doing that only.

-Denise
Atlassian Support

Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 17/Mar/2014 3:04 AM Hi Jim, Sorry for the delay in replying here! We definitely don't recommend removing FKs from the database. They exist to ensure your data is consistent and robust, and we find we run into many problems when they don't exist (eg when someone is on MySQL with MyISAM and therefore has no foreign keys). Is this the foreign key you removed? ALTER TABLE cwd_user ADD CONSTRAINT fk_user_dir_id FOREIGN KEY (directory_id) REFERENCES cwd_directory (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION; This foreign key exists to ensure that all entries in cwd_user are from a valid directory - this will stop you having orphaned user records, or unintentional duplicates. Moving to in memory storage is a much more reliable and less risky workaround and should completely solve the deadlock problem, so I'd recommend doing that only. -Denise Atlassian Support

Jim Pickering added a comment - 14/Mar/2014 2:07 PM

FWIW, we removed the foreign key from table dbo.cwd_user and all of the deadlocks have ceased, and performance has been very good. This resolved the issue for us.

Jim Pickering added a comment - 14/Mar/2014 2:07 PM FWIW, we removed the foreign key from table dbo.cwd_user and all of the deadlocks have ceased, and performance has been very good. This resolved the issue for us.

Ben Cameron added a comment - 11/Mar/2014 8:32 PM

@Jim Pickering - Thanks for the heads up!

Ben Cameron added a comment - 11/Mar/2014 8:32 PM @Jim Pickering - Thanks for the heads up!

Jim Pickering added a comment - 11/Mar/2014 8:23 PM

@Ben Cameron - Steve Ruiz's variation of Workaround 1 worked for us too; SQL Server 2008 R2 and Crowd 2.7.1. It worked, in that we could use the software, login, etc., Crowd wasn't freezing or crashing. However, we still were getting deadlocks. If you still see deadlocks, try removing the foreign key constraint from table dbo.cwd_user. So far this has resolved all of the deadlocks for us, although we haven't tried switching back to database storage of authentication tokens yet; still using in-memory storage.

Curious if anyone from Atlassian recommends against removing the foreign key constraint from table dbo.cwd_user.

Jim Pickering added a comment - 11/Mar/2014 8:23 PM @Ben Cameron - Steve Ruiz's variation of Workaround 1 worked for us too; SQL Server 2008 R2 and Crowd 2.7.1. It worked, in that we could use the software, login, etc., Crowd wasn't freezing or crashing. However, we still were getting deadlocks. If you still see deadlocks, try removing the foreign key constraint from table dbo.cwd_user. So far this has resolved all of the deadlocks for us, although we haven't tried switching back to database storage of authentication tokens yet; still using in-memory storage. Curious if anyone from Atlassian recommends against removing the foreign key constraint from table dbo.cwd_user.

Ben Cameron added a comment - 11/Mar/2014 8:13 PM

I had this issue on MS SQL Server 2012 Express and Crowd 2.7.1. Steve Ruiz's variation of Workaround 1 worked for me.

Ben Cameron added a comment - 11/Mar/2014 8:13 PM I had this issue on MS SQL Server 2012 Express and Crowd 2.7.1. Steve Ruiz's variation of Workaround 1 worked for me.

Assignee:: Unassigned

Reporter:: Diego Berrueta

Affected customers:: 38 This affects my team

Watchers:: 71 Start watching this issue

Created:: 20/Jan/2014 5:47 AM

Updated:: 22/Jan/2025 5:29 PM

Resolved:: 23/Apr/2014 6:22 AM

Details

Description

Steps to reproduce

Expected result

Observed result

Workaround 1

Workaround 2

Attachments

Attachments

Issue Links

Forms

Activity

Collapse comment: Caspar Krieger (Inactive) added a comment - 22/Apr/2015 5:31 AM

Expand comment: Caspar Krieger (Inactive) added a comment - 22/Apr/2015 5:31 AM

Collapse comment: Zuber Khursiwala added a comment - 09/Mar/2015 2:16 PM

Expand comment: Zuber Khursiwala added a comment - 09/Mar/2015 2:16 PM

Collapse comment: Adhip Pokharel added a comment - 25/Apr/2014 3:25 PM

Expand comment: Adhip Pokharel added a comment - 25/Apr/2014 3:25 PM

Collapse comment: Diego Berrueta added a comment - 23/Apr/2014 6:58 AM

Expand comment: Diego Berrueta added a comment - 23/Apr/2014 6:58 AM

Collapse comment: Risto Yrjänä added a comment - 10/Apr/2014 3:08 PM

Expand comment: Risto Yrjänä added a comment - 10/Apr/2014 3:08 PM

Collapse comment: Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 17/Mar/2014 3:04 AM

Expand comment: Denise Unterwurzacher [Atlassian] (Inactive) added a comment - 17/Mar/2014 3:04 AM

Collapse comment: Jim Pickering added a comment - 14/Mar/2014 2:07 PM

Expand comment: Jim Pickering added a comment - 14/Mar/2014 2:07 PM

Collapse comment: Ben Cameron added a comment - 11/Mar/2014 8:32 PM

Expand comment: Ben Cameron added a comment - 11/Mar/2014 8:32 PM

Collapse comment: Jim Pickering added a comment - 11/Mar/2014 8:23 PM

Expand comment: Jim Pickering added a comment - 11/Mar/2014 8:23 PM

Collapse comment: Ben Cameron added a comment - 11/Mar/2014 8:13 PM

Expand comment: Ben Cameron added a comment - 11/Mar/2014 8:13 PM

People

Dates