[CWD-3768] A failure in a single DB connection causes deadlock in Crowd

Diego Berrueta added a comment - 15/May/2014 5:29 AM

Hi danilo.tuler, we're sorry to hear that you're having problems with 2.7.2. I've noticed that you've opened ~~CWD-3915~~. The cause of your problems with 2.7.2 seems to be unrelated with this issue (~~CWD-3768~~).

Diego Berrueta added a comment - 15/May/2014 5:29 AM Hi danilo.tuler , we're sorry to hear that you're having problems with 2.7.2. I've noticed that you've opened CWD-3915 . The cause of your problems with 2.7.2 seems to be unrelated with this issue ( CWD-3768 ).

Danilo Tuler added a comment - 15/May/2014 5:16 AM

2.7.2 is as unstable as 2.7.1 for me. Even with tokens in memory.
And it seems it's affecting JIRA and others atlassian apps heavily.

Danilo Tuler added a comment - 15/May/2014 5:16 AM 2.7.2 is as unstable as 2.7.1 for me. Even with tokens in memory. And it seems it's affecting JIRA and others atlassian apps heavily.

Diego Berrueta added a comment - 23/Apr/2014 7:22 AM

Thank you for your patience,

As described at https://jira.atlassian.com/browse/CWD-3769?focusedCommentId=591918&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-591918, we have simplified the locking around the token storage, making it impossible for an unresponsive database connection to hold essential resources and cause the whole server to freeze. We have also changed the transaction model to eliminate the deadlocks. We believe these changes will fix the problem described in this issue. They will be part of the upcoming Crowd 2.7.2 release.

Nevertheless, as a best practice to improve resilience against unexpected failures, we still recommend setting socket timeouts in your JDBC driver, and transaction timeouts in your database server. Please check the documentation of your database to configure timeouts.

If you still experience deadlocks and stability problems after the upgrade to the upcoming Crowd 2.7.2 release, please open a support ticket. Thank you.

Diego Berrueta added a comment - 23/Apr/2014 7:22 AM Thank you for your patience, As described at https://jira.atlassian.com/browse/CWD-3769?focusedCommentId=591918&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-591918 , we have simplified the locking around the token storage, making it impossible for an unresponsive database connection to hold essential resources and cause the whole server to freeze. We have also changed the transaction model to eliminate the deadlocks. We believe these changes will fix the problem described in this issue. They will be part of the upcoming Crowd 2.7.2 release. Nevertheless, as a best practice to improve resilience against unexpected failures, we still recommend setting socket timeouts in your JDBC driver, and transaction timeouts in your database server. Please check the documentation of your database to configure timeouts. If you still experience deadlocks and stability problems after the upgrade to the upcoming Crowd 2.7.2 release, please open a support ticket. Thank you.

Jim Pickering added a comment - 14/Mar/2014 2:06 PM

FWIW, we removed the foreign key from table dbo.cwd_user and all of the deadlocks have ceased, and performance has been very good. This resolved the issue for us.

Jim Pickering added a comment - 14/Mar/2014 2:06 PM FWIW, we removed the foreign key from table dbo.cwd_user and all of the deadlocks have ceased, and performance has been very good. This resolved the issue for us.

Jim Pickering added a comment - 06/Mar/2014 7:16 PM

Using SQL Server 2008 R2 64-bit, when using the sqljdbc4.jar driver, Crowd didn't crash whether or not we used database storage of authentication tokens.

When we switched to the jtds-1.2.7.jar driver, since all of our other Atlassian tools are using it, Crowd crashed. Crowd and all of our Atlassian applications became unresponsive too. I had to use a SQL script to turn off database storage of authentication tokens, restore the sqljdbc4.jar driver in the crowd config, and restart the server to get everything back online.

Jim Pickering added a comment - 06/Mar/2014 7:16 PM Using SQL Server 2008 R2 64-bit, when using the sqljdbc4.jar driver, Crowd didn't crash whether or not we used database storage of authentication tokens. When we switched to the jtds-1.2.7.jar driver, since all of our other Atlassian tools are using it, Crowd crashed. Crowd and all of our Atlassian applications became unresponsive too. I had to use a SQL script to turn off database storage of authentication tokens, restore the sqljdbc4.jar driver in the crowd config, and restart the server to get everything back online.

Jim Pickering added a comment - 06/Mar/2014 4:39 PM

Would appreciate an Update comment from Atlassian on this issue, as the last comment was nearly one month ago. How close is a resolution to this issue?

This issue seems Critical to me, rather than Major unless Major is the highest priority. Crowd is the nucleus of all Atlassian Tools, this needs to be resolved ASAP.

We just upgraded Crowd from 2.4.2 to 2.7.1 on Windows Server 2008 R2 using SQL Server 2008 R2 both 64-bit and can confirm the comments made previous to this one: Moving Authentication Token Storage to "Memory Cache" did not help.

The only reason I upgraded was to get all of the Atlassian tools using Java 7, otherwise version 2.4.2 wasn't having any issues. I sort of regret the upgrade, due to this bug. I will check Atlassian's Jira issues prior to upgrading in the future.

We are getting deadlocks, and it is filling up our SQL logs, however it is not crashing Crowd. Sessions are timing out frequently though, among the Atlassian applications( Jira, Confluence, Fisheye, etc. ) requiring our users to re-login frequently, and also saving documents in Confluence are erroring out, due to losing Session, when they just logged in.

Thanks.

Jim Pickering added a comment - 06/Mar/2014 4:39 PM Would appreciate an Update comment from Atlassian on this issue, as the last comment was nearly one month ago. How close is a resolution to this issue? This issue seems Critical to me, rather than Major unless Major is the highest priority. Crowd is the nucleus of all Atlassian Tools, this needs to be resolved ASAP. We just upgraded Crowd from 2.4.2 to 2.7.1 on Windows Server 2008 R2 using SQL Server 2008 R2 both 64-bit and can confirm the comments made previous to this one: Moving Authentication Token Storage to "Memory Cache" did not help. The only reason I upgraded was to get all of the Atlassian tools using Java 7, otherwise version 2.4.2 wasn't having any issues. I sort of regret the upgrade, due to this bug. I will check Atlassian's Jira issues prior to upgrading in the future. We are getting deadlocks, and it is filling up our SQL logs, however it is not crashing Crowd. Sessions are timing out frequently though, among the Atlassian applications( Jira, Confluence, Fisheye, etc. ) requiring our users to re-login frequently, and also saving documents in Confluence are erroring out, due to losing Session, when they just logged in. Thanks.

Jaco van Tonder added a comment - 17/Feb/2014 12:59 PM

We get the following on the database side (Postgres 9.3)

 16387 | crowd      | 62073 |    16396 | crowd      |                  | 172.17.2.178 |                 |       40219 | 2014-02-10 12:00:00.057242+02 | 2014-02-10 12:01:00.607062+02 | 2014-02-10 12:01:00.699423+02 | 2014-02-10 12:01:00.699961+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
 16387 | crowd      | 34633 |    16396 | crowd      |                  | 172.17.2.178 |                 |       38868 | 2014-02-10 11:15:47.146698+02 | 2014-02-10 11:16:01.402503+02 | 2014-02-10 11:16:01.405518+02 | 2014-02-10 11:16:01.405521+02 | t       | active              | insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
 16387 | crowd      | 25933 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41748 | 2014-02-10 12:51:59.465915+02 |                               | 2014-02-10 12:52:34.667549+02 | 2014-02-10 12:52:34.667726+02 | f       | idle                |  DISCARD ALL
 16387 | crowd      | 34635 |    16396 | crowd      |                  | 172.17.2.178 |                 |       38870 | 2014-02-10 11:15:48.357651+02 | 2014-02-10 11:16:01.02871+02  | 2014-02-10 11:16:01.356168+02 | 2014-02-10 11:16:01.35656+02  | f       | idle in transaction | delete from cwd_token where id=$1
 16387 | crowd      | 25934 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41749 | 2014-02-10 12:51:59.47671+02  |                               | 2014-02-10 12:52:34.668734+02 | 2014-02-10 12:52:34.668846+02 | f       | idle                |  DISCARD ALL
 16387 | crowd      | 25935 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41750 | 2014-02-10 12:51:59.477418+02 | 2014-02-10 12:52:11.390548+02 | 2014-02-10 12:52:11.466483+02 | 2014-02-10 12:52:11.466827+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
 16387 | crowd      | 62074 |    16396 | crowd      |                  | 172.17.2.178 |                 |       40220 | 2014-02-10 12:00:00.058206+02 |                               | 2014-02-10 12:00:00.063045+02 | 2014-02-10 12:00:00.063565+02 | f       | idle                | SHOW TRANSACTION ISOLATION LEVEL
 16387 | crowd      | 35507 |    16396 | crowd      |                  | 172.17.2.178 |                 |       38975 | 2014-02-10 11:19:49.546673+02 | 2014-02-10 11:20:20.105722+02 | 2014-02-10 11:20:20.195675+02 | 2014-02-10 11:20:20.196117+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
 16387 | crowd      | 35508 |    16396 | crowd      |                  | 172.17.2.178 |                 |       38976 | 2014-02-10 11:19:49.58113+02  | 2014-02-10 11:19:55.930264+02 | 2014-02-10 11:19:56.006547+02 | 2014-02-10 11:19:56.006703+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
 16387 | crowd      | 26809 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41855 | 2014-02-10 12:56:00.286072+02 | 2014-02-10 12:56:00.313195+02 | 2014-02-10 12:56:00.395825+02 | 2014-02-10 12:56:00.396254+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
 16387 | crowd      | 45089 |    16396 | crowd      |                  | 172.17.2.178 |                 |       39215 | 2014-02-10 11:24:56.00944+02  | 2014-02-10 11:24:56.032012+02 | 2014-02-10 11:24:56.126233+02 | 2014-02-10 11:24:56.126891+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
 16387 | crowd      | 26983 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41887 | 2014-02-10 12:57:11.439214+02 |                               | 2014-02-10 12:59:59.807766+02 | 2014-02-10 12:59:59.80853+02  | f       | idle                | SHOW TRANSACTION ISOLATION LEVEL
 16387 | crowd      | 26985 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41888 | 2014-02-10 12:57:11.45502+02  | 2014-02-10 12:57:11.465899+02 | 2014-02-10 12:57:11.553381+02 | 2014-02-10 12:57:11.553848+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
 16387 | crowd      | 61210 |    16396 | crowd      |                  | 172.17.2.178 |                 |       40118 | 2014-02-10 11:56:00.500355+02 | 2014-02-10 11:56:00.525963+02 | 2014-02-10 11:56:00.608745+02 | 2014-02-10 11:56:00.60894+02  | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
 16387 | crowd      | 26986 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41889 | 2014-02-10 12:57:11.456641+02 | 2014-02-10 12:57:21.703485+02 | 2014-02-10 12:57:21.777868+02 | 2014-02-10 12:57:21.77803+02  | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
 16387 | crowd      | 27361 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41940 | 2014-02-10 12:59:08.022304+02 | 2014-02-10 13:01:00.357251+02 | 2014-02-10 13:01:00.43968+02  | 2014-02-10 13:01:00.44007+02  | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2

The deadlock is caused by pid 34633.

---Jaco

Jaco van Tonder added a comment - 17/Feb/2014 12:59 PM We get the following on the database side (Postgres 9.3) 16387 | crowd | 62073 | 16396 | crowd | | 172.17.2.178 | | 40219 | 2014-02-10 12:00:00.057242+02 | 2014-02-10 12:01:00.607062+02 | 2014-02-10 12:01:00.699423+02 | 2014-02-10 12:01:00.699961+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 34633 | 16396 | crowd | | 172.17.2.178 | | 38868 | 2014-02-10 11:15:47.146698+02 | 2014-02-10 11:16:01.402503+02 | 2014-02-10 11:16:01.405518+02 | 2014-02-10 11:16:01.405521+02 | t | active | insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) 16387 | crowd | 25933 | 16396 | crowd | | 172.17.2.178 | | 41748 | 2014-02-10 12:51:59.465915+02 | | 2014-02-10 12:52:34.667549+02 | 2014-02-10 12:52:34.667726+02 | f | idle | DISCARD ALL 16387 | crowd | 34635 | 16396 | crowd | | 172.17.2.178 | | 38870 | 2014-02-10 11:15:48.357651+02 | 2014-02-10 11:16:01.02871+02 | 2014-02-10 11:16:01.356168+02 | 2014-02-10 11:16:01.35656+02 | f | idle in transaction | delete from cwd_token where id=$1 16387 | crowd | 25934 | 16396 | crowd | | 172.17.2.178 | | 41749 | 2014-02-10 12:51:59.47671+02 | | 2014-02-10 12:52:34.668734+02 | 2014-02-10 12:52:34.668846+02 | f | idle | DISCARD ALL 16387 | crowd | 25935 | 16396 | crowd | | 172.17.2.178 | | 41750 | 2014-02-10 12:51:59.477418+02 | 2014-02-10 12:52:11.390548+02 | 2014-02-10 12:52:11.466483+02 | 2014-02-10 12:52:11.466827+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 62074 | 16396 | crowd | | 172.17.2.178 | | 40220 | 2014-02-10 12:00:00.058206+02 | | 2014-02-10 12:00:00.063045+02 | 2014-02-10 12:00:00.063565+02 | f | idle | SHOW TRANSACTION ISOLATION LEVEL 16387 | crowd | 35507 | 16396 | crowd | | 172.17.2.178 | | 38975 | 2014-02-10 11:19:49.546673+02 | 2014-02-10 11:20:20.105722+02 | 2014-02-10 11:20:20.195675+02 | 2014-02-10 11:20:20.196117+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 35508 | 16396 | crowd | | 172.17.2.178 | | 38976 | 2014-02-10 11:19:49.58113+02 | 2014-02-10 11:19:55.930264+02 | 2014-02-10 11:19:56.006547+02 | 2014-02-10 11:19:56.006703+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 26809 | 16396 | crowd | | 172.17.2.178 | | 41855 | 2014-02-10 12:56:00.286072+02 | 2014-02-10 12:56:00.313195+02 | 2014-02-10 12:56:00.395825+02 | 2014-02-10 12:56:00.396254+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 45089 | 16396 | crowd | | 172.17.2.178 | | 39215 | 2014-02-10 11:24:56.00944+02 | 2014-02-10 11:24:56.032012+02 | 2014-02-10 11:24:56.126233+02 | 2014-02-10 11:24:56.126891+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 26983 | 16396 | crowd | | 172.17.2.178 | | 41887 | 2014-02-10 12:57:11.439214+02 | | 2014-02-10 12:59:59.807766+02 | 2014-02-10 12:59:59.80853+02 | f | idle | SHOW TRANSACTION ISOLATION LEVEL 16387 | crowd | 26985 | 16396 | crowd | | 172.17.2.178 | | 41888 | 2014-02-10 12:57:11.45502+02 | 2014-02-10 12:57:11.465899+02 | 2014-02-10 12:57:11.553381+02 | 2014-02-10 12:57:11.553848+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 61210 | 16396 | crowd | | 172.17.2.178 | | 40118 | 2014-02-10 11:56:00.500355+02 | 2014-02-10 11:56:00.525963+02 | 2014-02-10 11:56:00.608745+02 | 2014-02-10 11:56:00.60894+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 26986 | 16396 | crowd | | 172.17.2.178 | | 41889 | 2014-02-10 12:57:11.456641+02 | 2014-02-10 12:57:21.703485+02 | 2014-02-10 12:57:21.777868+02 | 2014-02-10 12:57:21.77803+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 27361 | 16396 | crowd | | 172.17.2.178 | | 41940 | 2014-02-10 12:59:08.022304+02 | 2014-02-10 13:01:00.357251+02 | 2014-02-10 13:01:00.43968+02 | 2014-02-10 13:01:00.44007+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 The deadlock is caused by pid 34633. ---Jaco

Steve Ruiz added a comment - 12/Feb/2014 7:55 PM

I tried adding ?socketTimeout=30 to my jdbc url, but still doesn't work. This is a fresh/brand new install of crowd. I went through the installation wizard, logged in after that, and was able to use it. After I restarted (tried to add a plugin jar), I have not been able to login since - this is a server with zero activity on it, other than me trying to login as an admin. I see postgres doing "INSERT waiting" and another "idle in transaction".

status "idle in transaction" is: delete from cwd_token where id=$1
active query is: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_d
ate, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)

Steve Ruiz added a comment - 12/Feb/2014 7:55 PM I tried adding ?socketTimeout=30 to my jdbc url, but still doesn't work. This is a fresh/brand new install of crowd. I went through the installation wizard, logged in after that, and was able to use it. After I restarted (tried to add a plugin jar), I have not been able to login since - this is a server with zero activity on it, other than me trying to login as an admin. I see postgres doing "INSERT waiting" and another "idle in transaction". status "idle in transaction" is: delete from cwd_token where id=$1 active query is: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_d ate, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)

Theo Barker added a comment - 07/Feb/2014 3:42 PM

Moving Authentication Token Storage to "Memory Cache" did not help.

Theo Barker added a comment - 07/Feb/2014 3:42 PM Moving Authentication Token Storage to "Memory Cache" did not help.

Polar IS Europe added a comment - 07/Feb/2014 11:57 AM

Just to add to the comments already here. Out fresh evaluation setup consists of:

Centos 6 + OpenJDK 1.7.0
PostgreSQL 9.2
Crowd 2.7.1

After clean new installation & setup with completely unloaded server and no data, we are unable to log back in. Crowd hangs while trying to log in with PostgreSQL showing: postgres: crowd crowd_db_01 127.0.0.1(33930) INSERT waiting

And the following in logs:

ERROR: duplicate key value violates unique constraint "uk_token_id_hash"
DETAIL: Key (identifier_hash)=(ggFyZinu0tfC85Ccyz4fRA00) already exists.
STATEMENT: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
LOG: could not send data to client: Broken pipe
FATAL: connection to client lost

Adding socket timeout to the connection URL did not help.

Polar IS Europe added a comment - 07/Feb/2014 11:57 AM Just to add to the comments already here. Out fresh evaluation setup consists of: Centos 6 + OpenJDK 1.7.0 PostgreSQL 9.2 Crowd 2.7.1 After clean new installation & setup with completely unloaded server and no data, we are unable to log back in. Crowd hangs while trying to log in with PostgreSQL showing: postgres: crowd crowd_db_01 127.0.0.1(33930) INSERT waiting And the following in logs: ERROR: duplicate key value violates unique constraint "uk_token_id_hash" DETAIL: Key (identifier_hash)=(ggFyZinu0tfC85Ccyz4fRA00) already exists. STATEMENT: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) LOG: could not send data to client: Broken pipe FATAL: connection to client lost Adding socket timeout to the connection URL did not help.

Theo Barker added a comment - 06/Feb/2014 10:09 PM

Further, Crowd v2.7.0. While we've seen the PostgreSQL log error referenced in the first message for a couple of instances of Crowd logging the "Directory 'xxxx' is not functional during authentication of 'uuuuu'. Skipped." messages. However, we have many more instances of Crowd logging the "Directory...not functional" errors.

Theo Barker added a comment - 06/Feb/2014 10:09 PM Further, Crowd v2.7.0. While we've seen the PostgreSQL log error referenced in the first message for a couple of instances of Crowd logging the "Directory 'xxxx' is not functional during authentication of 'uuuuu'. Skipped." messages. However, we have many more instances of Crowd logging the "Directory...not functional" errors.

Theo Barker added a comment - 06/Feb/2014 9:56 PM

We're seeing this on a lightly loaded server.
Config: Ubuntu 12.04.4 LTS (GNU/Linux 3.2.0-56-generic x86_64), 8GB RAM, 4 CPU (Intel Xeon X5675 @ 3.07GHz), PostgreSQL 9.1.11-0ubuntu, VMware vSphere 5.5, hosting JIRA, Crowd & Confluence on same machine. Thus we do have three applications all accessing the same database engine on the same machine on which they are running. Considering our loading, should not be a problem.

Theo Barker added a comment - 06/Feb/2014 9:56 PM We're seeing this on a lightly loaded server. Config: Ubuntu 12.04.4 LTS (GNU/Linux 3.2.0-56-generic x86_64), 8GB RAM, 4 CPU (Intel Xeon X5675 @ 3.07GHz), PostgreSQL 9.1.11-0ubuntu, VMware vSphere 5.5, hosting JIRA, Crowd & Confluence on same machine. Thus we do have three applications all accessing the same database engine on the same machine on which they are running. Considering our loading, should not be a problem.

Peter Hudec added a comment - 04/Feb/2014 7:00 AM

I will try to use the socketTimeout during off-peak hours.
Maybe it would be good to ship the CROWD with the bundled JRE and the INSTALLER as the other Atlassian product /jira, confluence/.

Peter Hudec added a comment - 04/Feb/2014 7:00 AM I will try to use the socketTimeout during off-peak hours. Maybe it would be good to ship the CROWD with the bundled JRE and the INSTALLER as the other Atlassian product /jira, confluence/.

Peter Hudec added a comment - 04/Feb/2014 6:55 AM

Hi,

we experience this issue after migrating to the new server.
The old one was Ubuntu Hardy /8.04/, aka postgresql 8.4 and java6. New OS is Debian Wheezy /amd64/, latest java 1.7 form Oracle a 9.1 postgresql. We got the deadlock right after the first attempt to login.

Peter Hudec added a comment - 04/Feb/2014 6:55 AM Hi, we experience this issue after migrating to the new server. The old one was Ubuntu Hardy /8.04/, aka postgresql 8.4 and java6. New OS is Debian Wheezy /amd64/, latest java 1.7 form Oracle a 9.1 postgresql. We got the deadlock right after the first attempt to login.

Diego Berrueta added a comment - 23/Jan/2014 9:56 PM - edited

We're investigating this issue. If anyone is experiencing server crashes and is using Postgres, we suggest you modify the JDBC connection URL in crowd.cfg.xml to add the parameter ?socketTimeout=30. For instance, in my case it looks like:

<property name="hibernate.connection.url">jdbc:postgresql://localhost:5432/crowd?socketTimeout=30</property>

Please let us know if that improves the stability of the server. Thank you.

Diego Berrueta added a comment - 23/Jan/2014 9:56 PM - edited We're investigating this issue. If anyone is experiencing server crashes and is using Postgres, we suggest you modify the JDBC connection URL in crowd.cfg.xml to add the parameter ?socketTimeout=30 . For instance, in my case it looks like: <property name= "hibernate.connection.url" >jdbc:postgresql: //localhost:5432/crowd?socketTimeout=30</property> Please let us know if that improves the stability of the server. Thank you.

Diego Berrueta added a comment - 17/Jan/2014 4:04 AM

This issue has similar effects to ~~CWD-3692~~ (Crowd freezes), but different causes. In particular, it is not essential for this issue to have a high load in the Crowd server. Once the situation described in the "Symptoms"/"Steps to reproduce" section above happens, Crowd will eventually crash after some time.

This issue also has some resemblances to ~~CWD-3568~~. In particular, we have observed the line "ERROR: duplicate key value violates unique constraint "cwd_token_identifier_hash_key" STATEMENT: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)" in the Postgres logs just before the "LOG: could not send data to client: Broken pipe". The effects are quite different: ~~CWD-3568~~ never caused a server crash, just some request to fail.

The "LOG: could not send data to client: Broken pipe" line was also seen in ~~CWD-3495~~.

Diego Berrueta added a comment - 17/Jan/2014 4:04 AM This issue has similar effects to CWD-3692 (Crowd freezes), but different causes. In particular, it is not essential for this issue to have a high load in the Crowd server. Once the situation described in the "Symptoms"/"Steps to reproduce" section above happens, Crowd will eventually crash after some time. This issue also has some resemblances to CWD-3568 . In particular, we have observed the line "ERROR: duplicate key value violates unique constraint "cwd_token_identifier_hash_key" STATEMENT: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)" in the Postgres logs just before the "LOG: could not send data to client: Broken pipe". The effects are quite different: CWD-3568 never caused a server crash, just some request to fail. The "LOG: could not send data to client: Broken pipe" line was also seen in CWD-3495 .

Details

Description

Symptoms

Steps to reproduce

Attachments

Issue Links

Forms

Activity

Collapse comment: Diego Berrueta added a comment - 15/May/2014 5:29 AM

Expand comment: Diego Berrueta added a comment - 15/May/2014 5:29 AM

Collapse comment: Danilo Tuler added a comment - 15/May/2014 5:16 AM

Expand comment: Danilo Tuler added a comment - 15/May/2014 5:16 AM

Collapse comment: Diego Berrueta added a comment - 23/Apr/2014 7:22 AM

Expand comment: Diego Berrueta added a comment - 23/Apr/2014 7:22 AM

Collapse comment: Jim Pickering added a comment - 14/Mar/2014 2:06 PM

Expand comment: Jim Pickering added a comment - 14/Mar/2014 2:06 PM

Collapse comment: Jim Pickering added a comment - 06/Mar/2014 7:16 PM

Expand comment: Jim Pickering added a comment - 06/Mar/2014 7:16 PM

Collapse comment: Jim Pickering added a comment - 06/Mar/2014 4:39 PM

Expand comment: Jim Pickering added a comment - 06/Mar/2014 4:39 PM

Collapse comment: Jaco van Tonder added a comment - 17/Feb/2014 12:59 PM

Expand comment: Jaco van Tonder added a comment - 17/Feb/2014 12:59 PM

Collapse comment: Steve Ruiz added a comment - 12/Feb/2014 7:55 PM

Expand comment: Steve Ruiz added a comment - 12/Feb/2014 7:55 PM

Collapse comment: Theo Barker added a comment - 07/Feb/2014 3:42 PM

Expand comment: Theo Barker added a comment - 07/Feb/2014 3:42 PM

Collapse comment: Polar IS Europe added a comment - 07/Feb/2014 11:57 AM

Expand comment: Polar IS Europe added a comment - 07/Feb/2014 11:57 AM

Collapse comment: Theo Barker added a comment - 06/Feb/2014 10:09 PM

Expand comment: Theo Barker added a comment - 06/Feb/2014 10:09 PM

Collapse comment: Theo Barker added a comment - 06/Feb/2014 9:56 PM

Expand comment: Theo Barker added a comment - 06/Feb/2014 9:56 PM

Collapse comment: Peter Hudec added a comment - 04/Feb/2014 7:00 AM

Expand comment: Peter Hudec added a comment - 04/Feb/2014 7:00 AM

Collapse comment: Peter Hudec added a comment - 04/Feb/2014 6:55 AM

Expand comment: Peter Hudec added a comment - 04/Feb/2014 6:55 AM

Collapse comment: Diego Berrueta added a comment - 23/Jan/2014 9:56 PM, Edited by Diego Berrueta - 23/Jan/2014 9:57 PM

Expand comment: Diego Berrueta added a comment - 23/Jan/2014 9:56 PM, Edited by Diego Berrueta - 23/Jan/2014 9:57 PM

Collapse comment: Diego Berrueta added a comment - 17/Jan/2014 4:04 AM

Expand comment: Diego Berrueta added a comment - 17/Jan/2014 4:04 AM

People

Dates