Uploaded image for project: 'Crowd Data Center'
  1. Crowd Data Center
  2. CWD-3768

A failure in a single DB connection causes deadlock in Crowd

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Medium Medium
    • 2.7.2
    • 2.7.1
    • None
    • None

      Symptoms

      Crowd becomes unresponsive. A thread dump shows that all threads are in WAITING state, except one which is RUNNABLE and reading from the JDBC socket (SocketInputStream.read) while at the same time holding the WRITE lock in SwitchableTokenManagerImpl.

      Postgres logs contain "LOG: could not send data to client: Broken pipe".

      Steps to reproduce

      This issue is affecting some customers. I haven't been able to reproduce it locally. The key to reproduce this issue seems to be able to kill a connection between Crowd and Postgres in such way that Postgres believes it's closed ("broken pipe"), while Crowd keeps waiting to read from the socket.

      This issue seems to happen only when using database token storage.

            [CWD-3768] A failure in a single DB connection causes deadlock in Crowd

            Hi danilo.tuler, we're sorry to hear that you're having problems with 2.7.2. I've noticed that you've opened CWD-3915. The cause of your problems with 2.7.2 seems to be unrelated with this issue (CWD-3768).

            Diego Berrueta added a comment - Hi danilo.tuler , we're sorry to hear that you're having problems with 2.7.2. I've noticed that you've opened CWD-3915 . The cause of your problems with 2.7.2 seems to be unrelated with this issue ( CWD-3768 ).

            2.7.2 is as unstable as 2.7.1 for me. Even with tokens in memory.
            And it seems it's affecting JIRA and others atlassian apps heavily.

            Danilo Tuler added a comment - 2.7.2 is as unstable as 2.7.1 for me. Even with tokens in memory. And it seems it's affecting JIRA and others atlassian apps heavily.

            Thank you for your patience,

            As described at https://jira.atlassian.com/browse/CWD-3769?focusedCommentId=591918&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-591918, we have simplified the locking around the token storage, making it impossible for an unresponsive database connection to hold essential resources and cause the whole server to freeze. We have also changed the transaction model to eliminate the deadlocks. We believe these changes will fix the problem described in this issue. They will be part of the upcoming Crowd 2.7.2 release.

            Nevertheless, as a best practice to improve resilience against unexpected failures, we still recommend setting socket timeouts in your JDBC driver, and transaction timeouts in your database server. Please check the documentation of your database to configure timeouts.

            If you still experience deadlocks and stability problems after the upgrade to the upcoming Crowd 2.7.2 release, please open a support ticket. Thank you.

            Diego Berrueta added a comment - Thank you for your patience, As described at https://jira.atlassian.com/browse/CWD-3769?focusedCommentId=591918&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-591918 , we have simplified the locking around the token storage, making it impossible for an unresponsive database connection to hold essential resources and cause the whole server to freeze. We have also changed the transaction model to eliminate the deadlocks. We believe these changes will fix the problem described in this issue. They will be part of the upcoming Crowd 2.7.2 release. Nevertheless, as a best practice to improve resilience against unexpected failures, we still recommend setting socket timeouts in your JDBC driver, and transaction timeouts in your database server. Please check the documentation of your database to configure timeouts. If you still experience deadlocks and stability problems after the upgrade to the upcoming Crowd 2.7.2 release, please open a support ticket. Thank you.

            FWIW, we removed the foreign key from table dbo.cwd_user and all of the deadlocks have ceased, and performance has been very good. This resolved the issue for us.

            Jim Pickering added a comment - FWIW, we removed the foreign key from table dbo.cwd_user and all of the deadlocks have ceased, and performance has been very good. This resolved the issue for us.

            Using SQL Server 2008 R2 64-bit, when using the sqljdbc4.jar driver, Crowd didn't crash whether or not we used database storage of authentication tokens.

            When we switched to the jtds-1.2.7.jar driver, since all of our other Atlassian tools are using it, Crowd crashed. Crowd and all of our Atlassian applications became unresponsive too. I had to use a SQL script to turn off database storage of authentication tokens, restore the sqljdbc4.jar driver in the crowd config, and restart the server to get everything back online.

            Jim Pickering added a comment - Using SQL Server 2008 R2 64-bit, when using the sqljdbc4.jar driver, Crowd didn't crash whether or not we used database storage of authentication tokens. When we switched to the jtds-1.2.7.jar driver, since all of our other Atlassian tools are using it, Crowd crashed. Crowd and all of our Atlassian applications became unresponsive too. I had to use a SQL script to turn off database storage of authentication tokens, restore the sqljdbc4.jar driver in the crowd config, and restart the server to get everything back online.

            Would appreciate an Update comment from Atlassian on this issue, as the last comment was nearly one month ago. How close is a resolution to this issue?

            This issue seems Critical to me, rather than Major unless Major is the highest priority. Crowd is the nucleus of all Atlassian Tools, this needs to be resolved ASAP.

            We just upgraded Crowd from 2.4.2 to 2.7.1 on Windows Server 2008 R2 using SQL Server 2008 R2 both 64-bit and can confirm the comments made previous to this one: Moving Authentication Token Storage to "Memory Cache" did not help.

            The only reason I upgraded was to get all of the Atlassian tools using Java 7, otherwise version 2.4.2 wasn't having any issues. I sort of regret the upgrade, due to this bug. I will check Atlassian's Jira issues prior to upgrading in the future.

            We are getting deadlocks, and it is filling up our SQL logs, however it is not crashing Crowd. Sessions are timing out frequently though, among the Atlassian applications( Jira, Confluence, Fisheye, etc. ) requiring our users to re-login frequently, and also saving documents in Confluence are erroring out, due to losing Session, when they just logged in.

            Thanks.

            Jim Pickering added a comment - Would appreciate an Update comment from Atlassian on this issue, as the last comment was nearly one month ago. How close is a resolution to this issue? This issue seems Critical to me, rather than Major unless Major is the highest priority. Crowd is the nucleus of all Atlassian Tools, this needs to be resolved ASAP. We just upgraded Crowd from 2.4.2 to 2.7.1 on Windows Server 2008 R2 using SQL Server 2008 R2 both 64-bit and can confirm the comments made previous to this one: Moving Authentication Token Storage to "Memory Cache" did not help. The only reason I upgraded was to get all of the Atlassian tools using Java 7, otherwise version 2.4.2 wasn't having any issues. I sort of regret the upgrade, due to this bug. I will check Atlassian's Jira issues prior to upgrading in the future. We are getting deadlocks, and it is filling up our SQL logs, however it is not crashing Crowd. Sessions are timing out frequently though, among the Atlassian applications( Jira, Confluence, Fisheye, etc. ) requiring our users to re-login frequently, and also saving documents in Confluence are erroring out, due to losing Session, when they just logged in. Thanks.

            We get the following on the database side (Postgres 9.3)

             16387 | crowd      | 62073 |    16396 | crowd      |                  | 172.17.2.178 |                 |       40219 | 2014-02-10 12:00:00.057242+02 | 2014-02-10 12:01:00.607062+02 | 2014-02-10 12:01:00.699423+02 | 2014-02-10 12:01:00.699961+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
             16387 | crowd      | 34633 |    16396 | crowd      |                  | 172.17.2.178 |                 |       38868 | 2014-02-10 11:15:47.146698+02 | 2014-02-10 11:16:01.402503+02 | 2014-02-10 11:16:01.405518+02 | 2014-02-10 11:16:01.405521+02 | t       | active              | insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
             16387 | crowd      | 25933 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41748 | 2014-02-10 12:51:59.465915+02 |                               | 2014-02-10 12:52:34.667549+02 | 2014-02-10 12:52:34.667726+02 | f       | idle                |  DISCARD ALL
             16387 | crowd      | 34635 |    16396 | crowd      |                  | 172.17.2.178 |                 |       38870 | 2014-02-10 11:15:48.357651+02 | 2014-02-10 11:16:01.02871+02  | 2014-02-10 11:16:01.356168+02 | 2014-02-10 11:16:01.35656+02  | f       | idle in transaction | delete from cwd_token where id=$1
             16387 | crowd      | 25934 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41749 | 2014-02-10 12:51:59.47671+02  |                               | 2014-02-10 12:52:34.668734+02 | 2014-02-10 12:52:34.668846+02 | f       | idle                |  DISCARD ALL
             16387 | crowd      | 25935 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41750 | 2014-02-10 12:51:59.477418+02 | 2014-02-10 12:52:11.390548+02 | 2014-02-10 12:52:11.466483+02 | 2014-02-10 12:52:11.466827+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
             16387 | crowd      | 62074 |    16396 | crowd      |                  | 172.17.2.178 |                 |       40220 | 2014-02-10 12:00:00.058206+02 |                               | 2014-02-10 12:00:00.063045+02 | 2014-02-10 12:00:00.063565+02 | f       | idle                | SHOW TRANSACTION ISOLATION LEVEL
             16387 | crowd      | 35507 |    16396 | crowd      |                  | 172.17.2.178 |                 |       38975 | 2014-02-10 11:19:49.546673+02 | 2014-02-10 11:20:20.105722+02 | 2014-02-10 11:20:20.195675+02 | 2014-02-10 11:20:20.196117+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
             16387 | crowd      | 35508 |    16396 | crowd      |                  | 172.17.2.178 |                 |       38976 | 2014-02-10 11:19:49.58113+02  | 2014-02-10 11:19:55.930264+02 | 2014-02-10 11:19:56.006547+02 | 2014-02-10 11:19:56.006703+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
             16387 | crowd      | 26809 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41855 | 2014-02-10 12:56:00.286072+02 | 2014-02-10 12:56:00.313195+02 | 2014-02-10 12:56:00.395825+02 | 2014-02-10 12:56:00.396254+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
             16387 | crowd      | 45089 |    16396 | crowd      |                  | 172.17.2.178 |                 |       39215 | 2014-02-10 11:24:56.00944+02  | 2014-02-10 11:24:56.032012+02 | 2014-02-10 11:24:56.126233+02 | 2014-02-10 11:24:56.126891+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
             16387 | crowd      | 26983 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41887 | 2014-02-10 12:57:11.439214+02 |                               | 2014-02-10 12:59:59.807766+02 | 2014-02-10 12:59:59.80853+02  | f       | idle                | SHOW TRANSACTION ISOLATION LEVEL
             16387 | crowd      | 26985 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41888 | 2014-02-10 12:57:11.45502+02  | 2014-02-10 12:57:11.465899+02 | 2014-02-10 12:57:11.553381+02 | 2014-02-10 12:57:11.553848+02 | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
             16387 | crowd      | 61210 |    16396 | crowd      |                  | 172.17.2.178 |                 |       40118 | 2014-02-10 11:56:00.500355+02 | 2014-02-10 11:56:00.525963+02 | 2014-02-10 11:56:00.608745+02 | 2014-02-10 11:56:00.60894+02  | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
             16387 | crowd      | 26986 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41889 | 2014-02-10 12:57:11.456641+02 | 2014-02-10 12:57:21.703485+02 | 2014-02-10 12:57:21.777868+02 | 2014-02-10 12:57:21.77803+02  | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
             16387 | crowd      | 27361 |    16396 | crowd      |                  | 172.17.2.178 |                 |       41940 | 2014-02-10 12:59:08.022304+02 | 2014-02-10 13:01:00.357251+02 | 2014-02-10 13:01:00.43968+02  | 2014-02-10 13:01:00.44007+02  | f       | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2
            

            The deadlock is caused by pid 34633.

            ---Jaco

            Jaco van Tonder added a comment - We get the following on the database side (Postgres 9.3) 16387 | crowd | 62073 | 16396 | crowd | | 172.17.2.178 | | 40219 | 2014-02-10 12:00:00.057242+02 | 2014-02-10 12:01:00.607062+02 | 2014-02-10 12:01:00.699423+02 | 2014-02-10 12:01:00.699961+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 34633 | 16396 | crowd | | 172.17.2.178 | | 38868 | 2014-02-10 11:15:47.146698+02 | 2014-02-10 11:16:01.402503+02 | 2014-02-10 11:16:01.405518+02 | 2014-02-10 11:16:01.405521+02 | t | active | insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) 16387 | crowd | 25933 | 16396 | crowd | | 172.17.2.178 | | 41748 | 2014-02-10 12:51:59.465915+02 | | 2014-02-10 12:52:34.667549+02 | 2014-02-10 12:52:34.667726+02 | f | idle | DISCARD ALL 16387 | crowd | 34635 | 16396 | crowd | | 172.17.2.178 | | 38870 | 2014-02-10 11:15:48.357651+02 | 2014-02-10 11:16:01.02871+02 | 2014-02-10 11:16:01.356168+02 | 2014-02-10 11:16:01.35656+02 | f | idle in transaction | delete from cwd_token where id=$1 16387 | crowd | 25934 | 16396 | crowd | | 172.17.2.178 | | 41749 | 2014-02-10 12:51:59.47671+02 | | 2014-02-10 12:52:34.668734+02 | 2014-02-10 12:52:34.668846+02 | f | idle | DISCARD ALL 16387 | crowd | 25935 | 16396 | crowd | | 172.17.2.178 | | 41750 | 2014-02-10 12:51:59.477418+02 | 2014-02-10 12:52:11.390548+02 | 2014-02-10 12:52:11.466483+02 | 2014-02-10 12:52:11.466827+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 62074 | 16396 | crowd | | 172.17.2.178 | | 40220 | 2014-02-10 12:00:00.058206+02 | | 2014-02-10 12:00:00.063045+02 | 2014-02-10 12:00:00.063565+02 | f | idle | SHOW TRANSACTION ISOLATION LEVEL 16387 | crowd | 35507 | 16396 | crowd | | 172.17.2.178 | | 38975 | 2014-02-10 11:19:49.546673+02 | 2014-02-10 11:20:20.105722+02 | 2014-02-10 11:20:20.195675+02 | 2014-02-10 11:20:20.196117+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 35508 | 16396 | crowd | | 172.17.2.178 | | 38976 | 2014-02-10 11:19:49.58113+02 | 2014-02-10 11:19:55.930264+02 | 2014-02-10 11:19:56.006547+02 | 2014-02-10 11:19:56.006703+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 26809 | 16396 | crowd | | 172.17.2.178 | | 41855 | 2014-02-10 12:56:00.286072+02 | 2014-02-10 12:56:00.313195+02 | 2014-02-10 12:56:00.395825+02 | 2014-02-10 12:56:00.396254+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 45089 | 16396 | crowd | | 172.17.2.178 | | 39215 | 2014-02-10 11:24:56.00944+02 | 2014-02-10 11:24:56.032012+02 | 2014-02-10 11:24:56.126233+02 | 2014-02-10 11:24:56.126891+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 26983 | 16396 | crowd | | 172.17.2.178 | | 41887 | 2014-02-10 12:57:11.439214+02 | | 2014-02-10 12:59:59.807766+02 | 2014-02-10 12:59:59.80853+02 | f | idle | SHOW TRANSACTION ISOLATION LEVEL 16387 | crowd | 26985 | 16396 | crowd | | 172.17.2.178 | | 41888 | 2014-02-10 12:57:11.45502+02 | 2014-02-10 12:57:11.465899+02 | 2014-02-10 12:57:11.553381+02 | 2014-02-10 12:57:11.553848+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 61210 | 16396 | crowd | | 172.17.2.178 | | 40118 | 2014-02-10 11:56:00.500355+02 | 2014-02-10 11:56:00.525963+02 | 2014-02-10 11:56:00.608745+02 | 2014-02-10 11:56:00.60894+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 26986 | 16396 | crowd | | 172.17.2.178 | | 41889 | 2014-02-10 12:57:11.456641+02 | 2014-02-10 12:57:21.703485+02 | 2014-02-10 12:57:21.777868+02 | 2014-02-10 12:57:21.77803+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 16387 | crowd | 27361 | 16396 | crowd | | 172.17.2.178 | | 41940 | 2014-02-10 12:59:08.022304+02 | 2014-02-10 13:01:00.357251+02 | 2014-02-10 13:01:00.43968+02 | 2014-02-10 13:01:00.44007+02 | f | idle in transaction | select property0_.property_key as property1_13_0_, property0_.property_name as property2_13_0_, property0_.property_value as property3_13_0_ from cwd_property property0_ where property0_.property_key=$1 and property0_.property_name=$2 The deadlock is caused by pid 34633. ---Jaco

            Steve Ruiz added a comment -

            I tried adding ?socketTimeout=30 to my jdbc url, but still doesn't work. This is a fresh/brand new install of crowd. I went through the installation wizard, logged in after that, and was able to use it. After I restarted (tried to add a plugin jar), I have not been able to login since - this is a server with zero activity on it, other than me trying to login as an admin. I see postgres doing "INSERT waiting" and another "idle in transaction".

            status "idle in transaction" is: delete from cwd_token where id=$1
            active query is: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_d
            ate, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)

            Steve Ruiz added a comment - I tried adding ?socketTimeout=30 to my jdbc url, but still doesn't work. This is a fresh/brand new install of crowd. I went through the installation wizard, logged in after that, and was able to use it. After I restarted (tried to add a plugin jar), I have not been able to login since - this is a server with zero activity on it, other than me trying to login as an admin. I see postgres doing "INSERT waiting" and another "idle in transaction". status "idle in transaction" is: delete from cwd_token where id=$1 active query is: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_d ate, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)

            Moving Authentication Token Storage to "Memory Cache" did not help.

            Theo Barker added a comment - Moving Authentication Token Storage to "Memory Cache" did not help.

            Just to add to the comments already here. Out fresh evaluation setup consists of:

            • Centos 6 + OpenJDK 1.7.0
            • PostgreSQL 9.2
            • Crowd 2.7.1

            After clean new installation & setup with completely unloaded server and no data, we are unable to log back in. Crowd hangs while trying to log in with PostgreSQL showing: postgres: crowd crowd_db_01 127.0.0.1(33930) INSERT waiting

            And the following in logs:

            ERROR: duplicate key value violates unique constraint "uk_token_id_hash"
            DETAIL: Key (identifier_hash)=(ggFyZinu0tfC85Ccyz4fRA00) already exists.
            STATEMENT: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
            LOG: could not send data to client: Broken pipe
            FATAL: connection to client lost

            Adding socket timeout to the connection URL did not help.

            Polar IS Europe added a comment - Just to add to the comments already here. Out fresh evaluation setup consists of: Centos 6 + OpenJDK 1.7.0 PostgreSQL 9.2 Crowd 2.7.1 After clean new installation & setup with completely unloaded server and no data, we are unable to log back in. Crowd hangs while trying to log in with PostgreSQL showing: postgres: crowd crowd_db_01 127.0.0.1(33930) INSERT waiting And the following in logs: ERROR: duplicate key value violates unique constraint "uk_token_id_hash" DETAIL: Key (identifier_hash)=(ggFyZinu0tfC85Ccyz4fRA00) already exists. STATEMENT: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) LOG: could not send data to client: Broken pipe FATAL: connection to client lost Adding socket timeout to the connection URL did not help.

            Further, Crowd v2.7.0. While we've seen the PostgreSQL log error referenced in the first message for a couple of instances of Crowd logging the "Directory 'xxxx' is not functional during authentication of 'uuuuu'. Skipped." messages. However, we have many more instances of Crowd logging the "Directory...not functional" errors.

            Theo Barker added a comment - Further, Crowd v2.7.0. While we've seen the PostgreSQL log error referenced in the first message for a couple of instances of Crowd logging the "Directory 'xxxx' is not functional during authentication of 'uuuuu'. Skipped." messages. However, we have many more instances of Crowd logging the "Directory...not functional" errors.

            We're seeing this on a lightly loaded server.
            Config: Ubuntu 12.04.4 LTS (GNU/Linux 3.2.0-56-generic x86_64), 8GB RAM, 4 CPU (Intel Xeon X5675 @ 3.07GHz), PostgreSQL 9.1.11-0ubuntu, VMware vSphere 5.5, hosting JIRA, Crowd & Confluence on same machine. Thus we do have three applications all accessing the same database engine on the same machine on which they are running. Considering our loading, should not be a problem.

            Theo Barker added a comment - We're seeing this on a lightly loaded server. Config: Ubuntu 12.04.4 LTS (GNU/Linux 3.2.0-56-generic x86_64), 8GB RAM, 4 CPU (Intel Xeon X5675 @ 3.07GHz), PostgreSQL 9.1.11-0ubuntu, VMware vSphere 5.5, hosting JIRA, Crowd & Confluence on same machine. Thus we do have three applications all accessing the same database engine on the same machine on which they are running. Considering our loading, should not be a problem.

            I will try to use the socketTimeout during off-peak hours.
            Maybe it would be good to ship the CROWD with the bundled JRE and the INSTALLER as the other Atlassian product /jira, confluence/.

            Peter Hudec added a comment - I will try to use the socketTimeout during off-peak hours. Maybe it would be good to ship the CROWD with the bundled JRE and the INSTALLER as the other Atlassian product /jira, confluence/.

            Hi,

            we experience this issue after migrating to the new server.
            The old one was Ubuntu Hardy /8.04/, aka postgresql 8.4 and java6. New OS is Debian Wheezy /amd64/, latest java 1.7 form Oracle a 9.1 postgresql. We got the deadlock right after the first attempt to login.

            Peter Hudec added a comment - Hi, we experience this issue after migrating to the new server. The old one was Ubuntu Hardy /8.04/, aka postgresql 8.4 and java6. New OS is Debian Wheezy /amd64/, latest java 1.7 form Oracle a 9.1 postgresql. We got the deadlock right after the first attempt to login.

            Diego Berrueta added a comment - - edited

            We're investigating this issue. If anyone is experiencing server crashes and is using Postgres, we suggest you modify the JDBC connection URL in crowd.cfg.xml to add the parameter ?socketTimeout=30. For instance, in my case it looks like:

            <property name="hibernate.connection.url">jdbc:postgresql://localhost:5432/crowd?socketTimeout=30</property>
            

            Please let us know if that improves the stability of the server. Thank you.

            Diego Berrueta added a comment - - edited We're investigating this issue. If anyone is experiencing server crashes and is using Postgres, we suggest you modify the JDBC connection URL in crowd.cfg.xml to add the parameter ?socketTimeout=30 . For instance, in my case it looks like: <property name= "hibernate.connection.url" >jdbc:postgresql: //localhost:5432/crowd?socketTimeout=30</property> Please let us know if that improves the stability of the server. Thank you.

            This issue has similar effects to CWD-3692 (Crowd freezes), but different causes. In particular, it is not essential for this issue to have a high load in the Crowd server. Once the situation described in the "Symptoms"/"Steps to reproduce" section above happens, Crowd will eventually crash after some time.

            This issue also has some resemblances to CWD-3568. In particular, we have observed the line "ERROR: duplicate key value violates unique constraint "cwd_token_identifier_hash_key" STATEMENT: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)" in the Postgres logs just before the "LOG: could not send data to client: Broken pipe". The effects are quite different: CWD-3568 never caused a server crash, just some request to fail.

            The "LOG: could not send data to client: Broken pipe" line was also seen in CWD-3495.

            Diego Berrueta added a comment - This issue has similar effects to CWD-3692 (Crowd freezes), but different causes. In particular, it is not essential for this issue to have a high load in the Crowd server. Once the situation described in the "Symptoms"/"Steps to reproduce" section above happens, Crowd will eventually crash after some time. This issue also has some resemblances to CWD-3568 . In particular, we have observed the line "ERROR: duplicate key value violates unique constraint "cwd_token_identifier_hash_key" STATEMENT: insert into cwd_token (directory_id, entity_name, random_number, identifier_hash, random_hash, created_date, last_accessed_date, last_accessed_time, duration, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)" in the Postgres logs just before the "LOG: could not send data to client: Broken pipe". The effects are quite different: CWD-3568 never caused a server crash, just some request to fail. The "LOG: could not send data to client: Broken pipe" line was also seen in CWD-3495 .

              dberrueta Diego Berrueta
              dberrueta Diego Berrueta
              Affected customers:
              29 This affects my team
              Watchers:
              49 Start watching this issue

                Created:
                Updated:
                Resolved: