Uploaded image for project: 'Crowd Data Center'
  1. Crowd Data Center
  2. CWD-6276

Running Crowd with Oracle Database Native Network Encryption degrades performance or causes outages

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Low Low
    • None
    • 5.1.2
    • Database

      Issue Summary

      Oracle provides a feature called Native Network Encryption: ORACLE-BASE - Native Network Encryption for Database Connections. This feature was previously part of the Advanced Security Option license, and provides connection encryption without requiring client side configuration.

      When this feature is enabled, it adds 350ms+ to the time required to establish a database connection. This alone will cause noteworthy performance degradation, but when combined with the default database connection pool manager in Crowd, c3p0, it can cause intermittent outages and extreme performance degradation.

      Oracle have stated that this latency is working as intended: Slow Connection Using 12c Client When Network Encryption Is Enabled

      This is reproducible on Data Center: yes

      Steps to Reproduce

      1. Install any version of Crowd
      2. Install Oracle DB 11g or later with Native Network Encryption enabled
      3. Introduce load to the system. The problem is usually exacerbated with a larger value of hibernate.c3p0.max_size in crowd.cfg.xml, due to the nature of the c3p0 pool scaling bug.
      4. Monitor Crowd for delayed or timeout responses

      Expected Results

      Crowd should continue operating normally, scaling the size of the c3p0 pool appropriately.

      Actual Results

      There is a prolonged delay in establishing database connections that causes c3p0 to get stuck in a loop of attempting to obtain additional database connections. As obtaining these database connections is slow, this will take longer than normal.

      Crowd will remain unresponsive until it reaches the c3p0 maximum pool size for the node.

      This issue will not be visible in the logs by default, but the following KB provides additional details on how to diagnose this issue: Confluence Unresponsive Due to High Database Connection Latency (taken from Confluence, which used c3p0 up until Confluence 7.14 (not inclusive). A KB article will be drafted for Crowd shortly)

      Thread dumps will show unresponsive threads waiting on com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable for long periods, and this is indicative of the problem.

      Additionally, if the StuckThreadDetectionValve is enabled in the <Host> block within server.xml with an appropriate threshold value, Tomcat (catalina) logs will then show show the same stuck thread stack trace. For example:

      <Valve className="org.apache.catalina.valves.StuckThreadDetectionValve" threshold="60"/>
      

      The Tomcat (catalina) log will then show:

      15-Jun-2024 11:16:50.109 WARNING [ContainerBackgroundProcessor[StandardEngine[Catalina]]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [http-nio-8095-exec-103 url: /crowd/rest/usermanagement/1/authentication] (id=[306]) has been active for [67,206] milliseconds (since [6/15/24, 11:15 AM]) to serve the same request for [https://testsite.atlassian.com/crowd/rest/usermanagement/1/authentication?username=user55] and may be stuck (configured threshold for this StuckThreadDetectionValve is [60] seconds). There is/are [138] thread(s) in total that are monitored by this Valve and may be stuck.
              java.lang.Throwable
                      at java.base@11.0.21/java.lang.Object.wait(Native Method)
                      at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1503)
                      at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:644)
                      at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:554)
                      at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse(C3P0PooledConnectionPool.java:758)
                      at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:685)
                      ....
      

      Workaround

      A workaround is detailed on this KB: Confluence Unresponsive Due to High Database Connection Latency. In essence, the quickest workaround is workaround #2, which can be implemented by:

      1. Editing the crowd.cfg.xml
      2. Changing the hibernate.c3p0.timeout value from the default of 30 to a larger value (eg. 900 or 1800). For example, change:
        <property name="hibernate.c3p0.timeout">30</property>
        

        to

        <property name="hibernate.c3p0.timeout">900</property>
        

        If in doubt, set 1800 and validate that the problem is solved, then try a lower value but choose one that is sufficiently high to solve the problem.

      3. Restart Crowd for the setting to take effect. This must be completed on all nodes if Crowd is clustered.

      However, it may be preferably to implement SSL to the database with proper certificate exchange, or disable Native Network Encryption entirely.

            [CWD-6276] Running Crowd with Oracle Database Native Network Encryption degrades performance or causes outages

            Rob made changes -
            Labels Original: whl-fy25q2 New: whl-fy25q2 whl-fy25q3
            Rob made changes -
            Labels New: whl-fy25q2
            SET Analytics Bot made changes -
            UIS Original: 3 New: 2
            SET Analytics Bot made changes -
            UIS Original: 1 New: 3
            SET Analytics Bot made changes -
            Support reference count Original: 1 New: 2
            Viktar Arlou made changes -
            Status Original: Needs Triage [ 10030 ] New: Gathering Impact [ 12072 ]
            Viktar Arlou made changes -
            Remote Link Original: This issue links to "KRAK-6560 (Bulldog)" [ 926617 ] New: This issue links to "KRAK-6560 (JIRA Server (Bulldog))" [ 926617 ]
            Viktar Arlou made changes -
            Remote Link Original: This issue links to "KRAK-6559 (Bulldog)" [ 926618 ] New: This issue links to "KRAK-6559 (JIRA Server (Bulldog))" [ 926618 ]
            Viktar Arlou made changes -
            Remote Link New: This issue links to "KRAK-6559 (Bulldog)" [ 926618 ]
            Viktar Arlou made changes -
            Remote Link New: This issue links to "KRAK-6560 (Bulldog)" [ 926617 ]

              Unassigned Unassigned
              mninnes@atlassian.com Malcolm Ninnes
              Affected customers:
              0 This affects my team
              Watchers:
              4 Start watching this issue

                Created:
                Updated: