Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-98637

NativePRNG Blocking Issue can lead to Performance Problems & Outages

      Problem

      In Confluence Data Center 9.1.1 it has been observed production crashes, slow macro attachment load or team calendars integration failing to load in some specific instances

      Environment

      Confluence Data Center version 9.1.1 running on Linux-based Operating Systems.

      Symptoms and Steps to Reproduce

      The problem relates to the entropy available (or not available) with /dev/random on specific Linux OS configurations and containerized environments. In some instances, it appears the lack of entropy available leads to blocked HTTP Threads/part of Confluence integration becoming unresponsive and the problem can cascade to a complete blocage/outage of the instance from the end user perspective.

      Example of stack traces being observed

      • Attachment macro slow to load
        19-Nov-2024 07:46:02.541 WARNING [Catalina-utility-1] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [https-jsse-nio-8443-exec-29] (id=[275]) has been active for [63,611] milliseconds (since [11/19/24, 7:44 AM]) to serve the same request for [sanitizedURL] and may be stuck (configured threshold for this StuckThreadDetectionValve is [60] seconds). There is/are [2] thread(s) in total that are monitored by this Valve and may be stuck.
        	java.lang.Throwable
        		at java.base/sun.security.provider.NativePRNG$RandomIO.implNextBytes(Unknown Source)
        		at java.base/sun.security.provider.NativePRNG$Blocking.engineNextBytes(Unknown Source)
        		at java.base/java.security.SecureRandom.nextBytes(Unknown Source)
        		at org.apache.commons.lang3.CachedRandomBits.<init>(CachedRandomBits.java:67)
        		at org.apache.commons.lang3.RandomStringUtils.random(RandomStringUtils.java:304)
        		at org.apache.commons.lang3.RandomStringUtils.random(RandomStringUtils.java:263)
        		at org.apache.commons.lang3.RandomStringUtils.next(RandomStringUtils.java:668)
        		at org.apache.commons.lang3.RandomStringUtils.random(RandomStringUtils.java:165)
        		at org.apache.commons.lang3.RandomStringUtils.next(RandomStringUtils.java:628)
        		at org.apache.commons.lang3.RandomStringUtils.random(RandomStringUtils.java:129)
        		at org.apache.commons.lang3.RandomStringUtils.nextAlphanumeric(RandomStringUtils.java:759)
        		at org.apache.commons.lang3.RandomStringUtils.randomAlphanumeric(RandomStringUtils.java:412)
        		at com.atlassian.confluence.extra.attachments.AttachmentsMacro.buildTemplateModel(AttachmentsMacro.java:216)
        		at com.atlassian.confluence.extra.attachments.AttachmentsMacro.execute(AttachmentsMacro.java:182)
        		at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Unknown Source)
        		at java.base/java.lang.reflect.Method.invoke(Unknown Source)
      • Team Calendar failing to load
        18-Nov-2024 07:41:01.643 WARNING [Catalina-utility-3] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [http-nio-8090-exec-73] (id=[46354]) has been active for [67,498] milliseconds (since [11/18/24, 7:39 AM]) to serve the same request for [<confluenceBaseURL>/calendar/mycalendar.action] and may be stuck (configured threshold for this StuckThreadDetectionValve is [60] seconds). There is/are [5] thread(s) in total that are monitored by this Valve and may be stuck.
        	java.lang.Throwable
        		at java.base/sun.security.provider.NativePRNG$RandomIO.implNextBytes(Unknown Source)
        		at java.base/sun.security.provider.NativePRNG$Blocking.engineNextBytes(Unknown Source)
        		at java.base/java.security.SecureRandom.nextBytes(Unknown Source)
        		at org.apache.commons.lang3.CachedRandomBits.<init>(CachedRandomBits.java:67)
        		at org.apache.commons.lang3.RandomStringUtils.random(RandomStringUtils.java:304)
        		at org.apache.commons.lang3.RandomStringUtils.next(RandomStringUtils.java:668)
        		at org.apache.commons.lang3.RandomStringUtils.random(RandomStringUtils.java:165)
        		at org.apache.commons.lang3.RandomStringUtils.next(RandomStringUtils.java:628)
        		at org.apache.commons.lang3.RandomStringUtils.random(RandomStringUtils.java:129)
        		at org.apache.commons.lang3.RandomStringUtils.nextAlphabetic(RandomStringUtils.java:728)
        		at org.apache.commons.lang3.RandomStringUtils.randomAlphabetic(RandomStringUtils.java:381)
        		at com.atlassian.confluence.extra.calendar3.CalendarRenderer$RenderParamsBuilder.build(CalendarRenderer.java:427)
        		at com.atlassian.confluence.extra.calendar3.xwork.CalendarAction.getParams(CalendarAction.java:130)
        		at com.atlassian.confluence.extra.calendar3.xwork.CalendarAction.getCalendarHtml(CalendarAction.java:121)
        		at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Unknown Source)
        		at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        		at org.apache.velocity.runtime.parser.node.PropertyExecutor.execute(PropertyExecutor.java:166)
        		at org.apache.velocity.util.introspection.UberspectImpl$VelGetterImpl.invoke(UberspectImpl.java:562)

      We will add more details on the steps to reproduce as the investigation progresses

      Expected Results

      Confluence would run regardless of the entropy available/underlying method being used to generate secure random.

      Actual Results

      Confluence can fail to run properly depending on some specific OS configuration and blocking method highlighted above

      Workaround

      • For Red Hat Enterprise Linux 8 OS: Installing rng-tools seems to provide additional entropy and work around the present symptoms.
      • For other Linux OS, creating a symbolic link to re-map /dev/random to /dev/urandom has been successful in working around the described issue, e.g.
        mv /dev/random /dev/random.bak
        ln -s /dev/urandom /dev/random
        

      As the workarounds involved updates outside of the application scope, we suggest validating workarounds in a lower staging environment. More information will be added here as investigation progress continues on this bug report

          Form Name

            [CONFSERVER-98637] NativePRNG Blocking Issue can lead to Performance Problems & Outages

            Quan Pham added a comment -

            A fix for this issue is available in Confluence Server and Data Center 9.2.0. Upgrade now or check out the Release Notes to see what other issues are resolved.

            Quan Pham added a comment - A fix for this issue is available in Confluence Server and Data Center 9.2.0. Upgrade now or check out the Release Notes to see what other issues are resolved.

            Can confirm that the workaround with the symlink has worked for us and resolved the instability of Confluence, as well as the Team Calendar failing to load.

            IT-Services added a comment - Can confirm that the workaround with the symlink has worked for us and resolved the instability of Confluence, as well as the Team Calendar failing to load.

              kmacleod Kenny MacLeod
              2e857505f334 Pascal Oberle
              Affected customers:
              6 This affects my team
              Watchers:
              25 Start watching this issue

                Created:
                Updated:
                Resolved: