Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-74478

Atlassian diagnostics creates several threads when monitoring database operations made by Jira and may crash the instance during high DB activity

    XMLWordPrintable

Details

    Description

      Issue Summary

      It seems Atlassian Diagnostics plugin isn't reusing threads from its thread pool and is creating several threads when monitoring operations against the database.

      The problem becomes more evident when running a full reindex as this is one of the operations on Jira with high database activity.

      During high DB activity, such as a full reindex, the Jira instance might crash because of the high number of threads.

      This seems to be related to an upgrade on the Atlassian Diagnostics plugin to version 2.0.4 on Jira 9.3.0.
      The bug couldn't be recreated on Jira 9.2.0.

      Steps to Reproduce

      1. Install a vanilla instance of Jira Software Data Center 9.3.0.
        • This was validated with both 9.3.0, 9.3.1 and 9.4.0-eap.
        • The bug couldn't be recreated on 9.2.0.
      2. Make sure Java Flight Recorder is enabled and taking thread dumps each 3 seconds.
      3. Create ~100 projects so the instance has something to reindex.
        JIRA_BASE_URL=http://localhost:8080
        JIRA_ADMIN_USERNAME=admin
        JIRA_ADMIN_PASSWORD=admin
        JIRA_PROJECT_NAME=kan
        JIRA_PROJECT_KEY=KAN
        
        for i in $(seq 1 100); do
          curl -v -u ${JIRA_ADMIN_USERNAME}:${JIRA_ADMIN_PASSWORD} -X POST -o /dev/null \
            -H 'X-Atlassian-Token: no-check' \
            ${JIRA_BASE_URL}'/rest/jira-importers-plugin/1.0/demo/create' \
            --data-raw 'name='${JIRA_PROJECT_NAME}${i}'&key='${JIRA_PROJECT_KEY}${i}'&keyEdited=false&projectTemplateWebItemKey=software-demo-project-kanban&projectTemplateModuleKey=undefined'
        done
        
      4. Run a Full reindex.

      Expected Results

      Full reindex runs with no major problem and no side effects.

      Actual Results

      Full reindex completes without any error.
      Hundreds or thousands of threads named pool-XX-thread-XXXX are created.
      Inspecting JFR and thread dumps while the reindex was running shows threads similar to the below.

      "pool-18-thread-695" prio=5 tid=0x00000000000007d5 nid=0 waiting on condition 
         java.lang.Thread.State: TIMED_WAITING (parking)
      	at java.base@11.0.16/jdk.internal.misc.Unsafe.park(Native Method)
      	- parking to wait for <0x000000003697a89f> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      	at java.base@11.0.16/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
      	at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
      	at java.base@11.0.16/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432)
      	at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1053)
      	at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
      	at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      	at java.base@11.0.16/java.lang.Thread.run(Thread.java:829)
      
         Locked ownable synchronizers:
      	- None
      

      On few instances we are able to capture these threads on runnable state and they would be similar to the below, running code associated to com.atlassian.diagnostics.

      "pool-17-thread-27202" #36575 prio=5 os_prio=0 cpu=0.13ms elapsed=1.50s tid=0x00007f41f89d8000 nid=0x905b runnable  [0x00007f3aab6fe000]
         java.lang.Thread.State: RUNNABLE
      	at io.micrometer.core.instrument.LongTaskTimer$Builder.register(LongTaskTimer.java:408)
      	at io.micrometer.core.instrument.MeterRegistry$More.longTaskTimer(MeterRegistry.java:872)
      	at com.atlassian.util.profiling.micrometer.MicrometerStrategy.startLongRunningTimer(MicrometerStrategy.java:104)
      	at com.atlassian.util.profiling.micrometer.MicrometerStrategy.startLongRunningTimer(MicrometerStrategy.java:93)
      	at com.atlassian.util.profiling.Metrics$DefaultLongRunningMetricTimer.start(Metrics.java:721)
      	at com.atlassian.util.profiling.Metrics$Builder.startLongRunningTimer(Metrics.java:635)
      	at com.atlassian.diagnostics.internal.platform.monitor.db.DefaultDatabaseDiagnosticsCollector.lambda$startTimingDatabaseOperationAsync$1(DefaultDatabaseDiagnosticsCollector.java:153)
      	at com.atlassian.diagnostics.internal.platform.monitor.db.DefaultDatabaseDiagnosticsCollector$$Lambda$2047/0x00000008427bd840.call(Unknown Source)
      	at java.util.concurrent.FutureTask.run(java.base@11.0.16.1/FutureTask.java:264)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.16.1/ThreadPoolExecutor.java:1128)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.16.1/ThreadPoolExecutor.java:628)
      	at java.lang.Thread.run(java.base@11.0.16.1/Thread.java:829)
      

      Looking at the JFR data there will be thousands of threads with the characteristics described above.


      Depending on how large the environment is, the application crashes during the reindex because of the number of processes created.

      Sometimes Jira may crash with the following error because of many existing threads.

      java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
      

      Workaround

      Disable affecting system Apps while running Jira on 9.3.0 and 9.3.1 versions:

      1. Go to Cog icon > Manage Apps > Manage Apps.
      2. Choose All Apps and filter for diagnostic.
      3. Disable the following 2 system apps.
        • Atlassian Diagnostics - Plugin
        • Atlassian Jira - Plugins - Diagnostics Plugin
      4. Restart Jira to ensure old lingering threads won't affect our test.
      5. Go to Manage Apps and confirm the two diagnostic Apps are still disabled.

      Enable these Apps once upgrading Jira to a version where this bug is fixed.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tmasutti Thiago Masutti
              Votes:
              2 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: