-
Bug
-
Resolution: Fixed
-
High (View bug fix roadmap)
-
9.3.0, 9.3.1
-
9.03
-
12
-
Severity 2 - Major
-
46
-
Issue Summary
It seems Atlassian Diagnostics plugin isn't reusing threads from its thread pool and is creating several threads when monitoring operations against the database.
The problem becomes more evident when running a full reindex as this is one of the operations on Jira with high database activity.
During high DB activity, such as a full reindex, the Jira instance might crash because of the high number of threads.
This seems to be related to an upgrade on the Atlassian Diagnostics plugin to version 2.0.4 on Jira 9.3.0.
The bug couldn't be recreated on Jira 9.2.0.
Steps to Reproduce
- Install a vanilla instance of Jira Software Data Center 9.3.0.
- This was validated with both 9.3.0, 9.3.1 and 9.4.0-eap.
- The bug couldn't be recreated on 9.2.0.
- Make sure Java Flight Recorder is enabled and taking thread dumps each 3 seconds.
- Create ~100 projects so the instance has something to reindex.
JIRA_BASE_URL=http://localhost:8080 JIRA_ADMIN_USERNAME=admin JIRA_ADMIN_PASSWORD=admin JIRA_PROJECT_NAME=kan JIRA_PROJECT_KEY=KAN for i in $(seq 1 100); do curl -v -u ${JIRA_ADMIN_USERNAME}:${JIRA_ADMIN_PASSWORD} -X POST -o /dev/null \ -H 'X-Atlassian-Token: no-check' \ ${JIRA_BASE_URL}'/rest/jira-importers-plugin/1.0/demo/create' \ --data-raw 'name='${JIRA_PROJECT_NAME}${i}'&key='${JIRA_PROJECT_KEY}${i}'&keyEdited=false&projectTemplateWebItemKey=software-demo-project-kanban&projectTemplateModuleKey=undefined' done
- Run a Full reindex.
Expected Results
Full reindex runs with no major problem and no side effects.
Actual Results
Full reindex completes without any error.
Hundreds or thousands of threads named pool-XX-thread-XXXX are created.
Inspecting JFR and thread dumps while the reindex was running shows threads similar to the below.
"pool-18-thread-695" prio=5 tid=0x00000000000007d5 nid=0 waiting on condition java.lang.Thread.State: TIMED_WAITING (parking) at java.base@11.0.16/jdk.internal.misc.Unsafe.park(Native Method) - parking to wait for <0x000000003697a89f> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.base@11.0.16/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234) at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123) at java.base@11.0.16/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1053) at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base@11.0.16/java.lang.Thread.run(Thread.java:829) Locked ownable synchronizers: - None
On few instances we are able to capture these threads on runnable state and they would be similar to the below, running code associated to com.atlassian.diagnostics.
"pool-17-thread-27202" #36575 prio=5 os_prio=0 cpu=0.13ms elapsed=1.50s tid=0x00007f41f89d8000 nid=0x905b runnable [0x00007f3aab6fe000] java.lang.Thread.State: RUNNABLE at io.micrometer.core.instrument.LongTaskTimer$Builder.register(LongTaskTimer.java:408) at io.micrometer.core.instrument.MeterRegistry$More.longTaskTimer(MeterRegistry.java:872) at com.atlassian.util.profiling.micrometer.MicrometerStrategy.startLongRunningTimer(MicrometerStrategy.java:104) at com.atlassian.util.profiling.micrometer.MicrometerStrategy.startLongRunningTimer(MicrometerStrategy.java:93) at com.atlassian.util.profiling.Metrics$DefaultLongRunningMetricTimer.start(Metrics.java:721) at com.atlassian.util.profiling.Metrics$Builder.startLongRunningTimer(Metrics.java:635) at com.atlassian.diagnostics.internal.platform.monitor.db.DefaultDatabaseDiagnosticsCollector.lambda$startTimingDatabaseOperationAsync$1(DefaultDatabaseDiagnosticsCollector.java:153) at com.atlassian.diagnostics.internal.platform.monitor.db.DefaultDatabaseDiagnosticsCollector$$Lambda$2047/0x00000008427bd840.call(Unknown Source) at java.util.concurrent.FutureTask.run(java.base@11.0.16.1/FutureTask.java:264) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.16.1/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.16.1/ThreadPoolExecutor.java:628) at java.lang.Thread.run(java.base@11.0.16.1/Thread.java:829)
Looking at the JFR data there will be thousands of threads with the characteristics described above.
Depending on how large the environment is, the application crashes during the reindex because of the number of processes created.
Sometimes Jira may crash with the following error because of many existing threads.
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
Workaround
Disable affecting system Apps while running Jira on 9.3.0 and 9.3.1 versions:
- Go to Cog icon > Manage Apps > Manage Apps.
- Choose All Apps and filter for diagnostic.
- Disable the following 2 system apps.
- Atlassian Diagnostics - Plugin
- Atlassian Jira - Plugins - Diagnostics Plugin
- Restart Jira to ensure old lingering threads won't affect our test.
- Go to Manage Apps and confirm the two diagnostic Apps are still disabled.
Enable these Apps once upgrading Jira to a version where this bug is fixed.
Atlassian diagnostics creates several threads when monitoring database operations made by Jira and may crash the instance during high DB activity
-
Bug
-
Resolution: Fixed
-
High
-
9.3.0, 9.3.1
-
9.03
-
12
-
Severity 2 - Major
-
46
-
Issue Summary
It seems Atlassian Diagnostics plugin isn't reusing threads from its thread pool and is creating several threads when monitoring operations against the database.
The problem becomes more evident when running a full reindex as this is one of the operations on Jira with high database activity.
During high DB activity, such as a full reindex, the Jira instance might crash because of the high number of threads.
This seems to be related to an upgrade on the Atlassian Diagnostics plugin to version 2.0.4 on Jira 9.3.0.
The bug couldn't be recreated on Jira 9.2.0.
Steps to Reproduce
- Install a vanilla instance of Jira Software Data Center 9.3.0.
- This was validated with both 9.3.0, 9.3.1 and 9.4.0-eap.
- The bug couldn't be recreated on 9.2.0.
- Make sure Java Flight Recorder is enabled and taking thread dumps each 3 seconds.
- Create ~100 projects so the instance has something to reindex.
JIRA_BASE_URL=http://localhost:8080 JIRA_ADMIN_USERNAME=admin JIRA_ADMIN_PASSWORD=admin JIRA_PROJECT_NAME=kan JIRA_PROJECT_KEY=KAN for i in $(seq 1 100); do curl -v -u ${JIRA_ADMIN_USERNAME}:${JIRA_ADMIN_PASSWORD} -X POST -o /dev/null \ -H 'X-Atlassian-Token: no-check' \ ${JIRA_BASE_URL}'/rest/jira-importers-plugin/1.0/demo/create' \ --data-raw 'name='${JIRA_PROJECT_NAME}${i}'&key='${JIRA_PROJECT_KEY}${i}'&keyEdited=false&projectTemplateWebItemKey=software-demo-project-kanban&projectTemplateModuleKey=undefined' done
- Run a Full reindex.
Expected Results
Full reindex runs with no major problem and no side effects.
Actual Results
Full reindex completes without any error.
Hundreds or thousands of threads named pool-XX-thread-XXXX are created.
Inspecting JFR and thread dumps while the reindex was running shows threads similar to the below.
"pool-18-thread-695" prio=5 tid=0x00000000000007d5 nid=0 waiting on condition java.lang.Thread.State: TIMED_WAITING (parking) at java.base@11.0.16/jdk.internal.misc.Unsafe.park(Native Method) - parking to wait for <0x000000003697a89f> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.base@11.0.16/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234) at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123) at java.base@11.0.16/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1053) at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base@11.0.16/java.lang.Thread.run(Thread.java:829) Locked ownable synchronizers: - None
On few instances we are able to capture these threads on runnable state and they would be similar to the below, running code associated to com.atlassian.diagnostics.
"pool-17-thread-27202" #36575 prio=5 os_prio=0 cpu=0.13ms elapsed=1.50s tid=0x00007f41f89d8000 nid=0x905b runnable [0x00007f3aab6fe000] java.lang.Thread.State: RUNNABLE at io.micrometer.core.instrument.LongTaskTimer$Builder.register(LongTaskTimer.java:408) at io.micrometer.core.instrument.MeterRegistry$More.longTaskTimer(MeterRegistry.java:872) at com.atlassian.util.profiling.micrometer.MicrometerStrategy.startLongRunningTimer(MicrometerStrategy.java:104) at com.atlassian.util.profiling.micrometer.MicrometerStrategy.startLongRunningTimer(MicrometerStrategy.java:93) at com.atlassian.util.profiling.Metrics$DefaultLongRunningMetricTimer.start(Metrics.java:721) at com.atlassian.util.profiling.Metrics$Builder.startLongRunningTimer(Metrics.java:635) at com.atlassian.diagnostics.internal.platform.monitor.db.DefaultDatabaseDiagnosticsCollector.lambda$startTimingDatabaseOperationAsync$1(DefaultDatabaseDiagnosticsCollector.java:153) at com.atlassian.diagnostics.internal.platform.monitor.db.DefaultDatabaseDiagnosticsCollector$$Lambda$2047/0x00000008427bd840.call(Unknown Source) at java.util.concurrent.FutureTask.run(java.base@11.0.16.1/FutureTask.java:264) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.16.1/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.16.1/ThreadPoolExecutor.java:628) at java.lang.Thread.run(java.base@11.0.16.1/Thread.java:829)
Looking at the JFR data there will be thousands of threads with the characteristics described above.
Depending on how large the environment is, the application crashes during the reindex because of the number of processes created.
Sometimes Jira may crash with the following error because of many existing threads.
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
Workaround
Disable affecting system Apps while running Jira on 9.3.0 and 9.3.1 versions:
- Go to Cog icon > Manage Apps > Manage Apps.
- Choose All Apps and filter for diagnostic.
- Disable the following 2 system apps.
- Atlassian Diagnostics - Plugin
- Atlassian Jira - Plugins - Diagnostics Plugin
- Restart Jira to ensure old lingering threads won't affect our test.
- Go to Manage Apps and confirm the two diagnostic Apps are still disabled.
Enable these Apps once upgrading Jira to a version where this bug is fixed.