Loading...

Type: Bug
Resolution: Fixed
Priority: High
Fix Version/s: 9.3.2, 9.4.0
Affects Version/s: 9.3.0, 9.3.1
Component/s: App Diagnositcs
Labels:
- lts940
- sefcon_cat2_performance

Introduced in Version:
9.03
Support reference count:
12
Symptom Severity:
Severity 2 - Major
UIS:
46
Bug Fix Policy:
View Atlassian Server bug fix policy

Issue Summary

It seems Atlassian Diagnostics plugin isn't reusing threads from its thread pool and is creating several threads when monitoring operations against the database.

The problem becomes more evident when running a full reindex as this is one of the operations on Jira with high database activity.

During high DB activity, such as a full reindex, the Jira instance might crash because of the high number of threads.

This seems to be related to an upgrade on the Atlassian Diagnostics plugin to version 2.0.4 on Jira 9.3.0.
The bug couldn't be recreated on Jira 9.2.0.

Steps to Reproduce

Install a vanilla instance of Jira Software Data Center 9.3.0.
- This was validated with both 9.3.0, 9.3.1 and 9.4.0-eap.
- The bug couldn't be recreated on 9.2.0.
Make sure Java Flight Recorder is enabled and taking thread dumps each 3 seconds.

Create ~100 projects so the instance has something to reindex.

JIRA_BASE_URL=http://localhost:8080
JIRA_ADMIN_USERNAME=admin
JIRA_ADMIN_PASSWORD=admin
JIRA_PROJECT_NAME=kan
JIRA_PROJECT_KEY=KAN

for i in $(seq 1 100); do
  curl -v -u ${JIRA_ADMIN_USERNAME}:${JIRA_ADMIN_PASSWORD} -X POST -o /dev/null \
    -H 'X-Atlassian-Token: no-check' \
    ${JIRA_BASE_URL}'/rest/jira-importers-plugin/1.0/demo/create' \
    --data-raw 'name='${JIRA_PROJECT_NAME}${i}'&key='${JIRA_PROJECT_KEY}${i}'&keyEdited=false&projectTemplateWebItemKey=software-demo-project-kanban&projectTemplateModuleKey=undefined'
done

Run a Full reindex.

Expected Results

Full reindex runs with no major problem and no side effects.

Actual Results

Full reindex completes without any error.
Hundreds or thousands of threads named pool-XX-thread-XXXX are created.
Inspecting JFR and thread dumps while the reindex was running shows threads similar to the below.

"pool-18-thread-695" prio=5 tid=0x00000000000007d5 nid=0 waiting on condition 
   java.lang.Thread.State: TIMED_WAITING (parking)
	at java.base@11.0.16/jdk.internal.misc.Unsafe.park(Native Method)
	- parking to wait for <0x000000003697a89f> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.base@11.0.16/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
	at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
	at java.base@11.0.16/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432)
	at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1053)
	at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
	at java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base@11.0.16/java.lang.Thread.run(Thread.java:829)

   Locked ownable synchronizers:
	- None

On few instances we are able to capture these threads on runnable state and they would be similar to the below, running code associated to com.atlassian.diagnostics.

"pool-17-thread-27202" #36575 prio=5 os_prio=0 cpu=0.13ms elapsed=1.50s tid=0x00007f41f89d8000 nid=0x905b runnable  [0x00007f3aab6fe000]
   java.lang.Thread.State: RUNNABLE
	at io.micrometer.core.instrument.LongTaskTimer$Builder.register(LongTaskTimer.java:408)
	at io.micrometer.core.instrument.MeterRegistry$More.longTaskTimer(MeterRegistry.java:872)
	at com.atlassian.util.profiling.micrometer.MicrometerStrategy.startLongRunningTimer(MicrometerStrategy.java:104)
	at com.atlassian.util.profiling.micrometer.MicrometerStrategy.startLongRunningTimer(MicrometerStrategy.java:93)
	at com.atlassian.util.profiling.Metrics$DefaultLongRunningMetricTimer.start(Metrics.java:721)
	at com.atlassian.util.profiling.Metrics$Builder.startLongRunningTimer(Metrics.java:635)
	at com.atlassian.diagnostics.internal.platform.monitor.db.DefaultDatabaseDiagnosticsCollector.lambda$startTimingDatabaseOperationAsync$1(DefaultDatabaseDiagnosticsCollector.java:153)
	at com.atlassian.diagnostics.internal.platform.monitor.db.DefaultDatabaseDiagnosticsCollector$$Lambda$2047/0x00000008427bd840.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(java.base@11.0.16.1/FutureTask.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.16.1/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.16.1/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.16.1/Thread.java:829)

Looking at the JFR data there will be thousands of threads with the characteristics described above.

Depending on how large the environment is, the application crashes during the reindex because of the number of processes created.

Sometimes Jira may crash with the following error because of many existing threads.

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

Workaround

Disable affecting system Apps while running Jira on 9.3.0 and 9.3.1 versions:

Go to Cog icon > Manage Apps > Manage Apps.
Choose All Apps and filter for diagnostic.
Disable the following 2 system apps.
- Atlassian Diagnostics - Plugin
- Atlassian Jira - Plugins - Diagnostics Plugin
Restart Jira to ensure old lingering threads won't affect our test.
Go to Manage Apps and confirm the two diagnostic Apps are still disabled.

Enable these Apps once upgrading Jira to a version where this bug is fixed.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

screenshot-1.png
231 kB
01/Nov/2022 6:14 PM

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(6 mentioned in)

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates