[SCALE-141] Jira deployed on kubernetes using helm is throwing error on stop when jmx monitoring is enabled

Type: Bug
Resolution: Fixed
Priority: Low
Fix Version/s: None
Affects Version/s: master
Component/s: Environment - Kubernetes
Labels:
- dcxp-triaged

Symptom Severity:
Severity 2 - Major
Development Effort:
S

Issue Summary

In case Jira is deployed on kubernetes and jmx monitoring is enabled when a stop command is executed, jmx agent throws a port already in use error and the stop command fails. JMV is terminated by an OS signal as a result.

Steps to Reproduce

Deploy Jira on kubernetes using helm chart with jmx monitoring enabled
Execute the following command to stop Jira

kubectl exec -it jira-0 -c jira -- /shutdown-wait.sh

Expected Results

Jira stops without any error

Actual Results

The below exception is thrown

$ kubectl exec -it jira-0 -c jira --  /shutdown-wait.sh
Shutting down Jira...
executing as current user

                  `sMMMMMMMMMMMMMM+
                     MMMMMMMMMMMMMM
                     :sdMMMMMMMMMMM
                             MMMMMM
          `sMMMMMMMMMMMMMM+  MMMMMM
             MMMMMMMMMMMMMM  +MMMMM
              :sMMMMMMMMMMM   MMMMM
                     MMMMMM    `UOJ
   `sMMMMMMMMMMMMM+  MMMMMM
     MMMMMMMMMMMMMM  +MMMMM
      :sdMMMMMMMMMM   MMMMM
             MMMMMM    `UOJ
             MMMMMM
             +MMMMM
              MMMMM
               `UOJ

      Atlassian Jira
      Version : 9.12.13
                  

If you encounter issues starting or stopping Jira, please see the Troubleshooting guide at https://docs.atlassian.com/jira/jadm-docs-0912/Troubleshooting+installation

Using JIRA_HOME:       /var/atlassian/application-data/jira

Server startup logs are located in /opt/atlassian/jira/logs/catalina.out
Using CATALINA_BASE:   /opt/atlassian/jira
Using CATALINA_HOME:   /opt/atlassian/jira
Using CATALINA_TMPDIR: /opt/atlassian/jira/temp
Using JRE_HOME:        /opt/java/openjdk
Using CLASSPATH:       /opt/atlassian/jira/bin/bootstrap.jar:/opt/atlassian/jira/bin/tomcat-juli.jar
Using CATALINA_OPTS:   
Using CATALINA_PID:    /opt/atlassian/jira/work/catalina.pid
NOTE: Picked up JDK_JAVA_OPTIONS:  --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
 specified to --add-exportsL-UNNAMED
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:569)
        at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:491)
        at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:503)
Caused by: java.net.BindException: Address already in use
        at java.base/sun.nio.ch.Net.bind0(Native Method)
        at java.base/sun.nio.ch.Net.bind(Net.java:555)
        at java.base/sun.nio.ch.ServerSocketChannelImpl.netBind(ServerSocketChannelImpl.java:337)
        at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:294)
        at java.base/sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:89)
        at jdk.httpserver/sun.net.httpserver.ServerImpl.<init>(ServerImpl.java:142)
        at jdk.httpserver/sun.net.httpserver.HttpServerImpl.<init>(HttpServerImpl.java:50)
        at jdk.httpserver/sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35)
        at jdk.httpserver/com.sun.net.httpserver.HttpServer.create(HttpServer.java:150)
        at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.<init>(HTTPServer.java:396)
        at io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:31)
        ... 6 more
*** java.lang.instrument ASSERTION FAILED ***: "!errorOutstanding" with message Outstanding error when calling method in invokeJavaAgentMainMethod at src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 619
*** java.lang.instrument ASSERTION FAILED ***: "success" with message invokeJavaAgentMainMethod failed at src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 459
*** java.lang.instrument ASSERTION FAILED ***: "result" with message agent load/premain call failed at src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422
FATAL ERROR in native method: processing of -javaagent failed, processJavaStart failed
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x8b385e]  jni_FatalError+0xbe
V  [libjvm.so+0xa327f2]  JvmtiExport::post_vm_initialized()+0x132
V  [libjvm.so+0xe6df56]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x946
V  [libjvm.so+0x8c1e22]  JNI_CreateJavaVM+0x52
C  [libjli.so+0x9c7f]  JavaMain+0x8f
C  [libjli.so+0xdf79]  ThreadJavaMain+0x9
C  [libc.so.6+0x9ca94]

/opt/atlassian/jira/bin/catalina.sh: line 575:   743 Aborted                 (core dumped) "/opt/java/openjdk/bin/java" -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio.file=ALL-UNNAMED --add-opens=java.base/javax.crypto=ALL-UNNAMED --add-opens=java.management/javax.management=ALL-UNNAMED --add-opens=java.desktop/sun.font=ALL-UNNAMED --add-opens=java.base/sun.reflect.generics.parser=ALL-UNNAMED --add-opens=java.base/java.time=ALL-UNNAMED --add-opens=jdk -Xms2g -Xmx2g -XX:InitialCodeCacheSize=32m -XX:ReservedCodeCacheSize=${JVM_RESERVED_CODE_CACHE_SIZE:=512m} -Dlog4j2.contextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector -Dlog4j2.disableJmx=true -Dlog4j2.garbagefree.threadContextMap=true -Dlog4j2.isWebapp=false -Djava.awt.headless=true -Datlassian.standalone=JIRA -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true -Dmail.mime.decodeparameters=true -Dorg.dom4j.factory=com.atlassian.core.xml.InterningDocumentFactory -Dorg.apache.catalina.connector.RECYCLE_FACADES=false -Datlassian.logging.cloud.enabled=false -XX:ActiveProcessorCount=2 -javaagent:/var/atlassian/application-data/shared-home/jmx_prometheus_javaagent.jar=30000:/opt/atlassian/jmx/jmx-config.yaml -XX:-OmitStackTraceInFastThrow -Djava.locale.providers=COMPAT -Djira.home=/var/atlassian/application-data/jira -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs="" -classpath "/opt/atlassian/jira/bin/bootstrap.jar:/opt/atlassian/jira/bin/tomcat-juli.jar" -Dcatalina.base="/opt/atlassian/jira" -Dcatalina.home="/opt/atlassian/jira" -Djava.io.tmpdir="/opt/atlassian/jira/temp" org.apache.catalina.startup.Bootstrap stop
The stop command failed. Attempting to signal the process to stop through OS signal.
command terminated with exit code 137

Workaround

Currently there is no known workaround for this behavior. A workaround will be added here when available

Yevhen added a comment - 15/Nov/2024 2:51 AM

Available in 1.22.0

Yevhen added a comment - 15/Nov/2024 2:51 AM Available in 1.22.0

anders.lantz added a comment - 21/Oct/2024 9:48 AM

To expand on this issue:

When monitoring.exposeJmxMetrics is set to true, then jira will start up an additional java agent which will by default listen on port 9999 (see env var JVM_SUPPORT_RECOMMENDED_ARGS for details).

Then when the cluster decides to terminate pod, it will run the command specified in helm shutdown.command: "/shutdown-wait.sh"

This script will through a complex chain of events trigger catalina.sh which before finally starting a java process picks up the values in JVM_SUPPORT_RECOMMENDED_ARGS by running setenv.sh.

This new java process will try to also spin up a javaagent listening/binding to the same port as the jira javaagent, this will fail and a "Exception caused by: java.net.BindException: Address already in use" will be thrown.

JIRA now appears to terminate immediately, potentially corrupting any ongoing SQL queries, processes etc. On top of potential corruption problems, we also see "core.xxx" files created in JIRA local home, they seem to be the size of JVM_MAXIMUM_MEMORY which in our prod environment is ~35GB and with 4 pods each having several of these core files that is filling up disk space quickly.

The workaround we implemented was to update the shutdown command:

--set jira.shutdown.command: 'sed -i "JVM_SUPPORT_RECOMMENDED_ARGS=$(echo "$JVM_SUPPORT_RECOMMENDED_ARGS" | sed "s|-javaagent:/var/atlassian/application-data/shared-home/jmx_prometheus_javaagent.jar=[0-9]*:/opt/atlassian/jmx/jmx-config.yaml||") && /shutdown-wait.sh'

If you are still having similar shutdown problems, have a look at: https://jira.atlassian.com/browse/JRASERVER-70785

anders.lantz added a comment - 21/Oct/2024 9:48 AM To expand on this issue: When monitoring.exposeJmxMetrics is set to true, then jira will start up an additional java agent which will by default listen on port 9999 (see env var JVM_SUPPORT_RECOMMENDED_ARGS for details). Then when the cluster decides to terminate pod, it will run the command specified in helm shutdown.command: "/shutdown-wait.sh" This script will through a complex chain of events trigger catalina.sh which before finally starting a java process picks up the values in JVM_SUPPORT_RECOMMENDED_ARGS by running setenv.sh. This new java process will try to also spin up a javaagent listening/binding to the same port as the jira javaagent, this will fail and a "Exception caused by: java.net.BindException: Address already in use" will be thrown. JIRA now appears to terminate immediately, potentially corrupting any ongoing SQL queries, processes etc. On top of potential corruption problems, we also see "core.xxx" files created in JIRA local home, they seem to be the size of JVM_MAXIMUM_MEMORY which in our prod environment is ~35GB and with 4 pods each having several of these core files that is filling up disk space quickly. The workaround we implemented was to update the shutdown command: --set jira.shutdown.command: 'sed -i "JVM_SUPPORT_RECOMMENDED_ARGS=$(echo " $JVM_SUPPORT_RECOMMENDED_ARGS " | sed " s|-javaagent:/ var /atlassian/application-data/shared-home/jmx_prometheus_javaagent.jar=[0-9]*:/opt/atlassian/jmx/jmx-config.yaml||") && /shutdown-wait.sh' If you are still having similar shutdown problems, have a look at: https://jira.atlassian.com/browse/JRASERVER-70785

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Forms

Activity

Collapse comment: Yevhen added a comment - 15/Nov/2024 2:51 AM

Expand comment: Yevhen added a comment - 15/Nov/2024 2:51 AM

Collapse comment: anders.lantz added a comment - 21/Oct/2024 9:48 AM

Expand comment: anders.lantz added a comment - 21/Oct/2024 9:48 AM

People

Dates