-
Bug
-
Resolution: Fixed
-
Low
-
7.4.0, 7.4.9
-
6
-
Severity 1 - Critical
-
11
-
Issue Summary
Confluence 7.4.0 intermittently and unexpectedly quits shortly after host restart
Steps to Reproduce
- With Confluence stopped, restart the host Confluence is running on, allow Confluence to automatically start and run for 10 minutes
- Repeat step 1 repeatedly (employ a script to automate if desired)
Expected Results
Confluence will remain running when started
Actual Results
Confluence will intermittently quit after ~3 minutes
Java 8 on Ubuntu Data
Server startup logs are located in /opt/atlassian/confluence/logs/catalina.out
---------------------------------------------------------------------------
Using Java: /opt/atlassian/jdk1.8.0_251/bin/java
2020-07-01 08:38:13,185 INFO [main] [atlassian.confluence.bootstrap.SynchronyProxyWatchdog] A Context element for ${confluence.context.path}/synchrony-proxy is found in /opt/atlassian/confluence/conf/server.xml. No further action is required
---------------------------------------------------------------------------
Tomcat started.
- Wait 130 seconds to allow Confluence to fully start
- Confluence unexpectedly quits after ~2.5 minutes
Confluence is running uptime is 2x5 seconds :: Wed Jul 1 08:40:23 UTC 2020 :::::::::: 1562 = Confluence PID number uptime is 3x5 seconds :: Wed Jul 1 08:40:28 UTC 2020 :::::::::: = Confluence PID number Confluence is NOT running
- Starting Confluence after unexpected stop
Server startup logs are located in /opt/atlassian/confluence/logs/catalina.out
---------------------------------------------------------------------------
Using Java: /opt/atlassian/jdk1.8.0_251/bin/java
2020-07-01 08:40:33,672 INFO [main] [atlassian.confluence.bootstrap.SynchronyProxyWatchdog] A Context element for ${confluence.context.path}/synchrony-proxy is found in /opt/atlassian/confluence/conf/server.xml. No further action is required
---------------------------------------------------------------------------
Existing PID file found during start.
Removing/clearing stale PID file.
- Wait 130 seconds to all Confluence to fully start again
- the next several restarts all run for 10+ minutes
Confluence is running uptime is 5x5 seconds :: Wed Jul 1 08:42:43 UTC 2020 :::::::::: 1974 = Confluence PID number Confluence is running uptime is 6x5 seconds :: Wed Jul 1 08:42:48 UTC 2020 :::::::::: 1974 = Confluence PID number Confluence is running
2020-07-01 04:38:20,781 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438 ....truncated..... 2020-07-01 04:39:57,617 INFO [lifecycle:thread-20] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] startup Starting Synchrony and enabling Collaborative Editing 2020-07-01 04:40:00,993 WARN [SiteSummaryCalculator:thread-1] [hibernate.boot.internal.SessionFactoryOptionsBuilder] doLog org.hibernate.tool.schema.Action cannot be cast to java.lang.String Ignoring *unexpected quit* 2020-07-01 04:40:38,577 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
01-Jul-2020 04:38:14.111 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server] failed to set property [debug] to [0] 01-Jul-2020 04:38:14.217 WARNING [main] org.apache.catalina.startup.SetAllPropertiesRule.begin [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'debug' to '0' did not find a matching property. 01-Jul-2020 04:38:14.248 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine] failed to set property [debug] to [0] 01-Jul-2020 04:38:14.262 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host] failed to set property [debug] to [0] 01-Jul-2020 04:38:14.330 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0] 01-Jul-2020 04:38:14.370 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0] 01-Jul-2020 04:38:14.803 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8090"] 01-Jul-2020 04:38:14.947 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [1,020] milliseconds 01-Jul-2020 04:38:14.960 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Tomcat-Standalone] 01-Jul-2020 04:38:14.961 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.33] 01-Jul-2020 04:38:17.978 INFO [Catalina-utility-2] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 01-Jul-2020 04:38:18.052 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log 1 Spring WebApplicationInitializers detected on classpath 01-Jul-2020 04:38:18.331 WARNING [Catalina-utility-2] org.apache.catalina.util.SessionIdGeneratorBase.createSecureRandom Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [170] milliseconds. 01-Jul-2020 04:38:18.582 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log Initializing Spring DispatcherServlet 'dispatcher' 2020-07-01 04:38:20,781 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438 2020-07-01 04:38:25,553 INFO [Catalina-utility-1] [atlassian.confluence.cluster.DefaultClusterConfigurationHelper] lambda$populateExistingClusterSetupConfig$5 Populating setup configuration if running with Cluster mode... 2020-07-01 04:38:25,916 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] configure Configuring Hazelcast with instanceName [confluence], join configuration TCP/IP member addresses: 10.125.91.103, network interfaces [fe80:0:0:0:10ff:5ff:fed8:8c37%ens3, 10.125.91.103] and local port 5801 2020-07-01 04:38:25,916 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Starting the cluster. 2020-07-01 04:38:27,489 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Confluence cluster node identifier is [452e2be3] *unexpected quit* 01-Jul-2020 04:40:34.220 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server] failed to set property [debug] to [0] 01-Jul-2020 04:40:34.298 WARNING [main] org.apache.catalina.startup.SetAllPropertiesRule.begin [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'debug' to '0' did not find a matching property. 01-Jul-2020 04:40:34.317 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine] failed to set property [debug] to [0] 01-Jul-2020 04:40:34.325 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host] failed to set property [debug] to [0] 01-Jul-2020 04:40:34.368 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0] 01-Jul-2020 04:40:34.392 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0] 01-Jul-2020 04:40:34.706 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8090"] 01-Jul-2020 04:40:34.739 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [632] milliseconds 01-Jul-2020 04:40:34.748 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Tomcat-Standalone] 01-Jul-2020 04:40:34.748 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.33] 01-Jul-2020 04:40:37.175 INFO [Catalina-utility-2] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 01-Jul-2020 04:40:37.238 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log 1 Spring WebApplicationInitializers detected on classpath 01-Jul-2020 04:40:37.574 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log Initializing Spring DispatcherServlet 'dispatcher' 2020-07-01 04:40:38,577 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
- OS logs do not show any evidence the process was terminated
Jul 01 08:37:40 ip-10-125-91-103 systemd-timesyncd[666]: Synchronized to time server 91.189.89.198:123 (ntp.ubuntu.com). Jul 01 08:38:12 ip-10-125-91-103 sudo[1516]: root : TTY=unknown ; PWD=/opt/atlassian/confluence/bin ; USER=root ; COMMAND=/bin/bash /opt/atlassian/confluence/bin/start-c Jul 01 08:38:12 ip-10-125-91-103 sudo[1516]: pam_unix(sudo:session): session opened for user root by (uid=0) Jul 01 08:38:12 ip-10-125-91-103 runuser[1522]: pam_unix(runuser:session): session opened for user confluence by (uid=0) Jul 01 08:38:13 ip-10-125-91-103 runuser[1522]: pam_unix(runuser:session): session closed for user confluence Jul 01 08:38:13 ip-10-125-91-103 sudo[1516]: pam_unix(sudo:session): session closed for user root Jul 01 08:40:33 ip-10-125-91-103 sudo[1928]: root : TTY=unknown ; PWD=/opt/atlassian/confluence/bin ; USER=root ; COMMAND=/bin/bash /opt/atlassian/confluence/bin/start-c Jul 01 08:40:33 ip-10-125-91-103 sudo[1928]: pam_unix(sudo:session): session opened for user root by (uid=0) Jul 01 08:40:33 ip-10-125-91-103 runuser[1933]: pam_unix(runuser:session): session opened for user confluence by (uid=0) Jul 01 08:40:33 ip-10-125-91-103 runuser[1933]: pam_unix(runuser:session): session closed for user confluence Jul 01 08:40:33 ip-10-125-91-103 sudo[1928]: pam_unix(sudo:session): session closed for user root Jul 01 08:42:14 ip-10-125-91-103 amazon-ssm-agent.amazon-ssm-agent[999]: 2020-07-01 08:42:14 INFO Backing off health check to every 600 seconds for 1800 seconds. Jul 01 08:42:14 ip-10-125-91-103 amazon-ssm-agent.amazon-ssm-agent[999]: 2020-07-01 08:42:14 ERROR Health ping failed with error - AccessDeniedException: User: arn:aws:sts:: Jul 01 08:42:14 ip-10-125-91-103 amazon-ssm-agent.amazon-ssm-agent[999]: status code: 400, request id: 76cc4ad1-04dd-4fa6-b415-27716fbf6cd0 Jul 01 08:45:01 ip-10-125-91-103 CRON[2767]: pam_unix(cron:session): session opened for user root by (uid=0)
Jul 1 08:37:11 ip-10-125-91-103 kernel: [ 12.463296] Key type id_legacy registered Jul 1 08:37:11 ip-10-125-91-103 kernel: [ 12.526266] new mount options do not match the existing superblock, will be ignored Jul 1 08:52:42 ip-10-125-91-103 kernel: [ 943.535197] nfsd: last server has exited, flushing export cache Jul 1 08:53:25 ip-10-125-91-103 kernel: [ 0.000000] Linux version 5.3.0-1028-aws (buildd@lcy01-amd64-015) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #30~18.04.1-Ubuntu SMP Mon Jun 22 15:48:21 UTC 2020 (Ubuntu 5.3.0-1028.30~18.04.1-aws 5.3.18) Jul 1 08:53:25 ip-10-125-91-103 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-1028-aws root=UUID=8abb4f2d-c7e0-4daf-80b5-24e1c814cf55 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295
Java 11 on Amazon Linux 2
Starting Confluence Server startup logs are located in /opt/atlassian/confluence/logs/catalina.out --------------------------------------------------------------------------- Using Java: /opt/atlassian/confluence/jre//bin/java 2020-06-30 18:30:48,659 INFO [main] [atlassian.confluence.bootstrap.SynchronyProxyWatchdog] A Context element for ${confluence.context.path}/synchrony-proxy is found in /opt/atlassian/confluence/conf/server.xml. No further action is required --------------------------------------------------------------------------- Tomcat started.
- Wait 130 seconds to all Confluence to completely start
- Confluence unexpectedly quits after ~2.5 minutes
Confluence is NOT running Confluence is NOT running Starting Confluence Server startup logs are located in /opt/atlassian/confluence/logs/catalina.out --------------------------------------------------------------------------- Using Java: /opt/atlassian/confluence/jre//bin/java 2020-06-30 18:32:59,525 INFO [main] [atlassian.confluence.bootstrap.SynchronyProxyWatchdog] A Context element for ${confluence.context.path}/synchrony-proxy is found in /opt/atlassian/confluence/conf/server.xml. No further action is required --------------------------------------------------------------------------- Existing PID file found during start. Removing/clearing stale PID file. Tomcat started.
- Wait 130 seconds to all Confluence to completely start again
- Confluence will continue to run for 10+ minutes uninterrupted
Confluence is running Confluence is running uptime is 3x5 seconds :: Tue Jun 30 18:35:09 UTC 2020 :::::::::: 3877 = Confluence PID number Confluence is running uptime is 4x5 seconds :: Tue Jun 30 18:35:14 UTC 2020 :::::::::: 3877 = Confluence PID number Confluence is running uptime is 5x5 seconds :: Tue Jun 30 18:35:19 UTC 2020 :::::::::: 3877 = Confluence PID number
2020-06-30 14:30:55,729 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438 2020-06-30 14:31:00,145 INFO [Catalina-utility-1] [atlassian.confluence.cluster.DefaultClusterConfigurationHelper] lambda$populateExistingClusterSetupConfig$5 Populating setup configuration if running with Cluster mode... 2020-06-30 14:31:00,502 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] configure Configuring Hazelcast with instanceName [confluence], join configuration TCP/IP member addresses: 10.125.88.150, network interfaces [fe80:0:0:0:85e:3bff:fe69:ab2d%eth0, 10.125.88.150] and local port 5801 2020-06-30 14:31:00,502 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Starting the cluster. 2020-06-30 14:31:00,508 WARN [Catalina-utility-1] [com.hazelcast.instance.HazelcastInstanceFactory] log Hazelcast is starting in a Java modular environment (Java 9 and newer) but without proper access to required Java packages. Use additional Java arguments to provide Hazelcast access to Java internal API. The internal API access is used to get the best performance results. Arguments to be used: --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED 2020-06-30 14:31:02,729 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Confluence cluster node identifier is [452dd494] 2020-06-30 14:31:02,819 INFO [Catalina-utility-1] [springframework.web.context.ContextLoader] initWebApplicationContext Root WebApplicationContext: initialization started 2020-06-30 14:31:07,903 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] <init> Loading EhCache cache manager 2020-06-30 14:31:18,196 INFO [Catalina-utility-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] init init: cluster ClusterService{address=[10.125.88.150]:5801} 2020-06-30 14:31:18,197 INFO [Catalina-utility-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] init init: cluster contains Member [10.125.88.150]:5801 - 60471142-00a2-4646-9dfc-a3a9c196745d this 2020-06-30 14:31:23,223 INFO [Catalina-utility-1] [springframework.web.context.ContextLoader] initWebApplicationContext Root WebApplicationContext initialized in 20404 ms 2020-06-30 14:31:24,172 INFO [Catalina-utility-1] [atlassian.plugin.manager.DefaultPluginManager] lambda$earlyStartup$3 Plugin system earlyStartup begun 2020-06-30 14:31:28,393 WARN [FelixStartLevel] [extender.internal.support.ExtenderConfiguration] addDefaultDependencyFactories Gemini Blueprint extensions bundle not present, annotation processing disabled. 2020-06-30 14:32:16,656 INFO [Catalina-utility-1] [atlassian.plugin.manager.DefaultPluginManager] logTime Plugin system earlyStartup ended 2020-06-30 14:32:16,663 INFO [hz.confluence.event-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] memberAttributeChanged memberAttributeChanged: member Member [10.125.88.150]:5801 - 60471142-00a2-4646-9dfc-a3a9c196745d this, key confluence.node.tenant.arrived, value true 2020-06-30 14:32:16,718 INFO [Catalina-utility-1] [atlassian.plugin.manager.DefaultPluginManager] lambda$lateStartup$4 Plugin system lateStartup begun 2020-06-30 14:32:16,739 WARN [AtlassianEvent::CustomizableThreadFactory-1] [persistence.schema.hibernate.HibernateSchemaComparator] lambda$compareTables$1 Schema check for table [EVENTS] encountered an error. Turn debug logging on for stacktrace. 2020-06-30 14:32:16,741 WARN [AtlassianEvent::CustomizableThreadFactory-1] [persistence.schema.hibernate.HibernateSchemaComparator] lambda$compareTables$1 Schema check for table [SECRETS] encountered an error. Turn debug logging on for stacktrace. 2020-06-30 14:32:16,743 WARN [AtlassianEvent::CustomizableThreadFactory-1] [persistence.schema.hibernate.HibernateSchemaComparator] lambda$compareTables$1 Schema check for table [SNAPSHOTS] encountered an error. Turn debug logging on for stacktrace. 2020-06-30 14:32:21,126 INFO [ThreadPoolAsyncTaskExecutor::Thread 20] [plugins.synchrony.bootstrap.DefaultSynchronyProxyMonitor] <init> synchrony-proxy healthcheck url: http://127.0.0.1:8090/synchrony-proxy/healthcheck 2020-06-30 14:32:24,254 INFO [ThreadPoolAsyncTaskExecutor::Thread 19] [atlassian.ratelimiting.internal.RateLimitingConfiguration] userRequestHandler Rate Limited Request Handler ENABLED 2020-06-30 14:32:24,846 INFO [ThreadPoolAsyncTaskExecutor::Thread 16] [service.stepexecutor.attachment.AttachmentMigrationExecutor] info Initialized attachment migration executor with concurrency = 5 and batch size = 20. 2020-06-30 14:32:24,862 INFO [ThreadPoolAsyncTaskExecutor::Thread 16] [migration.agent.queue.QueueBroker] info Started queue broker. Registered consumers: [com.atlassian.migration.agent.service.stepexecutor.user.UsersMigrationExecutor@688c9983, com.atlassian.migration.agent.service.stepexecutor.space.SpaceExportExecutor@3af44477, com.atlassian.migration.agent.service.stepexecutor.space.SpaceUploadExecutor@513c2d3d, com.atlassian.migration.agent.service.stepexecutor.attachment.AttachmentMigrationExecutor@6768780f, com.atlassian.migration.agent.service.stepexecutor.space.SpaceImportExecutor@6b24b5e0] 2020-06-30 14:32:24,915 INFO [ThreadPoolAsyncTaskExecutor::Thread 16] [agent.service.analytics.AnalyticsEventConsumer] info Successfully registered AnalyticsEventConsumer job migration-plugin:analytics-events-consumer-runner-key. 2020-06-30 14:32:24,937 INFO [ThreadPoolAsyncTaskExecutor::Thread 16] [agent.service.analytics.AnalyticsEventConsumer] info Successfully started AnalyticsEventConsumer poller. 2020-06-30 14:32:25,073 INFO [ThreadPoolAsyncTaskExecutor::Thread 16] [agent.service.impl.DetectedUserEmailAnalyticsScheduledRunner] info Scheduled detected email analytics event collection job to be run each 120 hours. 2020-06-30 14:32:27,843 INFO [Catalina-utility-1] [atlassian.plugin.manager.DefaultPluginManager] logTime Plugin system lateStartup ended 2020-06-30 14:32:28,098 WARN [lifecycle:thread-14] [expose.jmx.schedule.JmxInstrumentSchedulerImpl] onStart atlassian-instrumentation-jmx expose scheduler started. 2020-06-30 14:32:28,128 INFO [lifecycle:thread-18] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] startup Starting Synchrony and enabling Collaborative Editing 2020-06-30 14:32:28,430 WARN [hz.ShutdownThread] [com.hazelcast.instance.Node] log [10.125.88.150]:5801 [dc_cluser] [3.11.6] Terminating forcefully... *unexpected quit* 2020-06-30 14:33:05,293 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
2020-06-30 14:30:55,729 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438 2020-06-30 14:31:00,145 INFO [Catalina-utility-1] [atlassian.confluence.cluster.DefaultClusterConfigurationHelper] lambda$populateExistingClusterSetupConfig$5 Populating setup configuration if running with Cluster mode... 2020-06-30 14:31:00,502 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] configure Configuring Hazelcast with instanceName [confluence], join configuration TCP/IP member addresses: 10.125.88.150, network interfaces [fe80:0:0:0:85e:3bff:fe69:ab2d%eth0, 10.125.88.150] and local port 5801 2020-06-30 14:31:00,502 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Starting the cluster. 2020-06-30 14:31:00,508 WARN [Catalina-utility-1] [com.hazelcast.instance.HazelcastInstanceFactory] log Hazelcast is starting in a Java modular environment (Java 9 and newer) but without proper access to required Java packages. Use additional Java arguments to provide Hazelcast access to Java internal API. The internal API access is used to get the best performance results. Arguments to be used: --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by com.hazelcast.internal.networking.nio.SelectorOptimizer (file:/opt/atlassian/confluence/confluence/WEB-INF/lib/hazelcast-3.11.6.jar) to field sun.nio.ch.SelectorImpl.selectedKeys WARNING: Please consider reporting this to the maintainers of com.hazelcast.internal.networking.nio.SelectorOptimizer WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2020-06-30 14:31:02,729 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Confluence cluster node identifier is [452dd494] NOTE: Picked up JDK_JAVA_OPTIONS: --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED [0.001s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/opt/atlassian/confluence/logs/gc-2020-06-30_18-32-59.log instead. *unexpected quit* 30-Jun-2020 14:33:00.208 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server] failed to set property [debug] to [0] 30-Jun-2020 14:33:00.343 WARNING [main] org.apache.catalina.startup.SetAllPropertiesRule.begin [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'debug' to '0' did not find a matching property. 30-Jun-2020 14:33:00.380 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine] failed to set property [debug] to [0] 30-Jun-2020 14:33:00.386 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host] failed to set property [debug] to [0] 30-Jun-2020 14:33:00.437 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0] 30-Jun-2020 14:33:00.460 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0] 30-Jun-2020 14:33:00.965 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8090"] 30-Jun-2020 14:33:01.021 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [1,016] milliseconds 30-Jun-2020 14:33:01.036 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Tomcat-Standalone] 30-Jun-2020 14:33:01.036 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.33] 30-Jun-2020 14:33:03.584 INFO [Catalina-utility-2] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 30-Jun-2020 14:33:03.704 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log 1 Spring WebApplicationInitializers detected on classpath 30-Jun-2020 14:33:04.332 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log Initializing Spring DispatcherServlet 'dispatcher' 2020-06-30 14:33:05,293 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
- OS logs do not show any evidence the process was terminated
Jun 30 18:30:35 ip-10-125-91-103 systemd[1]: Reached target Cloud-init target. Jun 30 18:30:35 ip-10-125-91-103 systemd[1]: Startup finished in 5.098s (kernel) + 10.366s (userspace) = 15.465s. Jun 30 18:31:00 ip-10-125-91-103 systemd-timesyncd[643]: Synchronized to time server 91.189.94.4:123 (ntp.ubuntu.com). Jun 30 18:31:32 ip-10-125-91-103 sudo[1488]: root : TTY=unknown ; PWD=/opt/atlassian/confluence/bin ; USER=root ; COMMAND=/bin/bash /opt/atlassian/confluence/bin/start-c Jun 30 18:31:32 ip-10-125-91-103 sudo[1488]: pam_unix(sudo:session): session opened for user root by (uid=0) Jun 30 18:31:32 ip-10-125-91-103 runuser[1494]: pam_unix(runuser:session): session opened for user confluence by (uid=0) Jun 30 18:31:33 ip-10-125-91-103 runuser[1494]: pam_unix(runuser:session): session closed for user confluence Jun 30 18:31:33 ip-10-125-91-103 sudo[1488]: pam_unix(sudo:session): session closed for user root Jun 30 18:35:01 ip-10-125-91-103 CRON[2184]: pam_unix(cron:session): session opened for user root by (uid=0) Jun 30 18:35:01 ip-10-125-91-103 CRON[2191]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jun 30 18:30:31 ip-10-125-91-103 kernel: [ 12.272625] Key type id_resolver registered Jun 30 18:30:31 ip-10-125-91-103 kernel: [ 12.272626] Key type id_legacy registered Jun 30 18:30:32 ip-10-125-91-103 kernel: [ 12.548340] new mount options do not match the existing superblock, will be ignored Jun 30 18:44:43 ip-10-125-91-103 kernel: [ 0.000000] Linux version 5.3.0-1028-aws (buildd@lcy01-amd64-015) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #30~18.04.1-Ubuntu SMP Mon Jun 22 15:48:21 UTC 2020 (Ubuntu 5.3.0-1028.30~18.04.1-aws 5.3.18) Jun 30 18:44:43 ip-10-125-91-103 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-1028-aws root=UUID=8abb4f2d-c7e0-4daf-80b5-24e1c814cf55 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 Jun 30 18:44:43 ip-10-125-91-103 kernel: [ 0.000000] KERNEL supported cpus:
Notes
- Issue has been consistently reproducible in AWS environments running either Ubuntu or Amazon Linux 2, have also tested on bare-metal servers as well.
- Issue is most recreate-able after a host restart, although restarting the app from the command line can produce this as well.
- Issue was originally thought to be isolated to Java 11, but later testing revealed issue can be recreated using Java 8 as well. The customer in this case (https://getsupport.atlassian.com/browse/PSSRV-3617) switched to Java 8 and claimed he could no longer reproduce the issue.
- It has been observed that if Confluence will stay running past the ~3 to ~10 minute mark then it will likely not stop unexpectedly past that point (as expected)
- Have tried JVM heap sizes of 1G and 5G
- No 3rd-party plugins in Confluence. These are vanilla instances installed using the 7.4.0 bin installer, with the Ubuntu example tested on Java 11 at first, then manually switched over to use Oracle's Java 8u251 rather than Adopt OpenJDK 11.0.5
- Confluence 6 has been tested and does this as well.
- Customer tested with Adopt OpenJDK 11.0.7 and was able to reproduce the problem consistently
- Customer claims a 50% failure rate after host restarts or in his case, initial AWS scaling group deployment from AMI image. Our reproductions of the issue are more like 20% of restarts will produce this issue.
Files (application logs are 4 hours behind server's time)
- ubuntu_Java_8_example.zip - contains the OS systemd journal log + restart automation script output
- AmazonLinux2_Java_11_example.zip - contains the OS systemd journal log + restart automation script output
For the "script output" logs, manually scroll and look for instances where "Starting" occur but has no corresponding "Stopping", as these indicate the times where the application quit on its own unexpectedly.
- For these there will be a 2nd "Starting" message printed and then proceeds with uptime checks. Script allows the application to run for 10 minutes before printing a "Stopping" message and shutting down the application gracefully, then restarts the host after which the application is automatically started by the OS.
- Grep'ing for these 2 strings will show that there are more starts than stops, even counting the final entry in the logs which will have a start/no stop because Confluence was running when log was captured.
grep -c "Starting" Ubuntu_Java8_automation_script.log 52 grep -c "Stopping" Ubuntu_Java8_automation_script.log 40 grep -c "Starting" AmazonLinux2_Java11_automation_script.log 18 grep -c "Stopping" AmazonLinux2_Java11_automation_script.log 14
Scripts used to automate the restarts (see Confluence intermittently and unexpectedly quits shortly after host restart for latest versions of these)
- script.restart_app - script used for restart application-only after 10 minutes experimentation
- script.restart_host - script used for stop application after 10 minutes then restart host experiments
Systemd calls the Confluence sysv init startup script (/etc/init.d/confluence) for OS-controlled automatic service starts. This sysv init script was modified to call the custom scripts above for automatic service start.
# Confluence Linux service controller script cd "/opt/atlassian/confluence/bin"case "$1" in start) /home/ec2-user/script >> /home/ec2-user/log 2>&1 & ;; stop) ./stop-confluence.sh ;; restart) ./stop-confluence.sh ./start-confluence.sh ;; *) echo "Usage: $0 {start|stop|restart}" exit 1 ;; esac
- The systemd log will show the "Confluence service" to be in error when the OS attempts to stop the application during restarts, this is because the automation script has already stopped Confluence previously, so the exit code from the system-executed shutdown script will be "1" due to the app already being off.
- Any Confluence service errors in the systemd journal log which occur when the host is restarting can be disregarded for that reason.
- The automation script can be improved to not shutdown Confluence prior to restart if you wish to collect systemd journal logs that would not contain this disregard-able error.
Issue a "ps aux | grep script" command to get the PID of the automation script which is launched at startup, "kill -9" to end it and prevent reboots from occurring while you are logged in and investigating. Restart the host to resume the script - recommend renaming the "log" file when doing so to always be starting a new log when you intentionally wish to start the whole process over.
Note: The script waits 60 seconds after OS start before starting Confluence to ensure everything has started before starting the application, there will be a delay if you log in to check the app's status directly after a reboot.
Workaround
Delete synchrony.pid file before starting Confluence. This file should be located in JVM temporary directory (java.io.tmpdir). This directory location is specified by $CATALINA_TMPDIR (or in cases when $CATALINA_TMPDIR is not specified, default value $CATALINA_BASE/temp is used).
- is duplicated by
-
CONFSERVER-55501 Confluence startup fails being terminated by Synchrony
- Closed
- relates to
-
CONFSERVER-59893 Synchrony is unable to start after a non-graceful Confluence shutdown
- Closed
-
PSR-437 Loading...
- causes
-
PS-72037 Loading...
-
PSSRV-3617 Loading...
- links to
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...