Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-60014

Confluence 7.4.0 intermittently and unexpectedly quits shortly after host restart

    XMLWordPrintable

Details

    Description

      Issue Summary

      Confluence 7.4.0 intermittently and unexpectedly quits shortly after host restart

      Steps to Reproduce

      1. With Confluence stopped, restart the host Confluence is running on, allow Confluence to automatically start and run for 10 minutes
      2. Repeat step 1 repeatedly (employ a script to automate if desired)

      Expected Results

      Confluence will remain running when started

      Actual Results

      Confluence will intermittently quit after ~3 minutes

       

      Java 8 on Ubuntu Data

      Starting Confluence
      Server startup logs are located in /opt/atlassian/confluence/logs/catalina.out
      ---------------------------------------------------------------------------
      Using Java: /opt/atlassian/jdk1.8.0_251/bin/java
      2020-07-01 08:38:13,185 INFO [main] [atlassian.confluence.bootstrap.SynchronyProxyWatchdog] A Context element for ${confluence.context.path}/synchrony-proxy is found in /opt/atlassian/confluence/conf/server.xml. No further action is required
      ---------------------------------------------------------------------------
      Tomcat started.
      • Wait 130 seconds to allow Confluence to fully start
      • Confluence unexpectedly quits after ~2.5 minutes
      Confluence is running
      uptime is 2x5 seconds :: Wed Jul  1 08:40:23 UTC 2020 :::::::::: 1562 = Confluence PID number
      uptime is 3x5 seconds :: Wed Jul  1 08:40:28 UTC 2020 ::::::::::  = Confluence PID number
      Confluence is NOT running
      
      • Starting Confluence after unexpected stop
      Server startup logs are located in /opt/atlassian/confluence/logs/catalina.out
      ---------------------------------------------------------------------------
      Using Java: /opt/atlassian/jdk1.8.0_251/bin/java
      2020-07-01 08:40:33,672 INFO [main] [atlassian.confluence.bootstrap.SynchronyProxyWatchdog] A Context element for ${confluence.context.path}/synchrony-proxy is found in /opt/atlassian/confluence/conf/server.xml. No further action is required
      ---------------------------------------------------------------------------
      Existing PID file found during start.
      Removing/clearing stale PID file.
      
      • Wait 130 seconds to all Confluence to fully start again
      • the next several restarts all run for 10+ minutes
      Confluence is running
      uptime is 5x5 seconds :: Wed Jul  1 08:42:43 UTC 2020 :::::::::: 1974 = Confluence PID number
      Confluence is running
      uptime is 6x5 seconds :: Wed Jul  1 08:42:48 UTC 2020 :::::::::: 1974 = Confluence PID number
      Confluence is running
      
      atlassian-confluence.log - logs are 4 hours behind server's time
      2020-07-01 04:38:20,781 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
      ....truncated.....
      2020-07-01 04:39:57,617 INFO [lifecycle:thread-20] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] startup Starting Synchrony and enabling Collaborative Editing
      2020-07-01 04:40:00,993 WARN [SiteSummaryCalculator:thread-1] [hibernate.boot.internal.SessionFactoryOptionsBuilder] doLog org.hibernate.tool.schema.Action cannot be cast to java.lang.String  Ignoring
      
        *unexpected quit*
      
      2020-07-01 04:40:38,577 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
      
      catalina.out - logs are 4 hours behind server's time
      01-Jul-2020 04:38:14.111 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server] failed to set property [debug] to [0]
      01-Jul-2020 04:38:14.217 WARNING [main] org.apache.catalina.startup.SetAllPropertiesRule.begin [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'debug' to '0' did not find a matching property.
      01-Jul-2020 04:38:14.248 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine] failed to set property [debug] to [0]
      01-Jul-2020 04:38:14.262 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host] failed to set property [debug] to [0]
      01-Jul-2020 04:38:14.330 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0]
      01-Jul-2020 04:38:14.370 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0]
      01-Jul-2020 04:38:14.803 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8090"]
      01-Jul-2020 04:38:14.947 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [1,020] milliseconds
      01-Jul-2020 04:38:14.960 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Tomcat-Standalone]
      01-Jul-2020 04:38:14.961 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.33]
      01-Jul-2020 04:38:17.978 INFO [Catalina-utility-2] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
      SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
      SLF4J: Defaulting to no-operation (NOP) logger implementation
      SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
      01-Jul-2020 04:38:18.052 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log 1 Spring WebApplicationInitializers detected on classpath
      01-Jul-2020 04:38:18.331 WARNING [Catalina-utility-2] org.apache.catalina.util.SessionIdGeneratorBase.createSecureRandom Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [170] milliseconds.
      01-Jul-2020 04:38:18.582 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log Initializing Spring DispatcherServlet 'dispatcher'
      2020-07-01 04:38:20,781 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
      2020-07-01 04:38:25,553 INFO [Catalina-utility-1] [atlassian.confluence.cluster.DefaultClusterConfigurationHelper] lambda$populateExistingClusterSetupConfig$5 Populating setup configuration if running with Cluster mode...
      2020-07-01 04:38:25,916 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] configure Configuring Hazelcast with instanceName [confluence], join configuration TCP/IP member addresses: 10.125.91.103, network interfaces [fe80:0:0:0:10ff:5ff:fed8:8c37%ens3, 10.125.91.103] and local port 5801
      2020-07-01 04:38:25,916 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Starting the cluster.
      2020-07-01 04:38:27,489 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Confluence cluster node identifier is [452e2be3]
      
        *unexpected quit*
      
      01-Jul-2020 04:40:34.220 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server] failed to set property [debug] to [0]
      01-Jul-2020 04:40:34.298 WARNING [main] org.apache.catalina.startup.SetAllPropertiesRule.begin [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'debug' to '0' did not find a matching property.
      01-Jul-2020 04:40:34.317 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine] failed to set property [debug] to [0]
      01-Jul-2020 04:40:34.325 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host] failed to set property [debug] to [0]
      01-Jul-2020 04:40:34.368 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0]
      01-Jul-2020 04:40:34.392 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0]
      01-Jul-2020 04:40:34.706 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8090"]
      01-Jul-2020 04:40:34.739 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [632] milliseconds
      01-Jul-2020 04:40:34.748 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Tomcat-Standalone]
      01-Jul-2020 04:40:34.748 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.33]
      01-Jul-2020 04:40:37.175 INFO [Catalina-utility-2] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
      SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
      SLF4J: Defaulting to no-operation (NOP) logger implementation
      SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
      01-Jul-2020 04:40:37.238 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log 1 Spring WebApplicationInitializers detected on classpath
      01-Jul-2020 04:40:37.574 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log Initializing Spring DispatcherServlet 'dispatcher'
      2020-07-01 04:40:38,577 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
      • OS logs do not show any evidence the process was terminated
      systemd journal
      Jul 01 08:37:40 ip-10-125-91-103 systemd-timesyncd[666]: Synchronized to time server 91.189.89.198:123 (ntp.ubuntu.com).
      Jul 01 08:38:12 ip-10-125-91-103 sudo[1516]:     root : TTY=unknown ; PWD=/opt/atlassian/confluence/bin ; USER=root ; COMMAND=/bin/bash /opt/atlassian/confluence/bin/start-c
      Jul 01 08:38:12 ip-10-125-91-103 sudo[1516]: pam_unix(sudo:session): session opened for user root by (uid=0)
      Jul 01 08:38:12 ip-10-125-91-103 runuser[1522]: pam_unix(runuser:session): session opened for user confluence by (uid=0)
      Jul 01 08:38:13 ip-10-125-91-103 runuser[1522]: pam_unix(runuser:session): session closed for user confluence
      Jul 01 08:38:13 ip-10-125-91-103 sudo[1516]: pam_unix(sudo:session): session closed for user root
      Jul 01 08:40:33 ip-10-125-91-103 sudo[1928]:     root : TTY=unknown ; PWD=/opt/atlassian/confluence/bin ; USER=root ; COMMAND=/bin/bash /opt/atlassian/confluence/bin/start-c
      Jul 01 08:40:33 ip-10-125-91-103 sudo[1928]: pam_unix(sudo:session): session opened for user root by (uid=0)
      Jul 01 08:40:33 ip-10-125-91-103 runuser[1933]: pam_unix(runuser:session): session opened for user confluence by (uid=0)
      Jul 01 08:40:33 ip-10-125-91-103 runuser[1933]: pam_unix(runuser:session): session closed for user confluence
      Jul 01 08:40:33 ip-10-125-91-103 sudo[1928]: pam_unix(sudo:session): session closed for user root
      Jul 01 08:42:14 ip-10-125-91-103 amazon-ssm-agent.amazon-ssm-agent[999]: 2020-07-01 08:42:14 INFO Backing off health check to every 600 seconds for 1800 seconds.
      Jul 01 08:42:14 ip-10-125-91-103 amazon-ssm-agent.amazon-ssm-agent[999]: 2020-07-01 08:42:14 ERROR Health ping failed with error - AccessDeniedException: User: arn:aws:sts::
      Jul 01 08:42:14 ip-10-125-91-103 amazon-ssm-agent.amazon-ssm-agent[999]:         status code: 400, request id: 76cc4ad1-04dd-4fa6-b415-27716fbf6cd0
      Jul 01 08:45:01 ip-10-125-91-103 CRON[2767]: pam_unix(cron:session): session opened for user root by (uid=0)
      
      kern.log
      Jul  1 08:37:11 ip-10-125-91-103 kernel: [   12.463296] Key type id_legacy registered
      Jul  1 08:37:11 ip-10-125-91-103 kernel: [   12.526266] new mount options do not match the existing superblock, will be ignored
      Jul  1 08:52:42 ip-10-125-91-103 kernel: [  943.535197] nfsd: last server has exited, flushing export cache
      Jul  1 08:53:25 ip-10-125-91-103 kernel: [    0.000000] Linux version 5.3.0-1028-aws (buildd@lcy01-amd64-015) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #30~18.04.1-Ubuntu SMP Mon Jun 22 15:48:21 UTC 2020 (Ubuntu 5.3.0-1028.30~18.04.1-aws 5.3.18)
      Jul  1 08:53:25 ip-10-125-91-103 kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-1028-aws root=UUID=8abb4f2d-c7e0-4daf-80b5-24e1c814cf55 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295
      

       

      Java 11 on Amazon Linux 2

      Starting Confluence
       Starting Confluence 
      
      Server startup logs are located in /opt/atlassian/confluence/logs/catalina.out
      ---------------------------------------------------------------------------
      Using Java: /opt/atlassian/confluence/jre//bin/java
      2020-06-30 18:30:48,659 INFO [main] [atlassian.confluence.bootstrap.SynchronyProxyWatchdog] A Context element for ${confluence.context.path}/synchrony-proxy is found in /opt/atlassian/confluence/conf/server.xml. No further action is required
      ---------------------------------------------------------------------------
      Tomcat started. 
      
      • Wait 130 seconds to all Confluence to completely start
      • Confluence unexpectedly quits after ~2.5 minutes
      Confluence is NOT running
      Confluence is NOT running
      
       Starting Confluence 
      
      Server startup logs are located in /opt/atlassian/confluence/logs/catalina.out
      ---------------------------------------------------------------------------
      Using Java: /opt/atlassian/confluence/jre//bin/java
      2020-06-30 18:32:59,525 INFO [main] [atlassian.confluence.bootstrap.SynchronyProxyWatchdog] A Context element for ${confluence.context.path}/synchrony-proxy is found in /opt/atlassian/confluence/conf/server.xml. No further action is required
      ---------------------------------------------------------------------------
      Existing PID file found during start.
      Removing/clearing stale PID file.
      Tomcat started.
      
      • Wait 130 seconds to all Confluence to completely start again
      • Confluence will continue to run for 10+ minutes uninterrupted 
      Confluence is running
      Confluence is running
      uptime is 3x5 seconds :: Tue Jun 30 18:35:09 UTC 2020 :::::::::: 3877 = Confluence PID number
      Confluence is running
      uptime is 4x5 seconds :: Tue Jun 30 18:35:14 UTC 2020 :::::::::: 3877 = Confluence PID number
      Confluence is running
      uptime is 5x5 seconds :: Tue Jun 30 18:35:19 UTC 2020 :::::::::: 3877 = Confluence PID number
      
      atlassian-confluence.log - logs are 4 hours behind the server's time
      2020-06-30 14:30:55,729 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
      2020-06-30 14:31:00,145 INFO [Catalina-utility-1] [atlassian.confluence.cluster.DefaultClusterConfigurationHelper] lambda$populateExistingClusterSetupConfig$5 Populating setup configuration if running with Cluster mode...
      2020-06-30 14:31:00,502 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] configure Configuring Hazelcast with instanceName [confluence], join configuration TCP/IP member addresses: 10.125.88.150, network interfaces [fe80:0:0:0:85e:3bff:fe69:ab2d%eth0, 10.125.88.150] and local port 5801
      2020-06-30 14:31:00,502 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Starting the cluster.
      2020-06-30 14:31:00,508 WARN [Catalina-utility-1] [com.hazelcast.instance.HazelcastInstanceFactory] log Hazelcast is starting in a Java modular environment (Java 9 and newer) but without proper access to required Java packages. Use additional Java arguments to provide Hazelcast access to Java internal API. The internal API access is used to get the best performance results. Arguments to be used:
       --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED
      2020-06-30 14:31:02,729 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Confluence cluster node identifier is [452dd494]
      2020-06-30 14:31:02,819 INFO [Catalina-utility-1] [springframework.web.context.ContextLoader] initWebApplicationContext Root WebApplicationContext: initialization started
      2020-06-30 14:31:07,903 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] <init> Loading EhCache cache manager
      2020-06-30 14:31:18,196 INFO [Catalina-utility-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] init init: cluster ClusterService{address=[10.125.88.150]:5801}
      2020-06-30 14:31:18,197 INFO [Catalina-utility-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] init init: cluster contains Member [10.125.88.150]:5801 - 60471142-00a2-4646-9dfc-a3a9c196745d this
      2020-06-30 14:31:23,223 INFO [Catalina-utility-1] [springframework.web.context.ContextLoader] initWebApplicationContext Root WebApplicationContext initialized in 20404 ms
      2020-06-30 14:31:24,172 INFO [Catalina-utility-1] [atlassian.plugin.manager.DefaultPluginManager] lambda$earlyStartup$3 Plugin system earlyStartup begun
      2020-06-30 14:31:28,393 WARN [FelixStartLevel] [extender.internal.support.ExtenderConfiguration] addDefaultDependencyFactories Gemini Blueprint extensions bundle not present, annotation processing disabled.
      2020-06-30 14:32:16,656 INFO [Catalina-utility-1] [atlassian.plugin.manager.DefaultPluginManager] logTime Plugin system earlyStartup ended
      2020-06-30 14:32:16,663 INFO [hz.confluence.event-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] memberAttributeChanged memberAttributeChanged: member Member [10.125.88.150]:5801 - 60471142-00a2-4646-9dfc-a3a9c196745d this, key confluence.node.tenant.arrived, value true
      2020-06-30 14:32:16,718 INFO [Catalina-utility-1] [atlassian.plugin.manager.DefaultPluginManager] lambda$lateStartup$4 Plugin system lateStartup begun
      2020-06-30 14:32:16,739 WARN [AtlassianEvent::CustomizableThreadFactory-1] [persistence.schema.hibernate.HibernateSchemaComparator] lambda$compareTables$1 Schema check for table [EVENTS] encountered an error. Turn debug logging on for stacktrace.
      2020-06-30 14:32:16,741 WARN [AtlassianEvent::CustomizableThreadFactory-1] [persistence.schema.hibernate.HibernateSchemaComparator] lambda$compareTables$1 Schema check for table [SECRETS] encountered an error. Turn debug logging on for stacktrace.
      2020-06-30 14:32:16,743 WARN [AtlassianEvent::CustomizableThreadFactory-1] [persistence.schema.hibernate.HibernateSchemaComparator] lambda$compareTables$1 Schema check for table [SNAPSHOTS] encountered an error. Turn debug logging on for stacktrace.
      2020-06-30 14:32:21,126 INFO [ThreadPoolAsyncTaskExecutor::Thread 20] [plugins.synchrony.bootstrap.DefaultSynchronyProxyMonitor] <init> synchrony-proxy healthcheck url: http://127.0.0.1:8090/synchrony-proxy/healthcheck
      2020-06-30 14:32:24,254 INFO [ThreadPoolAsyncTaskExecutor::Thread 19] [atlassian.ratelimiting.internal.RateLimitingConfiguration] userRequestHandler Rate Limited Request Handler ENABLED
      2020-06-30 14:32:24,846 INFO [ThreadPoolAsyncTaskExecutor::Thread 16] [service.stepexecutor.attachment.AttachmentMigrationExecutor] info Initialized attachment migration executor with concurrency = 5 and batch size = 20.
      2020-06-30 14:32:24,862 INFO [ThreadPoolAsyncTaskExecutor::Thread 16] [migration.agent.queue.QueueBroker] info Started queue broker. Registered consumers: [com.atlassian.migration.agent.service.stepexecutor.user.UsersMigrationExecutor@688c9983, com.atlassian.migration.agent.service.stepexecutor.space.SpaceExportExecutor@3af44477, com.atlassian.migration.agent.service.stepexecutor.space.SpaceUploadExecutor@513c2d3d, com.atlassian.migration.agent.service.stepexecutor.attachment.AttachmentMigrationExecutor@6768780f, com.atlassian.migration.agent.service.stepexecutor.space.SpaceImportExecutor@6b24b5e0]
      2020-06-30 14:32:24,915 INFO [ThreadPoolAsyncTaskExecutor::Thread 16] [agent.service.analytics.AnalyticsEventConsumer] info Successfully registered AnalyticsEventConsumer job migration-plugin:analytics-events-consumer-runner-key.
      2020-06-30 14:32:24,937 INFO [ThreadPoolAsyncTaskExecutor::Thread 16] [agent.service.analytics.AnalyticsEventConsumer] info Successfully started AnalyticsEventConsumer poller.
      2020-06-30 14:32:25,073 INFO [ThreadPoolAsyncTaskExecutor::Thread 16] [agent.service.impl.DetectedUserEmailAnalyticsScheduledRunner] info Scheduled detected email analytics event collection job to be run each 120 hours.
      2020-06-30 14:32:27,843 INFO [Catalina-utility-1] [atlassian.plugin.manager.DefaultPluginManager] logTime Plugin system lateStartup ended
      2020-06-30 14:32:28,098 WARN [lifecycle:thread-14] [expose.jmx.schedule.JmxInstrumentSchedulerImpl] onStart atlassian-instrumentation-jmx expose scheduler started.
      2020-06-30 14:32:28,128 INFO [lifecycle:thread-18] [plugins.synchrony.bootstrap.DefaultSynchronyProcessManager] startup Starting Synchrony and enabling Collaborative Editing
      2020-06-30 14:32:28,430 WARN [hz.ShutdownThread] [com.hazelcast.instance.Node] log [10.125.88.150]:5801 [dc_cluser] [3.11.6] Terminating forcefully...
      
        *unexpected quit*
      
      2020-06-30 14:33:05,293 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
      
      catalina.out - application logs are 4 hours behind the server's time
      2020-06-30 14:30:55,729 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
      2020-06-30 14:31:00,145 INFO [Catalina-utility-1] [atlassian.confluence.cluster.DefaultClusterConfigurationHelper] lambda$populateExistingClusterSetupConfig$5 Populating setup configuration if running with Cluster mode...
      2020-06-30 14:31:00,502 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] configure Configuring Hazelcast with instanceName [confluence], join configuration TCP/IP member addresses: 10.125.88.150, network interfaces [fe80:0:0:0:85e:3bff:fe69:ab2d%eth0, 10.125.88.150] and local port 5801
      2020-06-30 14:31:00,502 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Starting the cluster.
      2020-06-30 14:31:00,508 WARN [Catalina-utility-1] [com.hazelcast.instance.HazelcastInstanceFactory] log Hazelcast is starting in a Java modular environment (Java 9 and newer) but without proper access to required Java packages. Use additional Java arguments to provide Hazelcast access to Java internal API. The internal API access is used to get the best performance results. Arguments to be used:
       --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED
      WARNING: An illegal reflective access operation has occurred
      WARNING: Illegal reflective access by com.hazelcast.internal.networking.nio.SelectorOptimizer (file:/opt/atlassian/confluence/confluence/WEB-INF/lib/hazelcast-3.11.6.jar) to field sun.nio.ch.SelectorImpl.selectedKeys
      WARNING: Please consider reporting this to the maintainers of com.hazelcast.internal.networking.nio.SelectorOptimizer
      WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
      WARNING: All illegal access operations will be denied in a future release
      2020-06-30 14:31:02,729 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Confluence cluster node identifier is [452dd494]
      NOTE: Picked up JDK_JAVA_OPTIONS:  --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
      [0.001s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/opt/atlassian/confluence/logs/gc-2020-06-30_18-32-59.log instead.
      
        *unexpected quit*
      
      30-Jun-2020 14:33:00.208 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server] failed to set property [debug] to [0]
      30-Jun-2020 14:33:00.343 WARNING [main] org.apache.catalina.startup.SetAllPropertiesRule.begin [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'debug' to '0' did not find a matching property.
      30-Jun-2020 14:33:00.380 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine] failed to set property [debug] to [0]
      30-Jun-2020 14:33:00.386 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host] failed to set property [debug] to [0]
      30-Jun-2020 14:33:00.437 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0]
      30-Jun-2020 14:33:00.460 WARNING [main] org.apache.tomcat.util.digester.SetPropertiesRule.begin Match [Server/Service/Engine/Host/Context] failed to set property [debug] to [0]
      30-Jun-2020 14:33:00.965 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8090"]
      30-Jun-2020 14:33:01.021 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [1,016] milliseconds
      30-Jun-2020 14:33:01.036 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Tomcat-Standalone]
      30-Jun-2020 14:33:01.036 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.33]
      30-Jun-2020 14:33:03.584 INFO [Catalina-utility-2] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
      SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
      SLF4J: Defaulting to no-operation (NOP) logger implementation
      SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
      30-Jun-2020 14:33:03.704 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log 1 Spring WebApplicationInitializers detected on classpath
      30-Jun-2020 14:33:04.332 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log Initializing Spring DispatcherServlet 'dispatcher'
      2020-06-30 14:33:05,293 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence 7.4.0 [build 8401 based on commit hash 0f22f74a0c673511e48ab1bc7ae6af4388aa8dbf] - synchrony version 3.1.0-master-022ca438
      
      • OS logs do not show any evidence the process was terminated
      systemd journal
      Jun 30 18:30:35 ip-10-125-91-103 systemd[1]: Reached target Cloud-init target.
      Jun 30 18:30:35 ip-10-125-91-103 systemd[1]: Startup finished in 5.098s (kernel) + 10.366s (userspace) = 15.465s.
      Jun 30 18:31:00 ip-10-125-91-103 systemd-timesyncd[643]: Synchronized to time server 91.189.94.4:123 (ntp.ubuntu.com).
      Jun 30 18:31:32 ip-10-125-91-103 sudo[1488]:     root : TTY=unknown ; PWD=/opt/atlassian/confluence/bin ; USER=root ; COMMAND=/bin/bash /opt/atlassian/confluence/bin/start-c
      Jun 30 18:31:32 ip-10-125-91-103 sudo[1488]: pam_unix(sudo:session): session opened for user root by (uid=0)
      Jun 30 18:31:32 ip-10-125-91-103 runuser[1494]: pam_unix(runuser:session): session opened for user confluence by (uid=0)
      Jun 30 18:31:33 ip-10-125-91-103 runuser[1494]: pam_unix(runuser:session): session closed for user confluence
      Jun 30 18:31:33 ip-10-125-91-103 sudo[1488]: pam_unix(sudo:session): session closed for user root
      Jun 30 18:35:01 ip-10-125-91-103 CRON[2184]: pam_unix(cron:session): session opened for user root by (uid=0)
      Jun 30 18:35:01 ip-10-125-91-103 CRON[2191]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
      
      kern.log
      Jun 30 18:30:31 ip-10-125-91-103 kernel: [   12.272625] Key type id_resolver registered
      Jun 30 18:30:31 ip-10-125-91-103 kernel: [   12.272626] Key type id_legacy registered
      Jun 30 18:30:32 ip-10-125-91-103 kernel: [   12.548340] new mount options do not match the existing superblock, will be ignored
      Jun 30 18:44:43 ip-10-125-91-103 kernel: [    0.000000] Linux version 5.3.0-1028-aws (buildd@lcy01-amd64-015) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #30~18.04.1-Ubuntu SMP Mon Jun 22 15:48:21 UTC 2020 (Ubuntu 5.3.0-1028.30~18.04.1-aws 5.3.18)
      Jun 30 18:44:43 ip-10-125-91-103 kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-1028-aws root=UUID=8abb4f2d-c7e0-4daf-80b5-24e1c814cf55 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295
      Jun 30 18:44:43 ip-10-125-91-103 kernel: [    0.000000] KERNEL supported cpus:
      

       

      Notes

      • Issue has been consistently reproducible in AWS environments running either Ubuntu or Amazon Linux 2, have also tested on bare-metal servers as well.
      •  Issue is most recreate-able after a host restart, although restarting the app from the command line can produce this as well. 
      •  Issue was originally thought to be isolated to Java 11, but later testing revealed issue can be recreated using Java 8 as well. The customer in this case (https://getsupport.atlassian.com/browse/PSSRV-3617) switched to Java 8 and claimed he could no longer reproduce the issue. 
      • It has been observed that if Confluence will stay running past the ~3 to ~10 minute mark then it will likely not stop unexpectedly past that point (as expected)
      • Have tried JVM heap sizes of 1G and 5G
      • No 3rd-party plugins in Confluence. These are vanilla instances installed using the 7.4.0 bin installer, with the Ubuntu example tested on Java 11 at first, then manually switched over to use Oracle's Java 8u251 rather than Adopt OpenJDK 11.0.5
        • Confluence 6 has been tested and does this as well.
      • Customer tested with Adopt OpenJDK 11.0.7 and was able to reproduce the problem consistently 
      • Customer claims a 50% failure rate after host restarts or in his case, initial AWS scaling group deployment from AMI image. Our reproductions of the issue are more like 20% of restarts will produce this issue. 

       

      Files (application logs are 4 hours behind server's time)

      For the "script output" logs, manually scroll and look for instances where "Starting" occur but has no corresponding "Stopping", as these indicate the times where the application quit on its own unexpectedly.

      • For these there will be a 2nd "Starting" message printed and then proceeds with uptime checks. Script allows the application to run for 10 minutes before printing a "Stopping" message and shutting down the application gracefully, then restarts the host after which the application is automatically started by the OS.
      • Grep'ing for these 2 strings will show that there are more starts than stops, even counting the final entry in the logs which will have a start/no stop because Confluence was running when log was captured. 
      grep -c "Starting" Ubuntu_Java8_automation_script.log
      52
      grep -c "Stopping" Ubuntu_Java8_automation_script.log
      40
       
      grep -c "Starting" AmazonLinux2_Java11_automation_script.log
      18
      grep -c "Stopping" AmazonLinux2_Java11_automation_script.log
      14 
      

      Scripts used to automate the restarts (see Confluence intermittently and unexpectedly quits shortly after host restart for latest versions of these)

      • script.restart_app - script used for restart application-only after 10 minutes experimentation
      • script.restart_host - script used for stop application after 10 minutes then restart host experiments

       Systemd calls the Confluence sysv init startup script (/etc/init.d/confluence) for OS-controlled automatic service starts. This sysv init script was modified to call the custom scripts above for automatic service start.

      modified sysv init startup script
      # Confluence Linux service controller script
      cd "/opt/atlassian/confluence/bin"case "$1" in
          start)
              /home/ec2-user/script >> /home/ec2-user/log 2>&1 &
              ;;
          stop)
              ./stop-confluence.sh
              ;;
          restart)
              ./stop-confluence.sh
              ./start-confluence.sh
              ;;
          *)
              echo "Usage: $0 {start|stop|restart}"
              exit 1
              ;;
      esac 
      • The systemd log will show the "Confluence service" to be in error when the OS attempts to stop the application during restarts, this is because the automation script has already stopped Confluence previously, so the exit code from the system-executed shutdown script will be "1" due to the app already being off.
      • Any Confluence service errors in the systemd journal log which occur when the host is restarting can be disregarded for that reason.
        • The automation script can be improved to not shutdown Confluence prior to restart if you wish to collect systemd journal logs that would not contain this disregard-able error. 

      Issue a "ps aux | grep script" command to get the PID of the automation script which is launched at startup, "kill -9" to end it and prevent reboots from occurring while you are logged in and investigating. Restart the host to resume the script - recommend renaming the "log" file when doing so to always be starting a new log when you intentionally wish to start the whole process over.  

      Note: The script waits 60 seconds after OS start before starting Confluence to ensure everything has started before starting the application, there will be a delay if you log in to check the app's status directly after a reboot. 

      Workaround

      Delete synchrony.pid file before starting Confluence. This file should be located in JVM temporary directory (java.io.tmpdir). This directory location is specified by $CATALINA_TMPDIR (or in cases when $CATALINA_TMPDIR is not specified, default value $CATALINA_BASE/temp is used).
       
       

      Attachments

        1. AmazonLinux2_Java_11_example.zip
          7.97 MB
        2. Java11_with_Synchrony_doc_conv_DISABLED.tar.gz
          16.85 MB
        3. Java8_with_Synchrony_doc_conv_DISABLED.tar.gz
          18.80 MB
        4. script.restart_app
          3 kB
        5. script.restart_host
          3 kB
        6. ubuntu_Java_8_example.zip
          9.64 MB

        Issue Links

          Activity

            People

              03cb0c04aa4f Irina Tiapchenko
              asmith4@atlassian.com Andrew S
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: