Investigate application hangs after May 2011 release

XMLWordPrintable

    • Type: Bug
    • Resolution: Fixed
    • Priority: Medium
    • Component/s: None
    • None

      After the May release we noticed some issues where applications would start up fine, but still be inaccessible via the web. This could have been caused by a number of things (e.g. AOD-4748) but there have been a number of cases where restarting Postgres along with the apps has been what 'fixed' it.

      Symptoms

      • Unable to access the main applications (or they come up only after a long time) but can access Crowd OK.
      • There are lots of httpd processes hanging around (e.g. 100+)
      • One of either JIRA or Confluence will have hit Tomcat's max thread limit, e.g.
        May 10, 2011 11:36:19 PM org.apache.tomcat.util.net.AprEndpoint createWorkerThread
        INFO: Maximum number of threads (200) created for connector with address j2ee.instance.jirastudio.com-confluence/127.0.0.101 and port 8009
        

        It's easy to confirm this by taking a thread dump. A typical thread dump might be 50-100K whilst one that hit its thread ceiling will be 800-900K.

      Upon trying to shut the affected application down, log entries such as the following may be generated:

      2011-05-11 04:37:03,548 ERROR [ContainerBackgroundProcessor[StandardEngine[Catalina]]] [sf.hibernate.util.JDBCExceptionReporter] logExceptions Cannot get a connection, general error
      2011-05-11 04:37:03,821 ERROR [ContainerBackgroundProcessor[StandardEngine[Catalina]]] [ContainerBase.[Catalina].[localhost].[/wiki]] log Session event listener threw exception
      org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is net.sf.hibernate.exception.GenericJDBCException: Cannot open connection
      Caused by: net.sf.hibernate.exception.GenericJDBCException: Cannot open connection
            at net.sf.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:80)
            at net.sf.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:69)
            at net.sf.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:29)
            at net.sf.hibernate.impl.BatcherImpl.convert(BatcherImpl.java:328)
            at net.sf.hibernate.impl.BatcherImpl.openConnection(BatcherImpl.java:295)
            at net.sf.hibernate.impl.SessionImpl.connect(SessionImpl.java:3371)
            at net.sf.hibernate.impl.SessionImpl.connection(SessionImpl.java:3331)
            at org.springframework.orm.hibernate.HibernateTransactionManager.doBegin(HibernateTransactionManager.java:422)
            at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:350)
            at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:262)
            at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:101)
            at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
            at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
            at $Proxy11.publishEvent(Unknown Source)
            at com.atlassian.confluence.user.listeners.UserSessionExpiryListener.sessionDestroyed(UserSessionExpiryListener.java:38)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:702)
            at org.apache.catalina.session.StandardSession.isValid(StandardSession.java:592)
            at org.apache.catalina.session.PersistentManagerBase.processExpires(PersistentManagerBase.java:546)
            at org.apache.catalina.session.ManagerBase.backgroundProcess(ManagerBase.java:665)
            at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1316)
            at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1601)
            at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1610)
            at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1610)
            at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1590)
            at java.lang.Thread.run(Thread.java:619)
      Caused by: org.apache.tomcat.dbcp.dbcp.SQLNestedException: Cannot get a connection, general error
            at org.apache.tomcat.dbcp.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:108)
            at org.apache.tomcat.dbcp.dbcp.BasicDataSource.getConnection(BasicDataSource.java:880)
            at net.sf.hibernate.connection.DatasourceConnectionProvider.getConnection(DatasourceConnectionProvider.java:59)
            at net.sf.hibernate.impl.BatcherImpl.openConnection(BatcherImpl.java:292)
            ... 20 more
      Caused by: java.lang.InterruptedException
            at java.lang.Object.wait(Native Method)
            at java.lang.Object.wait(Object.java:485)
            at org.apache.tomcat.dbcp.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:942)
            at org.apache.tomcat.dbcp.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:96)
            ... 23 more
      

      However, the application may still not actually shut down without anything but a kill -9.

      This leads me to believe that there could be problems with the database pool configuration in the applications, or in PostgreSQL's configuration. I've attached a thread dump while Confluence was in this deadlocked state.

      Related? https://sdog.jira.com/browse/JSTDEV-177 https://sdog.jira.com/browse/JSTDEV-256 https://sdog.jira.com/browse/JSTDEV-151

        1. conf.thread.dump.gz
          210 kB
        2. crowd.thread.dump
          32 kB
        3. entityengine-1.0.5-dbfix.jar
          254 kB
        4. jira.thread.dump.gz
          58 kB

            Assignee:
            Felipe Cuozzo (Inactive)
            Reporter:
            Michael Knight
            Votes:
            3 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: