Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-67613

Race Condition with Jira Startup Consistency Checker Can Result in File Lock Errors

XMLWordPrintable

      Problem

      During startup, Jira performs a consistency check for various items, including a check to see if the index locations already have an existing lock. There are no explicit waits on this, so at times the check may run after existing services/job have already started. This may cause the consistency check to fail and the result is Jira startup will fail.

      • This has been found to occur in a Data Center environment where the NodeReindexServiceThread starts indexing before the consistency checker. This results in index write.lock check errors.
      Steps to Reproduce

      One situation found to delay the consistency checker is JMX monitoring, see JRASERVER-67614 for details. So we'll use this as an example.

      1. Startup Jira Data Center Node with JMX enabled
        • Notice that there is a long delay during startup, in this case ~20s:
          2018-07-13 17:55:09,844 localhost-startStop-1 INFO [c.a.j.instrumentation.external.DatabaseExternalGauges] Installing DBCP monitoring instruments: DatabaseExternalGauges.JiraDbcpInstruments[instruments=[DBCP_MAX, DBCP_ACTIVE, DBCP_IDLE],objectName=com.atlassian.jira:name=BasicDataSource]
          
          2018-07-13 17:55:38,300 localhost-startStop-1 WARN      [c.a.jira.health.HealthChecks] Your database is using an unsupported collation
          
      2. After waiting for a longer period, we see:
            ********************************************************************************************************************************************************************************************************
            Index lock file(s) found. This occurs either because JIRA was not cleanly shutdown
            or because there is another instance of this JIRA installation currently running.
            Please ensure that no other instance of this JIRA installation is running
            and then remove the following lock file(s) and restart JIRA:
            
            UtilConcurrentLock{lock=true, fullLockName=/opt/atlassian/home/caches/indexes/comments/write.lock} UtilConcurrentLock{lock=true, fullLockName=/opt/atlassian/home/caches/indexes/issues/write.lock} UtilConcurrentLock{lock=true, fullLockName=/opt/atlassian/home/caches/indexes/changes/write.lock} UtilConcurrentLock{lock=true, fullLockName=/opt/atlassian/home/caches/indexes/worklogs/write.lock}
            
            Once restarted you will need to reindex your data to ensure that indexes are up to date.
            
            Do NOT delete the lock file(s) if there is another JIRA running with the same index directory
            instead cleanly shutdown the other instance.
            ********************************************************************************************************************************************************************************************************
        
      Diagnosing

      If receiving the index write.lock errors, we can diagnose by:

      • Enable DEBUG logging on the following packages via log4j.properties and restart Jira:
        com.atlassian.jira.index.LuceneCorruptionChecker
        com.atlassian.jira.upgrade.ConsistencyCheckerImpl
        com.atlassian.jira.startup.JiraStartupLogger
        com.atlassian.jira.util.LuceneDirectoryUtilsImpl.UtilConcurrentLock
        
        • This will detect if index operations are occurring before the the consistency checker
      • Capture thread dumps in short 2 second intervals to examine what's happening during the startup delay.
      Workaround

      In the case where JMX monitoring is causing startup delays in Jira Data Center, disable JMX monitoring before starting up the new node. Once startup is complete, JMX monitoring can be re-enabled.

              Unassigned Unassigned
              dchan David Chan
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: