-
Bug
-
Resolution: Fixed
-
Low
-
7.2.8, 7.8.0, 7.6.9
-
7.02
-
53
-
Severity 2 - Major
-
149
-
-
Summary
JIRA DC might lose cluster lock while trying to unlock the lock if there is database connectivity problems .
That makes whole cluster stuck for some operations/actions that needs that lock.
Specifically we saw problem with:
- com.atlassian.jira.config.DefaultReindexMessageManager.messageLock lock
- com.atlassian.upm.impl.PluginManagerPluginAsynchronousTaskStatusStoreImpl lock
- NotificationBatchScheduler.sd_batch_sending_lock lock
- sla_issue_update_xx lock
- the Service Desk Mail Puller job com.atlassian.jira.internal.mail.services.MailPullerJobRunner
So there is a possible chain of events:
- Node1 obtains the lock in thread1
- Node1 registers it in DB and internal JVM structure.
- All other nodes see lock and wait
- Thread1 unlocks the lock. While updating db it fails and release only local ref lock
- All other nodes see lock and wait
- Lock check at Node1 see that local ref is NULL and DB lock is present, so does nothing
- All stuck
Environment
- JIRA Datacenter
Steps to Reproduce
- Set up a Datacenter JIRA instance with 2 nodes
- Initiate action that would obtain the lock
- Break connection to the database while this node has a clusterlock and tries to unlock it
Expected Results
JIRA will recover and do the unlock query once connected back to the database
Actual Results
JIRA DC lost the cluster lock. The below exception is thrown in the atlassian-jira.log file:
2018-01-19 09:09:31,377 http-nio-8080-exec-1 ERROR bob 300x234323428x11 18tz5ra 10.0.0.1 /secure/admin/user/UserBrowser.jspa [o.a.c.c.C.[Catalina].[localhost].[/jira]] Unhandled exception occurred whilst decorating page java.lang.RuntimeException: org.apache.jasper.JasperException: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET UPDATE_TIME = ? , LOCKED_BY_NODE = ? WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? (Communications link failure The last packet successfully received from the server was 1,339,594 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.) at com.opensymphony.sitemesh.webapp.decorator.BaseWebAppDecorator.render(BaseWebAppDecorator.java:37) at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:92) at com.atlassian.jira.web.filters.SitemeshPageFilter.doFilter(SitemeshPageFilter.java:112) ... 10 filtered at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doHttpFilter(AbstractStructureServletFilter.java:46) at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doFilter(AbstractStructureServletFilter.java:34) ... 3 filtered at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doHttpFilter(AbstractStructureServletFilter.java:46) at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doFilter(AbstractStructureServletFilter.java:34) ... 3 filtered at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doHttpFilter(AbstractStructureServletFilter.java:46) at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doFilter(AbstractStructureServletFilter.java:34) ... 18 filtered at com.atlassian.jira.security.JiraSecurityFilter.lambda$doFilter$0(JiraSecurityFilter.java:76) ... 1 filtered at com.atlassian.jira.security.JiraSecurityFilter.doFilter(JiraSecurityFilter.java:74) ... 40 filtered at com.atlassian.jira.servermetrics.CorrelationIdPopulatorFilter.doFilter(CorrelationIdPopulatorFilter.java:30) ... 5 filtered at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.lambda$invokeFilterChain$0(CustomerContextSettingFilter.java:181) at com.atlassian.servicedesk.internal.utils.context.ReentrantThreadLocalBasedCodeContext.rteInvoke(ReentrantThreadLocalBasedCodeContext.java:142) at com.atlassian.servicedesk.internal.utils.context.ReentrantThreadLocalBasedCodeContext.runOutOfContext(ReentrantThreadLocalBasedCodeContext.java:94) at com.atlassian.servicedesk.internal.utils.context.CustomerContextServiceImpl.runOutOfCustomerContext(CustomerContextServiceImpl.java:64) at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.outOfCustomerContext(CustomerContextSettingFilter.java:174) at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.doFilterImpl(CustomerContextSettingFilter.java:130) at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.doFilter(CustomerContextSettingFilter.java:121) ... 9 filtered at com.atlassian.jwt.internal.servlet.JwtAuthFilter.doFilter(JwtAuthFilter.java:32) ... 3 filtered at com.atlassian.web.servlet.plugin.request.RedirectInterceptingFilter.doFilter(RedirectInterceptingFilter.java:21) ... 4 filtered at com.atlassian.web.servlet.plugin.LocationCleanerFilter.doFilter(LocationCleanerFilter.java:36) ... 29 filtered at com.atlassian.jira.servermetrics.MetricsCollectorFilter.doFilter(MetricsCollectorFilter.java:25) ... 27 filtered at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745) ... Caused by: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET UPDATE_TIME = ? , LOCKED_BY_NODE = ? WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? (Communications link failure: The last packet successfully received from the server was 1,339,594 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.) at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.bulkUpdateByAnd(DefaultOfBizDelegator.java:570) at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.bulkUpdateByAnd(WrappingOfBizDelegator.java:290) at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:126) at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:122) at com.atlassian.jira.cluster.lock.JiraClusterLockDao.unlock(JiraClusterLockDao.java:82) at com.atlassian.beehive.db.DatabaseClusterLock.unlock(DatabaseClusterLock.java:372) at com.atlassian.jira.config.DefaultReindexMessageManager.getMessage(DefaultReindexMessageManager.java:190) at org.apache.jsp.decorators.admin_jsp._jspService(admin_jsp.java:485) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:438) ... 208 more Caused by: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET UPDATE_TIME = ? , LOCKED_BY_NODE = ? WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? (Communications link failure The last packet successfully received from the server was 1,339,594 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.) at org.ofbiz.core.entity.jdbc.SQLProcessor.executeUpdate(SQLProcessor.java:683) at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.bulkUpdateByAnd(DefaultOfBizDelegator.java:565) ... 219 more The last packet successfully received from the server was 1,339,594 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago. ... 2 filtered at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:989) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3559) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3459) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3900) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2490) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858) at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079) at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013) at com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1998) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98) at org.ofbiz.core.entity.jdbc.SQLProcessor.executeUpdate(SQLProcessor.java:673) ... 220 more Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3011) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3469) ... 233 more
Key part:
Caused by: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET UPDATE_TIME = ? , LOCKED_BY_NODE = ? WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? (Communications link failure: The last packet successfully received from the server was 1,339,594 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.) at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.bulkUpdateByAnd(DefaultOfBizDelegator.java:570) at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.bulkUpdateByAnd(WrappingOfBizDelegator.java:290) at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:126) at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:122) at com.atlassian.jira.cluster.lock.JiraClusterLockDao.unlock(JiraClusterLockDao.java:82) at com.atlassian.beehive.db.DatabaseClusterLock.unlock(DatabaseClusterLock.java:372) at com.atlassian.jira.config.DefaultReindexMessageManager.getMessage(DefaultReindexMessageManager.java:190) ...
Workaround
Restart the problematic node. See the following KB article for full instructions on this workaround:
https://confluence.atlassian.com/jirakb/jira-data-center-functionalities-loss-due-to-cluster-wide-lock-942860754.html
- causes
-
JRASERVER-69899 Updating an issue causes TransactionRuntimeException or "current transaction is aborted" errors
- Closed
-
DELTA-911 Loading...
- is duplicated by
-
JRASERVER-66658 JIRA does not retry cluster unlock actions after database connectivity problems
- Closed
- is related to
-
JRASERVER-68477 Workflow scheme actions leave left-over rows in 'clusterlockstatus' table.
- Gathering Impact
-
JRASERVER-69113 JiraDashboardStateStoreManager leave left-over rows in 'clusterlockstatus' table.
- Gathering Impact
-
JRASERVER-65890 Create logging event for clusterlockstatus during startup
- Gathering Interest
-
JRASERVER-66596 JIRA Datacenter - Add Cluster lock status page which doesn't use locks
- Gathering Interest
- relates to
-
JRASERVER-74298 Jira node fails to start due to cluster lock at the active objects
- Closed
-
JSDSERVER-6621 Unable to restore from XML backup
- Closed
-
JRASERVER-70147 XML Restore stucks at 90%
- Closed
-
CLIP-537 Loading...
-
DELTA-767 Loading...
- Mentioned in
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...