-
Bug
-
Resolution: Fixed
-
Low (View bug fix roadmap)
-
7.2.8, 7.8.0, 7.6.9
-
7.02
-
53
-
Severity 2 - Major
-
149
-
-
Summary
JIRA DC might lose cluster lock while trying to unlock the lock if there is database connectivity problems .
That makes whole cluster stuck for some operations/actions that needs that lock.
Specifically we saw problem with:
- com.atlassian.jira.config.DefaultReindexMessageManager.messageLock lock
- com.atlassian.upm.impl.PluginManagerPluginAsynchronousTaskStatusStoreImpl lock
- NotificationBatchScheduler.sd_batch_sending_lock lock
- sla_issue_update_xx lock
- the Service Desk Mail Puller job com.atlassian.jira.internal.mail.services.MailPullerJobRunner
So there is a possible chain of events:
- Node1 obtains the lock in thread1
- Node1 registers it in DB and internal JVM structure.
- All other nodes see lock and wait
- Thread1 unlocks the lock. While updating db it fails and release only local ref lock
- All other nodes see lock and wait
- Lock check at Node1 see that local ref is NULL and DB lock is present, so does nothing
- All stuck
Environment
- JIRA Datacenter
Steps to Reproduce
- Set up a Datacenter JIRA instance with 2 nodes
- Initiate action that would obtain the lock
- Break connection to the database while this node has a clusterlock and tries to unlock it
Expected Results
JIRA will recover and do the unlock query once connected back to the database
Actual Results
JIRA DC lost the cluster lock. The below exception is thrown in the atlassian-jira.log file:
2018-01-19 09:09:31,377 http-nio-8080-exec-1 ERROR bob 300x234323428x11 18tz5ra 10.0.0.1 /secure/admin/user/UserBrowser.jspa [o.a.c.c.C.[Catalina].[localhost].[/jira]] Unhandled exception occurred whilst decorating page java.lang.RuntimeException: org.apache.jasper.JasperException: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET UPDATE_TIME = ? , LOCKED_BY_NODE = ? WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? (Communications link failure The last packet successfully received from the server was 1,339,594 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.) at com.opensymphony.sitemesh.webapp.decorator.BaseWebAppDecorator.render(BaseWebAppDecorator.java:37) at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:92) at com.atlassian.jira.web.filters.SitemeshPageFilter.doFilter(SitemeshPageFilter.java:112) ... 10 filtered at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doHttpFilter(AbstractStructureServletFilter.java:46) at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doFilter(AbstractStructureServletFilter.java:34) ... 3 filtered at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doHttpFilter(AbstractStructureServletFilter.java:46) at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doFilter(AbstractStructureServletFilter.java:34) ... 3 filtered at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doHttpFilter(AbstractStructureServletFilter.java:46) at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doFilter(AbstractStructureServletFilter.java:34) ... 18 filtered at com.atlassian.jira.security.JiraSecurityFilter.lambda$doFilter$0(JiraSecurityFilter.java:76) ... 1 filtered at com.atlassian.jira.security.JiraSecurityFilter.doFilter(JiraSecurityFilter.java:74) ... 40 filtered at com.atlassian.jira.servermetrics.CorrelationIdPopulatorFilter.doFilter(CorrelationIdPopulatorFilter.java:30) ... 5 filtered at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.lambda$invokeFilterChain$0(CustomerContextSettingFilter.java:181) at com.atlassian.servicedesk.internal.utils.context.ReentrantThreadLocalBasedCodeContext.rteInvoke(ReentrantThreadLocalBasedCodeContext.java:142) at com.atlassian.servicedesk.internal.utils.context.ReentrantThreadLocalBasedCodeContext.runOutOfContext(ReentrantThreadLocalBasedCodeContext.java:94) at com.atlassian.servicedesk.internal.utils.context.CustomerContextServiceImpl.runOutOfCustomerContext(CustomerContextServiceImpl.java:64) at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.outOfCustomerContext(CustomerContextSettingFilter.java:174) at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.doFilterImpl(CustomerContextSettingFilter.java:130) at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.doFilter(CustomerContextSettingFilter.java:121) ... 9 filtered at com.atlassian.jwt.internal.servlet.JwtAuthFilter.doFilter(JwtAuthFilter.java:32) ... 3 filtered at com.atlassian.web.servlet.plugin.request.RedirectInterceptingFilter.doFilter(RedirectInterceptingFilter.java:21) ... 4 filtered at com.atlassian.web.servlet.plugin.LocationCleanerFilter.doFilter(LocationCleanerFilter.java:36) ... 29 filtered at com.atlassian.jira.servermetrics.MetricsCollectorFilter.doFilter(MetricsCollectorFilter.java:25) ... 27 filtered at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745) ... Caused by: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET UPDATE_TIME = ? , LOCKED_BY_NODE = ? WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? (Communications link failure: The last packet successfully received from the server was 1,339,594 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.) at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.bulkUpdateByAnd(DefaultOfBizDelegator.java:570) at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.bulkUpdateByAnd(WrappingOfBizDelegator.java:290) at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:126) at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:122) at com.atlassian.jira.cluster.lock.JiraClusterLockDao.unlock(JiraClusterLockDao.java:82) at com.atlassian.beehive.db.DatabaseClusterLock.unlock(DatabaseClusterLock.java:372) at com.atlassian.jira.config.DefaultReindexMessageManager.getMessage(DefaultReindexMessageManager.java:190) at org.apache.jsp.decorators.admin_jsp._jspService(admin_jsp.java:485) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:438) ... 208 more Caused by: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET UPDATE_TIME = ? , LOCKED_BY_NODE = ? WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? (Communications link failure The last packet successfully received from the server was 1,339,594 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.) at org.ofbiz.core.entity.jdbc.SQLProcessor.executeUpdate(SQLProcessor.java:683) at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.bulkUpdateByAnd(DefaultOfBizDelegator.java:565) ... 219 more The last packet successfully received from the server was 1,339,594 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago. ... 2 filtered at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:989) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3559) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3459) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3900) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2490) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858) at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079) at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013) at com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1998) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98) at org.ofbiz.core.entity.jdbc.SQLProcessor.executeUpdate(SQLProcessor.java:673) ... 220 more Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3011) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3469) ... 233 more
Key part:
Caused by: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET UPDATE_TIME = ? , LOCKED_BY_NODE = ? WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? (Communications link failure: The last packet successfully received from the server was 1,339,594 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.) at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.bulkUpdateByAnd(DefaultOfBizDelegator.java:570) at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.bulkUpdateByAnd(WrappingOfBizDelegator.java:290) at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:126) at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:122) at com.atlassian.jira.cluster.lock.JiraClusterLockDao.unlock(JiraClusterLockDao.java:82) at com.atlassian.beehive.db.DatabaseClusterLock.unlock(DatabaseClusterLock.java:372) at com.atlassian.jira.config.DefaultReindexMessageManager.getMessage(DefaultReindexMessageManager.java:190) ...
Workaround
Restart the problematic node. See the following KB article for full instructions on this workaround:
https://confluence.atlassian.com/jirakb/jira-data-center-functionalities-loss-due-to-cluster-wide-lock-942860754.html
- causes
-
JRASERVER-69899 Updating an issue causes TransactionRuntimeException or "current transaction is aborted" errors
-
- Closed
-
-
DELTA-911 You do not have permission to view this issue
- is duplicated by
-
JRASERVER-66658 JIRA does not retry cluster unlock actions after database connectivity problems
-
- Closed
-
- is related to
-
JRASERVER-68477 Workflow scheme actions leave left-over rows in 'clusterlockstatus' table.
-
- Gathering Impact
-
-
JRASERVER-69113 JiraDashboardStateStoreManager leave left-over rows in 'clusterlockstatus' table.
-
- Gathering Impact
-
-
JRASERVER-65890 Create logging event for clusterlockstatus during startup
- Gathering Interest
-
JRASERVER-66596 JIRA Datacenter - Add Cluster lock status page which doesn't use locks
- Gathering Interest
- relates to
-
JRASERVER-74298 Jira node fails to start due to cluster lock at the active objects
-
- Closed
-
-
JSDSERVER-6621 Unable to restore from XML backup
-
- Closed
-
-
JRASERVER-70147 XML Restore stucks at 90%
-
- Closed
-
-
CLIP-537 Loading...
-
DELTA-767 Loading...
- Mentioned in
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
Form Name |
---|
Hi,
Thank you for the recent comments.
We want to inform you that we have decided not to backport this fix to the 7.13 Enterprise Release version. While we’ve been working to backport this fix in recent months, after careful analysis, we ultimately feel that the change would bring too much risk to the stability of the release.
We have made this tough call due to the fact that the changes introduced in 8.3 were complex, scattered over many libraries and codebases. Additionally, backporting this particular change over the platform release involved many moving parts. All this made us realise that the risk we might bring with this change is far too severe and we might cause even more problems if we decide to go forward with this backport.
We are aware that you were waiting for this update and our decision may be disappointing to you. We would like to reassure you that we will keep backporting crucial bugs to our long term support versions as we have stated here: https://confluence.atlassian.com/enterprise/atlassian-enterprise-releases-948227420.html, simply in some cases this brings more risks than benefits that we hadn’t originally anticipated.
If you are impacted by this bug, we recommend upgrading to our newest Enterprise release, version 8.5.x. Please follow our upgrade guide for Jira 8.5: https://confluence.atlassian.com/jiracore/jira-8-5-enterprise-release-upgrade-guide-976161392.html.
Best,
Jira Server and Data Center product management