Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: Low
Fix Version/s: 8.3.0
Affects Version/s: 7.2.8, 7.8.0, 7.6.9
Component/s: Data Center - Other
Labels:

Introduced in Version:
7.02
Support reference count:
53
Symptom Severity:
Severity 2 - Major
UIS:
149
Bug Fix Policy:
View Atlassian Server bug fix policy
Current Status:

Hide

Atlassian Update - 17th June 2019

Dear Jira users,

we're glad to announce that this issue will be addressed in our upcoming 8.3.0 release!

Cluster locks have been turned into leases that automatically expire when no longer used, making the situation described in this issue automatically recoverable.

Thank you,
Grazyna Kaszkur 
Product Manager,
Jira Server and Data Center

Show
Atlassian Update - 17th June 2019 Dear Jira users, we're glad to announce that this issue will be addressed in our upcoming 8.3.0 release! Cluster locks have been turned into leases that automatically expire when no longer used, making the situation described in this issue automatically recoverable. Thank you, Grazyna Kaszkur  Product Manager, Jira Server and Data Center

Summary

JIRA DC might lose cluster lock while trying to unlock the lock if there is database connectivity problems .

That makes whole cluster stuck for some operations/actions that needs that lock.
Specifically we saw problem with:

com.atlassian.jira.config.DefaultReindexMessageManager.messageLock lock
com.atlassian.upm.impl.PluginManagerPluginAsynchronousTaskStatusStoreImpl lock
NotificationBatchScheduler.sd_batch_sending_lock lock
sla_issue_update_xx lock
the Service Desk Mail Puller job com.atlassian.jira.internal.mail.services.MailPullerJobRunner

So there is a possible chain of events:

Node1 obtains the lock in thread1
- Node1 registers it in DB and internal JVM structure.
All other nodes see lock and wait
- Thread1 unlocks the lock. While updating db it fails and release only local ref lock
All other nodes see lock and wait
Lock check at Node1 see that local ref is NULL and DB lock is present, so does nothing
All stuck

Environment

JIRA Datacenter

Steps to Reproduce

Set up a Datacenter JIRA instance with 2 nodes
Initiate action that would obtain the lock
Break connection to the database while this node has a clusterlock and tries to unlock it

Expected Results

JIRA will recover and do the unlock query once connected back to the database

Actual Results

JIRA DC lost the cluster lock. The below exception is thrown in the atlassian-jira.log file:

2018-01-19 09:09:31,377 http-nio-8080-exec-1 ERROR bob 300x234323428x11 18tz5ra 10.0.0.1 /secure/admin/user/UserBrowser.jspa [o.a.c.c.C.[Catalina].[localhost].[/jira]] Unhandled exception occurred whilst decorating page
java.lang.RuntimeException: org.apache.jasper.JasperException: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET  UPDATE_TIME = ? ,  LOCKED_BY_NODE = ?  WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? (Communications link failure

The last packet successfully received from the server was 1,339,594 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.)
	at com.opensymphony.sitemesh.webapp.decorator.BaseWebAppDecorator.render(BaseWebAppDecorator.java:37)
	at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:92)
	at com.atlassian.jira.web.filters.SitemeshPageFilter.doFilter(SitemeshPageFilter.java:112)
	... 10 filtered
	at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doHttpFilter(AbstractStructureServletFilter.java:46)
	at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doFilter(AbstractStructureServletFilter.java:34)
	... 3 filtered
	at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doHttpFilter(AbstractStructureServletFilter.java:46)
	at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doFilter(AbstractStructureServletFilter.java:34)
	... 3 filtered
	at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doHttpFilter(AbstractStructureServletFilter.java:46)
	at com.almworks.jira.structure.util.servlet.AbstractStructureServletFilter.doFilter(AbstractStructureServletFilter.java:34)
	... 18 filtered
	at com.atlassian.jira.security.JiraSecurityFilter.lambda$doFilter$0(JiraSecurityFilter.java:76)
	... 1 filtered
	at com.atlassian.jira.security.JiraSecurityFilter.doFilter(JiraSecurityFilter.java:74)
	... 40 filtered
	at com.atlassian.jira.servermetrics.CorrelationIdPopulatorFilter.doFilter(CorrelationIdPopulatorFilter.java:30)
	... 5 filtered
	at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.lambda$invokeFilterChain$0(CustomerContextSettingFilter.java:181)
	at com.atlassian.servicedesk.internal.utils.context.ReentrantThreadLocalBasedCodeContext.rteInvoke(ReentrantThreadLocalBasedCodeContext.java:142)
	at com.atlassian.servicedesk.internal.utils.context.ReentrantThreadLocalBasedCodeContext.runOutOfContext(ReentrantThreadLocalBasedCodeContext.java:94)
	at com.atlassian.servicedesk.internal.utils.context.CustomerContextServiceImpl.runOutOfCustomerContext(CustomerContextServiceImpl.java:64)
	at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.outOfCustomerContext(CustomerContextSettingFilter.java:174)
	at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.doFilterImpl(CustomerContextSettingFilter.java:130)
	at com.atlassian.servicedesk.internal.web.CustomerContextSettingFilter.doFilter(CustomerContextSettingFilter.java:121)
	... 9 filtered
	at com.atlassian.jwt.internal.servlet.JwtAuthFilter.doFilter(JwtAuthFilter.java:32)
	... 3 filtered
	at com.atlassian.web.servlet.plugin.request.RedirectInterceptingFilter.doFilter(RedirectInterceptingFilter.java:21)
	... 4 filtered
	at com.atlassian.web.servlet.plugin.LocationCleanerFilter.doFilter(LocationCleanerFilter.java:36)
	... 29 filtered
	at com.atlassian.jira.servermetrics.MetricsCollectorFilter.doFilter(MetricsCollectorFilter.java:25)
	... 27 filtered
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)
...
Caused by: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET  UPDATE_TIME = ? ,  LOCKED_BY_NODE = ?  WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? 
(Communications link failure: 
The last packet successfully received from the server was 1,339,594 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.)
	at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.bulkUpdateByAnd(DefaultOfBizDelegator.java:570)
	at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.bulkUpdateByAnd(WrappingOfBizDelegator.java:290)
	at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:126)
	at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:122)
	at com.atlassian.jira.cluster.lock.JiraClusterLockDao.unlock(JiraClusterLockDao.java:82)
	at com.atlassian.beehive.db.DatabaseClusterLock.unlock(DatabaseClusterLock.java:372)
	at com.atlassian.jira.config.DefaultReindexMessageManager.getMessage(DefaultReindexMessageManager.java:190)
	at org.apache.jsp.decorators.admin_jsp._jspService(admin_jsp.java:485)
	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
	at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:438)
	... 208 more
Caused by: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET  UPDATE_TIME = ? ,  LOCKED_BY_NODE = ?  WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? (Communications link failure

The last packet successfully received from the server was 1,339,594 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.)
	at org.ofbiz.core.entity.jdbc.SQLProcessor.executeUpdate(SQLProcessor.java:683)
	at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.bulkUpdateByAnd(DefaultOfBizDelegator.java:565)
	... 219 more
The last packet successfully received from the server was 1,339,594 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.
	... 2 filtered
	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
	at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
	at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:989)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3559)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3459)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3900)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2490)
	at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
	at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
	at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
	at com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
	at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1998)
	at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98)
	at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:98)
	at org.ofbiz.core.entity.jdbc.SQLProcessor.executeUpdate(SQLProcessor.java:673)
	... 220 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
	at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3011)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3469)
	... 233 more

Key part:

Caused by: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:UPDATE clusterlockstatus SET  UPDATE_TIME = ? ,  LOCKED_BY_NODE = ?  WHERE LOCK_NAME=? AND LOCKED_BY_NODE=? 
(Communications link failure: 
The last packet successfully received from the server was 1,339,594 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.)
	at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.bulkUpdateByAnd(DefaultOfBizDelegator.java:570)
	at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.bulkUpdateByAnd(WrappingOfBizDelegator.java:290)
	at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:126)
	at com.atlassian.jira.entity.Update$WhereContext.execute(Update.java:122)
	at com.atlassian.jira.cluster.lock.JiraClusterLockDao.unlock(JiraClusterLockDao.java:82)
	at com.atlassian.beehive.db.DatabaseClusterLock.unlock(DatabaseClusterLock.java:372)
	at com.atlassian.jira.config.DefaultReindexMessageManager.getMessage(DefaultReindexMessageManager.java:190)
...

Workaround

Restart the problematic node. See the following KB article for full instructions on this workaround:
https://confluence.atlassian.com/jirakb/jira-data-center-functionalities-loss-due-to-cluster-wide-lock-942860754.html

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

Screen Shot 2019-03-26 at 5.25.37 pm.png
93 kB
26/Mar/2019 11:15 PM

causes

JRASERVER-69899 Updating an issue causes TransactionRuntimeException or "current transaction is aborted" errors

Closed

DELTA-911 Loading...

is duplicated by

JRASERVER-66658 JIRA does not retry cluster unlock actions after database connectivity problems

Closed

is related to

JRASERVER-68477 Workflow scheme actions leave left-over rows in 'clusterlockstatus' table.

Gathering Impact

JRASERVER-69113 JiraDashboardStateStoreManager leave left-over rows in 'clusterlockstatus' table.

Gathering Impact

JRASERVER-65890 Create logging event for clusterlockstatus during startup

Gathering Interest

JRASERVER-66596 JIRA Datacenter - Add Cluster lock status page which doesn't use locks

Gathering Interest

relates to

JRASERVER-74298 Jira node fails to start due to cluster lock at the active objects

Closed

JSDSERVER-6621 Unable to restore from XML backup

Closed

JRASERVER-70147 XML Restore stucks at 90%

Closed

CLIP-537 Loading...

DELTA-767 Loading...

Mentioned in

FY19 Jira DC L1L2 Bugs - Recommend to fix in 7.13 --> 7.6 where technically possible: (12) DC Lock [JRA-66597]

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(2 is related to, 5 relates to, 1 Mentioned in, 34 mentioned in)

Assignee:: Karol Lopacinski

Reporter:: Andriy Yakovlev [Atlassian]

Votes:: 27 Vote for this issue

Watchers:: 55 Start watching this issue

Created:: 05/Jan/2018 11:18 AM

Updated:: 05/Sep/2023 7:57 PM

Resolved:: 22/Jul/2019 10:01 AM

Details

Description

Summary

Environment

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates