-
Bug
-
Resolution: Cannot Reproduce
-
High
-
None
-
8.22.6
-
8.22
-
12
-
Severity 2 - Major
-
41
-
-
Jira node might fail to unlock if there is a database connectivity problem.
When other nodes are starting up, the localhost-startStop thread sees the lock and waits, and the node startup hangs.
Environment
JIRA Datacenter
Steps to Reproduce
1. Set up a Datacenter JIRA instance with 2 nodes
2. Initiate action that would obtain the lock
3. Break connection to the database while this node has a clusterlock and tries to unlock it
Expected Results
JIRA will warn that there is a lock for the active objects from the other node.
OR
The other node will unlock once connected back to the database.
Actual Results
Node does not start.
Thread dumps collected during node startup show the localhost-startStop thread waiting for the lock for ActiveObjects:
10:21:15 - localhost-startStop-1 State:WAITING CPU usage:0.0% Running for: 0:56.95 Waiting for This thread is waiting for notification on lock [0x73ba00030] without an owner Locks held This thread holds [0x73ba002b0, 0x500a991f8, 0x500008af8, 0x500008af8] Stack trace jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method) java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/Unknown Source) java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.13/Unknown Source) java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.13/Unknown Source) java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.13/Unknown Source) java.util.concurrent.CompletableFuture.get(java.base@11.0.13/Unknown Source) io.atlassian.util.concurrent.Promises$OfStage.claim(Promises.java:280) com.atlassian.activeobjects.osgi.TenantAwareActiveObjects.flushAll(TenantAwareActiveObjects.java:247) jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.13/Native Method) jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.13/Unknown Source) jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.13/Unknown Source) java.lang.reflect.Method.invoke(java.base@11.0.13/Unknown Source) org.joor.Reflect.on(Reflect.java:673) org.joor.Reflect.call(Reflect.java:379) org.joor.Reflect.call(Reflect.java:332) com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl.invokeAo(DatabaseSchemaCreationImpl.java:86) com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl$$Lambda$2360/0x0000000803c4e440.apply(Unknown Source) io.atlassian.fugue.Effect.accept(Effect.java:43) io.atlassian.fugue.Option$Some.forEach(Option.java:468) io.atlassian.fugue.Option$Some.foreach(Option.java:464) com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl.lambda$primeImpl$0(DatabaseSchemaCreationImpl.java:66) com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl$$Lambda$1532/0x0000000802169040.apply(Unknown Source) com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference.lambda$get$0(MemoizingResettingReference.java:59) com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference$$Lambda$2359/0x0000000803c4e040.get(Unknown Source) com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference$SmarterMemoizingSupplier.get(MemoizingResettingReference.java:150) com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference.safelyGetT(MemoizingResettingReference.java:71) com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference.get(MemoizingResettingReference.java:63) com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl.prime(DatabaseSchemaCreationImpl.java:60) com.atlassian.pocketknife.internal.querydsl.DatabaseAccessorImpl.execute(DatabaseAccessorImpl.java:62) com.atlassian.pocketknife.internal.querydsl.DatabaseAccessorImpl.runInTransaction(DatabaseAccessorImpl.java:43) com.atlassian.ratelimiting.db.internal.dao.QDSLSystemRateLimitingSettingsDao.initializeDbIfNeeded(QDSLSystemRateLimitingSettingsDao.java:39) com.atlassian.ratelimiting.internal.configuration.DefaultSystemPropertiesService.initializeData(DefaultSystemPropertiesService.java:64) com.atlassian.ratelimiting.internal.settings.RateLimitModificationSettingsService.onPluginEnabled(RateLimitModificationSettingsService.java:96) jdk.internal.reflect.GeneratedMethodAccessor331.invoke(Unknown Source) jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.13/Unknown Source) java.lang.reflect.Method.invoke(java.base@11.0.13/Unknown Source) com.atlassian.event.internal.SingleParameterMethodListenerInvoker.invoke(SingleParameterMethodListenerInvoker.java:42) com.atlassian.event.internal.ComparableListenerInvoker.invoke(ComparableListenerInvoker.java:48) com.atlassian.event.internal.AsynchronousAbleEventDispatcher.lambda$null$0(AsynchronousAbleEventDispatcher.java:37) com.atlassian.event.internal.AsynchronousAbleEventDispatcher$$Lambda$707/0x0000000800a60c40.run(Unknown Source) com.atlassian.event.internal.AsynchronousAbleEventDispatcher$$Lambda$180/0x0000000800376440.execute(Unknown Source) com.atlassian.event.internal.AsynchronousAbleEventDispatcher.dispatch(AsynchronousAbleEventDispatcher.java:85) com.atlassian.event.internal.EventPublisherImpl.publish(EventPublisherImpl.java:114) com.atlassian.event.internal.LockFreeEventPublisher.publish(LockFreeEventPublisher.java:40) com.atlassian.plugin.event.impl.DefaultPluginEventManager.broadcast(DefaultPluginEventManager.java:90) com.atlassian.plugin.manager.DefaultPluginManager.broadcastIgnoreError(DefaultPluginManager.java:1972) com.atlassian.plugin.manager.DefaultPluginManager.lambda$broadcastPluginEnabled$41(DefaultPluginManager.java:1782) com.atlassian.plugin.manager.DefaultPluginManager$$Lambda$2192/0x0000000803982040.run(Unknown Source) com.atlassian.plugin.manager.PluginTransactionContext.wrap(PluginTransactionContext.java:63) com.atlassian.plugin.manager.DefaultPluginManager.broadcastPluginEnabled(DefaultPluginManager.java:1781) com.atlassian.plugin.manager.DefaultPluginManager.lambda$enableDependentPlugins$24(DefaultPluginManager.java:1260) com.atlassian.plugin.manager.DefaultPluginManager$$Lambda$1018/0x0000000800dd2440.run(Unknown Source) com.atlassian.plugin.manager.PluginTransactionContext.wrap(PluginTransactionContext.java:63) com.atlassian.plugin.manager.DefaultPluginManager.enableDependentPlugins(DefaultPluginManager.java:1229) com.atlassian.plugin.manager.DefaultPluginManager.lambda$addPlugins$22(DefaultPluginManager.java:1214) com.atlassian.plugin.manager.DefaultPluginManager$$Lambda$1014/0x0000000800dd3440.run(Unknown Source) com.atlassian.plugin.manager.PluginTransactionContext.wrap(PluginTransactionContext.java:63) com.atlassian.plugin.manager.DefaultPluginManager.addPlugins(DefaultPluginManager.java:1114) com.atlassian.jira.plugin.JiraPluginManager.addPlugins(JiraPluginManager.java:157) com.atlassian.plugin.manager.DefaultPluginManager.lambda$earlyStartup$5(DefaultPluginManager.java:593) com.atlassian.plugin.manager.DefaultPluginManager$$Lambda$756/0x0000000800a83840.run(Unknown Source) com.atlassian.plugin.manager.PluginTransactionContext.wrap(PluginTransactionContext.java:63) com.atlassian.plugin.manager.DefaultPluginManager.earlyStartup(DefaultPluginManager.java:528)
The Active-Objects init thread is waiting for the lock:
10:21:04 - active-objects-init-JiraTenantImpl{id='system'}-0 State:TIMED-WAITING CPU usage:6.7% Running for: 0:03.75 Waiting for This thread is not waiting for notification on any lock Locks held This thread does not hold any locks Stack trace java.lang.Thread.sleep(java.base@11.0.13/Native Method) com.atlassian.beehive.db.DatabaseClusterLock.sleep(DatabaseClusterLock.java:621) com.atlassian.beehive.db.DatabaseClusterLock.tryLockWaitWithTimeout(DatabaseClusterLock.java:472) com.atlassian.beehive.db.DatabaseClusterLock.tryLock(DatabaseClusterLock.java:453) com.atlassian.activeobjects.internal.AbstractActiveObjectsFactory.create(AbstractActiveObjectsFactory.java:57) com.atlassian.activeobjects.internal.DelegatingActiveObjectsFactory.create(DelegatingActiveObjectsFactory.java:32) com.atlassian.activeobjects.osgi.TenantAwareActiveObjects$1$1$1.call(TenantAwareActiveObjects.java:91) com.atlassian.activeobjects.osgi.TenantAwareActiveObjects$1$1$1.call(TenantAwareActiveObjects.java:86) com.atlassian.sal.core.executor.ThreadLocalDelegateCallable.call(ThreadLocalDelegateCallable.java:38) java.util.concurrent.FutureTask.run(java.base@11.0.13/Unknown Source) java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/Unknown Source) java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/Unknown Source) java.lang.Thread.run(java.base@11.0.13/Unknown Source)
However, table clusterlockstatus shows that there is a lock from another node being held for more than one week, since some Database Connection problems had occurred:
ID | LOCK_NAME | LOCKED_BY_NODE | UPDATE_TIME |
---|---|---|---|
1747827 | ao-plugin.upgrade.com.atlassian.jira.migration.jira-migration-plugin | node4 | 1660313500762 |
Workaround
Workaround 1
Restart the problematic node. See the following KB article for full instructions on this workaround:
https://confluence.atlassian.com/jirakb/jira-data-center-functionalities-loss-due-to-cluster-wide-lock-942860754.html
Workaround 2
- Run the following query to find the lock:
select * from clusterlockstatus where locked_by_node is not NULL;
- The query will return something like:
ID LOCK_NAME LOCKED_BY_NODE UPDATE_TIME 1747827 ao-plugin.upgrade.com.atlassian.jira.migration.jira-migration-plugin node4 1660313500762
* Remove the lock:
delete from clusterlockstatus where id = 1747827;