-
Bug
-
Resolution: Fixed
-
Medium
-
9.4.6, 9.4.14, 9.12.11
-
9.04
-
73
-
Severity 3 - Minor
-
46
-
Issue Summary
This is reproducible on Data Center: yes
Steps to Reproduce
Reproduced on a customer environment but not able to replicate locally
An index copy is requested:
2022-10-25 10:51:55,405-0400 ClusterMessageHandlerServiceThread:thread-1 INFO anonymous [c.a.j.index.ha.DefaultIndexCopyService] Received message: "Index Backed Up" - notification that node: node2 created an index snapshot which can be restored on current node: node1 2022-10-25 10:51:55,406-0400 ClusterMessageHandlerServiceThread:thread-1 INFO anonymous [c.a.j.index.ha.DefaultIndexCopyService] Index restore started. Total 6882913 issues on instance before loading Snapshot file: IndexSnapshot_118402.tar.sz
The index restoration starts in node node1. Lock file is created and the path to it is put into LOCK_HELD Set.
2022-10-25 10:53:43,583-0400 ClusterMessageHandlerServiceThread:thread-1 INFO anonymous [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Restoring index with 6887494 issues 2022-10-25 10:53:43,586-0400 ClusterMessageHandlerServiceThread:thread-1 INFO anonymous [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Restoring search indexes - 1% complete... [INDEX-FIXER] Replacing indexes 2022-10-25 10:53:43,586-0400 ClusterMessageHandlerServiceThread:thread-1 INFO anonymous [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Starting the index restore process by replacing existing indexes. For this, we're holding the 'stop-the-world' index lock.
If something happens to the write.lock file (i.e. file is missing), a NoSuchFileException error will be visible in the logs:
2022-10-25 10:53:52,109-0400 ClusterMessageHandlerServiceThread:thread-1 INFO anonymous [c.a.jira.index.WriterWrapper] Closing index writer with non-existing file. Ignoring: /jirahome/caches/indexesV1/entities/searchrequest/write.lock java.nio.file.NoSuchFileException: /jirahome/caches/indexesV1/entities/searchrequest/write.lock
The index restore process won't end successfully because of this error. Jira didn't release the lock and, because of that, there were errors with the failure to obtain this lock.
Next the LockObtainFailedException failure will be visible in the logs "Lock held by this virtual machine"
2022-10-25 10:53:52,144-0400 ClusterMessageHandlerServiceThread:thread-1 ERROR anonymous [c.a.jira.cluster.OfBizMessageHandlerService] There was a problem handling a cluster message com.atlassian.jira.util.RuntimeIOException: org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine: /jirahome/caches/indexesV1/entities/searchrequest/write.lock at com.atlassian.jira.index.DefaultIndexEngine.clean(DefaultIndexEngine.java:143) at com.atlassian.jira.index.DefaultManager.deleteIndexDirectory(DefaultManager.java:40) at com.atlassian.jira.sharing.index.DefaultSharedEntityIndexer.recreate(DefaultSharedEntityIndexer.java:243) at com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndexAll(DefaultSharedEntityIndexManager.java:92) at com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndexAll(DefaultSharedEntityIndexManager.java:83) at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:194) at com.atlassian.jira.index.ha.DefaultIndexCopyService$MessageConsumer.restoreIndex(DefaultIndexCopyService.java:165) at com.atlassian.jira.index.ha.DefaultIndexCopyService$MessageConsumer.receive(DefaultIndexCopyService.java:191) at com.atlassian.jira.cluster.OfBizMessageHandlerService.sendLocalFromNode(OfBizMessageHandlerService.java:214) at com.atlassian.jira.cluster.OfBizMessageHandlerService.tryHandleMessage(OfBizMessageHandlerService.java:140) at com.atlassian.jira.cluster.OfBizMessageHandlerService.handleReceivedMessages(OfBizMessageHandlerService.java:124) at com.atlassian.jira.util.thread.JiraThreadLocalUtils.lambda$wrap$1(JiraThreadLocalUtils.java:156) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine: /jirahome/caches/indexesV1/entities/searchrequest/write.lock at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139) at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:948) at com.atlassian.jira.index.MonitoringIndexWriter.<init>(MonitoringIndexWriter.java:43) at com.atlassian.jira.index.DefaultIndexEngine.clean(DefaultIndexEngine.java:141)
Expected Results
Jira to be able to gracefully handle the situation by releasing the lock without a restart.
Actual Results
Index replication will fall behind on this node
Name: Cluster Index Replication NodeId: null Is healthy: false Failure reason: Index replication for cluster node 'node1' is behind by 4,300 seconds. Severity: CRITICAL
Workaround
Restart the affected node to clear the lock
- relates to
-
JRASERVER-74524 Running an index snapshot backup job whilst the index is being restored, breaks the node's index
-
- Closed
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...