Expected Behavior
When an IOException is encountered during reindexing or restoring an index, Jira is able to retry the operation a few times before failing.
Actual Behavior
If an IOException is being thrown, e.g. because a file is being locked by another process, the reindex/index restore stops, which usually results in a corrupted index.
This usually happens due to a running anti-virus or due to JDK-4724038 bug on Windows.
The problem can manifest itself with an exception like this:
2018-07-10 22:58:51,518 NodeReindexServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexCopyService] Index restore started. Snapshot file: IndexSnapshot_16400.zip
2018-07-10 22:59:03,221 NodeReindexServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] Restoring search indexes - 1% complete... Replacing indexes
2018-07-10 22:59:03,486 NodeReindexServiceThread:thread-1 ERROR [c.a.j.index.ha.DefaultNodeReindexService] Error re-indexing node changes
java.lang.RuntimeException: java.io.IOException: Unable to delete file: D:\Atlassian\JIRA\caches\indexes\issues\_9j.cfs
at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager$ReplaceIndexRunner.run(DefaultIndexRecoveryManager.java:344)
at com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock(DefaultIndexManager.java:377)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.atlassian.jira.config.component.SwitchingInvocationHandler.invoke(SwitchingInvocationHandler.java:22)
at com.sun.proxy.$Proxy19.withReindexLock(Unknown Source)
at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:121)
at com.atlassian.jira.index.ha.DefaultIndexCopyService$MessageConsumer.restoreIndex(DefaultIndexCopyService.java:154)
at com.atlassian.jira.index.ha.DefaultIndexCopyService.restoreIndex(DefaultIndexCopyService.java:71)
at com.atlassian.jira.index.ha.DefaultNodeReindexService.updateAffectedIndexes(DefaultNodeReindexService.java:293)
at com.atlassian.jira.index.ha.DefaultNodeReindexService.reIndex(DefaultNodeReindexService.java:252)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unable to delete file: D:\Atlassian\JIRA\caches\indexes\issues\_9j.cfs
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1919)
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331)
at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager$ReplaceIndexRunner.replaceIndexes(DefaultIndexRecoveryManager.java:407)
at com.atlassian.jira.index.ha.DefaultIndexRecoveryManager$ReplaceIndexRunner.run(DefaultIndexRecoveryManager.java:342)
Steps to Reproduce
Lock one of the Lucene's files inside Jira home and start a reindex.
Workaround
Exclude Jira home directory from anti-virus scans.
If the problem is a result of the JVM bug, there is no known workaround for this at the moment.