-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
9.12.10
-
9.12
-
9
-
Severity 3 - Minor
-
56
-
Issue Summary
When Jira starts up without an index or after a full reindex, it restores the issue index snapshot from the shared home.
The snapshot doesn't contain the shared entity index (that contains the filters) so it does a full shared entity reindex that is single-threaded, using NodeReindexServiceThread:thread-0
Steps to Reproduce
- Create tens of thousands of filters
- Either start Jira without an index or run a full reindex
Expected Results
The index snapshot restore happens quickly.
Actual Results
For customers with many filters, we see that the INDEX-FIXER gets stuck for some minutes. In this example, it takes 12 minutes:
2024-07-13 09:39:42,696-0500 NodeReindexServiceThread:thread-0 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering - 99% complete... Done catching up 2024-07-13 09:51:36,083-0500 NodeReindexServiceThread:thread-0 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering - 100% complete... Recovered all indexes
During this time, NodeReindexServiceThread:thread-0 remains in this stack:
(trimmed DB stack) com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement.executeQuery(DiagnosticPreparedStatement.java:59) org.ofbiz.core.entity.jdbc.SQLProcessor.executeQuery(SQLProcessor.java:543) org.ofbiz.core.entity.GenericDAO.createEntityListIterator(GenericDAO.java:882) org.ofbiz.core.entity.GenericDAO.selectListIteratorByCondition(GenericDAO.java:862) org.ofbiz.core.entity.GenericDAO.selectByAnd(GenericDAO.java:734) org.ofbiz.core.entity.GenericHelperDAO.findByAnd(GenericHelperDAO.java:166) org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:913) org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:891) org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:860) com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findByAnd(DefaultOfBizDelegator.java:83) com.atlassian.jira.ofbiz.WrappingOfBizDelegator.findByAnd(WrappingOfBizDelegator.java:62) com.atlassian.jira.sharing.OfBizSharePermissionStore.getSharePermissions(OfBizSharePermissionStore.java:46) com.atlassian.jira.sharing.CachingSharePermissionStore.loadSharePermissions(CachingSharePermissionStore.java:166) com.atlassian.jira.sharing.CachingSharePermissionStore$$Lambda$838/0x00007f3bf30e24b0.load(Unknown Source) com.atlassian.cache.ehcache.wrapper.ValueProcessorAtlassianCacheLoaderDecorator.load(ValueProcessorAtlassianCacheLoaderDecorator.java:26) com.atlassian.cache.ehcache.LoadingCache.getFromLoader(LoadingCache.java:174) com.atlassian.cache.ehcache.LoadingCache$$Lambda$567/0x00007f3c686ee058.apply(Unknown Source) com.atlassian.cache.ehcache.SynchronizedLoadingCacheDecorator.synchronizedLoad(SynchronizedLoadingCacheDecorator.java:29) com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:142) com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:121) com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:107) com.atlassian.cache.impl.metrics.InstrumentedCache.get(InstrumentedCache.java:72) com.atlassian.jira.cache.stats.CacheWithStats.get(CacheWithStats.java:46) com.atlassian.jira.sharing.CachingSharePermissionStore.getSharePermissions(CachingSharePermissionStore.java:57) com.atlassian.jira.sharing.DefaultShareManager.getSharePermissions(DefaultShareManager.java:46) com.atlassian.jira.issue.search.DefaultSearchRequestManager.setSharePermissions(DefaultSearchRequestManager.java:324) com.atlassian.jira.issue.search.DefaultSearchRequestManager$$Lambda$843/0x00007f3bf2e728d0.apply(Unknown Source) com.atlassian.jira.util.collect.TransformingEnclosedIterable.lambda$foreach$0(TransformingEnclosedIterable.java:26) com.atlassian.jira.util.collect.TransformingEnclosedIterable$$Lambda$8219/0x00007f2db77bd0b0.consume(Unknown Source) com.atlassian.jira.util.collect.CollectionUtil.foreach(CollectionUtil.java:39) com.atlassian.jira.util.collect.CloseableIterator$Functions.foreach(CloseableIterator.java:50) com.atlassian.jira.ofbiz.DatabaseIterable.foreach(DatabaseIterable.java:39) com.atlassian.jira.util.collect.TransformingEnclosedIterable.foreach(TransformingEnclosedIterable.java:26) com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndex(DefaultSharedEntityIndexManager.java:124) com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndexAll(DefaultSharedEntityIndexManager.java:93) com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndexAll(DefaultSharedEntityIndexManager.java:83) com.atlassian.jira.index.ha.DefaultIndexRecoveryManager$ReplaceIndexRunner.run(DefaultIndexRecoveryManager.java:503) com.atlassian.jira.util.thread.JiraThreadLocalUtils.lambda$wrap$1(JiraThreadLocalUtils.java:156) com.atlassian.jira.util.thread.JiraThreadLocalUtils$$Lambda$928/0x00007f3bf22e8840.run(Unknown Source) com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock(DefaultIndexManager.java:432) jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.23/Native Method) jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.23/NativeMethodAccessorImpl.java:62) jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.23/DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(java.base@11.0.23/Method.java:566) com.atlassian.jira.config.component.SwitchingInvocationHandler.invoke(SwitchingInvocationHandler.java:38) com.sun.proxy.$Proxy71.withReindexLock(Unknown Source) com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:171) ...
Also, because the reindex uses com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock we see several threads that time out when waiting for the index lock:
2024-07-13 09:39:56,017-0500 https-jsse-nio-8443-exec-209 url: /rest/greenhopper/1.0/sprintquery/58919; user: myuser ERROR myuser 579x13391640x2 2664fv 10.0.0.10 /rest/greenhopper/1.0/sprintquery/1234 [c.a.j.issue.index.DefaultIndexManager] Wait attempt timed out - waited 30000 milliseconds com.atlassian.jira.issue.index.IndexException: Wait attempt timed out - waited 30000 milliseconds at com.atlassian.jira.issue.index.DefaultIndexManager.obtain(DefaultIndexManager.java:966) 2024-07-13 09:39:56,098-0500 https-jsse-nio-8443-exec-209 url: /rest/greenhopper/1.0/sprintquery/58919; user: myuser ERROR myuser 579x13391640x2 2664fv 10.0.0.10 /rest/greenhopper/1.0/sprintquery/1234 [c.a.p.r.c.error.jersey.ThrowableExceptionMapper] Uncaught exception thrown by REST service: null com.atlassian.jira.issue.index.SearchUnavailableException at com.atlassian.jira.issue.index.DefaultIndexManager.getEntitySearcher(DefaultIndexManager.java:1021)
In other nodes, we can see that index replication fails because of this:
2024-07-13 09:43:23,552-0500 localq-reader-111 WARN [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Runtime exception: UnrecoverableFailure occurred when processing: LocalQCacheOp{cacheName='com.atlassian.jira.cluster.dbr.DBRMessage', action=PUT, key=DBR, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1720881798527} from cache replication queue: [queueId=queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put, queuePath=/opt/atlassian/application-data/jira/localq/queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put], failuresCount: 1/1, error: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: java.net.SocketTimeoutException: Read timed out 2024-07-13 09:43:23,552-0500 localq-reader-111 ERROR [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Abandoning sending: LocalQCacheOp{cacheName='com.atlassian.jira.cluster.dbr.DBRMessage', action=PUT, key=DBR, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1720881798527} from cache replication queue: [queueId=queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put, queuePath=/opt/atlassian/application-data/jira/localq/queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put], failuresCount: 1/1. Removing from queue. Error: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: java.net.SocketTimeoutException: Read timed out com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpSender$UnrecoverableFailure: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: java.net.SocketTimeoutException: Read timed out at com.atlassian.jira.cluster.distribution.localq.rmi.LocalQCacheOpRMISender.send(LocalQCacheOpRMISender.java:90) at com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpReader.run(LocalQCacheOpReader.java:96) ... Caused by: java.net.SocketTimeoutException: Read timed out at java.base/java.net.SocketInputStream.socketRead0(Native Method) at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
Workaround
Currently, there is no known workaround for this behavior. A workaround will be added here when available
As mitigation, you can:
- Make sure the index in the sharepermissions table is optimized;
- Avoid running full reindex during business hours
- relates to
-
JRASERVER-76237 Index catch-up of Comments and Worklogs isn't multi-threaded during Jira startup
- Closed