Loading...

Type: Bug
Resolution: Unresolved
Priority: Medium
Fix Version/s: None
Affects Version/s: 9.12.10
Component/s: Data Center - Index, Indexing
Labels:
- FY25NNB
- NNB3

Introduced in Version:
9.12
Support reference count:
19
Symptom Severity:
Severity 3 - Minor
UIS:
66
Bug Fix Policy:
View Atlassian Server bug fix policy

Issue Summary

When Jira starts up without an index or after a full reindex, it restores the issue index snapshot from the shared home.

The snapshot doesn't contain the shared entity index (that contains the filters) so it does a full shared entity reindex that is single-threaded, using NodeReindexServiceThread:thread-0

Steps to Reproduce

Create tens of thousands of filters
Either start Jira without an index or run a full reindex

Expected Results

The index snapshot restore happens quickly.

Actual Results

For customers with many filters, we see that the INDEX-FIXER gets stuck for some minutes. In this example, it takes 12 minutes:

2024-07-13 09:39:42,696-0500 NodeReindexServiceThread:thread-0 INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering - 99% complete... Done catching up
2024-07-13 09:51:36,083-0500 NodeReindexServiceThread:thread-0 INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering - 100% complete... Recovered all indexes

During this time, NodeReindexServiceThread:thread-0 remains in this stack:

(trimmed DB stack)
com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement.executeQuery(DiagnosticPreparedStatement.java:59)
org.ofbiz.core.entity.jdbc.SQLProcessor.executeQuery(SQLProcessor.java:543)
org.ofbiz.core.entity.GenericDAO.createEntityListIterator(GenericDAO.java:882)
org.ofbiz.core.entity.GenericDAO.selectListIteratorByCondition(GenericDAO.java:862)
org.ofbiz.core.entity.GenericDAO.selectByAnd(GenericDAO.java:734)
org.ofbiz.core.entity.GenericHelperDAO.findByAnd(GenericHelperDAO.java:166)
org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:913)
org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:891)
org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:860)
com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findByAnd(DefaultOfBizDelegator.java:83)
com.atlassian.jira.ofbiz.WrappingOfBizDelegator.findByAnd(WrappingOfBizDelegator.java:62)
com.atlassian.jira.sharing.OfBizSharePermissionStore.getSharePermissions(OfBizSharePermissionStore.java:46)
com.atlassian.jira.sharing.CachingSharePermissionStore.loadSharePermissions(CachingSharePermissionStore.java:166)
com.atlassian.jira.sharing.CachingSharePermissionStore$$Lambda$838/0x00007f3bf30e24b0.load(Unknown Source)
com.atlassian.cache.ehcache.wrapper.ValueProcessorAtlassianCacheLoaderDecorator.load(ValueProcessorAtlassianCacheLoaderDecorator.java:26)
com.atlassian.cache.ehcache.LoadingCache.getFromLoader(LoadingCache.java:174)
com.atlassian.cache.ehcache.LoadingCache$$Lambda$567/0x00007f3c686ee058.apply(Unknown Source)
com.atlassian.cache.ehcache.SynchronizedLoadingCacheDecorator.synchronizedLoad(SynchronizedLoadingCacheDecorator.java:29)
com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:142)
com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:121)
com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:107)
com.atlassian.cache.impl.metrics.InstrumentedCache.get(InstrumentedCache.java:72)
com.atlassian.jira.cache.stats.CacheWithStats.get(CacheWithStats.java:46)
com.atlassian.jira.sharing.CachingSharePermissionStore.getSharePermissions(CachingSharePermissionStore.java:57)
com.atlassian.jira.sharing.DefaultShareManager.getSharePermissions(DefaultShareManager.java:46)
com.atlassian.jira.issue.search.DefaultSearchRequestManager.setSharePermissions(DefaultSearchRequestManager.java:324)
com.atlassian.jira.issue.search.DefaultSearchRequestManager$$Lambda$843/0x00007f3bf2e728d0.apply(Unknown Source)
com.atlassian.jira.util.collect.TransformingEnclosedIterable.lambda$foreach$0(TransformingEnclosedIterable.java:26)
com.atlassian.jira.util.collect.TransformingEnclosedIterable$$Lambda$8219/0x00007f2db77bd0b0.consume(Unknown Source)
com.atlassian.jira.util.collect.CollectionUtil.foreach(CollectionUtil.java:39)
com.atlassian.jira.util.collect.CloseableIterator$Functions.foreach(CloseableIterator.java:50)
com.atlassian.jira.ofbiz.DatabaseIterable.foreach(DatabaseIterable.java:39)
com.atlassian.jira.util.collect.TransformingEnclosedIterable.foreach(TransformingEnclosedIterable.java:26)
com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndex(DefaultSharedEntityIndexManager.java:124)
com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndexAll(DefaultSharedEntityIndexManager.java:93)
com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndexAll(DefaultSharedEntityIndexManager.java:83)
com.atlassian.jira.index.ha.DefaultIndexRecoveryManager$ReplaceIndexRunner.run(DefaultIndexRecoveryManager.java:503)
com.atlassian.jira.util.thread.JiraThreadLocalUtils.lambda$wrap$1(JiraThreadLocalUtils.java:156)
com.atlassian.jira.util.thread.JiraThreadLocalUtils$$Lambda$928/0x00007f3bf22e8840.run(Unknown Source)
com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock(DefaultIndexManager.java:432)
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.23/Native Method)
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.23/NativeMethodAccessorImpl.java:62)
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.23/DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(java.base@11.0.23/Method.java:566)
com.atlassian.jira.config.component.SwitchingInvocationHandler.invoke(SwitchingInvocationHandler.java:38)
com.sun.proxy.$Proxy71.withReindexLock(Unknown Source)
com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:171)
...

Also, because the reindex uses com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock we see several threads that time out when waiting for the index lock:

2024-07-13 09:39:56,017-0500 https-jsse-nio-8443-exec-209 url: /rest/greenhopper/1.0/sprintquery/58919; user: myuser ERROR myuser 579x13391640x2 2664fv 10.0.0.10 /rest/greenhopper/1.0/sprintquery/1234 [c.a.j.issue.index.DefaultIndexManager] Wait attempt timed out - waited 30000 milliseconds
com.atlassian.jira.issue.index.IndexException: Wait attempt timed out - waited 30000 milliseconds
	at com.atlassian.jira.issue.index.DefaultIndexManager.obtain(DefaultIndexManager.java:966)
2024-07-13 09:39:56,098-0500 https-jsse-nio-8443-exec-209 url: /rest/greenhopper/1.0/sprintquery/58919; user: myuser ERROR myuser 579x13391640x2 2664fv 10.0.0.10 /rest/greenhopper/1.0/sprintquery/1234 [c.a.p.r.c.error.jersey.ThrowableExceptionMapper] Uncaught exception thrown by REST service: null
com.atlassian.jira.issue.index.SearchUnavailableException
	at com.atlassian.jira.issue.index.DefaultIndexManager.getEntitySearcher(DefaultIndexManager.java:1021)

In other nodes, we can see that index replication fails because of this:

2024-07-13 09:43:23,552-0500 localq-reader-111 WARN      [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Runtime exception: UnrecoverableFailure occurred when processing: LocalQCacheOp{cacheName='com.atlassian.jira.cluster.dbr.DBRMessage', action=PUT, key=DBR, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1720881798527} from cache replication queue: [queueId=queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put, queuePath=/opt/atlassian/application-data/jira/localq/queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put], failuresCount: 1/1, error: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: 
    	java.net.SocketTimeoutException: Read timed out
2024-07-13 09:43:23,552-0500 localq-reader-111 ERROR      [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Abandoning sending: LocalQCacheOp{cacheName='com.atlassian.jira.cluster.dbr.DBRMessage', action=PUT, key=DBR, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1720881798527} from cache replication queue: [queueId=queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put, queuePath=/opt/atlassian/application-data/jira/localq/queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put], failuresCount: 1/1. Removing from queue. Error: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: 
    	java.net.SocketTimeoutException: Read timed out
com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpSender$UnrecoverableFailure: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: 
	java.net.SocketTimeoutException: Read timed out
	at com.atlassian.jira.cluster.distribution.localq.rmi.LocalQCacheOpRMISender.send(LocalQCacheOpRMISender.java:90)
	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpReader.run(LocalQCacheOpReader.java:96)
...
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.base/java.net.SocketInputStream.socketRead0(Native Method)
	at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)

Workaround

Currently, there is no known workaround for this behavior. A workaround will be added here when available

As mitigation, you can:

Make sure the index in the sharepermissions table is optimized;
Avoid running full reindex during business hours

relates to

JRASERVER-76237 Index catch-up of Comments and Worklogs isn't multi-threaded during Jira startup

Closed

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Issue Links

Forms

Activity

People

Dates