Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-77905

Search Request (filter) catch up isn't multi-threaded when restoring snapshot

XMLWordPrintable

      Issue Summary

      When Jira starts up without an index or after a full reindex, it restores the issue index snapshot from the shared home.

      The snapshot doesn't contain the shared entity index (that contains the filters) so it does a full shared entity reindex that is single-threaded, using NodeReindexServiceThread:thread-0

      Steps to Reproduce

      1. Create tens of thousands of filters
      2. Either start Jira without an index or run a full reindex

      Expected Results

      The index snapshot restore happens quickly.

      Actual Results

      For customers with many filters, we see that the INDEX-FIXER gets stuck for some minutes. In this example, it takes 12 minutes:

      2024-07-13 09:39:42,696-0500 NodeReindexServiceThread:thread-0 INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering - 99% complete... Done catching up
      2024-07-13 09:51:36,083-0500 NodeReindexServiceThread:thread-0 INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering - 100% complete... Recovered all indexes
      

      During this time, NodeReindexServiceThread:thread-0 remains in this stack:

      (trimmed DB stack)
      com.atlassian.jira.diagnostic.connection.DiagnosticPreparedStatement.executeQuery(DiagnosticPreparedStatement.java:59)
      org.ofbiz.core.entity.jdbc.SQLProcessor.executeQuery(SQLProcessor.java:543)
      org.ofbiz.core.entity.GenericDAO.createEntityListIterator(GenericDAO.java:882)
      org.ofbiz.core.entity.GenericDAO.selectListIteratorByCondition(GenericDAO.java:862)
      org.ofbiz.core.entity.GenericDAO.selectByAnd(GenericDAO.java:734)
      org.ofbiz.core.entity.GenericHelperDAO.findByAnd(GenericHelperDAO.java:166)
      org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:913)
      org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:891)
      org.ofbiz.core.entity.GenericDelegator.findByAnd(GenericDelegator.java:860)
      com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findByAnd(DefaultOfBizDelegator.java:83)
      com.atlassian.jira.ofbiz.WrappingOfBizDelegator.findByAnd(WrappingOfBizDelegator.java:62)
      com.atlassian.jira.sharing.OfBizSharePermissionStore.getSharePermissions(OfBizSharePermissionStore.java:46)
      com.atlassian.jira.sharing.CachingSharePermissionStore.loadSharePermissions(CachingSharePermissionStore.java:166)
      com.atlassian.jira.sharing.CachingSharePermissionStore$$Lambda$838/0x00007f3bf30e24b0.load(Unknown Source)
      com.atlassian.cache.ehcache.wrapper.ValueProcessorAtlassianCacheLoaderDecorator.load(ValueProcessorAtlassianCacheLoaderDecorator.java:26)
      com.atlassian.cache.ehcache.LoadingCache.getFromLoader(LoadingCache.java:174)
      com.atlassian.cache.ehcache.LoadingCache$$Lambda$567/0x00007f3c686ee058.apply(Unknown Source)
      com.atlassian.cache.ehcache.SynchronizedLoadingCacheDecorator.synchronizedLoad(SynchronizedLoadingCacheDecorator.java:29)
      com.atlassian.cache.ehcache.LoadingCache.loadValueAndReleaseLock(LoadingCache.java:142)
      com.atlassian.cache.ehcache.LoadingCache.get(LoadingCache.java:121)
      com.atlassian.cache.ehcache.DelegatingCache.get(DelegatingCache.java:107)
      com.atlassian.cache.impl.metrics.InstrumentedCache.get(InstrumentedCache.java:72)
      com.atlassian.jira.cache.stats.CacheWithStats.get(CacheWithStats.java:46)
      com.atlassian.jira.sharing.CachingSharePermissionStore.getSharePermissions(CachingSharePermissionStore.java:57)
      com.atlassian.jira.sharing.DefaultShareManager.getSharePermissions(DefaultShareManager.java:46)
      com.atlassian.jira.issue.search.DefaultSearchRequestManager.setSharePermissions(DefaultSearchRequestManager.java:324)
      com.atlassian.jira.issue.search.DefaultSearchRequestManager$$Lambda$843/0x00007f3bf2e728d0.apply(Unknown Source)
      com.atlassian.jira.util.collect.TransformingEnclosedIterable.lambda$foreach$0(TransformingEnclosedIterable.java:26)
      com.atlassian.jira.util.collect.TransformingEnclosedIterable$$Lambda$8219/0x00007f2db77bd0b0.consume(Unknown Source)
      com.atlassian.jira.util.collect.CollectionUtil.foreach(CollectionUtil.java:39)
      com.atlassian.jira.util.collect.CloseableIterator$Functions.foreach(CloseableIterator.java:50)
      com.atlassian.jira.ofbiz.DatabaseIterable.foreach(DatabaseIterable.java:39)
      com.atlassian.jira.util.collect.TransformingEnclosedIterable.foreach(TransformingEnclosedIterable.java:26)
      com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndex(DefaultSharedEntityIndexManager.java:124)
      com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndexAll(DefaultSharedEntityIndexManager.java:93)
      com.atlassian.jira.sharing.index.DefaultSharedEntityIndexManager.reIndexAll(DefaultSharedEntityIndexManager.java:83)
      com.atlassian.jira.index.ha.DefaultIndexRecoveryManager$ReplaceIndexRunner.run(DefaultIndexRecoveryManager.java:503)
      com.atlassian.jira.util.thread.JiraThreadLocalUtils.lambda$wrap$1(JiraThreadLocalUtils.java:156)
      com.atlassian.jira.util.thread.JiraThreadLocalUtils$$Lambda$928/0x00007f3bf22e8840.run(Unknown Source)
      com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock(DefaultIndexManager.java:432)
      jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.23/Native Method)
      jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.23/NativeMethodAccessorImpl.java:62)
      jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.23/DelegatingMethodAccessorImpl.java:43)
      java.lang.reflect.Method.invoke(java.base@11.0.23/Method.java:566)
      com.atlassian.jira.config.component.SwitchingInvocationHandler.invoke(SwitchingInvocationHandler.java:38)
      com.sun.proxy.$Proxy71.withReindexLock(Unknown Source)
      com.atlassian.jira.index.ha.DefaultIndexRecoveryManager.recoverIndexFromBackup(DefaultIndexRecoveryManager.java:171)
      ...
      

      Also, because the reindex uses com.atlassian.jira.issue.index.DefaultIndexManager.withReindexLock we see several threads that time out when waiting for the index lock:

      2024-07-13 09:39:56,017-0500 https-jsse-nio-8443-exec-209 url: /rest/greenhopper/1.0/sprintquery/58919; user: myuser ERROR myuser 579x13391640x2 2664fv 10.0.0.10 /rest/greenhopper/1.0/sprintquery/1234 [c.a.j.issue.index.DefaultIndexManager] Wait attempt timed out - waited 30000 milliseconds
      com.atlassian.jira.issue.index.IndexException: Wait attempt timed out - waited 30000 milliseconds
      	at com.atlassian.jira.issue.index.DefaultIndexManager.obtain(DefaultIndexManager.java:966)
      2024-07-13 09:39:56,098-0500 https-jsse-nio-8443-exec-209 url: /rest/greenhopper/1.0/sprintquery/58919; user: myuser ERROR myuser 579x13391640x2 2664fv 10.0.0.10 /rest/greenhopper/1.0/sprintquery/1234 [c.a.p.r.c.error.jersey.ThrowableExceptionMapper] Uncaught exception thrown by REST service: null
      com.atlassian.jira.issue.index.SearchUnavailableException
      	at com.atlassian.jira.issue.index.DefaultIndexManager.getEntitySearcher(DefaultIndexManager.java:1021)
      

      In other nodes, we can see that index replication fails because of this:

      2024-07-13 09:43:23,552-0500 localq-reader-111 WARN      [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Runtime exception: UnrecoverableFailure occurred when processing: LocalQCacheOp{cacheName='com.atlassian.jira.cluster.dbr.DBRMessage', action=PUT, key=DBR, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1720881798527} from cache replication queue: [queueId=queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put, queuePath=/opt/atlassian/application-data/jira/localq/queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put], failuresCount: 1/1, error: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: 
          	java.net.SocketTimeoutException: Read timed out
      2024-07-13 09:43:23,552-0500 localq-reader-111 ERROR      [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Abandoning sending: LocalQCacheOp{cacheName='com.atlassian.jira.cluster.dbr.DBRMessage', action=PUT, key=DBR, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1720881798527} from cache replication queue: [queueId=queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put, queuePath=/opt/atlassian/application-data/jira/localq/queue_node2_0_78882aaeb08e9a4c81687b5de2add74f_put], failuresCount: 1/1. Removing from queue. Error: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: 
          	java.net.SocketTimeoutException: Read timed out
      com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpSender$UnrecoverableFailure: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: 
      	java.net.SocketTimeoutException: Read timed out
      	at com.atlassian.jira.cluster.distribution.localq.rmi.LocalQCacheOpRMISender.send(LocalQCacheOpRMISender.java:90)
      	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpReader.run(LocalQCacheOpReader.java:96)
      ...
      Caused by: java.net.SocketTimeoutException: Read timed out
      	at java.base/java.net.SocketInputStream.socketRead0(Native Method)
      	at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
      

      Workaround

      Currently, there is no known workaround for this behavior. A workaround will be added here when available

      As mitigation, you can:

      • Make sure the index in the sharepermissions table is optimized;
      • Avoid running full reindex during business hours

              Unassigned Unassigned
              5fb7769fcbc7 Allan Gandelman
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: