-
Bug
-
Resolution: Fixed
-
Low
-
6.4.14, 7.2.8, 7.5.2, 7.3.9, 7.13.0, 7.12.2, 7.6.10
-
6.04
-
16
-
Severity 2 - Major
-
14
-
-
Summary
JIRA Data Center will skip replication operations in the current batch and mark them as applied in case of exception during reindexing
Environment
- JIRA datacenter
Steps to Reproduce
- Setup JIRA datacenter cluster (node1, node2)
- Do modification (create/delete/modify issue) at node1 which cause exception at node2 (not clearly identified)
- Check lucene status at node2
Expected Results
Node1 and Node2 has same number of elements in Lucene and same state.
Actual Results
Node1 and Node2 has different number of elements in Lucene.
Example from clients case:
issue count | |
---|---|
Node1 | 1506680 |
Node2 | 1506567 |
Node3 | 1506666 |
The error could be thrown in the logs, log should have line with Error re-indexing node changes:
2017-01-08 00:34:24,690 NodeReindexServiceThread:thread-1 ERROR [jira.index.ha.DefaultNodeReindexService] Error re-indexing node changes com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:SELECT ID, NODE_ID, SENDING_NODE_ID, INDEX_OPERATION_ID FROM nodeindexcounter WHERE (NODE_ID = ? ) AND (SENDING_NODE_ID = ? ) (Query execution was interrupted) at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findByAnd(DefaultOfBizDelegator.java:127) at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.findByAnd(WrappingOfBizDelegator.java:106) at com.atlassian.jira.index.ha.OfBizNodeIndexCounterStore.getIndexOperationCounterForNodeId(OfBizNodeIndexCounterStore.java:109) at com.atlassian.jira.index.ha.DefaultNodeReindexService.getCurrentIndexCount(DefaultNodeReindexService.java:324) at com.atlassian.jira.index.ha.DefaultNodeReindexService.reIndex(DefaultNodeReindexService.java:280) at com.atlassian.jira.index.ha.DefaultNodeReindexService.access$000(DefaultNodeReindexService.java:58) at com.atlassian.jira.index.ha.DefaultNodeReindexService$1.run(DefaultNodeReindexService.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Notes
If you see those errors with Error re-indexing node changes line that means highly likely node skipped some operations from replication table.
Note on fix
We introduced a retry mechanism, that simply stores a list of the failed operations and retries them, without any modifications, at a later time.See Comment for more details on fix.
Workaround
- Run full reindex
- Or replicate index from another node:
- Run health-check (or empty JQL) and get status for each node
- Identify nodes(s) with maximin difference, these nodes need Lucene index to be copied from another node:
- Remove node from LB
- Copy index from another nodes to current node
- is related to
-
JRASERVER-63584 JIRA encounters problems when database transactions are rolled back
- Closed
-
JRASERVER-62181 Improve logging for NodeReindexServiceThread for Datacenter
- Closed
-
JRASERVER-67173 Improve logging for DefaultNodeReindexService and add progress status
- Closed
- relates to
-
JRASERVER-68400 Jira Data Center local indexes get inconsistent over time
- Gathering Impact
-
GRD-907 Loading...
-
PSR-122 Loading...
- was cloned as
-
JRASERVER-68528 Make Jira DC replication event retry mechanism more resilient
- Gathering Interest
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...