Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-64922

JIRA Data Center will skip replication operations in case of index exception

    XMLWordPrintable

Details

    • 6.04
    • 16
    • Severity 2 - Major
    • 14
    • Hide
      Atlassian Update – 21 December 2018

      Dear Jira users,

      We’re glad to announce that this issue will be addressed in our upcoming 8.0 release.

      You can find more details about our 8.0 beta release here — https://community.developer.atlassian.com/t/beta-for-jira-8-0-is-up-for-grabs/25588

      Looking forward to your feedback!

      Kind regards,
      Syed Masood
      Product Manager, Jira Server and Data Center

      Show
      Atlassian Update – 21 December 2018 Dear Jira users, We’re glad to announce that this issue will be addressed in our upcoming 8.0 release. You can find more details about our 8.0 beta release here — https://community.developer.atlassian.com/t/beta-for-jira-8-0-is-up-for-grabs/25588 Looking forward to your feedback! Kind regards, Syed Masood Product Manager, Jira Server and Data Center

    Description

      Summary

      JIRA Data Center will skip replication operations in the current batch and mark them as applied in case of exception during reindexing

      Environment

      • JIRA datacenter

      Steps to Reproduce

      1. Setup JIRA datacenter cluster (node1, node2)
      2. Do modification (create/delete/modify issue) at node1 which cause exception at node2 (not clearly identified)
      3. Check lucene status at node2

      Expected Results

      Node1 and Node2 has same number of elements in Lucene and same state.

      Actual Results

      Node1 and Node2 has different number of elements in Lucene.
      Example from clients case:

      issue count
      Node1 1506680
      Node2 1506567
      Node3 1506666

      The error could be thrown in the logs, log should have line with Error re-indexing node changes:

      2017-01-08 00:34:24,690 NodeReindexServiceThread:thread-1 ERROR      [jira.index.ha.DefaultNodeReindexService] Error re-indexing node changes
      com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: SQL Exception while executing the following:SELECT ID, NODE_ID, SENDING_NODE_ID, INDEX_OPERATION_ID FROM nodeindexcounter WHERE (NODE_ID =  ? ) AND (SENDING_NODE_ID =  ? ) (Query execution was interrupted)
      	at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findByAnd(DefaultOfBizDelegator.java:127)
      	at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.findByAnd(WrappingOfBizDelegator.java:106)
      	at com.atlassian.jira.index.ha.OfBizNodeIndexCounterStore.getIndexOperationCounterForNodeId(OfBizNodeIndexCounterStore.java:109)
      	at com.atlassian.jira.index.ha.DefaultNodeReindexService.getCurrentIndexCount(DefaultNodeReindexService.java:324)
      	at com.atlassian.jira.index.ha.DefaultNodeReindexService.reIndex(DefaultNodeReindexService.java:280)
      	at com.atlassian.jira.index.ha.DefaultNodeReindexService.access$000(DefaultNodeReindexService.java:58)
      	at com.atlassian.jira.index.ha.DefaultNodeReindexService$1.run(DefaultNodeReindexService.java:82)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      Notes

      If you see those errors with Error re-indexing node changes line that means highly likely node skipped some operations from replication table.

      Note on fix

      We introduced a retry mechanism, that simply stores a list of the failed operations and retries them, without any modifications, at a later time.See Comment for more details on fix.

      Workaround

      • Run full reindex
      • Or replicate index from another node:
      1. Run health-check (or empty JQL) and get status for each node
      2. Identify nodes(s) with maximin difference, these nodes need Lucene index to be copied from another node:
        1. Remove node from LB
        2. Copy index from another nodes to current node

      Attachments

        Issue Links

          Activity

            People

              drauf Daniel Rauf
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Votes:
              5 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: