Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-72944

Restoring an index snapshot after a full re-index might trigger the index fixer, delaying the node start up

    • 8.1
    • 16
    • Severity 3 - Minor
    • 68
    • Hide
      Atlassian Update – 21 Mar 2022

      Hi everyone,

      In Jira 9.0 we ensure versions to each entity (issue, comment, worklog and change history). After upgrading to this version all missing versions will be properly added as a part of upgrade task.

      Also we change behaviour of making index snapshots. Now if Jira instance hasn't consistent index it will not make a snapshot and, as a result, snapshot with corrupted index will not be put to shared home directory.

      More details how to handle situation when index is not consistent can be found here: https://confluence.atlassian.com/jirakb/indexing-inconsistency-troubleshooting-1114800953.html

      Thank you,
      Michał Błajet,
      Jira Developer

      Show
      Atlassian Update – 21 Mar 2022 Hi everyone, In Jira 9.0 we ensure versions to each entity (issue, comment, worklog and change history). After upgrading to this version all missing versions will be properly added as a part of upgrade task. Also we change behaviour of making index snapshots. Now if Jira instance hasn't consistent index it will not make a snapshot and, as a result, snapshot with corrupted index will not be put to shared home directory. More details how to handle situation when index is not consistent can be found here: https://confluence.atlassian.com/jirakb/indexing-inconsistency-troubleshooting-1114800953.html Thank you, Michał Błajet, Jira Developer

      Issue Summary

      When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

      When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
      While this process happens, the index replication is kept on hold.
       

      Steps to Reproduce

      1. Stop all nodes
      2. Start 1 node
      3. Run a full locked reindex
      4. Start a second node

      Expected Results

      After getting the index snapshot from the first node, the node should operate normally in a short time.

      Actual Results

      If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

      During this time, we see the index-fixer process stuck at 60%, at the atlassian-jira.log:

      2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
      

      Workaround 1

      Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.
      Example SQL:

      update issue_version set update_time = 'epoch' 
      update comment_version set update_time = 'epoch' 
      update worklog_version set update_time = 'epoch' 

            [JRASERVER-72944] Restoring an index snapshot after a full re-index might trigger the index fixer, delaying the node start up

            Rodrigo Martinez made changes -
            Remote Link Original: This issue links to "DELTA-1140 (Bulldog)" [ 593702 ] New: This issue links to "DELTA-1140 (JIRA Server (Bulldog))" [ 593702 ]
            Antoni Kowalski made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 794643 ]
            Alp made changes -
            Description Original: h3. Issue Summary

            When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

            When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
            While this process happens, the index replication is kept on hold.
             
            h3. Steps to Reproduce
             # Stop all nodes
             # Start 1 node
             # Run a full locked reindex
             # Start a second node

            h3. Expected Results

            After getting the index snapshot from the first node, the node should operate normally in a short time.
            h3. Actual Results

            If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

            During this time, we see the index-fixer process stuck at 60%, at the {{{}atlassian-jira.log{}}}:
            {code:java}
            2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
            {code}
            h3. Workaround 1
            Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.
            Example SQL:
            {code:java}
            update issue_version set update_time = 'epoch'
            update comment_version set update_time = 'epoch'
            update worklog_version set update_time = 'epoch' {code}
            h3. Workaround 2
            The workaround below will help you to start a new Jira node faster but you will still need to run a lock-reindexing before you put it behind the load balancer:
             * Disable Index Recovery by navigating to *System > Indexing (in the Advanced section)* and setting *Enable Index Recovery* OFF.
             * Move the Index Snapshots that you currently have from *JIRA_INSTALL/temp/* (on Node1) and *SHARED_FOLDER/export/indexsnapshots/* to another folder so that when starting the new node, it shouldn't go and search for the snapshots
             * Copy Jira Install and Home directories to the new node
             * Edit the *cluster.properties* file in the local home directory and change *jira.node.id* to a new, unique identified.
             * Go to the *Jira_Home/caches* folder +on the new node+ and delete *indexesV1*
             * Start the new node
             * Login to Jira over the new node and run lock reindexing
             * Once the indexing is done, put it behind the load balancer
             * Enable index recovery again
            New: h3. Issue Summary

            When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

            When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
            While this process happens, the index replication is kept on hold.
             
            h3. Steps to Reproduce
             # Stop all nodes
             # Start 1 node
             # Run a full locked reindex
             # Start a second node

            h3. Expected Results

            After getting the index snapshot from the first node, the node should operate normally in a short time.
            h3. Actual Results

            If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

            During this time, we see the index-fixer process stuck at 60%, at the {{{}atlassian-jira.log{}}}:
            {code:java}
            2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
            {code}
            h3. Workaround 1
            Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.
            Example SQL:
            {code:java}
            update issue_version set update_time = 'epoch'
            update comment_version set update_time = 'epoch'
            update worklog_version set update_time = 'epoch' {code}
            Devisree Gedda made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 756142 ]
            Filipi Lima made changes -
            Affects Version/s New: 8.20.10 [ 100697 ]
            Affects Version/s New: 8.20.14 [ 103090 ]
            Kunal Kishore made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 731060 ]
            Alp made changes -
            Description Original: h3. Issue Summary

            When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

            When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
            While this process happens, the index replication is kept on hold.
             
            h3. Steps to Reproduce
             # Stop all nodes
             # Start 1 node
             # Run a full locked reindex
             # Start a second node

            h3. Expected Results

            After getting the index snapshot from the first node, the node should operate normally in a short time.
            h3. Actual Results

            If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

            During this time, we see the index-fixer process stuck at 60%, at the {{{}atlassian-jira.log{}}}:
            {code:java}
            2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
            {code}
            h3. Workaround

            Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.

            Example SQL:
            {code:java}
            update issue_version set update_time = 'epoch'
            update comment_version set update_time = 'epoch'
            update worklog_version set update_time = 'epoch' {code}
            New: h3. Issue Summary

            When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

            When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
            While this process happens, the index replication is kept on hold.
             
            h3. Steps to Reproduce
             # Stop all nodes
             # Start 1 node
             # Run a full locked reindex
             # Start a second node

            h3. Expected Results

            After getting the index snapshot from the first node, the node should operate normally in a short time.
            h3. Actual Results

            If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

            During this time, we see the index-fixer process stuck at 60%, at the {{{}atlassian-jira.log{}}}:
            {code:java}
            2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
            {code}
            h3. Workaround 1
            Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.
            Example SQL:
            {code:java}
            update issue_version set update_time = 'epoch'
            update comment_version set update_time = 'epoch'
            update worklog_version set update_time = 'epoch' {code}
            h3. Workaround 2
            The workaround below will help you to start a new Jira node faster but you will still need to run a lock-reindexing before you put it behind the load balancer:
             * Disable Index Recovery by navigating to *System > Indexing (in the Advanced section)* and setting *Enable Index Recovery* OFF.
             * Move the Index Snapshots that you currently have from *JIRA_INSTALL/temp/* (on Node1) and *SHARED_FOLDER/export/indexsnapshots/* to another folder so that when starting the new node, it shouldn't go and search for the snapshots
             * Copy Jira Install and Home directories to the new node
             * Edit the *cluster.properties* file in the local home directory and change *jira.node.id* to a new, unique identified.
             * Go to the *Jira_Home/caches* folder +on the new node+ and delete *indexesV1*
             * Start the new node
             * Login to Jira over the new node and run lock reindexing
             * Once the indexing is done, put it behind the load balancer
             * Enable index recovery again
            Maciej Swinarski (Inactive) made changes -
            Assignee New: Maciej Swinarski [ mswinarski ]
            Mila made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 658307 ]
            Maciej Swinarski (Inactive) made changes -
            Description Original: h3. Issue Summary
            When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

            When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
            While this process happens, the index replication is kept on hold.
             
            h3. Steps to Reproduce
            # Stop all nodes
            # Start 1 node
            # Run a full locked reindex
            # Start a second node

            h3. Expected Results
            After getting the index snapshot from the first node, the node should operate normally in a short time.

            h3. Actual Results
            If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

            During this time, we see the index-fixer process stuck at 60%, at the {{atlassian-jira.log}}:
            {code}
            2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
            {code}

            h3. Workaround

            Currently there is no known workaround for this behavior. A workaround will be added here when available
            New: h3. Issue Summary

            When restoring the index from another node, the index-fixer process checks the last 24h for issues that aren't in the index already, and reindex them.

            When the index snapshot is from a node that has just run a full-locked reindex, the index-fixer will reindex all issues updated in the last 24h, in a single-threaded process.
            While this process happens, the index replication is kept on hold.
             
            h3. Steps to Reproduce
             # Stop all nodes
             # Start 1 node
             # Run a full locked reindex
             # Start a second node

            h3. Expected Results

            After getting the index snapshot from the first node, the node should operate normally in a short time.
            h3. Actual Results

            If there were many issue updates in the last 24 hours, the node will become usable, but the issue replication takes a long time to start.

            During this time, we see the index-fixer process stuck at 60%, at the {{{}atlassian-jira.log{}}}:
            {code:java}
            2021-10-22 05:58:41,181 ClusterMessageHandlerServiceThread:thread-1 INFO [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering search indexes - 60% complete... [INDEX-FIXER] Re-indexing issues modified in the last {1 days, 0 hours, 2 minutes, and 38 seconds}. (Versioning short-circuit checks are enabled.)
            {code}
            h3. Workaround

            Before restoring the index make sure all entities version update time are older than index snapshot creation time minus 24 hours.

            Example SQL:
            {code:java}
            update issue_version set update_time = 'epoch'
            update comment_version set update_time = 'epoch'
            update worklog_version set update_time = 'epoch' {code}

              mswinarski Maciej Swinarski (Inactive)
              5fb7769fcbc7 Allan Gandelman
              Affected customers:
              6 This affects my team
              Watchers:
              28 Start watching this issue

                Created:
                Updated:
                Resolved: