Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-76237

Index catch-up of Comments and Worklogs isn't multi-threaded during Jira startup

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Low
    • None
    • 9.3.0, 9.4.6, 9.4.9
    • None

    Description

      Issue Summary

      With the publishing of JRASERVER-74232, Jira 9.3+ now catches-up the Index changes of Issues through multiple Threads during startup.
      It makes use of the same Thread pool and batch configurations as in a Full Reindex, considerably improving startup time.

      Comments and Worklogs Index catch-up, though, are still processed in a single-thread during startup. For some instances, this still may account for a lot of startup time.

      This particularly affects customers who recycle nodes through Docker or Kubernetes, as the local indexes are usually wiped clean and the nodes rely on shared home Index Snapshots during startup.

      This is reproducible on Data Center: yes

      Steps to Reproduce

      1. Have the cluster generate an Index Snapshot to the shared home
      2. Add hundreds of thousands of comments to thousands of Issues
      3. Startup (or restart) a Jira node with no local Index folders
      4. Monitor how long it takes to catch-up with the changes since the last available Index Snapshot during startup

      Expected Results

      Startup will be quick, as since JRASERVER-74232 (9.3.0) the catch-up is multi-threaded.

      Actual Results

      We observe the Issue changes catch-up since the last Snapshot are multi-threaded, but the Comments and Worklog catch-up are single-threaded, happening on the main Thread.

      During startup, we can see a pattern similar to this in the atlassian-jira.log:

      grep -e "exclusive use" -e "Warmed cache" application-logs/atlassian-jira.log
      
      2023-08-25 20:43:27,223 main INFO      [c.a.jira.startup.JiraHomeStartupCheck] The jira.home directory '/var/atlassian/application-data/jira' is validated and locked for exclusive use by this instance.
      2023-08-25 21:48:55,366 Caesium-1-3 INFO Anonymous user     [c.a.jira.startup.CacheWarmerLauncher] Warmed cache(s) in 16907 ms.
      

      Start command was issued at 20:43:27 and startup was completed at 21:48:55, in this example.

      If we search for INDEX-FIXER, we'll be able to see the progress and spot a long time on either Comments or Worklogs:

      grep "INDEX-FIXER" application-logs/atlassian-jira.log 
      
      2023-08-25 20:52:28,001 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] We will restore index snapshot with 6284212 issues
      2023-08-25 20:52:28,014 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Restoring search indexes from snapshot and catching up with changes - 1% complete... Starting
      2023-08-25 20:52:28,014 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Starting the index recovery process by replacing existing indexes. For the whole time of recovering the index (restoring and catch up) we will be holding the index write lock
      2023-08-25 20:52:28,152 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Done replacing existing indexes. We will continue with the index catch up.
      2023-08-25 20:52:28,153 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Restoring search indexes - 30% complete... Restored index backup
      2023-08-25 20:52:28,525 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Latest index date: {2023-08-25 01:00:00}, Latest DB issue-version date: {2023-08-25 16:52:28}
      2023-08-25 20:52:28,526 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Re-indexing issues updated in the last {1 days, 19 hours, 52 minutes, and 28 seconds}. (Note: it's an intentionally wider range)
      2023-08-25 20:52:42,209 IssueIndexer:thread-41 INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Updating index with recent changes - 31% complete... Processing issue changes is 1% complete
      
      *** lines removed for readability ***
      
      2023-08-25 20:53:57,331 IssueIndexer:thread-14 INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Updating index with recent changes - 80% complete... Processing issue changes is 99% complete
      2023-08-25 21:45:04,959 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Updating index with recent changes - 85% complete... Reindex of comments completed.
      2023-08-25 21:47:25,051 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Updating index with recent changes - 90% complete... Reindex of worklogs completed.
      2023-08-25 21:47:25,052 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Catch up - 95% complete... Done catching up. Potentially outdated issues were updated in the index. Any deleted issues (plus comments & worklogs) in this range were removed from the index. 143614 issues, 358054 comments and 13267 worklogs scanned. 81757 issues, 195206 comments and 8059 worklogs reindexed. 4 issues, 475 comments and 10 worklogs deindexed. It took 4.343 s to read from DB, 54.87 min to reindex, 55.85 ms to deindex, 54.94 min in total
      2023-08-25 21:47:25,053 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering - 99% complete... Done catching up
      2023-08-25 21:47:32,597 main INFO      [c.a.j.index.ha.DefaultIndexRecoveryManager] [INDEX-FIXER] Recovering - 100% complete... Recovered all indexes
      

      In this example:

      • The total number of Issues in the instance is 6284212
      • Issue Index catch-up was done in ~2 minutes (from 20:52:42 to 20:53:57)
      • Comments catch up was done in ~ 52 minutes (from 20:53:57 to 21:45:04)
      • Worklog catch-up was done in ~2 minutes (from 21:45:04 to 21:47:25)
      • 143614 issues, 358054 comments and 13267 worklogs scanned
      • 81757 issues, 195206 comments and 8059 worklogs reindexed

      Workaround

      1. Having a very recent Index Snapshot

      Having an Index Snapshot taken as close as possible to the startup would mitigate the startup delay, as Jira would have less changes to catch-up to.

      Since 9.7 Jira has the One-Click Index Snapshot feature (JRASERVER-66510) that allows an Admin to create an Index Snapshot at any moment — without the need to update the scheduler and wait for the next interval.
      As for the Long-Term-Support (LTS) releases, it'll be available on 9.12.

      2. Optimizing DB speed

      Making sure the DB tables for Comments and Worklogs are optimized is also a mitigation for this:

      jiraaction
      comment_version
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rmartinez3@atlassian.com Rodrigo Martinez
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: