Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-81283

Confluence reindexing progress appears to stall at less than 100% when rebuilding index from scratch

    XMLWordPrintable

Details

    Description

      Issue Summary

      When Confluence indexes are being rebuilt from scratch, if reindexing progress is being tracked in the app logs (atlassian-confluence.log or atlassian-confluence-index.log), OR by accessing <Confluence-URL>/json/reindextaskprogress.action, progress will appear to have stalled at a number less than 100%, with the indexed content count just short of the total indexable content count.

      Steps to Reproduce

      Manually corrupt attachments or indexable data on a Confluence instance and then rebuild index from scratch.
      Content that fails reindexing will lead to measurable delta between indexed content and total content.

      Expected Results

      If the reindexing of specific content/attachments encounter errors, the count of objects whose indexing has completed may fall short of the count of total (indexable) objects.
      The app should handle such situations, tally the count of the few objects that failed to get indexed, and accurately report the 100% completion of the reindexing attempt (if there is nothing else to reindex).
      When there are no errors, and every searchable object has been indexed, the app accurately reports completion like this:

      YYYY-MM-DD HH:MM:SS,mmm DEBUG [Indexer: 1] [internal.index.lucene.LuceneBatchIndexer] lambda$index$0 Re-index progress: 100% complete. 11332 items have been reindexed
      

      Actual Results

      When some objects fail to get indexed (say due to data corruption or from encountering errors/exceptions during their reindexing), in its current form, Confluence fails to process such situations while tallying the numbers and reports that less than 100% of re-indexing has completed:

      • Example 1 (99% complete):
        YYYY-MM-DD HH:MM:SS,mmm INFO [Indexer: 3] [internal.index.lucene.LuceneBatchIndexer] logProgress Re-index progress: 6281468 of 6281768. 99% complete. Memory usage: 18 GB free, 31 GB total
        ...
        YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [confluence.internal.index.ConcurrentBatchIndexer] index Shutting down indexing thread pool
        
      • Example 2 (81% complete):
        YYYY-MM-DD HH:MM:SS,mmm INFO [Indexer: 22] [internal.index.lucene.LuceneBatchIndexer] logProgress Re-index progress: 4968955 of 6132274. 81% complete. Memory usage: 3 GB free, 12 GB total
        
        YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [confluence.internal.index.ConcurrentBatchIndexer] index Shutting down indexing thread pool
        
        YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$4 full reindex group 1/1 completed for USER_ONLY, 81% complete
        YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$4 full reindex completed for USER_ONLY, 81% complete, start cleaning up files
        YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$4 full reindex cleanup completed for USER_ONLY
        YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] reIndex Indexing completed for stage USER_ONLY
        
        YYYY-MM-DD HH:MM:SS,mmm INFO [Caesium-1-1] [index.lucene.snapshot.DefaultLuceneIndexSnapshotManager] create Creating a new snapshot for index main_index
        

      The same stats are seen if <Confluence-URL>/json/reindextaskprogress.action is accessed.
      This gives an inaccurate perception that reindexing is still in progress or may have stalled, and a small percentage of indexable content is still pending processing.

      Workaround

      To find out whether reindexing is still in progress (without taking into account whether it finished successfully or was abnormally terminated by say an out-of-memory error), check for the following evidence:

      • indirect evidence (optional check)
        Is the outcome of regularly scheduled incremental/background indexing by one of the Caesium threads still being reported as ABORTED with the message Flushing or indexing is already in progress:
        atlassian-confluence.log showing scheduled job LuceneIndexFlusher ABORTED because "Flushing or indexing is already in progress"
        YYYY-MM-DD HH:MM:05,572 WARN [Caesium-1-2] [impl.schedule.caesium.JobRunnerWrapper] runJob Scheduled job LuceneIndexFlusher#LuceneIndexFlusher completed unsuccessfully with response JobRunnerResponse[runOutcome=ABORTED,message='Flushing or indexing is already in progress']
        


        and

      • direct evidence (mandatory check)
        Do the latest thread dumps show the presence of threads named Indexer: N:
        1. extract thread dumps,
        2. navigate to the directory containing the extracted thread dumps, and search for the pattern "Indexer: [one or more digits]":
          linux sample
          # listing the extracted thread dump files
          ➜ ls
          thread_dump_1.txt
          thread_dump_2.txt
          thread_dump_3.txt
          thread_dump_4.txt
          thread_dump_5.txt
          thread_dump_6.txt
          
          # grepping the above thread dumps for the presence of Indexer threads
          # -o only match
          # -h suppress printing of filenames
          # -a process binary files as text
          # -i ignore case
          # --include to only search files that have "thread" in the filename
          ➜ egrep -ohai --include="thread*" 'Indexer: [[:digit:]]{1,}' | sort -nk2 | uniq
          Indexer: 1
          Indexer: 2
          Indexer: 3
          Indexer: 4
          


      If the data collected matches the samples listed above, then reindexing is still in progress, because Caesium based incremental/background indexing jobs will cease execution if reindexing is already in progress, and Indexer: N threads only appear during reindexing attempts.

      Attachments

        Issue Links

          Activity

            People

              rlau@atlassian.com Richard Lau
              5c3a8aca27ce Mohit Sharma
              Votes:
              4 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: