Loading...

Details

Type: Bug
Resolution: Fixed
Priority: Medium
Fix Version/s: 7.19.6
Affects Version/s: 7.4.18, 7.13.12, 7.19.4
Component/s: Search - Indexing
Labels:
- fireball
- thunder-bird

Support reference count:
12
Symptom Severity:
Severity 3 - Minor
UIS:
61
Bug Fix Policy:
View Atlassian Server bug fix policy

Description

Issue Summary

When Confluence indexes are being rebuilt from scratch, if reindexing progress is being tracked in the app logs (atlassian-confluence.log or atlassian-confluence-index.log), OR by accessing <Confluence-URL>/json/reindextaskprogress.action, progress will appear to have stalled at a number less than 100%, with the indexed content count just short of the total indexable content count.

Steps to Reproduce

Manually corrupt attachments or indexable data on a Confluence instance and then rebuild index from scratch.
Content that fails reindexing will lead to measurable delta between indexed content and total content.

Expected Results

If the reindexing of specific content/attachments encounter errors, the count of objects whose indexing has completed may fall short of the count of total (indexable) objects.
The app should handle such situations, tally the count of the few objects that failed to get indexed, and accurately report the 100% completion of the reindexing attempt (if there is nothing else to reindex).
When there are no errors, and every searchable object has been indexed, the app accurately reports completion like this:

YYYY-MM-DD HH:MM:SS,mmm DEBUG [Indexer: 1] [internal.index.lucene.LuceneBatchIndexer] lambda$index$0 Re-index progress: 100% complete. 11332 items have been reindexed

Actual Results

When some objects fail to get indexed (say due to data corruption or from encountering errors/exceptions during their reindexing), in its current form, Confluence fails to process such situations while tallying the numbers and reports that less than 100% of re-indexing has completed:

Example 1 (99% complete):

YYYY-MM-DD HH:MM:SS,mmm INFO [Indexer: 3] [internal.index.lucene.LuceneBatchIndexer] logProgress Re-index progress: 6281468 of 6281768. 99% complete. Memory usage: 18 GB free, 31 GB total
...
YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [confluence.internal.index.ConcurrentBatchIndexer] index Shutting down indexing thread pool

Example 2 (81% complete):

YYYY-MM-DD HH:MM:SS,mmm INFO [Indexer: 22] [internal.index.lucene.LuceneBatchIndexer] logProgress Re-index progress: 4968955 of 6132274. 81% complete. Memory usage: 3 GB free, 12 GB total

YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [confluence.internal.index.ConcurrentBatchIndexer] index Shutting down indexing thread pool

YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$4 full reindex group 1/1 completed for USER_ONLY, 81% complete
YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$4 full reindex completed for USER_ONLY, 81% complete, start cleaning up files
YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$4 full reindex cleanup completed for USER_ONLY
YYYY-MM-DD HH:MM:SS,mmm INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] reIndex Indexing completed for stage USER_ONLY

YYYY-MM-DD HH:MM:SS,mmm INFO [Caesium-1-1] [index.lucene.snapshot.DefaultLuceneIndexSnapshotManager] create Creating a new snapshot for index main_index

The same stats are seen if <Confluence-URL>/json/reindextaskprogress.action is accessed.
This gives an inaccurate perception that reindexing is still in progress or may have stalled, and a small percentage of indexable content is still pending processing.

Workaround

To find out whether reindexing is still in progress (without taking into account whether it finished successfully or was abnormally terminated by say an out-of-memory error), check for the following evidence:

indirect evidence (optional check)
Is the outcome of regularly scheduled incremental/background indexing by one of the Caesium threads still being reported as ABORTED with the message Flushing or indexing is already in progress:
atlassian-confluence.log showing scheduled job LuceneIndexFlusher ABORTED because "Flushing or indexing is already in progress"
```
YYYY-MM-DD HH:MM:05,572 WARN [Caesium-1-2] [impl.schedule.caesium.JobRunnerWrapper] runJob Scheduled job LuceneIndexFlusher#LuceneIndexFlusher completed unsuccessfully with response JobRunnerResponse[runOutcome=ABORTED,message='Flushing or indexing is already in progress']
```
and

direct evidence (mandatory check)
Do the latest thread dumps show the presence of threads named Indexer: N:

extract thread dumps,

navigate to the directory containing the extracted thread dumps, and search for the pattern "Indexer: [one or more digits]":

linux sample

# listing the extracted thread dump files
➜ ls
thread_dump_1.txt
thread_dump_2.txt
thread_dump_3.txt
thread_dump_4.txt
thread_dump_5.txt
thread_dump_6.txt

# grepping the above thread dumps for the presence of Indexer threads
# -o only match
# -h suppress printing of filenames
# -a process binary files as text
# -i ignore case
# --include to only search files that have "thread" in the filename
➜ egrep -ohai --include="thread*" 'Indexer: [[:digit:]]{1,}' | sort -nk2 | uniq
Indexer: 1
Indexer: 2
Indexer: 3
Indexer: 4

If the data collected matches the samples listed above, then reindexing is still in progress, because Caesium based incremental/background indexing jobs will cease execution if reindexing is already in progress, and Indexer: N threads only appear during reindexing attempts.

Attachments

Issue Links

depends on

CONFSERVER-81685 Indexing : SQL Errors and lower percentage completion reported post reindex completion

Closed

mentioned in: Page Loading...; Page Loading...

Confluence reindexing progress appears to stall at less than 100% when rebuilding index from scratch