[CONFSERVER-69488] OutOfMemory when upgrading or running reindex

Type: Bug
Resolution: Fixed
Priority: Highest
Fix Version/s: 7.13.9, 7.19.0
Affects Version/s: 7.9.0, 7.13.0, 7.13.2, 7.13.5
Component/s: Search - Indexing
Labels:
- fireball

Support reference count:
89
Symptom Severity:
Severity 2 - Major
UIS:
1,555
Bug Fix Policy:
View Atlassian Server bug fix policy

The fix for this bug has been released to our Long Term Support release.

The fix for this bug is now available in the latest release of Confluence 7.13 and 7.19

Problem

Triggering a reindex on the UI after upgrading Confluence results in an OutOfMemory and application unresponsiveness on large instances. This can also happen during upgrade to Confluence 7.9 or later.

Steps to Reproduce (reindex)

Install a Confluence instance below 7.9 (tested with 7.4.9)
Populate it with a large data set (lots of deleted content)
Upgrade to 7.9+ (tested with 7.13)
Trigger an index rebuild from the UI

Steps to Reproduce (upgrade)

Install a Confluence instance below 7.9 (tested with 7.4.9)
Populate it with a large data set (lots of deleted content)
Upgrade to 7.9+ (tested with 7.13)

Expected Results

Reindexing/upgrade finishes without errors.

Actual Results

Node crashes with an out of memory error:

atlassian-confluence.log

2021-11-26 23:33:55,980 ERROR [Catalina-utility-1] [atlassian.confluence.plugin.PluginFrameworkContextListener] launchUpgrades Upgrade failed, application will not start: Upgrade task com.atlassian.confluence.upgrade.upgradetask.SplitIndexUpgradeTask@22b566d7 failed during the UPGRADE phase due to: Java heap space
com.atlassian.confluence.upgrade.UpgradeException: Upgrade task com.atlassian.confluence.upgrade.upgradetask.SplitIndexUpgradeTask@22b566d7 failed during the UPGRADE phase due to: Java heap space

Caused by: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot complete forceMergeDeletes
Exception in thread "Lucene Merge Thread #24" org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space Caused by: java.lang.OutOfMemoryError: Java heap space
   java.lang.OutOfMemoryError: Java heap space
   java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
   Caused by: java.lang.OutOfMemoryError: Java heap space

You may also see errors like this in the atlassian-confluence-index.log:

atlassian-confluence-index.log

NFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$5 full reindex group 17/17 completed for CONTENT_ONLY, 26% complete
INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$5 full reindex completed for CONTENT_ONLY, 26% complete, start cleaning up files
ERROR [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] cleanUpIndex unable to force writer to clean-up
 -- referer: http://1.1.1.1:8090/plugins/servlet/rebuildindex | url: /rest/prototype/latest/index/reindex | traceId: 2784176c19e14e59 | userName: admin
java.io.IOException: background merge hit exception: _nuh(4.4):C3668664/9192 into _p44
    at org.apache.lucene.index.IndexWriter.forceMergeDeletes(IndexWriter.java:1817)
    at com.atlassian.bonnie.InstrumentedIndexWriter.forceMergeDeletes(InstrumentedIndexWriter.java:99)
    at com.atlassian.confluence.internal.index.lucene.LuceneReIndexer.cleanUpIndex(LuceneReIndexer.java:256)
...
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:212)
    at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:174)
    at org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
    at org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:253)
    at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:215)
    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
    at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
    at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
    at com.atlassian.bonnie.InstrumentedIndexWriter.merge(InstrumentedIndexWriter.java:113)
    at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
    at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

Workaround

Rebuilding the index from scratch manually does not trigger this problem:

Subsequent reindexes from the UI should complete as expected after the rebuild from scratch.

Notes

Inspecting the heap dump, there are thousands of small org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$3 objects. Digging into those, all of them seem to reference .nvd files from the change index folder. Checking the size of those files reveals they are huge, reaching GBs of size depending on the data set.

Looks like there is an issue with the logic of forceMergeDeletes which leads to those big files being created and loaded in memory after the split index upgrade task. You can also see that the overall index folder size increases significantly after the upgrade and even more after the failed reindex from the ui. Rebuilding from scratch fixes that and index files return to the expected size.

Since a rebuild from scratch is required to avoid this issue, it is a good approach to remove the index files before upgrading so they are rebuilt with the new format.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

Screen Shot 2022-01-14 at 1.56.11 pm.png
805 kB
14/Jan/2022 2:56 AM

blocks

CONFSERVER-55267 Links to internal pages and attachments are changed to a self-referential link after changing the context path or protocol (http/https) in the base URL

Closed

mentioned in: Page Failed to load; Page Failed to load; Page Failed to load; Page Failed to load; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(36 mentioned in)

Rilwan_Ahmed_NC added a comment - 30/Jan/2023 2:50 PM

Hi Martin,
Try to increase heap memory and re do indexing by scratch.

The fix for this bug is now available in the latest release of Confluence 7.13 and 7.19

Rilwan_Ahmed_NC added a comment - 30/Jan/2023 2:50 PM Hi Martin, Try to increase heap memory and re do indexing by scratch. The fix for this bug is now available in the latest release of Confluence 7.13 and 7.19

Martin Adámek added a comment - 30/Jan/2023 2:32 PM

We have same problem, but even rebuilding index from scratch do not avoid this issue for us.. We are running confluence 7.14.9. We are currently using 7days old indexes, not able to use reindexing without full heap space error..

Martin Adámek added a comment - 30/Jan/2023 2:32 PM We have same problem, but even rebuilding index from scratch do not avoid this issue for us.. We are running confluence 7.14.9. We are currently using 7days old indexes, not able to use reindexing without full heap space error..

Rilwan_Ahmed_NC added a comment - 15/Nov/2022 6:47 AM

Some more information:

In confluence 7.13.7 after 28% of indexing, system fails. No heap memory error is displayed in atlassian-confluence-index.log or atlassian-confluence.log. But when you check any graphs/dashboards, it will show the heap memory is consumed full.

atlassian-confluence-index.log will have only logs till 28%. You can see last line as

INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$5 full reindex completed for CONTENT_ONLY, 28% complete, start cleaning up files
WARN [Caesium-1-1] [index.status.schedule.ReIndexHouseKeepingJobRunner] repairRebuildingJobIfNeeded There was no updates for current re-index job for a while. Last update received at XXXXXXXX. Resetting it

Rilwan_Ahmed_NC added a comment - 15/Nov/2022 6:47 AM Some more information: In confluence 7.13.7 after 28% of indexing, system fails. No heap memory error is displayed in atlassian-confluence-index.log or atlassian-confluence.log. But when you check any graphs/dashboards, it will show the heap memory is consumed full. atlassian-confluence-index.log will have only logs till 28%. You can see last line as INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$5 full reindex completed for CONTENT_ONLY, 28% complete, start cleaning up files WARN [Caesium-1-1] [index.status.schedule.ReIndexHouseKeepingJobRunner] repairRebuildingJobIfNeeded There was no updates for current re-index job for a while. Last update received at XXXXXXXX. Resetting it

Saran Babu Pannuru (Inactive) added a comment - 30/Aug/2022 10:14 AM

A fix for this issue is available in Confluence Server and Data Center 7.13.9.
Upgrade now or check out the Release Notes to see what other issues are resolved.

Saran Babu Pannuru (Inactive) added a comment - 30/Aug/2022 10:14 AM A fix for this issue is available in Confluence Server and Data Center 7.13.9. Upgrade now or check out the Release Notes to see what other issues are resolved.

James Whitehead added a comment - 28/Jul/2022 12:02 PM

A fix for this issue is available in Confluence Server and Data Center 7.19.0. Upgrade now or check out the Release Notes to see what other issues are resolved.

James Whitehead added a comment - 28/Jul/2022 12:02 PM A fix for this issue is available in Confluence Server and Data Center 7.19.0. Upgrade now or check out the Release Notes to see what other issues are resolved.

Assignee:: Saquia Naz

Reporter:: Bernardo Andreeti

Affected customers:: 23 This affects my team

Watchers:: 36 Start watching this issue

Created:: 17/Sep/2021 10:30 PM

Updated:: 08/Oct/2024 7:04 AM

Resolved:: 28/Jul/2022 10:46 AM

Details

Description

Problem

Steps to Reproduce (reindex)

Steps to Reproduce (upgrade)

Expected Results

Actual Results

Workaround

Notes

Attachments

Attachments

Issue Links

Forms

Activity

Collapse comment: Rilwan_Ahmed_NC added a comment - 30/Jan/2023 2:50 PM

Expand comment: Rilwan_Ahmed_NC added a comment - 30/Jan/2023 2:50 PM

Collapse comment: Martin Adámek added a comment - 30/Jan/2023 2:32 PM

Expand comment: Martin Adámek added a comment - 30/Jan/2023 2:32 PM

Collapse comment: Rilwan_Ahmed_NC added a comment - 15/Nov/2022 6:47 AM

Some more information:

Expand comment: Rilwan_Ahmed_NC added a comment - 15/Nov/2022 6:47 AM

Collapse comment: Saran Babu Pannuru (Inactive) added a comment - 30/Aug/2022 10:14 AM

Expand comment: Saran Babu Pannuru (Inactive) added a comment - 30/Aug/2022 10:14 AM

Collapse comment: James Whitehead added a comment - 28/Jul/2022 12:02 PM

Expand comment: James Whitehead added a comment - 28/Jul/2022 12:02 PM

People

Dates