Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-69488

OutOfMemory when upgrading or running reindex

    XMLWordPrintable

Details

    Description

      The fix for this bug has been released to our Long Term Support release.

      The fix for this bug is now available in the latest release of Confluence 7.13 and 7.19

      Problem

      Triggering a reindex on the UI after upgrading Confluence results in an OutOfMemory and application unresponsiveness on large instances. This can also happen during upgrade to Confluence 7.9 or later.

      Steps to Reproduce (reindex)

      1. Install a Confluence instance below 7.9 (tested with 7.4.9)
      2. Populate it with a large data set (lots of deleted content)
      3. Upgrade to 7.9+ (tested with 7.13)
      4. Trigger an index rebuild from the UI

      Steps to Reproduce (upgrade)

      1. Install a Confluence instance below 7.9 (tested with 7.4.9)
      2. Populate it with a large data set (lots of deleted content)
      3. Upgrade to 7.9+ (tested with 7.13)

      Expected Results

      Reindexing/upgrade finishes without errors.

      Actual Results

      Node crashes with an out of memory error:

      atlassian-confluence.log
      2021-11-26 23:33:55,980 ERROR [Catalina-utility-1] [atlassian.confluence.plugin.PluginFrameworkContextListener] launchUpgrades Upgrade failed, application will not start: Upgrade task com.atlassian.confluence.upgrade.upgradetask.SplitIndexUpgradeTask@22b566d7 failed during the UPGRADE phase due to: Java heap space
      com.atlassian.confluence.upgrade.UpgradeException: Upgrade task com.atlassian.confluence.upgrade.upgradetask.SplitIndexUpgradeTask@22b566d7 failed during the UPGRADE phase due to: Java heap space
      
      Caused by: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot complete forceMergeDeletes
      Exception in thread "Lucene Merge Thread #24" org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space Caused by: java.lang.OutOfMemoryError: Java heap space
         java.lang.OutOfMemoryError: Java heap space
         java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
         Caused by: java.lang.OutOfMemoryError: Java heap space
      

      You may also see errors like this in the atlassian-confluence-index.log:

      atlassian-confluence-index.log
      NFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$5 full reindex group 17/17 completed for CONTENT_ONLY, 26% complete
      INFO [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] lambda$null$5 full reindex completed for CONTENT_ONLY, 26% complete, start cleaning up files
      ERROR [lucene-interactive-reindexing-thread] [internal.index.lucene.LuceneReIndexer] cleanUpIndex unable to force writer to clean-up
       -- referer: http://1.1.1.1:8090/plugins/servlet/rebuildindex | url: /rest/prototype/latest/index/reindex | traceId: 2784176c19e14e59 | userName: admin
      java.io.IOException: background merge hit exception: _nuh(4.4):C3668664/9192 into _p44
          at org.apache.lucene.index.IndexWriter.forceMergeDeletes(IndexWriter.java:1817)
          at com.atlassian.bonnie.InstrumentedIndexWriter.forceMergeDeletes(InstrumentedIndexWriter.java:99)
          at com.atlassian.confluence.internal.index.lucene.LuceneReIndexer.cleanUpIndex(LuceneReIndexer.java:256)
      ...
      Caused by: java.lang.OutOfMemoryError: Java heap space
          at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:212)
          at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:174)
          at org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
          at org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:253)
          at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:215)
          at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
          at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
          at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
          at com.atlassian.bonnie.InstrumentedIndexWriter.merge(InstrumentedIndexWriter.java:113)
          at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
          at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
      

      Workaround

      Rebuilding the index from scratch manually does not trigger this problem:

      Subsequent reindexes from the UI should complete as expected after the rebuild from scratch.

      Notes

      Inspecting the heap dump, there are thousands of small org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$3 objects. Digging into those, all of them seem to reference .nvd files from the change index folder. Checking the size of those files reveals they are huge, reaching GBs of size depending on the data set.

      Looks like there is an issue with the logic of forceMergeDeletes which leads to those big files being created and loaded in memory after the split index upgrade task. You can also see that the overall index folder size increases significantly after the upgrade and even more after the failed reindex from the ui. Rebuilding from scratch fixes that and index files return to the expected size.

      Since a rebuild from scratch is required to avoid this issue, it is a good approach to remove the index files before upgrading so they are rebuilt with the new format.

      Attachments

        Issue Links

          Activity

            People

              05a8667aef42 Saquia Naz
              bandreeti Bernardo Andreeti
              Votes:
              23 Vote for this issue
              Watchers:
              36 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: