Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-51951

Indexer does not handle attachments efficiently for a page update

    XMLWordPrintable

Details

    Description

      Summary

      There are two different symptoms for this problem, but both go back to the same root cause.

      Scenario 1

      When a page is updated and needs to be indexed, the indexer does not add the attachments, comments and permissions into the journal table. This will overload the hibernate session, where it will pick up items from the journal table. These items will introduce more work on the indexer than expected, where it will need to process all the attachments, comments and permissions of the current page. From the user perspective, it will look like the indexer stalls and slow at processing a small number of tasks
       

      Scenario 2

      When a page is updated multiple times and those changes are included in a single indexer batch, the indexer does not handle this efficiently. It will index those changes with every change.

      Steps to Reproduce

      Reproducing this behavior for Scenario 2 will require multiple quick updates to a certain page (within 5 seconds.)
      Hint: This can be done through rest api

      1. Create a new page
      2. Add multiple attachments to the page (preferably large attachments)
      3. Edit the page and save
      4. Edit the page again and save
      5. Edit the page a third time and save
      6. Enable DEBUG logging for the package
        com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager
        

      Expected Results

      The indexer will show that it is processing the page along with all the attachments, comments and permission efficiently.
      The page should be processed only once in a single batch.

      Actual Results

      The indexer will show that it is processing a single page only while it is actually processing page, attachments, comment and permissions. Also the indexer will process each change separately, redoing all the work again.
      This is very time and resource consuming and will cause the indexer to stall or lead to an outage

      Large attachments can affect the processing time as well. Below is the log snippet of the behavior for Scenario 1

      2017-03-21 11:15:03,887 DEBUG [scheduler_Worker-4] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue Flushed 291 items in 43881 milliseconds
      2017-03-21 11:14:20,006 DEBUG [scheduler_Worker-4] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue flush requested
      2017-03-21 11:14:16,240 DEBUG [scheduler_Worker-1] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue Flushed 2338 items in 276234 milliseconds
      2017-03-21 11:09:40,005 DEBUG [scheduler_Worker-1] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue flush requested
      2017-03-21 11:09:36,402 DEBUG [scheduler_Worker-2] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue Flushed 1527 items in 466396 milliseconds
      2017-03-21 11:01:50,006 DEBUG [scheduler_Worker-2] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue flush requested
      2017-03-21 11:01:47,935 DEBUG [scheduler_Worker-1] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue Flushed 1778 items in 667844 milliseconds
      2017-03-21 10:50:40,091 DEBUG [scheduler_Worker-1] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue flush requested
      2017-03-21 10:50:36,342 DEBUG [scheduler_Worker-6] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue Flushed 869 items in 26337 milliseconds
      2017-03-21 10:50:10,004 DEBUG [scheduler_Worker-6] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue flush requested
      ....
      2017-03-21 10:48:02,290 DEBUG [scheduler_Worker-4] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue Flushed 6 items in 2285 milliseconds
      2017-03-21 10:48:00,005 DEBUG [scheduler_Worker-4] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue flush requested
      ....
      2017-03-21 10:36:50,333 DEBUG [scheduler_Worker-4] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue Flushed 4 items in 10325 milliseconds
      2017-03-21 10:36:40,007 DEBUG [scheduler_Worker-4] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue flush requested
      ....
      2017-03-21 10:34:58,947 DEBUG [scheduler_Worker-9] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue Flushed 157 items in 3942 milliseconds
      2017-03-21 10:34:55,005 DEBUG [scheduler_Worker-9] [confluence.search.lucene.DefaultConfluenceIndexManager] flushQueue flush requested
      

      Workaround

      The current workaround is to reduce the index batch size in order to avoid including multiple changes in a single batch.
      Please add the below to your JVM argument:

      -Dindex.queue.batch.size=100
      

      Attachments

        Issue Links

          Activity

            People

              huyle Huy Le (Inactive)
              rslaiby Rudy Slaiby
              Votes:
              8 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: