Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-43493

Increase G1GC reserve percent to 20% to stop long to-space allocation failure collections

    XMLWordPrintable

Details

    • We collect Confluence feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

    Description

      NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.

      This has caused outages for our own internal instances and some customers as well. To mitigate this risk I would like to add this param to the setenv scripts for all Server and DC customers:

      -XX:G1ReservePercent=20

      To-space allocation failures are very very slow, due to some bugs in the JDK which have been fixed in Java 9. We need to avoid these collections where possible.

      To briefly explain what this param does - G1GC splits the heap into regions and allocates those regions to Eden, Survivor and Tenured. When an object needs to be promoted from Eden to Survivor (for example), there needs to be enough room in a region which has been allocated to Survivor, or the heap needs to allocate a new region. If all regions are allocated, you get these failures, and you will then see 30s-1m long pauses (or worse).

      The heap maintains a certain percentage of regions free in order to try to mitigate this scenario - 10% by default. This parameter will bump that up to 20%. On our internal instances this was enough to eliminate to-space allocation failures entirely, and stopped our outages. 

      FYI, this is the bug I raised with Oracle, which they have fixed for Java 9: https://bugs.openjdk.java.net/browse/JDK-8155256

      Diagnosis

      This parameter is helpful if you are experiencing long to-space allocation pauses in your garbage collection. To determine if this is the case:

      1. Enable GC Logging
      2. After a few days, or if you experience any outages, review the GC logs:
        • Download GC Viewer
        • Open your GC logs in GC Viewer
        • Click on the Event Details tab
        • Check the pause types you are experiencing. If you have long 'to-space exhausted' pauses, this parameter can help.

      Attachments

        1. gcviewer.png
          gcviewer.png
          66 kB
        2. image.png
          image.png
          66 kB

        Issue Links

          Activity

            People

              dunterwurzacher Denise Unterwurzacher [Atlassian] (Inactive)
              dunterwurzacher Denise Unterwurzacher [Atlassian] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: