Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-63315

Inneficient caching for OneDimensionalObjectHitCollector may cause out of memory errors

XMLWordPrintable

      Summary

      Some gadgets and reports may cause a heavy increase in JIRA's memory footprint due to inneficient OneDimensionalObjectHitCollector caching.
      This may cause the application to crash after entering into a Full GC loop or to throw OutOfMemoryErrors.

      Afftected gadgets / reports:

      Created vs Resolved:

      The Created vs Resolved gadget is loading all created and resolved fields into an array.
      For big instance with a lot of issues this require creating arrays that can hold almost all the issues in the system.
      Those arrays are treated by G1GC as humongous objects and leads to, inefficiency in GC, memory fragmentation and makes this gadget very slow.
      Memory footprint is not dependent on the query nor on the reporting period, it only affected by size of biggest segment in Lucene index.

      Steps to reproduce

      1. Generate instance with 130k issues and memory of 1.5GB.
      2. Configure G1GC to use 1MB segments with the following JVM flags: -XX:UseG1GC -XX:G1HeapRegionSize=1m. This allows to trigger humongous allocation with a limited data set. Also set the following flags to allow GC logging: -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy
      3. Perform a lock & re-index operation.
      4. Create a dashboard with a Resolved vs Closed gadget on it.
      5. Capture the URL that was used for generating content of the gadget (something containing createdVsResolved), copied this as curl from chrome dev tools
      6. Create loop that hits this URL constantly.
      7. Observe in GC log:
        016-11-24T12:57:31.463+0000: 963.481: [GC pause (G1 Humongous Allocation) (young) (initial-mark) 963.481: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 653, predicted base time: 21.10 ms, remaining time: 178.90 ms, target pause time: 200.00 ms]

      Single Level Group By Report

      Steps to reproduce

      1. Generate an instance with 100k issues and memory of 1GB.
      2. When running this report, select a filter containing all issues.
      3. Inspecting the Garbage Collection logs, several Full GCs should be visible, even when using G1GC.
      4. Taking a thread dump during this period, while the report is loading, should reveal the stuck thread below:
        "http-bio-8349-exec-24" #569 daemon prio=5 os_prio=0 tid=0x00007f2fb0002000 nid=0x38f6 runnable [0x00007f2f473f9000]
           java.lang.Thread.State: RUNNABLE
                at org.apache.lucene.store.DataInput.readString(DataInput.java:182)
                at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:431)
                at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:261)
                at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:471)
                at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:564)
                at org.apache.lucene.index.IndexReader.document(IndexReader.java:844)
                at com.atlassian.jira.issue.statistics.util.OneDimensionalDocIssueHitCollector.getDocument(OneDimensionalDocIssueHitCollector.java:61)
                at com.atlassian.jira.issue.statistics.util.OneDimensionalDocIssueHitCollector.collectWithTerms(OneDimensionalDocIssueHitCollector.java:50)
                at com.atlassian.jira.issue.statistics.util.AbstractOneDimensionalHitCollector.collect(AbstractOneDimensionalHitCollector.java:190)
                at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.searchAndSort(LuceneSearchProvider.java:482)
                at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.searchAndSort(LuceneSearchProvider.java:184)
                at com.atlassian.jira.plugin.report.impl.SingleLevelGroupByReport.searchMapIssueKeys(SingleLevelGroupByReport.java:96)
                at com.atlassian.jira.plugin.report.impl.SingleLevelGroupByReport.getOptions(SingleLevelGroupByReport.java:77)
                at com.atlassian.jira.plugin.report.impl.SingleLevelGroupByReport.generateReportHtml(SingleLevelGroupByReport.java:125)
                at com.atlassian.jira.web.action.browser.ConfigureReport.doExecute(ConfigureReport.java:156)
                at webwork.action.ActionSupport.execute(ActionSupport.java:165)
                at com.atlassian.jira.action.JiraActionSupport.execute(JiraActionSupport.java:88)        
                (...)
        
      5. Capturing a heap dump should reveal a very large object referenced by such thread.

      Expected results

      No excessive amount of arrays containing all issues on the instance is created. GC operations are not impacted.

      Actual results

      JIRA allocates several arrays that might contain all issues on the instance. Increased memory pressure can trigger Full GC operations, decreasing the instance responsiveness.

      Workaround

      Single Level Group By Report: Reduce the number of issues in the filter configured in the report.

              ajakubowski Adam Jakubowski (Inactive)
              ajakubowski Adam Jakubowski (Inactive)
              Votes:
              4 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: