Uploaded image for project: 'Jira Server and Data Center'
  1. Jira Server and Data Center
  2. JRASERVER-67807

Increased CPU usage due to contention in docValuesCache

    XMLWordPrintable

    Details

      Description

      Summary

      As part of JRASERVER-65913 new docValuesCache was added to optimize memory usage.

      It appeared that the change introduced series of bugs which rendered the cache unusable:

      • use of custom finalizer causes unnecessary cache clearing,
      • incorrect registering of reader finished listener also adds up to the extensive cache clearing,
      • having a composite key in the cache requires iterating over it's elements in order to clear proper entries

      Under high load cleaning up that cache concurrently from multiple threads might lead to increased CPU usage due to contention.

      Environment

      • Jira with large number issues + CustomFields
      • High number of actions which creates JiraDocValues:
        • report generations
        • issue create/edit

      Expected Results

      Moderate CPU usage

      Actual Results

      • Higher CPU CPU.
      • Thread dump show contentions around this part, those threads are all CPU active:
         98.61%   [71/72] X at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
         29.17%   [21/72] \ at com.atlassian.jira.web.filters.steps.requestcleanup.RequestCleanupStep.finallyAfterDoFilter(RequestCleanupStep.java:45)
         29.17%   [21/72]  \ at com.atlassian.jira.util.searchers.ThreadLocalSearcherCache.resetSearchers(ThreadLocalSearcherCache.java:31)
         29.17%   [21/72]   \ at com.atlassian.jira.issue.index.DefaultIndexManager.flushThreadLocalSearchers(DefaultIndexManager.java:1020)
         29.17%   [21/72]    \ at com.atlassian.jira.issue.index.SearcherCache.closeSearchers(SearcherCache.java:60)
         29.17%   [21/72]     \ at com.atlassian.jira.issue.index.SearcherCache.closeSearcher(SearcherCache.java:79)
         29.17%   [21/72]      \ at com.atlassian.jira.index.DelayCloseSearcher.close(DelayCloseSearcher.java:57)
         29.17%   [21/72]       \ at com.atlassian.jira.index.DelayCloseable$Helper.close(DelayCloseable.java:76)
         29.17%   [21/72]        \ at com.atlassian.jira.index.DelayCloseable$Helper$UsageTracker.decrement(DelayCloseable.java:116)
         29.17%   [21/72]         \ at com.atlassian.jira.index.DelayCloseable$Helper$UsageTracker.executeUnderLockClosingWhenDone(DelayCloseable.java:143)
         29.17%   [21/72]          \ at com.atlassian.jira.index.DelayCloseSearcher$SearcherCloser.close(DelayCloseSearcher.java:74)
         29.17%   [21/72]           X at com.atlassian.jira.index.DelayCloseSearcher$SearcherCloser.decReaderRef(DelayCloseSearcher.java:88)
        
      • Example of the thread stack trace:
        "http-nio2-8080-exec-715" #335632 daemon prio=5 os_prio=0 tid=0x00007fd2dc731000 nid=0x1457a runnable [0x00007fd1abd13000]
           java.lang.Thread.State: RUNNABLE
                at com.google.common.cache.LocalCache$HashIterator.advance(LocalCache.java:4274)
                at com.google.common.cache.LocalCache$HashIterator.nextEntry(LocalCache.java:4349)
                at com.google.common.cache.LocalCache$KeyIterator.next(LocalCache.java:4365)
                at java.util.Iterator.forEachRemaining(Iterator.java:116)
                at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
                at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
                at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
                at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
                at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
                at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
                at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
                at com.atlassian.jira.issue.search.DefaultReaderCache.invalidateCacheForCoreCacheKey(DefaultReaderCache.java:60)
                at com.atlassian.jira.issue.search.DefaultReaderCache.access$200(DefaultReaderCache.java:16)
                at com.atlassian.jira.issue.search.DefaultReaderCache$InvalidateReaderFinishedListener.finished(DefaultReaderCache.java:151)
                at org.apache.lucene.index.IndexReader.notifyReaderFinishedListeners(IndexReader.java:127)
                at org.apache.lucene.index.IndexReader.readerFinished(IndexReader.java:133)
                at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:250)
                at com.atlassian.jira.index.DelayCloseSearcher$SearcherCloser.decReaderRef(DelayCloseSearcher.java:88)
                at com.atlassian.jira.index.DelayCloseSearcher$SearcherCloser.close(DelayCloseSearcher.java:74)
                at com.atlassian.jira.index.DelayCloseable$Helper$UsageTracker.executeUnderLockClosingWhenDone(DelayCloseable.java:143)
                at com.atlassian.jira.index.DelayCloseable$Helper$UsageTracker.decrement(DelayCloseable.java:116)
                at com.atlassian.jira.index.DelayCloseable$Helper.close(DelayCloseable.java:76)
                at com.atlassian.jira.index.DelayCloseSearcher.close(DelayCloseSearcher.java:57)
                at com.atlassian.jira.issue.index.SearcherCache.closeSearcher(SearcherCache.java:79)
                at com.atlassian.jira.issue.index.SearcherCache.closeSearchers(SearcherCache.java:60)
                at com.atlassian.jira.issue.index.DefaultIndexManager.flushThreadLocalSearchers(DefaultIndexManager.java:1020)
                at com.atlassian.jira.util.searchers.ThreadLocalSearcherCache.resetSearchers(ThreadLocalSearcherCache.java:31)
                at com.atlassian.jira.web.filters.steps.requestcleanup.RequestCleanupStep.finallyAfterDoFilter(RequestCleanupStep.java:45)
                at com.atlassian.jira.web.filters.steps.ChainedFilterStepRunner.doFilter(ChainedFilterStepRunner.java:78)
        ..
        

      Notes

      • While cleaning the cache, thread iterates over all cache entries.
      • Cache eviction policy is expire after access - that means that it needs to track every read to it's entries in order to keep records for further eviction.
      • It does that by counting a number of reads with CAS operation (thread safe, but very hot in case of contention) in readCount.incrementAndGet()

      The contention is caused by two factors

      • Iterating over cache entries way to often from multiple threads,
      • CAS overhead during Guave cache maintenance (counting number of reads with AtomicInteger)

      Workaround

      None

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              ayakovlev@atlassian.com Andriy Yakovlev
              Votes:
              11 Vote for this issue
              Watchers:
              26 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: