Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-67807

Increased CPU usage due to contention in docValuesCache

XMLWordPrintable

      Summary

      As part of JRASERVER-65913 new docValuesCache was added to optimize memory usage.

      It appeared that the change introduced series of bugs which rendered the cache unusable:

      • use of custom finalizer causes unnecessary cache clearing,
      • incorrect registering of reader finished listener also adds up to the extensive cache clearing,
      • having a composite key in the cache requires iterating over it's elements in order to clear proper entries

      Under high load cleaning up that cache concurrently from multiple threads might lead to increased CPU usage due to contention.

      Environment

      • Jira with large number issues + CustomFields
      • High number of actions which creates JiraDocValues:
        • report generations
        • issue create/edit

      Expected Results

      Moderate CPU usage

      Actual Results

      • Higher CPU CPU.
      • Thread dump show contentions around this part, those threads are all CPU active:
         98.61%   [71/72] X at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
         29.17%   [21/72] \ at com.atlassian.jira.web.filters.steps.requestcleanup.RequestCleanupStep.finallyAfterDoFilter(RequestCleanupStep.java:45)
         29.17%   [21/72]  \ at com.atlassian.jira.util.searchers.ThreadLocalSearcherCache.resetSearchers(ThreadLocalSearcherCache.java:31)
         29.17%   [21/72]   \ at com.atlassian.jira.issue.index.DefaultIndexManager.flushThreadLocalSearchers(DefaultIndexManager.java:1020)
         29.17%   [21/72]    \ at com.atlassian.jira.issue.index.SearcherCache.closeSearchers(SearcherCache.java:60)
         29.17%   [21/72]     \ at com.atlassian.jira.issue.index.SearcherCache.closeSearcher(SearcherCache.java:79)
         29.17%   [21/72]      \ at com.atlassian.jira.index.DelayCloseSearcher.close(DelayCloseSearcher.java:57)
         29.17%   [21/72]       \ at com.atlassian.jira.index.DelayCloseable$Helper.close(DelayCloseable.java:76)
         29.17%   [21/72]        \ at com.atlassian.jira.index.DelayCloseable$Helper$UsageTracker.decrement(DelayCloseable.java:116)
         29.17%   [21/72]         \ at com.atlassian.jira.index.DelayCloseable$Helper$UsageTracker.executeUnderLockClosingWhenDone(DelayCloseable.java:143)
         29.17%   [21/72]          \ at com.atlassian.jira.index.DelayCloseSearcher$SearcherCloser.close(DelayCloseSearcher.java:74)
         29.17%   [21/72]           X at com.atlassian.jira.index.DelayCloseSearcher$SearcherCloser.decReaderRef(DelayCloseSearcher.java:88)
        
      • Example of the thread stack trace:
        "http-nio2-8080-exec-715" #335632 daemon prio=5 os_prio=0 tid=0x00007fd2dc731000 nid=0x1457a runnable [0x00007fd1abd13000]
           java.lang.Thread.State: RUNNABLE
                at com.google.common.cache.LocalCache$HashIterator.advance(LocalCache.java:4274)
                at com.google.common.cache.LocalCache$HashIterator.nextEntry(LocalCache.java:4349)
                at com.google.common.cache.LocalCache$KeyIterator.next(LocalCache.java:4365)
                at java.util.Iterator.forEachRemaining(Iterator.java:116)
                at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
                at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
                at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
                at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
                at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
                at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
                at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
                at com.atlassian.jira.issue.search.DefaultReaderCache.invalidateCacheForCoreCacheKey(DefaultReaderCache.java:60)
                at com.atlassian.jira.issue.search.DefaultReaderCache.access$200(DefaultReaderCache.java:16)
                at com.atlassian.jira.issue.search.DefaultReaderCache$InvalidateReaderFinishedListener.finished(DefaultReaderCache.java:151)
                at org.apache.lucene.index.IndexReader.notifyReaderFinishedListeners(IndexReader.java:127)
                at org.apache.lucene.index.IndexReader.readerFinished(IndexReader.java:133)
                at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:250)
                at com.atlassian.jira.index.DelayCloseSearcher$SearcherCloser.decReaderRef(DelayCloseSearcher.java:88)
                at com.atlassian.jira.index.DelayCloseSearcher$SearcherCloser.close(DelayCloseSearcher.java:74)
                at com.atlassian.jira.index.DelayCloseable$Helper$UsageTracker.executeUnderLockClosingWhenDone(DelayCloseable.java:143)
                at com.atlassian.jira.index.DelayCloseable$Helper$UsageTracker.decrement(DelayCloseable.java:116)
                at com.atlassian.jira.index.DelayCloseable$Helper.close(DelayCloseable.java:76)
                at com.atlassian.jira.index.DelayCloseSearcher.close(DelayCloseSearcher.java:57)
                at com.atlassian.jira.issue.index.SearcherCache.closeSearcher(SearcherCache.java:79)
                at com.atlassian.jira.issue.index.SearcherCache.closeSearchers(SearcherCache.java:60)
                at com.atlassian.jira.issue.index.DefaultIndexManager.flushThreadLocalSearchers(DefaultIndexManager.java:1020)
                at com.atlassian.jira.util.searchers.ThreadLocalSearcherCache.resetSearchers(ThreadLocalSearcherCache.java:31)
                at com.atlassian.jira.web.filters.steps.requestcleanup.RequestCleanupStep.finallyAfterDoFilter(RequestCleanupStep.java:45)
                at com.atlassian.jira.web.filters.steps.ChainedFilterStepRunner.doFilter(ChainedFilterStepRunner.java:78)
        ..
        

      Notes

      • While cleaning the cache, thread iterates over all cache entries.
      • Cache eviction policy is expire after access - that means that it needs to track every read to it's entries in order to keep records for further eviction.
      • It does that by counting a number of reads with CAS operation (thread safe, but very hot in case of contention) in readCount.incrementAndGet()

      The contention is caused by two factors

      • Iterating over cache entries way to often from multiple threads,
      • CAS overhead during Guave cache maintenance (counting number of reads with AtomicInteger)

      Workaround

      None

              Unassigned Unassigned
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Votes:
              11 Vote for this issue
              Watchers:
              25 Start watching this issue

                Created:
                Updated:
                Resolved: