Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-66182

Running content reindexing in Confluence causes threads for Confluence Questions to hang

    XMLWordPrintable

Details

    Description

      Issue Summary

      If a Confluence admin rebuilds the Confluence index, this can cause HTTP threads servicing requests for Confluence Questions to become stuck (waiting for a lock). Confluence can hang in environments with an exhausted HTTP thread pool where all of the following conditions are true:

      • index rebuild time is relatively high (many hours or even days)
      • the configured maximum HTTP thread pool size (maxThreads in server.xml) is small-average
      • Confluence Questions activity is reasonably moderate or busy
        The susceptibility is mostly dependent on the above factors.

      Creating questions or answers in Confluence Questions utilises endpoints such as /rest/questions/1.0/topic. Calls to this appear to contain a small re-indexing component (likely for confluence-edge), however if the Confluence admin is already running content reindexing then the thread handling /rest/questions/1.0/topic (for example) will hang whilst waiting for a lock.

      For example:

      "http-nio-26134-exec-1 url:/rest/questions/1.0/topic username:user2" #261 daemon prio=5 os_prio=31 tid=0x00007fc86b783000 nid=0x21903 waiting on condition [0x000070001b9ab000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00000006da6428b0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
      	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
      	at com.atlassian.bonnie.LoggingReentrantLock.lock(LoggingReentrantLock.java:32)
      	at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:485)
      	at sun.reflect.GeneratedMethodAccessor1271.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
               <...snip...>
      

      This is similar to CONFSERVER-46588.

      Reproduced so far with the latest version of Confluence Questions (v2.7.30) and various versions of Confluence (6.13.4, 7.4.4, 7.12.2).

      Steps to Reproduce

      1. Generate a reasonable document set (this is to ensure that the index rebuild runs for a few minutes or more during the test)
      2. Click the 'rebuild' button on the 'Search Indexes' page (/admin/search-indexes.action)
      3. Whilst the index is rebuilding, attempt to save questions or answers in Confluence Questions

      Expected Results

      Rebuilding the index should not cause Confluence Questions threads to hang.

      Actual Results

      If index rebuilding is occurring, any requests to /rest/questions/1.0/* will hang whilst waiting for an indexing lock.

      The UI will show a spinner and the 'Save' button is often greyed out until the threads become unstuck:

      The below message is printed in the catalina.out file (if server.xml contains the StuckThreadDetectionValve):

      09-Jun-2021 12:37:45.671 WARNING [ContainerBackgroundProcessor[StandardEngine[Standalone]]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [http-nio-26134-exec-1 url:/rest/questions/1.0/topic username:user2] (id=[261]) has been active for [64,804] milliseconds (since [6/9/21 12:36 PM]) to serve the same request for [http://localhost:26134/rest/questions/1.0/topic?pageSize=9] and may be stuck (configured threshold for this StuckThreadDetectionValve is [60] seconds). There is/are [1] thread(s) in total that are monitored by this Valve and may be stuck.
       java.lang.Throwable
              at sun.misc.Unsafe.park(Native Method)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
              at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
              at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
              at com.atlassian.bonnie.LoggingReentrantLock.lock(LoggingReentrantLock.java:32)
              at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:485)
              at sun.reflect.GeneratedMethodAccessor1271.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      <...snip...>
      

      Such threads will eventually complete once the main index rebuild completes. However, if there's enough activity in Confluence Questions and the index rebuild is taking a long time, the entire HTTP thread pool can be exhausted.

      Workaround

      Currently there is no known workaround for this behavior. A workaround will be added here when available.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mninnes@atlassian.com Malcolm Ninnes
              Votes:
              3 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: