Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-40977

Ship Tomcat's long running (stuck) thread logging by default

    • We collect Confluence feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.

      Adding this parameter to the server.xml will log stuck threads in the catalina.out:

      <Valve className="org.apache.catalina.valves.StuckThreadDetectionValve" threshold="60" />
      

      This will log any request thread (eg http thread) that has been processing for more than 60 seconds. This will not log background threads, eg scheduled jobs or any long running operation that processes in the background, like import/export, reindex etc.
      If you are expecting to see something in these logs that you do not, that does not mean it is not running for more than 60s. It likely just means it is not running in a request thread. Thread dumps over time are still the best way to diagnose long running threads.

            [CONFSERVER-40977] Ship Tomcat's long running (stuck) thread logging by default

            Rilwan_Ahmed_NC added a comment - - edited

            @Heshan Manamperi Increase the threshold to 180.
            This is actually not an issue. Once the threads are free they will get processed.

            In confluence 5.10.4, I see this struck thread when we are running the load test. Eventually it gets complete when the thread processor bocomes free
            Example:
            06-Jun-2017 12:44:40.689 WARNING [ContainerBackgroundProcessor[StandardEngine[Standalone]]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread "http-apr-8443-exec-1233" (id=2713) has been active for 65,998 milliseconds (since 6/6/17 12:43 PM) to serve the same request for https://**************** and may be stuck (configured threshold for this StuckThreadDetectionValve is 60 seconds). There is/are 147 thread(s) in total that are monitored by this Valve and may be stuck.

            Once thread is completed, you can see below message in the Catalina.out

            06-Jun-2017 12:45:46.090 WARNING [ContainerBackgroundProcessor[StandardEngine[Standalone]]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadCompleted Thread "http-apr-8443-exec-1233" (id=2713) was previously reported to be stuck but has completed. It was active for approximately 114,298 milliseconds. There is/are still 186 thread(s) that are monitored by this Valve and may be stuck.

            Rilwan_Ahmed_NC added a comment - - edited @Heshan Manamperi Increase the threshold to 180. This is actually not an issue. Once the threads are free they will get processed. In confluence 5.10.4, I see this struck thread when we are running the load test. Eventually it gets complete when the thread processor bocomes free Example: 06-Jun-2017 12:44:40.689 WARNING [ContainerBackgroundProcessor[StandardEngine [Standalone] ]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread " http-apr-8443-exec-1233" (id=2713) has been active for 65,998 milliseconds (since 6/6/17 12:43 PM) to serve the same request for https://**************** and may be stuck (configured threshold for this StuckThreadDetectionValve is 60 seconds). There is/are 147 thread(s) in total that are monitored by this Valve and may be stuck. Once thread is completed, you can see below message in the Catalina.out 06-Jun-2017 12:45:46.090 WARNING [ContainerBackgroundProcessor[StandardEngine [Standalone] ]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadCompleted Thread " http-apr-8443-exec-1233" (id=2713 ) was previously reported to be stuck but has completed . It was active for approximately 114,298 milliseconds. There is/are still 186 thread(s) that are monitored by this Valve and may be stuck.

            Good feature! But How can I fix Tomcat's long running (stuck) threads?

            Heshan Manamperi added a comment - Good feature! But How can I fix Tomcat's long running (stuck) threads?

            QR complete

            jonah (Inactive) added a comment - QR complete

            From the docs we can also add an interruptThreadThreshold:

            Minimum duration in seconds after which a stuck thread should be interrupted to attempt to "free" it.
            Note that there's no guarantee that the thread will get unstuck. This usually works well for threads stuck on I/O or locks, but is probably useless in case of infinite loops.

            This is disabled by default. We have decided against enabling the interrupt threshold because of the below concerns:

            1. Copy Space plugin can sometimes run for hours, but completes in the end. This is done on a request thread.
            2. Space exports use long running tasks so wouldn't be affected
            3. Anything that gets off the request thread with an executorService wouldn't be effected by a valve. This would be request threads only.
            4. We run the risk of data corruption if we interrupt threads while they are actually processing
            5. Our applications are not designed to expect interruptions so this could have unintended consequences.

            If customers find this setting useful, they can apply it themselves on a case by case basis. We would recommend keeping an eye on it however to ensure it's not having any unintended consequences.

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - - edited From the docs we can also add an interruptThreadThreshold: Minimum duration in seconds after which a stuck thread should be interrupted to attempt to "free" it. Note that there's no guarantee that the thread will get unstuck. This usually works well for threads stuck on I/O or locks, but is probably useless in case of infinite loops. This is disabled by default. We have decided against enabling the interrupt threshold because of the below concerns: Copy Space plugin can sometimes run for hours, but completes in the end. This is done on a request thread. Space exports use long running tasks so wouldn't be affected Anything that gets off the request thread with an executorService wouldn't be effected by a valve. This would be request threads only. We run the risk of data corruption if we interrupt threads while they are actually processing Our applications are not designed to expect interruptions so this could have unintended consequences. If customers find this setting useful, they can apply it themselves on a case by case basis. We would recommend keeping an eye on it however to ensure it's not having any unintended consequences.

              dunterwurzacher Denise Unterwurzacher [Atlassian] (Inactive)
              dunterwurzacher Denise Unterwurzacher [Atlassian] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: