Details
-
Suggestion
-
Resolution: Fixed
Description
If a web request takes longer than some threshold (say 30 secs) then a Thread Sump should be taken.
The Support Use Case
Often in support, we are left wondering what is going on during "performance" problems.
We often ask the customer to generate a thread dump so we can look inside the state of the system.
However often it too late. The system may have gone back to a steady state, and the chance of a customer generated a stack trace at just the right time can be slim.
Design Considerations
The actual threshold should be configurable. Customers will want it at different levels and perhaps not at all. Perhaps via a Java system property so it can be turned on and off at runtime via code. Or a log4j switch with custom formatter.
The number of thread dumps generated should be governed within a time frame (say no more than 5 per minute).
If a system is runing slow, then chances are all requests will take longer than the threshold and hence too many thread dumps may be generated with some governing.
The thread dump should be output to the logs in such as way as it can easily be extracted into a thread dump tool.
Perhaps a specific thread dump file should be generated so each event can be more easily identified.
On a slow system where lots of thread dumps happen, the size of the logs will expand. perhaps this is unavoidable since we want the valuable support information.
ps. KUDOS goes to Brenden Bain for thinking of this idea. I reckon its corker!