Issue Summary
Slowly typing query into JSD Project Workload search causes service unavailability
Steps to Reproduce
- Visit /jira/projects/project-key/reports/workload
- Begin typing your search slowly (waiting more than 0.3 seconds between key presses)
- e.g.: a..b..c..d..e..f..g..h..
- Observe the client initiating additional requests to JSD for every key press to this URL:
- /jira/rest/servicedesk/1/pages/people/agents/project-key/search?query=abcdefg
- Observe slow response times to client:
- Initial search (one character) returned in 10 seconds.
- Subsequent searches returned results to client in 20 to 100 seconds.
- If the simultaneous requests to workload search continue, observe:
-
- Increased response times.
- High JVM memory usage
- Continuous stop the world garbage collection
- CPU usage on the server climbing to 100%
- Service unavailability:
- 500 errors from Tomcat
- Timeouts from load balancer
Expected Results
JSD Project Workload search should be faster and the service should remain available.
Actual Results
Stuck thread exceptions in server logs:
WARNING [ContainerBackgroundProcessor[StandardEngine[Catalina]]] WARNING [ContainerBackgroundProcessor[StandardEngine[Catalina]]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [http-nio-8080-exec-52] (id=[15055]) has been active for [122,276] milliseconds to serve the same request for [https://jira-hostname/jira/rest/servicedesk/1/pages/people/agents/project-key/search?query=abcdefghijklmnopqrstuvwxyz&_=1576189764646] and may be stuck (configured threshold for this StuckThreadDetectionValve is [120] seconds). There is/are [29] thread(s) in total that are monitored by this Valve and may be stuck. java.lang.Throwable at com.atlassian.crowd.manager.application.AggregatorImpl.constrainResults(ResultsAggregator.java:153) at com.atlassian.crowd.manager.application.AggregatorImpl.constrainResults(ResultsAggregator.java:141) at com.atlassian.crowd.manager.application.InMemoryNonAggregatingSearchStrategy.searchNestedGroupRelationships(InMemoryNonAggregatingSearchStrategy.java:164) at com.atlassian.crowd.manager.application.ApplicationServiceGeneric.searchNestedGroupRelationships(ApplicationServiceGeneric.java:1751) at com.atlassian.crowd.embedded.core.CrowdServiceImpl.searchNestedGroupRelationships(CrowdServiceImpl.java:216) at com.atlassian.crowd.embedded.core.CrowdServiceImpl.search(CrowdServiceImpl.java:157) at com.atlassian.jira.security.groups.DefaultGroupManager.getGroupsForUser(DefaultGroupManager.java:393) at com.atlassian.jira.security.groups.RequestCachingGroupManager.lambda$new$0(RequestCachingGroupManager.java:43) at com.atlassian.jira.security.groups.RequestCachingGroupManager$$Lambda$174/2024690402.load(Unknown Source) at com.atlassian.jira.cache.request.RequestCacheImpl.get(RequestCacheImpl.java:42) at com.atlassian.jira.security.groups.RequestCachingGroupManager.lambda$new$1(RequestCachingGroupManager.java:46) at com.atlassian.jira.security.groups.RequestCachingGroupManager$$Lambda$175/791274807.load(Unknown Source) at com.atlassian.jira.cache.request.RequestCacheImpl.get(RequestCacheImpl.java:42) at com.atlassian.jira.security.groups.RequestCachingGroupManager.getGroupNamesForUser(RequestCachingGroupManager.java:188) at com.atlassian.jira.security.groups.RequestCachingGroupManager.getGroupNamesForUser(RequestCachingGroupManager.java:193) at com.atlassian.jira.security.DefaultGlobalPermissionManager.loadPermissions(DefaultGlobalPermissionManager.java:332) at com.atlassian.jira.security.DefaultGlobalPermissionManager.hasPermissionIgnoreRecovery(DefaultGlobalPermissionManager.java:347) at com.atlassian.jira.security.DefaultGlobalPermissionManager.hasPermission(DefaultGlobalPermissionManager.java:289)
Multiple threads simultaneously searching:
1 [http-nio-8080-exec-49] ab 2 [http-nio-8080-exec-20] abcdef 3 [http-nio-8080-exec-30] abc 4 [http-nio-8080-exec-32] abcdefghijk 5 [http-nio-8080-exec-6] a 6 [http-nio-8080-exec-73] abcdefghi [...]
Thread dumps indicated the common activity was related to getting permissions from Crowd:
--- com.atlassian.jira.security.DefaultGlobalPermissionManager.hasPermission
--- com.atlassian.jira.security.DefaultGlobalPermissionManager.loadPermissions
--- com.atlassian.jira.security.groups.RequestCachingGroupManager.getGroupNamesForUser(RequestCachingGroupManager.java:193)
--- com.atlassian.jira.cache.request.RequestCacheImpl.get(RequestCacheImpl.java:42)
--- com.atlassian.jira.security.groups.RequestCachingGroupManager$$Lambda$175/791274807.load(Unknown Source)
--- com.atlassian.jira.cache.request.RequestCacheImpl.get(RequestCacheImpl.java:42)
--- com.atlassian.jira.security.groups.RequestCachingGroupManager.lambda$new$0(RequestCachingGroupManager.java:43)
--- com.atlassian.jira.security.groups.DefaultGroupManager.getGroupsForUser(DefaultGroupManager.java:393)
--- com.atlassian.crowd.embedded.core.CrowdServiceImpl.search(CrowdServiceImpl.java:157)
--- com.atlassian.crowd.embedded.core.CrowdServiceImpl.searchNestedGroupRelationships(CrowdServiceImpl.java:216)
--- com.atlassian.crowd.manager.application.ApplicationServiceGeneric.searchNestedGroupRelationships(ApplicationServiceGeneric.java:1751)
Workaround
- Type faster.
- Block workload search API endpoint.
- Decrease search complexity by reducing number of users, SLAs, nested crowd groups, etc.
- Upgrade from JSD 3.x to JSD ER 4.5+ for approximately 25% better workload search performance.
- Use another JSD instance to handle only workload search requests.
- is related to
-
JSDSERVER-6047 Workload report loads very slowly
- Closed
- relates to
-
JSDSERVER-6047 Workload report loads very slowly
- Closed
-
JSWSERVER-20336 Searching and Mentioning users may cause performance issues and high CPU load
- Closed
-
JRASERVER-70934 Requests to `/jira/rest/internal/2/user/mention/search` where the parameters do NOT include a query are very slow
- Closed
-
JSDSERVER-6889 Typing query into JSD "Alert user" automation "THEN" action causes constant GC and service instability
- Gathering Impact
- causes
-
PS-50534 Loading...