Description
Problem
When performing a paged search from n to p using CrowdService.search(Query), Crowd internally:
- fetches all the results from 0 to p,
- reorders the results in memory,
- and skips the first n results to return a page seemingly from n to p.
The issue is that, when the API is used to fetch successive pages (ex: from 0 to p, then from p+1 to 2p and so on), the in-memory sort is performed on a larger set on the successive pages, from 0 to the actual limit (ex: 2p for the second paged search).
As a result, the same result(s) can have different position(s) in each successive page request. And, since Crowd silently discards the sorted results up to the start index of the requested page, those results can be silently dropped depending on their position.
Example
On a customer dataset, Stash performed a group membership query using paged searches, where it asked for all the users in a group groupA (direct and nested membership). Because the groupA has several nested groups with large numbers of direct users, the user foo would not be found within the first request, from 0 to 100. In the subsequent search (from 100 to 200), the user will be found in the position 115 (before the in-memory sort) and 81 (after the sort). As a result, when Crowd discarded the first 100 results, that user (and other ones) would be silently discarded. Hence, in the aggregated result, the user would not be present and the list of members of groupA returned to the caller would not include the user foo. For the same reason, other users would be duplicated in the aggregated result, which caused STASH-3843.
Attachments
Issue Links
- causes
-
BSERV-3947 Searching nested group memberships fails for users with a large number of groups
- Closed
-
BSERV-3843 LazyReference$InitializationException: java.lang.IllegalArgumentException: duplicate key: {username}
- Closed
- relates to
-
CWD-2807 ApplicationServiceGeneric search methods may return too few results when paging
- Closed
- mentioned in
-
Page Loading...