History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: JRA-13884
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Chris Mountford [Atlassian]
Votes: 1
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
JIRA

TooManyClauses or OutOfMemoryException for AJAX Issue Picker thanks to PrefixQuery

Created: 04/Nov/07 08:41 PM   Updated: 06/Nov/07 08:29 PM
Component/s: Web interface, Filtering & Indexing
Affects Version/s: 3.11
Fix Version/s: 3.13

Time Tracking:
Not Specified

Environment: verfied in 3.10.2 standalone and 3.12DEV as at Mon Nov 5 13:02:04 EST 2007

Participants: Anton Mazkovoi [Atlassian], Chris Mountford [Atlassian] and Samuel Cai
Since last comment: 36 weeks, 4 days ago
Labels: scalability


 Description  « Hide
The AJAX Issue Picker uses a PrefixQuery which, via rewrite() is expanded to become a boolean query combining all possible matches as read in via the IndexReader with a boolean OR.

This consumes linear space with respect to the size of the search index. For a case with about 94000 issues this appears to entail approx. 900Mb.

Insane? perhaps, but it seems the Lucene people think this is fine. See the discussion from the mailing list below. TooManyClauses is designed to indicate a problem like this.

Support case that brought this up: JSP-16824

The customer increased their maximum clauses and this resulted in an OutOfMemoryException.

reproduction steps

To reproduce the problem with AJAX issue picker turned ON:

  1. create a large number of issues in the same project. (100000 seems ample)
  2. perform a search for all issues (hit enter in the quicksearch) This is necessary to seed the search space for the AJAX issue picker since it uses current query plus some other stuff.
  3. go to any issue
  4. click link the issue
  5. start typing the project key. AJAX requests will be made on each keystroke which should cause the stack trace below:
2007-11-05 13:08:07,667 http-8080-Processor2 ERROR [bc.issue.search.AbstractIssuePickerSearchProvider] Error while executing search request
org.apache.lucene.search.BooleanQuery$TooManyClauses
	at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:184)
	at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:175)
	at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:52)
	at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:381)
	at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:396)
	at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:396)
	at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:396)
	at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:137)
	at org.apache.lucene.search.Query.weight(Query.java:92)
	at org.apache.lucene.search.Hits.<init>(Hits.java:41)
	at org.apache.lucene.search.Searcher.search(Searcher.java:44)
	at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.runSearch(LuceneSearchProvider.java:148)
	at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.getHits(LuceneSearchProvider.java:76)
	at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.search(LuceneSearchProvider.java:208)
	at com.atlassian.jira.bc.issue.search.AbstractIssuePickerSearchProvider.getResults(AbstractIssuePickerSearchProvider.java:81)
	at com.atlassian.jira.bc.issue.search.DefaultIssuePickerSearchService.getResults(DefaultIssuePickerSearchService.java:59)
	at com.atlassian.jira.web.dwr.AjaxIssuePicker.getIssues(AjaxIssuePicker.java:89)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at uk.ltd.getahead.dwr.impl.ExecuteQuery.execute(ExecuteQuery.java:248)
	at uk.ltd.getahead.dwr.impl.DefaultExecProcessor.handle(DefaultExecProcessor.java:48)
	at uk.ltd.getahead.dwr.impl.DefaultProcessor.handle(DefaultProcessor.java:81)
	at uk.ltd.getahead.dwr.AbstractDWRServlet.doPost(AbstractDWRServlet.java:162)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
	at com.atlassian.jira.web.filters.AccessLogFilter.doFilter(AccessLogFilter.java:73)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
	at com.atlassian.seraph.filter.SecurityFilter.doFilter(SecurityFilter.java:182)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
	at com.atlassian.seraph.filter.LoginFilter.doFilter(LoginFilter.java:181)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
	at com.atlassian.jira.web.filters.ActionCleanupDelayFilter.doFilter(ActionCleanupDelayFilter.java:43)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
	at com.atlassian.jira.web.filters.RequestCleanupFilter.doFilter(RequestCleanupFilter.java:49)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
	at com.atlassian.johnson.filters.AbstractJohnsonFilter.doFilter(AbstractJohnsonFilter.java:72)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
	at com.atlassian.jira.web.filters.gzip.GzipFilter.doFilter(GzipFilter.java:64)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
	at com.atlassian.core.filters.AbstractEncodingFilter.doFilter(AbstractEncodingFilter.java:37)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
	at com.atlassian.jira.appconsistency.db.DatabaseCompatibilityEnforcerFilter.doFilter(DatabaseCompatibilityEnforcerFilter.java:39)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
	at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
	at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
	at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
	at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
	at java.lang.Thread.run(Thread.java:613)

The solution options are alluded to in the snippet below.

from http://mail-archives.apache.org/mod_mbox/lucene-java-user/200612.mbox/%3c359a92830612270753x3d0e3511g634b8e9c145d4034@mail.gmail.com%3e

"excerpt from lucene mailing list"

Also, see the thread on this list titled "I just don't get wildcards at all"
to see an extensive discussion of this issue, as well as wildcards in
general. You might also search the archive for wildcards. The short form is
that any wildcard (including prefix queries) expands under the covers to
create a clause for each possible entry in the index for that field. For
instance, say a field had the following values:

abcd
abck
abt

Searching for ab* would expand to searching for ab, abck and abt under the
covers. When the number of possibilities gets above the default value of
1024, you see a TooManyClauses exception. Expanding the number of clauses
may fix you right up, but on any reasonably sized index, you can come up
with a query that'll exceed whatever number you set. Or you'll get to an
unacceptable performance/memory footprint. Imagine your query with things
like a*

Think seriously about how you're going to deal with this. There are several
options:
1> use filters for all your wildcard clauses and create your own
BooleanQuery. Be aware that using filters affects scoring.
2> Assume that any query that throws a TooManyClauses exception (after
you've set a suitable max as Paul suggested) is too broad to be useful and
respond to the user with some polite phrase asking them to refine the query.
3> Look over the SrndQuery classes. I don't fully understand these, but they
certainly behave much differently in this area. Note that SrndQuery limits
wildcards to having at least three non-wildcard characters.
4> Ask whether stemming is a complete or partial solution. Ditto for
Soundex. There's a good chance these won't apply, but they may.
5> <Insert the solution to your specific problem here>

This is a sticky wicket that will probably consume more time than you think
to handle. It's easy for your product manager to claim that "Of course, we
must support arbitrary wildcards", but I'd urge you to seriously ask what
value arbitrary wildcards bring to the product. When you start getting
thousands of responses to a query, is it actually valuable to return them to
the user? Or do you give her just as much value (and deliver product sooner)
by telling her up front that she's getting too many responses to be useful?
With this last strategy, you just catch the TooManyClauses exception and
respond with "refine your query".....

Best
Erick

On 12/27/06, Paul Elschot <paul.elschot@xs4all.nl> wrote:
>
> Chris,
>
> On Wednesday 27 December 2006 15:42, Chris Salem wrote:
> > Hi All,
> >
> > I'm getting a 'TooManyClauses' Exception and I'm not sure how to fix
> this.
> Here's a sample query that I'm using:
> >
> > +(+freeform_text:exhibit* +(+freeform_text:dispaly
> +freeform_text:event*)
> +(+freeform_text:sale* +freeform_text:sells +freeform_text:develop*)
> +(+freeform_text:trade +freeform_text:show +freeform_text:trade
> +freeform_text:shows)) +degree_type:5 +position_desired:ftp
> +city:washington~0.5 +state:dc +ncountry:usa +last_modified:[2005-12-26 TO
> 2006-12-26]
> >
> > Here's the exception I'm getting:
> >
> > org.apache.lucene.search.BooleanQuery$TooManyClauses
> > at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:160)
> > at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:151)
> > at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:52)
>
> One of the prefix queries is causing this, possibly event* or sale*.
> Since they seem to be specific enough, increasing the maximum number
> of boolean clauses that can be added to a boolean query appears to be
> the good way to fix this, see BooleanQuery.setMaxClauseCount().
>
> Regards,
> Paul Elschot
>
> > at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372)
> > at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372)
> > at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372)
> > at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java
> :137)
> > at org.apache.lucene.search.Query.weight(Query.java:93)
> > at org.apache.lucene.search.Hits.<init>(Hits.java:41)
> > at org.apache.lucene.search.Searcher.search(Searcher.java:44)
> > at org.apache.lucene.search.Searcher.search(Searcher.java:36)
> > at
> net.mainsequence.pcr.lucene.LuceneHandler.multiSearch(LuceneHandler.java
> :382)
> > at
> net.mainsequence.pcr.lucene.LuceneServlet.searchIndex(LuceneServlet.java
> :169)
> > at
> net.mainsequence.pcr.lucene.LuceneServlet.processRequest(
> LuceneServlet.java:83)
> > at net.mainsequence.pcr.lucene.LuceneServlet.doPost(LuceneServlet.java
> :72)
> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> > at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
> ApplicationFilterChain.java:252)
> > at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:173)
> > at
> org.apache.catalina.core.StandardWrapperValve.invoke(
> StandardWrapperValve.java:213)
> > at
> org.apache.catalina.core.StandardContextValve.invoke(
> StandardContextValve.java:178)
> > at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
> :126)
> > at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
> :105)
> > at
> org.apache.catalina.core.StandardEngineValve.invoke(
> StandardEngineValve.java:107)
> > at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :148)
> > at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
> > at
>
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
> (Http11BaseProtocol.java:664)
> > at
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
> PoolTcpEndpoint.java:527)
> > at
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
> LeaderFollowerWorkerThread.java:80)
> > at
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
> ThreadPool.java:684)
> > at java.lang.Thread.run(Unknown Source)
> >
> > Is there anyway to increase the amount of clauses lucene can take? This
> kind of large query is not uncommon so any help would be greatly
> appreciated.
> >
> >
> > Chris Salem



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Anton Mazkovoi [Atlassian] - 04/Nov/07 10:02 PM
I thik we need to catch org.apache.lucene.search.BooleanQuery$TooManyClauses and deal with it.

We might also not even bother making the issue picker search through the current search if the current search returns too many issues.


Samuel Cai - 05/Nov/07 02:03 AM
One solution is do not search until have enough information, I can't see any limitation in source codes for the input in issue picker, that means any single character will triger a Prefix search, but we know many search softwares/web sites require at least some characters for wild search to limit expanded Term Query.
Another solution is having another index for issue picker, for the PrefixQuery on issue key, the index document only needs key and summary, while current implementation loads whole issue index document.

Anton Mazkovoi [Atlassian] - 06/Nov/07 06:45 PM
Thanks for the update.

These are good ideas. As fars as I know, we do not pull in all the issues that match into memory, but only the top x issues.

I think we need to complete the search as we need to ensure the matches are sorted correctly. Completing the search should not take up too much memory.

We have not completed investigations on this, however, I think the memory is taken up by the PrefixQuery when it explodes itself into a Boolean query, which happens before the search actually executes. As far as I can tell the size of the explosion depends on the size of the index (in terms of number of issues).

If this feature is causing you a lot of pain, could you disable it in Administration -> General Configuration -> Issue Picker Auto-complete

Cheers,
Anton


Samuel Cai - 06/Nov/07 08:29 PM
Yup, we already disabled it.
I know JIRA didn't pull all, but Lucene did.