New and Improved 3.13 Beta. Highlights: Shareable filters and dashboards and lots of other goodies. Any feedback can be raised as JIRA issues in the JIRA project.
Issue Details (XML | Word | Printable)

Key: JRA-3127
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Anton Mazkovoi [Atlassian]
Reporter: Tim Jones
Votes: 1
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
JIRA

failure finding issues by date criteria: org.apache.lucene.search.BooleanQuery$TooManyClauses

Created: 11/Feb/04 12:43 PM   Updated: 04/Oct/06 12:46 AM
Component/s: Filtering & Indexing
Affects Version/s: 2.5.3 Professional
Fix Version/s: 2.6.1 Pro, 2.6.1 Enterprise

Time Tracking:
Original Estimate: 1 day
Original Estimate - 1 day
Remaining Estimate: 1 day
Remaining Estimate - 1 day
Time Spent: Not Specified
Remaining Estimate - 1 day

Issue Links:
Duplicate
 
Reference
 

Participants: Anton Mazkovoi [Atlassian], Jeff Turner [Atlassian], Keith Brophy, Mike Aizatsky, Tim Jones and Yuen-Chi Lian [Atlassian]
Since last comment: 1 year, 47 weeks, 4 days ago
Resolution Date: 16/Mar/05 08:22 PM
Labels:


 Description  « Hide
when there are a large number of issues spanning a large number of dates in Jira, searches using date criteria will fail with the exception:

An error occurred searching: org.apache.lucene.search.BooleanQuery$TooManyClauses

see stack trace below.

The problem is that the date range search query actually turns into a BooleanQuery inside lucene, and if there are more than 1024 different dates, lucene will fail.

What needs to happen is the "Created After" and "Created Before" search criteria need to be combined into a single date range. Right now, they are individual RangeQueries, causing lucene to try to match every single date on one side or the other of the chosen boundary.

If the user only chooses "Created Before" (for example), and there are more than 1000 possible dates, Jira should present some kind of error message asking for a lower limit. Or (perhaps better) go as far as it can and report the limit in the search results.

Note that the lucene api allows you to retrieve terms less than a given term - see IndexReader#terms(Term). It would be possible to use this to determine that too many dates match.

Another solution would be to "normalize" dates in the search index to whole number days only. Right now it would appear the actual timestamp is indexed and the hours, minutes and seconds are not stripped off. Stripping these off would reduce the number of indexed terms in the lucene index. However, after a few years (1024 days, to be exact even this will fail - so Jira will still need to do some error checking.

stack trace:

org.apache.lucene.search.BooleanQuery$TooManyClauses
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:109)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:101)
at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:137)
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:244)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:188)
at org.apache.lucene.search.Query.weight(Query.java:120)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:128)
at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:150)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:93)
at org.apache.lucene.search.Hits.<init>(Hits.java:80)
at org.apache.lucene.search.Searcher.search(Searcher.java:71)
at org.apache.lucene.search.Searcher.search(Searcher.java:65)
at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.search(LuceneSearchProvider.java:75)
at com.atlassian.jira.issue.search.providers.DefaultSearchProvider.search(DefaultSearchProvider.java:43)
at com.atlassian.jira.issue.managers.DefaultIssueManager.execute(DefaultIssueManager.java:144)
at com.atlassian.jira.issue.managers.CachingIssueManager.execute(CachingIssueManager.java:82)
at com.atlassian.jira.web.action.issue.IssueNavigator.getBrowsableItems(IssueNavigator.java:370)
at com.atlassian.jira.web.action.issue.IssueNavigator.doExecute(IssueNavigator.java:258)
at webwork.action.ActionSupport.execute(ActionSupport.java:154)
at com.atlassian.jira.action.JiraActionSupport.execute(JiraActionSupport.java:46)
at webwork.dispatcher.GenericDispatcher.executeAction(GenericDispatcher.java:131)
at com.atlassian.jira.web.dispatcher.JiraServletDispatcher.service(JiraServletDispatcher.java:181)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Tim Jones added a comment - 25/Feb/04 09:13 AM
We added the following code to AbstractDocument.java on line 91 to strip off the hours and minutes by dates indexed into Lucene:

Calendar cal = Calendar.getInstance();
cal.setTimeInMillis (timestamp.getTime());
int day = cal.get (Calendar.DATE);
int month = cal.get (Calendar.MONTH);
int year = cal.get (Calendar.YEAR);
cal.clear();
cal.set (year, month, day);
timestamp = new Timestamp (cal.getTimeInMillis());


Mike Aizatsky added a comment - 13/Mar/04 04:58 AM
Isn't it really a blocker? What about fix in 2.7.1?
I can't perform a search for issues, modified between dates at all!

Jeff Turner [Atlassian] added a comment - 17/Mar/04 06:18 PM

Fixed simply by upping the maximum clause limit in lucene (now specifiable in jira-application.properties). Local testing with the jira.atlassian.com dataset suggests that large date queries are still pretty fast (under 4s)

Anton Mazkovoi [Atlassian] added a comment - 16/Apr/04 05:11 AM
Need to investigate - why does Lucene need so many parameters?

Keith Brophy added a comment - 16/Mar/05 08:22 PM
Issue is fixed, but we will investigate the Lucene problem.

Yuen-Chi Lian [Atlassian] added a comment - 21/Sep/06 09:56 AM
FYI,

http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-06fafb5d19e786a50fb3dfb8821a6af9f37aa831

  • Use a filter to replace the part of the query that causes the exception. For example, a RangeFilter can replace a RangeQuery on date fields and it will never throw the TooManyClauses exception – You can even use ConstantScoreRangeQuery to execute your RangeFilter as a Query. Note that filters are slower than queries when used for the first time, so you should cache them using [WWW] CachingWrapperFilter. Using Filters in place of Queries generated by QueryParser can be achieved by subclassing QueryParser and overriding the appropriate function to return a ConstantScore version of your Query.
  • Increase the number of terms using [WWW] BooleanQuery.setMaxClauseCount(). Note that this will increase the memory requirements for searches that expand to many terms. To deactivate any limits, use BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE).
  • A specfic solution that can work on very precise fields is to reduce the precision of the data in order to reduce the number of terms in the index. For example, the DateField class uses a microsecond resultion, which is often not required. Instead you can save your dates in the "yyyymmddHHMM" format, maybe even without hours and minutes if you don't need them (this was simplified in Lucene 1.9 thanks to the new DateTools class).


Cheers,
Yuen-Chi Lian

"I do not seek. I find." - Pablo Picasso