Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-22256

Performance Issue with JQL functions

XMLWordPrintable

    • We collect Jira feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      NOTE: This suggestion is for JIRA Server. Using JIRA Cloud? See the corresponding suggestion.

      Summary
      IssueNavigator calls every instance of a JQL function's getValues() three times

      1. to validate the query
      2. to get the query's issues
      3. to see if the query fits in the simple filter form [getQueryContext()]
        If the JQL function has to inspect a lot of data or returns a lot data, this leads to severe performance issues unexpectedly multiply.

      Detailed Explanation
      As some Atlassians know, we wedged Mike Cannon-Brookes prototype of advanced search into our Jira 3.13 and made enhancements to search change history and all the comment fields.
      I've been working to adapt what we did to work in Jira 4.

      In Jira 3, this required a heavy-handed modification of the LuceneQueryCreator so that a Lucene Filter on the comment index was properly or'ed or and'ed or not'ed against the issue index. It was messy but it worked. The performance was OK as long as the user didn't have too many queries against the comment index as sub-queries.

      For Jira 4, I figured that using a JQL function to search on the comment index and then return JiraDataType.ISSUE was the cleaner way to do what we had done by hacking LuceneQueryCreator in Jira 3. The JQL function searches the comment index and returns a list of issue id's. The rest would be handled by in Jira 4's great new re-worked search system and I could encapsulate everything in a plugin.

      What I did works, but it has horrible performance.
      IssueNavigator calls every instance of the function's getValues() three times

      1. to validate the query
      2. to get the query's issues
      3. to see if the query fits in the simple filter form [getQueryContext()]

      This means that my comment JQL function has to search the comment index 3 times, loop 3 times through each hit on the comment index and convert it to an issue id, and then the system re-queries the issue index using those issue ids in a new search 3 times.
      In our Jira 3 implementation the comment index was searched once, converted to a Lucene Filter and then filtered once against the issue index using Lucene. Here are averages of a couple comparisons that I did on the same machine running both Jira 3 and Jira 4

      issues Jira 3 Jira 4
      500 5 seconds 5 seconds
      30000 23 seconds 190 seconds
      100000 53 seconds 10 minutes

      (I also noticed that if an issue in the results is for some reason invalid – for example if it fell victim to Bulk Edit woes – then after the user has waited minutes to get their slow running results, instead of getting all their issues with a warning note about the invalid issues, they get zero results and an error message which doesn't indicate what went wrong. See invalid-issue.png. I've filed a separate bug on this, JRA-22277.)

      I believe JQL needs several an improvements

      1. getValues() should only be called once not 3 times per function. Perhaps wrap the call to the function in an object that holds onto the values while it gets passed from validation to query to context anaylysis. Another idea, add an isTooComplexForSimpleSearch() method to the JqlFunction interface that would negate the need to run the query for most functions
      2. The returning of issue id's from JqlFunction.getValues() is almost always going to lead to a poor-performance situation because whatever the function does to discover issue ids is going to be expensive. Perhaps it should be deprecated. Update I also have concern for the User functions. For example, I wrote another one that returns all the users that are no longer active so that project leads can re-assign issues that are assigned to employees who no longer work at our company. This function has to loop through all the users and find which ones no longer belong to any groups. The JQL framework also runs this function three times per function call.
      3. Perhaps the better alternative would be to have a JiraDataType.FILTER so that getValues would return a Lucene Filter. after analyzing the issue further myself, I realize that this isn't a valid suggestion

      Update, September 15
      After studying this for a couple days, I was able to speed up our system with 4 changes:

      1. To stop it from looping through thousands of issues and constructing an issue for each before the query executes, I commented out the validation code in IssueIdValidator
      2. I modified IssueIdQueryFactory to construct an IssueIdFilter wrapped by a ConstantScoreQuery rather than a BooleanQuery with a sub-query for each issue id. This avoids the Too Many Clauses Error if the JQL function hits more than 32000 issues. This is similar to the solution that I posed in JRA-22453. See code below.
      3. I modified MultiClauseDecoratorContextFactory so that it only loops through the first 100 results during getQueryContext().
      4. To work around but not solve the problem of IssueNavigator calling each instance of a JQL function 3 times, I put a cache in my JQL issue-type functions that holds onto each instance's results for a few seconds. This consumes unnecessary memory but it's the best that I can think of short of re-writing the entire JQL call framework to hold onto the results and then pass them through validate(), executeQuery(), and getQueryContext() rather than having validate(), executeQuery(), and getQueryContext() each separately calling jqlIssueFunction.getValues().
      5. To avoid unneccessarily calling isCurrentQueryTooComplex(), I hacked our IssueNavigator.isAdvanced() to have a regular expression that recognizes if there's a JQL string that matches one of our nasty custom JQL issue-type functions.

      With these improvements, our Jira 4 now runs faster than Jira 3! Yay!
      New results

      issues Jira 3 Jira 4
      500 5 seconds 4 seconds
      30000 23 seconds 8 seconds
      100000 53 seconds 23 seconds

      (Digression – another improvement request that I contemplate – get rid of the issue index and index everything, including change history, inside the comment index)

      IssueIdQueryFactory.java
          private Query createPositiveEqualsQuery(final List<QueryLiteral> rawValues)
          {
              if (rawValues.size() == 1)
              {
                  return createQuery(rawValues.get(0));
              }
              else
              {
                  return new ConstantScoreQuery(new IssueIdFilter(rawValues));
              }
          }
      

        1. invalid-issue.png
          84 kB
          Jeff Kirby

              jwinters tier-0 grump
              adc6ee404f6d Jeff Kirby
              Votes:
              24 Vote for this issue
              Watchers:
              37 Start watching this issue

                Created:
                Updated:
                Resolved: