Uploaded image for project: 'Jira Software Data Center'
  1. Jira Software Data Center
  2. JSWSERVER-16542

During rebalancing ranking operation fails to save new rank due to slow DB operation

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Low Low
    • None
    • 7.2.8, 7.2.13, 7.6.3, 7.6.4, 7.3.4, 8.20.12, 9.4.0, 9.12.0
    • AgileBoard

      Not a Jira bug

      This is not a Jira core bug, ticket created to track the problem and for transparency.
      At this stage, we believe the problem is related to MySQL: for some reason MySQL chooses the wrong execution plan and doesn't use an index in optimal way. That makes SQL execution slow. That being said, we will keep the ticket open for some time to monitor the problem.

      Summary

      As part of Lexorank rebalancing, each rank value needs to be updated in AO_60DB71_LEXORANK table. At the same time, individual ranking operation has 1500 ms timeout for rank lock.
      Due to slow DB operation, issue ranking operation fails to save new rank since rank lock expires. That makes Lexorank rebalancing process stall, usually you will see no progress in UI. Please note that situation is different to JSWSERVER-15703

      Environment

      • Large JIRA instance: 1M+ issue
        • more specific large AO_60DB71_LEXORANK table
      • The issue was initially reported with Jira connected to MySQL databases, but there were other database engines affected as well, such as Oracle and MS SQL Server.

      Steps to Reproduce

      1. Run Lexorank rebalancing
      2. Monitor progress

      Expected Results

      • Lexorank rebalancing runs continuously and finishes successful.
      • SQL executions is fast:
        SELECT FIELD_ID,ID,ISSUE_ID,LOCK_HASH,LOCK_TIME,RANK,TYPE FROM AO_60DB71_LEXORANK WHERE FIELD_ID = ? AND BUCKET = ? ORDER BY RANK ASC LIMIT 1; 
        SELECT FIELD_ID,ID,ISSUE_ID,LOCK_HASH,LOCK_TIME,RANK,TYPE FROM AO_60DB71_LEXORANK WHERE FIELD_ID = ? AND BUCKET = ? ORDER BY RANK DESC LIMIT 1;
        

      Actual Results

      • Lexorank rebalancing stalls
      • Issue ranking operation fails to save new rank since rank lock expires
      • SQL executions is slow

      Notes

      • To enable logging set DEBUG for com.atlassian.greenhopper.service.lexorank and com.atlassian.greenhopper.manager.lexorank
      • Example of logging:
        2018-02-27 10:21:58,046 lexorank-executor-thread-0 DEBUG ServiceRunner     [c.a.g.s.lexorank.balance.LexoRankBalanceOperation] Acquired lock on rows
        2018-02-27 10:21:58,046 lexorank-executor-thread-0 DEBUG ServiceRunner     [c.a.g.manager.lexorank.LexoRankDaoImpl]  LexoRank Query : SELECT FIELD_ID,ID,ISSUE_ID,LOCK_HASH,LOCK_TIME,RANK,TYPE FROM AO_60DB71_LEXORANK WHERE FIELD_ID = ? AND BUCKET = ? ORDER BY RANK ASC LIMIT 1; params[15530, 2]
        2018-02-27 10:21:59,190 lexorank-executor-thread-0 DEBUG ServiceRunner     [c.a.g.manager.lexorank.LexoRankDaoImpl]  LexoRank Query : SELECT FIELD_ID,ID,ISSUE_ID,LOCK_HASH,LOCK_TIME,RANK,TYPE FROM AO_60DB71_LEXORANK WHERE FIELD_ID = ? AND BUCKET = ? ORDER BY RANK DESC LIMIT 1; params[15530, 0]
        2018-02-27 10:22:00,648 lexorank-executor-thread-0 DEBUG ServiceRunner     [c.a.g.s.lexorank.balance.LexoRankBalanceOperation] Balancing rank row [type=ISSUE_RANK_ROW, oldRank=2|hx8yj3:, newRank=0|i1sicv:]
        2018-02-27 10:22:00,650 lexorank-executor-thread-0 DEBUG ServiceRunner     [c.a.g.s.lexorank.balance.LexoRankBalanceOperation] Failed to save rank row, retry
        2018-02-27 10:22:00,650 lexorank-executor-thread-0 DEBUG ServiceRunner     [c.a.g.s.lexorank.balance.LexoRankBalanceOperation] Releasing lock
        
      • explain SELECT
        explain SELECT FIELD_ID,ID,ISSUE_ID,LOCK_HASH,LOCK_TIME,RANK,TYPE FROM AO_60DB71_LEXORANK WHERE FIELD_ID = 11500 AND BUCKET = 2 ORDER BY RANK ASC LIMIT 1;
        
        speed id select_type table type possible_keys key key_len ref
        slow 1 SIMPLE AO_60DB71_LEXORANK ref index_ao_60db71_lex1569533973,index_ao_60db71_lex1694305086 index_ao_60db71_lex1569533973 8 const
        fast 1 SIMPLE AO_60DB71_LEXORANK range index_ao_60db71_lex1694305086,index_ao_60db71_lex1569533973 index_ao_60db71_lex1569533973 775 NULL
      • Note that in case of slow select key_len is 8.
      • To help investigate problem like this in the future, please see KB based on this bug: Issue ranking operation fails - JIRA Software cannot execute the rank operation at this time

      Workaround

      • Resolving the issue involves making MySQL select the right index, to do that, please recreate index (there will be small downtime during that operation):
        DROP INDEX index_ao_60db71_lex1569533973 ON AO_60DB71_LEXORANK;
        CREATE INDEX index_ao_60db71_lex1569533973 on AO_60DB71_LEXORANK (FIELD_ID,RANK);
        
        • If you can allow longer downtime, you can run at mysql: "ANALYZE TABLE AO_60DB71_LEXORANK" (During the analysis, the table is locked with a read lock for InnoDB and MyISAM. see analyze-table)

              Unassigned Unassigned
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Votes:
              23 Vote for this issue
              Watchers:
              30 Start watching this issue

                Created:
                Updated: