Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-73874

Reindexing issues with large number of comments, worklogs, history overloads indexers

      Issue Summary

      This is reproducible on Data Center: yes

      Problem & solution description:
      https://community.developer.atlassian.com/t/safeguards-in-jira-dc-index
      https://community.atlassian.com/t5/Data-Center-articles/Introducing-Safeguards-for-Jira-index/ba-p/2037880#M433
      https://confluence.atlassian.com/jirakb/jira-indexing-limits-stats-1125876330.html

      Steps to Reproduce

      Detailed description:
      https://community.developer.atlassian.com/t/safeguards-in-jira-dc-index/57636/4

      Expected Results

      Index should not be overloaded. Actions on issues with large number of comments, worklogs & history should not affect node & cluster performance & stability.

      Actual Results

      Index is overloaded. When this happens it will affect the performance of all actions triggering re-indexing. Jira may start dropping indexing request which can affect crucial Jira DC components like index replication.

      When the index is overloaded you may see the following exception thrown on failed indexing actions:

      IndexException: Wait attempt timed out - waited  30000 milliseconds
      

      The problem can be monitored with JIRA-STATS INDEXING-QUEUE logs.

      Workaround

      Archive issues with large number of related entities (comments, worklogs, history).

            [JRASERVER-73874] Reindexing issues with large number of comments, worklogs, history overloads indexers

            Matt Doar added a comment -

            Matt Doar added a comment - That looks like building the Lexorank index https://confluence.atlassian.com/jirakb/troubleshooting-lexorank-system-issues-779159221.html

            I commented in https://jira.atlassian.com/browse/JRASERVER-72045?attachmentOrder=desc
            maybe this makes sense to be added here too:

            The bottleneck was the database in our cases of Jira 8.20.x and 9.4 on 2 vCPU-machine with 32 G RAM and 25 indexing threads. For some reason, the indexing ran a query from Atlassian Greenhopper, that was very DB CPU-intensive and slow. For 12 hours the full indexing had completed 9% with that query:

            SELECT "FIELD_ID","ID","ISSUE_ID","LOCK_HASH","LOCK_TIME","RANK","TYPE" FROM jira_db."AO_60DB71_LEXORANK" WHERE "FIELD_ID" = $1 AND "ISSUE_ID" = $2 AND "TYPE" = $3
            

            with plan:

            jira_rds=> explain SELECT DATA."FIELD_ID",DATA."CHECKED",DATA."DATA_ROW_ID",DATA."DECIMAL_NUMBER_STR",DATA."ISSUE_ID",DATA."ID",DATA."DECIMAL_NUMBER",DATA."DATE",DATA."SIMPLE_DATA",DATA."SELECTED_ID",DATA."USER_KEY",DATA."SIMPLE_NUMBER" FROM jira_db."AO_FC7135_DATA" DATA JOIN jira_db."AO_FC7135_DATA_ROW" DATAROW ON DATA."DATA_ROW_ID" = DATAROW."ID" WHERE DATAROW."FIELD_CONFIG_ID" = 13 AND DATAROW."ISSUE_ID" = 13 ORDER BY DATAROW."ORDER_POSITION" ASC;
            QUERY PLAN
            -------------------------------------------------------------------------------------------------------------------------
            Sort (cost=677.91..677.92 rows=7 width=179)
            Sort Key: datarow."ORDER_POSITION"
            -> Nested Loop (cost=0.42..677.81 rows=7 width=179)
            -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..667.93 rows=1 width=16)
            Filter: (("FIELD_CONFIG_ID" = 13) AND ("ISSUE_ID" = 13))
            -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.81 rows=7 width=171)
            Index Cond: ("DATA_ROW_ID" = datarow."ID")
            

            created the index:

            *create index "AO_60DB71_LEXORANK_FLD_ISS_TYP" ON "AO_60DB71_LEXORANK" ("FIELD_ID", "ISSUE_ID", "TYPE")

            the load dropped and the plan now is:

            QUERY PLAN
            ---------------------------------------------------------------------------------------------------------------
            Index Scan using "AO_60DB71_LEXORANK_FLD_ISS_TYP" on "AO_60DB71_LEXORANK" (cost=0.43..8.45 rows=1 width=562)
            Index Cond: (("FIELD_ID" = 1) AND ("ISSUE_ID" = 2) AND ("TYPE" = 3))
            

            And also the query:

            jira_rds=> explain SELECT DATA."DATE_TIME",DATA."ID",DATA."FIELD_ID",DATA."CHECKED",DATA."DATA_ROW_ID",
                           DATA."DECIMAL_NUMBER_STR",DATA."ISSUE_ID",DATA."DECIMAL_NUMBER",DATA."SIMPLE_DATA",
                           DATA."DATE",DATA."SELECTED_ID",DATA."USER_KEY",DATA."SIMPLE_NUMBER"
                      FROM jira_db."AO_FC7135_DATA" DATA
                           JOIN jira_db."AO_FC7135_DATA_ROW" DATAROW ON DATA."DATA_ROW_ID" = DATAROW."ID"
                     WHERE DATAROW."FIELD_CONFIG_ID" = 1
                       AND DATAROW."ISSUE_ID" = 2
                     ORDER BY DATAROW."ORDER_POSITION" ASC;
            QUERY PLAN
            -------------------------------------------------------------------------------------------------------------------------
            Sort (cost=1005.27..1005.28 rows=6 width=203)
            Sort Key: datarow."ORDER_POSITION"
            -> Nested Loop (cost=0.42..1005.19 rows=6 width=203)
            -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..996.04 rows=1 width=16)
            Filter: (("FIELD_CONFIG_ID" = 1) AND ("ISSUE_ID" = 2))
            -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195)
            Index Cond: ("DATA_ROW_ID" = datarow."ID")
            (7 rows)
            

            I added the index:

            *create index "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" ("FIELD_CONFIG_ID", "ISSUE_ID", "ORDER_POSITION", "ID");

            and the plan now is:

            QUERY PLAN
            ----------------------------------------------------------------------------------------------------------------------------------------
            Nested Loop (cost=0.84..17.59 rows=6 width=203)
            -> Index Only Scan using "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" datarow (cost=0.41..8.43 rows=1 width=16)
            Index Cond: (("FIELD_CONFIG_ID" = 1) AND ("ISSUE_ID" = 2))
            -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195)
            Index Cond: ("DATA_ROW_ID" = datarow."ID")
            

            With these two database indexes, the full indexing of JIRA completed in 2.5 hours.

             

            The slow full index in JIRA might also be caused by inefficient DB queries. Consider adding (these) DB indexes and only then upgrade machines and indexing threads.

            Rusi Popov added a comment - I commented in https://jira.atlassian.com/browse/JRASERVER-72045?attachmentOrder=desc maybe this makes sense to be added here too: The bottleneck was the database in our cases of Jira 8.20.x and 9.4 on 2 vCPU-machine with 32 G RAM and 25 indexing threads. For some reason, the indexing ran a query from Atlassian Greenhopper, that was very DB CPU-intensive and slow. For 12 hours the full indexing had completed 9% with that query: SELECT "FIELD_ID" , "ID" , "ISSUE_ID" , "LOCK_HASH" , "LOCK_TIME" , "RANK" , "TYPE" FROM jira_db. "AO_60DB71_LEXORANK" WHERE "FIELD_ID" = $1 AND "ISSUE_ID" = $2 AND "TYPE" = $3 with plan: jira_rds=> explain SELECT DATA. "FIELD_ID" ,DATA. "CHECKED" ,DATA. "DATA_ROW_ID" ,DATA. "DECIMAL_NUMBER_STR" ,DATA. "ISSUE_ID" ,DATA. "ID" ,DATA. "DECIMAL_NUMBER" ,DATA. "DATE" ,DATA. "SIMPLE_DATA" ,DATA. "SELECTED_ID" ,DATA. "USER_KEY" ,DATA. "SIMPLE_NUMBER" FROM jira_db. "AO_FC7135_DATA" DATA JOIN jira_db. "AO_FC7135_DATA_ROW" DATAROW ON DATA. "DATA_ROW_ID" = DATAROW. "ID" WHERE DATAROW. "FIELD_CONFIG_ID" = 13 AND DATAROW. "ISSUE_ID" = 13 ORDER BY DATAROW. "ORDER_POSITION" ASC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Sort (cost=677.91..677.92 rows=7 width=179) Sort Key: datarow. "ORDER_POSITION" -> Nested Loop (cost=0.42..677.81 rows=7 width=179) -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..667.93 rows=1 width=16) Filter: (( "FIELD_CONFIG_ID" = 13) AND ( "ISSUE_ID" = 13)) -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.81 rows=7 width=171) Index Cond: ( "DATA_ROW_ID" = datarow. "ID" ) created the index: *create index "AO_60DB71_LEXORANK_FLD_ISS_TYP" ON "AO_60DB71_LEXORANK" ("FIELD_ID", "ISSUE_ID", "TYPE") the load dropped and the plan now is: QUERY PLAN --------------------------------------------------------------------------------------------------------------- Index Scan using "AO_60DB71_LEXORANK_FLD_ISS_TYP" on "AO_60DB71_LEXORANK" (cost=0.43..8.45 rows=1 width=562) Index Cond: (( "FIELD_ID" = 1) AND ( "ISSUE_ID" = 2) AND ( "TYPE" = 3)) And also the query: jira_rds=> explain SELECT DATA. "DATE_TIME" ,DATA. "ID" ,DATA. "FIELD_ID" ,DATA. "CHECKED" ,DATA. "DATA_ROW_ID" , DATA. "DECIMAL_NUMBER_STR" ,DATA. "ISSUE_ID" ,DATA. "DECIMAL_NUMBER" ,DATA. "SIMPLE_DATA" , DATA. "DATE" ,DATA. "SELECTED_ID" ,DATA. "USER_KEY" ,DATA. "SIMPLE_NUMBER" FROM jira_db. "AO_FC7135_DATA" DATA JOIN jira_db. "AO_FC7135_DATA_ROW" DATAROW ON DATA. "DATA_ROW_ID" = DATAROW. "ID" WHERE DATAROW. "FIELD_CONFIG_ID" = 1 AND DATAROW. "ISSUE_ID" = 2 ORDER BY DATAROW. "ORDER_POSITION" ASC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Sort (cost=1005.27..1005.28 rows=6 width=203) Sort Key: datarow. "ORDER_POSITION" -> Nested Loop (cost=0.42..1005.19 rows=6 width=203) -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..996.04 rows=1 width=16) Filter: (( "FIELD_CONFIG_ID" = 1) AND ( "ISSUE_ID" = 2)) -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195) Index Cond: ( "DATA_ROW_ID" = datarow. "ID" ) (7 rows) I added the index: *create index "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" ("FIELD_CONFIG_ID", "ISSUE_ID", "ORDER_POSITION", "ID"); and the plan now is: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.84..17.59 rows=6 width=203) -> Index Only Scan using "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" datarow (cost=0.41..8.43 rows=1 width=16) Index Cond: (( "FIELD_CONFIG_ID" = 1) AND ( "ISSUE_ID" = 2)) -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195) Index Cond: ( "DATA_ROW_ID" = datarow. "ID" ) With these two database indexes, the full indexing of JIRA completed in 2.5 hours.   The slow full index in JIRA might also be caused by inefficient DB queries. Consider adding (these) DB indexes and only then upgrade machines and indexing threads.

            Matt Doar added a comment -

            Good to hear, thanks

            Matt Doar added a comment - Good to hear, thanks

            Hi thony.lundin, this fix was backported to 8.20.12.

            Maciej Swinarski (Inactive) added a comment - Hi thony.lundin , this fix was backported to 8.20.12.

            Any news on backporting to 8.20 LTS? 

            Thony Lundin added a comment - Any news on backporting to 8.20 LTS? 

            e6a44563da75 ,

            We're considering backporting the safeguards to the 8.20.x LTS, but I cannot promise that yet.

            Kamil Cichy (Inactive) added a comment - e6a44563da75 , We're considering backporting the safeguards to the 8.20.x LTS, but I cannot promise that yet.

            Matt Doar added a comment -

            Matt Doar added a comment - https://community.developer.atlassian.com/t/safeguards-in-jira-dc-index/57636 is a useful link I had missed, thanks

            Matt Doar added a comment -

            Sorry. yes 8.20.x for LTS

            Matt Doar added a comment - Sorry. yes 8.20.x for LTS

            What changes were made in 8.22 to fix this, in general?

            The changes are described in detail in these three articles: https://community.developer.atlassian.com/t/safeguards-in-jira-dc-index/57636. Is there anything specific you'd like to know that's missing from these articles?

            Are there plans to port those changes to an 8.22.x release?

            I didn't get this one. These changes were introduced gradually in 8.22.2, 8.22.3 and 8.22.4.

            Did you mean 8.20.x maybe?

            Kamil Cichy (Inactive) added a comment - What changes were made in 8.22 to fix this, in general? The changes are described in detail in these three articles: https://community.developer.atlassian.com/t/safeguards-in-jira-dc-index/57636 . Is there anything specific you'd like to know that's missing from these articles? Are there plans to port those changes to an 8.22.x release? I didn't get this one. These changes were introduced gradually in 8.22.2, 8.22.3 and 8.22.4. Did you mean 8.20.x maybe?

            Matt Doar added a comment -

            1. What changes were made in 8.22 to fix this, in general?
            2. Are there plans to port those changes to an 8.22.x release?

            Matt Doar added a comment - 1. What changes were made in 8.22 to fix this, in general? 2. Are there plans to port those changes to an 8.22.x release?

              mswinarski Maciej Swinarski (Inactive)
              mswinarski Maciej Swinarski (Inactive)
              Affected customers:
              1 This affects my team
              Watchers:
              19 Start watching this issue

                Created:
                Updated:
                Resolved: