Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-72045

IndexException: Wait attempt timed out - waited 30000 milliseconds caused by background indexing tasks

      Issue Summary

      It is possible that background indexing tasks (like Background indexing, SLA recalculation and others) may fill up the indexer queue and affect user create/update issue/comment operation by:

      • slowing down create/update issue/comment user request operations 
      • timing out user indexing requests (IndexException: Wait attempt timed out...) which will results in an error shown after 30sec to users and may result in changes not being visible index

      There are couple of factors which will result in increasing the possibility of having this problem when triggering background indexing:

      • fast Lucene document creation
      • poor IO performance - this will slow down conditional updates, i.e. inserting the Lucene document in the index

      With the above we will have a background indexing task(s) which may produce Lucene Documents faster than the indexer can consume them. This will fill up the indexer queue which is also used by user create/update tasks (HTTP requests).  

      Steps to Reproduce

      TODO

      Expected Results

      Background indexing tasks should not affect user indexing tasks.

      Actual Results

      Currently a background indexing task can fill up the indexer queue and user indexing requests will timeout after 30sec:

      /secure/CreateIssueDetails.jspa [c.a.j.issue.index.DefaultIndexManager] Wait attempt timed out - waited 30000 milliseconds
      com.atlassian.jira.issue.index.IndexException: Wait attempt timed out - waited 30000 milliseconds
      at com.atlassian.jira.issue.index.DefaultIndexManager.obtain(DefaultIndexManager.java:872)
      at com.atlassian.jira.issue.index.DefaultIndexManager.await(DefaultIndexManager.java:843)
      at com.atlassian.jira.issue.index.DefaultIndexManager.executeWithIndexLock(DefaultIndexManager.java:830)
      at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexIssues(DefaultIndexManager.java:618)
      at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexIssueObjects(DefaultIndexManager.java:494)
      at com.atlassian.jira.issue.index.DefaultIndexManager.reIndex(DefaultIndexManager.java:540)
      at com.atlassian.jira.issue.index.DefaultIndexManager.reIndex(DefaultIndexManager.java:523) 
      

      Workaround

      In Jira DC:

      • Background indexing should not be used. Always favour foreground indexing on one of the nodes which will not accept any user requests during re-indexing. This will guarantee faster re-indexing and not affecting concurrent user requests.
      • SLA recalculation: the number of recalculation threads should be limited to 1 

      In Jira Server:

      • IO performance should be investigated: conditional index updates trigger index flush which performance can be negatively affected by slow disk: KB: Troubleshooting performance with JIRA Stats
        • Specifically check: Disk performance, updateDocumentsWithVersionMillis, processDBRMessageUpdateWithRelatedIndexInMillis
      • Background indexing should not be triggered on a regular basis. Since Jira 8.10 the index should be consistent with the DB and there is no need to background indexing to guarantee this.
      • If re-indexing is needed always favour foreground indexing (in Jira Server this  means Jira being not available for users). Note that Jira DC provides HA in case of running foreground indexing.

      Note that a slightly slower background indexing time is expected (since Jira 8.10) and this it not the problem described by this issue. This issue is about the problem of background indexing tasks taking over the indexer and affecting all user requests. Explained here: KB: Background reindex is slow after upgrading to Jira 8.10 and later

      Note that the indexer may be flooded by apps triggering re-indexing in non request (user) threads. This may be very similar to running a full foreground re-index (multi-thread) with parallel user requests. In such case the plugin app should not trigger re-indexing on more then one thread. The workaround should include checking if there is a way of controlling the number of re-indexing threads.

       

            [JRASERVER-72045] IndexException: Wait attempt timed out - waited 30000 milliseconds caused by background indexing tasks

            Conny Postma made changes -
            Remote Link Original: This issue links to "Page (Atlassian Documentation)" [ 534381 ]
            Artur Moura (Inactive) made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 849458 ]

            The bottleneck was the database in our cases of Jira 8.20.x and 9.4 on 2 vCPU-machine with 32 G RAM and 25 indexing threads. For some reason, the indexing ran a query from Atlassian Greenhopper, that was very DB CPU-intensive and slow. For 12 hours the full indexing had completed 9% with that query:

            SELECT "FIELD_ID","ID","ISSUE_ID","LOCK_HASH","LOCK_TIME","RANK","TYPE" FROM jira_db."AO_60DB71_LEXORANK" WHERE "FIELD_ID" = $1 AND "ISSUE_ID" = $2 AND "TYPE" = $3
            

            with plan:

            jira_rds=> explain SELECT DATA."FIELD_ID",DATA."CHECKED",DATA."DATA_ROW_ID",DATA."DECIMAL_NUMBER_STR",DATA."ISSUE_ID",DATA."ID",DATA."DECIMAL_NUMBER",DATA."DATE",DATA."SIMPLE_DATA",DATA."SELECTED_ID",DATA."USER_KEY",DATA."SIMPLE_NUMBER" FROM jira_db."AO_FC7135_DATA" DATA JOIN jira_db."AO_FC7135_DATA_ROW" DATAROW ON DATA."DATA_ROW_ID" = DATAROW."ID" WHERE DATAROW."FIELD_CONFIG_ID" = 13 AND DATAROW."ISSUE_ID" = 13 ORDER BY DATAROW."ORDER_POSITION" ASC;
            QUERY PLAN
            -------------------------------------------------------------------------------------------------------------------------
            Sort (cost=677.91..677.92 rows=7 width=179)
            Sort Key: datarow."ORDER_POSITION"
            -> Nested Loop (cost=0.42..677.81 rows=7 width=179)
            -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..667.93 rows=1 width=16)
            Filter: (("FIELD_CONFIG_ID" = 13) AND ("ISSUE_ID" = 13))
            -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.81 rows=7 width=171)
            Index Cond: ("DATA_ROW_ID" = datarow."ID")
            

            created the index:

            *create index "AO_60DB71_LEXORANK_FLD_ISS_TYP" ON "AO_60DB71_LEXORANK" ("FIELD_ID", "ISSUE_ID", "TYPE")

            the load dropped and the plan now is:

            QUERY PLAN
            ---------------------------------------------------------------------------------------------------------------
            Index Scan using "AO_60DB71_LEXORANK_FLD_ISS_TYP" on "AO_60DB71_LEXORANK" (cost=0.43..8.45 rows=1 width=562)
            Index Cond: (("FIELD_ID" = 1) AND ("ISSUE_ID" = 2) AND ("TYPE" = 3))
            

            And also the query:

            jira_rds=> explain SELECT DATA."DATE_TIME",DATA."ID",DATA."FIELD_ID",DATA."CHECKED",DATA."DATA_ROW_ID",
                           DATA."DECIMAL_NUMBER_STR",DATA."ISSUE_ID",DATA."DECIMAL_NUMBER",DATA."SIMPLE_DATA",
                           DATA."DATE",DATA."SELECTED_ID",DATA."USER_KEY",DATA."SIMPLE_NUMBER"
                      FROM jira_db."AO_FC7135_DATA" DATA
                           JOIN jira_db."AO_FC7135_DATA_ROW" DATAROW ON DATA."DATA_ROW_ID" = DATAROW."ID"
                     WHERE DATAROW."FIELD_CONFIG_ID" = 1
                       AND DATAROW."ISSUE_ID" = 2
                     ORDER BY DATAROW."ORDER_POSITION" ASC;
            QUERY PLAN
            -------------------------------------------------------------------------------------------------------------------------
            Sort (cost=1005.27..1005.28 rows=6 width=203)
            Sort Key: datarow."ORDER_POSITION"
            -> Nested Loop (cost=0.42..1005.19 rows=6 width=203)
            -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..996.04 rows=1 width=16)
            Filter: (("FIELD_CONFIG_ID" = 1) AND ("ISSUE_ID" = 2))
            -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195)
            Index Cond: ("DATA_ROW_ID" = datarow."ID")
            (7 rows)
            

            I added the index:

            *create index "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" ("FIELD_CONFIG_ID", "ISSUE_ID", "ORDER_POSITION", "ID");

            and the plan now is:

            QUERY PLAN
            ----------------------------------------------------------------------------------------------------------------------------------------
            Nested Loop (cost=0.84..17.59 rows=6 width=203)
            -> Index Only Scan using "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" datarow (cost=0.41..8.43 rows=1 width=16)
            Index Cond: (("FIELD_CONFIG_ID" = 1) AND ("ISSUE_ID" = 2))
            -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195)
            Index Cond: ("DATA_ROW_ID" = datarow."ID")
            

            With these two database indexes, the full indexing of JIRA completed in 2.5 hours.

             

            The slow full index in JIRA might also be caused by inefficient DB queries. Consider adding (these) DB indexes and only then upgrade machines and indexing threads.

            Rusi Popov added a comment - The bottleneck was the database in our cases of Jira 8.20.x and 9.4 on 2 vCPU-machine with 32 G RAM and 25 indexing threads. For some reason, the indexing ran a query from Atlassian Greenhopper, that was very DB CPU-intensive and slow. For 12 hours the full indexing had completed 9% with that query: SELECT "FIELD_ID" , "ID" , "ISSUE_ID" , "LOCK_HASH" , "LOCK_TIME" , "RANK" , "TYPE" FROM jira_db. "AO_60DB71_LEXORANK" WHERE "FIELD_ID" = $1 AND "ISSUE_ID" = $2 AND "TYPE" = $3 with plan: jira_rds=> explain SELECT DATA. "FIELD_ID" ,DATA. "CHECKED" ,DATA. "DATA_ROW_ID" ,DATA. "DECIMAL_NUMBER_STR" ,DATA. "ISSUE_ID" ,DATA. "ID" ,DATA. "DECIMAL_NUMBER" ,DATA. "DATE" ,DATA. "SIMPLE_DATA" ,DATA. "SELECTED_ID" ,DATA. "USER_KEY" ,DATA. "SIMPLE_NUMBER" FROM jira_db. "AO_FC7135_DATA" DATA JOIN jira_db. "AO_FC7135_DATA_ROW" DATAROW ON DATA. "DATA_ROW_ID" = DATAROW. "ID" WHERE DATAROW. "FIELD_CONFIG_ID" = 13 AND DATAROW. "ISSUE_ID" = 13 ORDER BY DATAROW. "ORDER_POSITION" ASC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Sort (cost=677.91..677.92 rows=7 width=179) Sort Key: datarow. "ORDER_POSITION" -> Nested Loop (cost=0.42..677.81 rows=7 width=179) -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..667.93 rows=1 width=16) Filter: (( "FIELD_CONFIG_ID" = 13) AND ( "ISSUE_ID" = 13)) -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.81 rows=7 width=171) Index Cond: ( "DATA_ROW_ID" = datarow. "ID" ) created the index: *create index "AO_60DB71_LEXORANK_FLD_ISS_TYP" ON "AO_60DB71_LEXORANK" ("FIELD_ID", "ISSUE_ID", "TYPE") the load dropped and the plan now is: QUERY PLAN --------------------------------------------------------------------------------------------------------------- Index Scan using "AO_60DB71_LEXORANK_FLD_ISS_TYP" on "AO_60DB71_LEXORANK" (cost=0.43..8.45 rows=1 width=562) Index Cond: (( "FIELD_ID" = 1) AND ( "ISSUE_ID" = 2) AND ( "TYPE" = 3)) And also the query: jira_rds=> explain SELECT DATA. "DATE_TIME" ,DATA. "ID" ,DATA. "FIELD_ID" ,DATA. "CHECKED" ,DATA. "DATA_ROW_ID" , DATA. "DECIMAL_NUMBER_STR" ,DATA. "ISSUE_ID" ,DATA. "DECIMAL_NUMBER" ,DATA. "SIMPLE_DATA" , DATA. "DATE" ,DATA. "SELECTED_ID" ,DATA. "USER_KEY" ,DATA. "SIMPLE_NUMBER" FROM jira_db. "AO_FC7135_DATA" DATA JOIN jira_db. "AO_FC7135_DATA_ROW" DATAROW ON DATA. "DATA_ROW_ID" = DATAROW. "ID" WHERE DATAROW. "FIELD_CONFIG_ID" = 1 AND DATAROW. "ISSUE_ID" = 2 ORDER BY DATAROW. "ORDER_POSITION" ASC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Sort (cost=1005.27..1005.28 rows=6 width=203) Sort Key: datarow. "ORDER_POSITION" -> Nested Loop (cost=0.42..1005.19 rows=6 width=203) -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..996.04 rows=1 width=16) Filter: (( "FIELD_CONFIG_ID" = 1) AND ( "ISSUE_ID" = 2)) -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195) Index Cond: ( "DATA_ROW_ID" = datarow. "ID" ) (7 rows) I added the index: *create index "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" ("FIELD_CONFIG_ID", "ISSUE_ID", "ORDER_POSITION", "ID"); and the plan now is: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.84..17.59 rows=6 width=203) -> Index Only Scan using "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" datarow (cost=0.41..8.43 rows=1 width=16) Index Cond: (( "FIELD_CONFIG_ID" = 1) AND ( "ISSUE_ID" = 2)) -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195) Index Cond: ( "DATA_ROW_ID" = datarow. "ID" ) With these two database indexes, the full indexing of JIRA completed in 2.5 hours.   The slow full index in JIRA might also be caused by inefficient DB queries. Consider adding (these) DB indexes and only then upgrade machines and indexing threads.
            Neel made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 771793 ]
            Diego Baeza (Inactive) made changes -
            Comment [ https://panjproducts06.yolasite.com/    
            https://www.twitch.tv/pixera1348    
            https://topsitenet.com/user/taslimniyalek/
            https://www.pexels.com/@di-m-nh-568823908/    
            https://www.mixily.com/event/4490098272973964221    
            https://www.artstation.com/fhanjaygola9    
            https://johu.hashnode.dev/rnyna-clothing-reviews-is-it-a-rnyna-online-store-or-another-online-scam-with-buyrs    
            https://johu.hashnode.dev/rnyna-clothing-reviews-is-it-a-rnyna-online-store-or-another-online-scam-with-buyrs    
            https://www.question2answer.org/qa/user/elenzy    
            https://jemi.so/clothings952    
            https://www.dermandar.com/user/wenjordy/    
            https://www.ted.com/profiles/43615693    
            https://www.40billion.com/profile/31813840    
            http://snaplant.com/question/rnyna-clothing-reviews-everything-we-know-so-far/    
            https://hashnode.com/@bhatenrkola    
            https://qiita.com/shamigor    
            https://circleten.org/a/248994    
            https://www.hackathon.io/users/385206    
            https://dribbble.com/faxamol/about    
            https://xiaoxq.net/d/234279-rnyna-clothing-reviews-quality-clothing-for-every-occasion    
            https://rnyna.hashnode.dev/rnyna-clothing-reviews-what-you-need-to-know-before-you-buy    
            https://www.agentpet.com/forum/discussion/general/rnyna-clothing-reviews-fashionable-and-comfortable-clothing    
            https://forum.ppr.pl/index.php?/topic/33069-rnyna-clothing-reviews-the-latest-collection-and-must-have-items/    
            https://forum.albiononline.com/index.php/Thread/181254-Rnyna-Clothing-Reviews-A-Closer-Look-at-Their-Fabric-Fit-and-Design/#post1298988    
            http://www.askmap.net/location/6535284/usa/wimarwimar    
            https://www.pinterest.com/pin/1010706341355228195/    
            https://www.pinterest.ca/micer51287/         ]
            Agata Kowal made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 762801 ]
            Murakami made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 745849 ]
            Maciej Swinarski (Inactive) made changes -
            Labels Original: deltaIndexReactivation fixedByDelta lts813 pse-request sefcon_cat2_performance New: deltaIssueLimits fixedByDelta lts813 pse-request sefcon_cat2_performance
            Maciej Swinarski (Inactive) made changes -
            Labels Original: fixedByDelta lts813 pse-request sefcon_cat2_performance New: deltaIndexReactivation fixedByDelta lts813 pse-request sefcon_cat2_performance
            Rinish made changes -
            Remote Link New: This issue links to "Page (Confluence)" [ 689484 ]

              mswinarski Maciej Swinarski (Inactive)
              mswinarski Maciej Swinarski (Inactive)
              Affected customers:
              49 This affects my team
              Watchers:
              101 Start watching this issue

                Created:
                Updated:
                Resolved: