Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-72045

IndexException: Wait attempt timed out - waited 30000 milliseconds caused by background indexing tasks

      Issue Summary

      It is possible that background indexing tasks (like Background indexing, SLA recalculation and others) may fill up the indexer queue and affect user create/update issue/comment operation by:

      • slowing down create/update issue/comment user request operations 
      • timing out user indexing requests (IndexException: Wait attempt timed out...) which will results in an error shown after 30sec to users and may result in changes not being visible index

      There are couple of factors which will result in increasing the possibility of having this problem when triggering background indexing:

      • fast Lucene document creation
      • poor IO performance - this will slow down conditional updates, i.e. inserting the Lucene document in the index

      With the above we will have a background indexing task(s) which may produce Lucene Documents faster than the indexer can consume them. This will fill up the indexer queue which is also used by user create/update tasks (HTTP requests).  

      Steps to Reproduce

      TODO

      Expected Results

      Background indexing tasks should not affect user indexing tasks.

      Actual Results

      Currently a background indexing task can fill up the indexer queue and user indexing requests will timeout after 30sec:

      /secure/CreateIssueDetails.jspa [c.a.j.issue.index.DefaultIndexManager] Wait attempt timed out - waited 30000 milliseconds
      com.atlassian.jira.issue.index.IndexException: Wait attempt timed out - waited 30000 milliseconds
      at com.atlassian.jira.issue.index.DefaultIndexManager.obtain(DefaultIndexManager.java:872)
      at com.atlassian.jira.issue.index.DefaultIndexManager.await(DefaultIndexManager.java:843)
      at com.atlassian.jira.issue.index.DefaultIndexManager.executeWithIndexLock(DefaultIndexManager.java:830)
      at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexIssues(DefaultIndexManager.java:618)
      at com.atlassian.jira.issue.index.DefaultIndexManager.reIndexIssueObjects(DefaultIndexManager.java:494)
      at com.atlassian.jira.issue.index.DefaultIndexManager.reIndex(DefaultIndexManager.java:540)
      at com.atlassian.jira.issue.index.DefaultIndexManager.reIndex(DefaultIndexManager.java:523) 
      

      Workaround

      In Jira DC:

      • Background indexing should not be used. Always favour foreground indexing on one of the nodes which will not accept any user requests during re-indexing. This will guarantee faster re-indexing and not affecting concurrent user requests.
      • SLA recalculation: the number of recalculation threads should be limited to 1 

      In Jira Server:

      • IO performance should be investigated: conditional index updates trigger index flush which performance can be negatively affected by slow disk: KB: Troubleshooting performance with JIRA Stats
        • Specifically check: Disk performance, updateDocumentsWithVersionMillis, processDBRMessageUpdateWithRelatedIndexInMillis
      • Background indexing should not be triggered on a regular basis. Since Jira 8.10 the index should be consistent with the DB and there is no need to background indexing to guarantee this.
      • If re-indexing is needed always favour foreground indexing (in Jira Server this  means Jira being not available for users). Note that Jira DC provides HA in case of running foreground indexing.

      Note that a slightly slower background indexing time is expected (since Jira 8.10) and this it not the problem described by this issue. This issue is about the problem of background indexing tasks taking over the indexer and affecting all user requests. Explained here: KB: Background reindex is slow after upgrading to Jira 8.10 and later

      Note that the indexer may be flooded by apps triggering re-indexing in non request (user) threads. This may be very similar to running a full foreground re-index (multi-thread) with parallel user requests. In such case the plugin app should not trigger re-indexing on more then one thread. The workaround should include checking if there is a way of controlling the number of re-indexing threads.

       

            [JRASERVER-72045] IndexException: Wait attempt timed out - waited 30000 milliseconds caused by background indexing tasks

            The bottleneck was the database in our cases of Jira 8.20.x and 9.4 on 2 vCPU-machine with 32 G RAM and 25 indexing threads. For some reason, the indexing ran a query from Atlassian Greenhopper, that was very DB CPU-intensive and slow. For 12 hours the full indexing had completed 9% with that query:

            SELECT "FIELD_ID","ID","ISSUE_ID","LOCK_HASH","LOCK_TIME","RANK","TYPE" FROM jira_db."AO_60DB71_LEXORANK" WHERE "FIELD_ID" = $1 AND "ISSUE_ID" = $2 AND "TYPE" = $3
            

            with plan:

            jira_rds=> explain SELECT DATA."FIELD_ID",DATA."CHECKED",DATA."DATA_ROW_ID",DATA."DECIMAL_NUMBER_STR",DATA."ISSUE_ID",DATA."ID",DATA."DECIMAL_NUMBER",DATA."DATE",DATA."SIMPLE_DATA",DATA."SELECTED_ID",DATA."USER_KEY",DATA."SIMPLE_NUMBER" FROM jira_db."AO_FC7135_DATA" DATA JOIN jira_db."AO_FC7135_DATA_ROW" DATAROW ON DATA."DATA_ROW_ID" = DATAROW."ID" WHERE DATAROW."FIELD_CONFIG_ID" = 13 AND DATAROW."ISSUE_ID" = 13 ORDER BY DATAROW."ORDER_POSITION" ASC;
            QUERY PLAN
            -------------------------------------------------------------------------------------------------------------------------
            Sort (cost=677.91..677.92 rows=7 width=179)
            Sort Key: datarow."ORDER_POSITION"
            -> Nested Loop (cost=0.42..677.81 rows=7 width=179)
            -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..667.93 rows=1 width=16)
            Filter: (("FIELD_CONFIG_ID" = 13) AND ("ISSUE_ID" = 13))
            -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.81 rows=7 width=171)
            Index Cond: ("DATA_ROW_ID" = datarow."ID")
            

            created the index:

            *create index "AO_60DB71_LEXORANK_FLD_ISS_TYP" ON "AO_60DB71_LEXORANK" ("FIELD_ID", "ISSUE_ID", "TYPE")

            the load dropped and the plan now is:

            QUERY PLAN
            ---------------------------------------------------------------------------------------------------------------
            Index Scan using "AO_60DB71_LEXORANK_FLD_ISS_TYP" on "AO_60DB71_LEXORANK" (cost=0.43..8.45 rows=1 width=562)
            Index Cond: (("FIELD_ID" = 1) AND ("ISSUE_ID" = 2) AND ("TYPE" = 3))
            

            And also the query:

            jira_rds=> explain SELECT DATA."DATE_TIME",DATA."ID",DATA."FIELD_ID",DATA."CHECKED",DATA."DATA_ROW_ID",
                           DATA."DECIMAL_NUMBER_STR",DATA."ISSUE_ID",DATA."DECIMAL_NUMBER",DATA."SIMPLE_DATA",
                           DATA."DATE",DATA."SELECTED_ID",DATA."USER_KEY",DATA."SIMPLE_NUMBER"
                      FROM jira_db."AO_FC7135_DATA" DATA
                           JOIN jira_db."AO_FC7135_DATA_ROW" DATAROW ON DATA."DATA_ROW_ID" = DATAROW."ID"
                     WHERE DATAROW."FIELD_CONFIG_ID" = 1
                       AND DATAROW."ISSUE_ID" = 2
                     ORDER BY DATAROW."ORDER_POSITION" ASC;
            QUERY PLAN
            -------------------------------------------------------------------------------------------------------------------------
            Sort (cost=1005.27..1005.28 rows=6 width=203)
            Sort Key: datarow."ORDER_POSITION"
            -> Nested Loop (cost=0.42..1005.19 rows=6 width=203)
            -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..996.04 rows=1 width=16)
            Filter: (("FIELD_CONFIG_ID" = 1) AND ("ISSUE_ID" = 2))
            -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195)
            Index Cond: ("DATA_ROW_ID" = datarow."ID")
            (7 rows)
            

            I added the index:

            *create index "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" ("FIELD_CONFIG_ID", "ISSUE_ID", "ORDER_POSITION", "ID");

            and the plan now is:

            QUERY PLAN
            ----------------------------------------------------------------------------------------------------------------------------------------
            Nested Loop (cost=0.84..17.59 rows=6 width=203)
            -> Index Only Scan using "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" datarow (cost=0.41..8.43 rows=1 width=16)
            Index Cond: (("FIELD_CONFIG_ID" = 1) AND ("ISSUE_ID" = 2))
            -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195)
            Index Cond: ("DATA_ROW_ID" = datarow."ID")
            

            With these two database indexes, the full indexing of JIRA completed in 2.5 hours.

             

            The slow full index in JIRA might also be caused by inefficient DB queries. Consider adding (these) DB indexes and only then upgrade machines and indexing threads.

            Rusi Popov added a comment - The bottleneck was the database in our cases of Jira 8.20.x and 9.4 on 2 vCPU-machine with 32 G RAM and 25 indexing threads. For some reason, the indexing ran a query from Atlassian Greenhopper, that was very DB CPU-intensive and slow. For 12 hours the full indexing had completed 9% with that query: SELECT "FIELD_ID" , "ID" , "ISSUE_ID" , "LOCK_HASH" , "LOCK_TIME" , "RANK" , "TYPE" FROM jira_db. "AO_60DB71_LEXORANK" WHERE "FIELD_ID" = $1 AND "ISSUE_ID" = $2 AND "TYPE" = $3 with plan: jira_rds=> explain SELECT DATA. "FIELD_ID" ,DATA. "CHECKED" ,DATA. "DATA_ROW_ID" ,DATA. "DECIMAL_NUMBER_STR" ,DATA. "ISSUE_ID" ,DATA. "ID" ,DATA. "DECIMAL_NUMBER" ,DATA. "DATE" ,DATA. "SIMPLE_DATA" ,DATA. "SELECTED_ID" ,DATA. "USER_KEY" ,DATA. "SIMPLE_NUMBER" FROM jira_db. "AO_FC7135_DATA" DATA JOIN jira_db. "AO_FC7135_DATA_ROW" DATAROW ON DATA. "DATA_ROW_ID" = DATAROW. "ID" WHERE DATAROW. "FIELD_CONFIG_ID" = 13 AND DATAROW. "ISSUE_ID" = 13 ORDER BY DATAROW. "ORDER_POSITION" ASC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Sort (cost=677.91..677.92 rows=7 width=179) Sort Key: datarow. "ORDER_POSITION" -> Nested Loop (cost=0.42..677.81 rows=7 width=179) -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..667.93 rows=1 width=16) Filter: (( "FIELD_CONFIG_ID" = 13) AND ( "ISSUE_ID" = 13)) -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.81 rows=7 width=171) Index Cond: ( "DATA_ROW_ID" = datarow. "ID" ) created the index: *create index "AO_60DB71_LEXORANK_FLD_ISS_TYP" ON "AO_60DB71_LEXORANK" ("FIELD_ID", "ISSUE_ID", "TYPE") the load dropped and the plan now is: QUERY PLAN --------------------------------------------------------------------------------------------------------------- Index Scan using "AO_60DB71_LEXORANK_FLD_ISS_TYP" on "AO_60DB71_LEXORANK" (cost=0.43..8.45 rows=1 width=562) Index Cond: (( "FIELD_ID" = 1) AND ( "ISSUE_ID" = 2) AND ( "TYPE" = 3)) And also the query: jira_rds=> explain SELECT DATA. "DATE_TIME" ,DATA. "ID" ,DATA. "FIELD_ID" ,DATA. "CHECKED" ,DATA. "DATA_ROW_ID" , DATA. "DECIMAL_NUMBER_STR" ,DATA. "ISSUE_ID" ,DATA. "DECIMAL_NUMBER" ,DATA. "SIMPLE_DATA" , DATA. "DATE" ,DATA. "SELECTED_ID" ,DATA. "USER_KEY" ,DATA. "SIMPLE_NUMBER" FROM jira_db. "AO_FC7135_DATA" DATA JOIN jira_db. "AO_FC7135_DATA_ROW" DATAROW ON DATA. "DATA_ROW_ID" = DATAROW. "ID" WHERE DATAROW. "FIELD_CONFIG_ID" = 1 AND DATAROW. "ISSUE_ID" = 2 ORDER BY DATAROW. "ORDER_POSITION" ASC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Sort (cost=1005.27..1005.28 rows=6 width=203) Sort Key: datarow. "ORDER_POSITION" -> Nested Loop (cost=0.42..1005.19 rows=6 width=203) -> Seq Scan on "AO_FC7135_DATA_ROW" datarow (cost=0.00..996.04 rows=1 width=16) Filter: (( "FIELD_CONFIG_ID" = 1) AND ( "ISSUE_ID" = 2)) -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195) Index Cond: ( "DATA_ROW_ID" = datarow. "ID" ) (7 rows) I added the index: *create index "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" ("FIELD_CONFIG_ID", "ISSUE_ID", "ORDER_POSITION", "ID"); and the plan now is: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.84..17.59 rows=6 width=203) -> Index Only Scan using "AO_FC7135_DATA_ROW_issue_field_pos_id" on "AO_FC7135_DATA_ROW" datarow (cost=0.41..8.43 rows=1 width=16) Index Cond: (( "FIELD_CONFIG_ID" = 1) AND ( "ISSUE_ID" = 2)) -> Index Scan using index_ao_fc7135_dat1120928830 on "AO_FC7135_DATA" data (cost=0.42..9.08 rows=7 width=195) Index Cond: ( "DATA_ROW_ID" = datarow. "ID" ) With these two database indexes, the full indexing of JIRA completed in 2.5 hours.   The slow full index in JIRA might also be caused by inefficient DB queries. Consider adding (these) DB indexes and only then upgrade machines and indexing threads.

            e68278496e06 you may be interested in this fix: https://jira.atlassian.com/browse/JRASERVER-73874

            Should be back-ported to 8.20.x soon.

            Maciej Swinarski (Inactive) added a comment - e68278496e06 you may be interested in this fix: https://jira.atlassian.com/browse/JRASERVER-73874 Should be back-ported to 8.20.x soon.

            Still getting this error on 8.13.14

            Sylvain Leduc added a comment - Still getting this error on 8.13.14

            Sri Kanth added a comment -

            Great point @Flavio Beck

            Sri Kanth added a comment - Great point @Flavio Beck

            Still getting this error on 8.13.6.

            The error is related to ScriptRunner scripted custom fields using Search Provider class.

            due upgrade from v7 to v8 , the searchProvider class was replaced by searchService.

            even the scripts was updated to use new method, there are several errors while re-indexing those fields.

             

             

            2021-11-16 11:47:20,193-0300 IssueIndexer:thread-5 ERROR <-USER-> 706x2926x1 ndh56w 105.210.50.127 /secure/admin/IndexReIndex!reindex.jspa [c.o.scriptrunner.customfield.GroovyCustomField] Script field failed on issue: <-ISSUE->, field: <-FIELD->
            com.atlassian.jira.issue.index.SearchUnavailableException
                    at com.atlassian.jira.issue.index.DefaultIndexManager.getEntitySearcher(DefaultIndexManager.java:927)
                    at com.atlassian.jira.issue.index.DefaultIndexManager.getIssueSearcher(DefaultIndexManager.java:906)
                    at com.atlassian.jira.config.component.SwitchingInvocationHandler.invoke(SwitchingInvocationHandler.java:38)
                    at com.sun.proxy.$Proxy17.getIssueSearcher(Unknown Source)
                    at com.atlassian.jira.issue.search.SearchProviderFactoryImpl.getSearcher(SearchProviderFactoryImpl.java:17)
                    at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.getIssueSearcher(LuceneSearchProvider.java:132)
                    at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.searchDocuments(LuceneSearchProvider.java:455)
                    at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.runSearch(LuceneSearchProvider.java:450)
                    at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.getHits(LuceneSearchProvider.java:230)
                    at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.search(LuceneSearchProvider.java:374)
                    at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.search(LuceneSearchProvider.java:137)
                    at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.search(LuceneSearchProvider.java:142)
                    at com.atlassian.jira.bc.issue.search.DefaultSearchService.search(DefaultSearchService.java:118)
                    at com.atlassian.jira.bc.issue.search.SearchService$search$0.call(Unknown Source)
                    at Script25.run(Script25.groovy:71)
            2021-11-16 11:47:20,237-0300 IssueIndexer:thread-21 ERROR <-USER-> 706x2926x1 ndh56w 105.210.50.127 /secure/admin/IndexReIndex!reindex.jspa [c.a.j.issue.index.DefaultIndexManager] Wait attempt timed out - waited 30000 milliseconds
            

             

            Flavio Beck added a comment - Still getting this error on 8.13.6. The error is related to ScriptRunner scripted custom fields using Search Provider class. due upgrade from v7 to v8 , the searchProvider class was replaced by searchService. even the scripts was updated to use new method, there are several errors while re-indexing those fields.     2021-11-16 11:47:20,193-0300 IssueIndexer:thread-5 ERROR <-USER-> 706x2926x1 ndh56w 105.210.50.127 /secure/admin/IndexReIndex!reindex.jspa [c.o.scriptrunner.customfield.GroovyCustomField] Script field failed on issue: <-ISSUE->, field: <-FIELD-> com.atlassian.jira.issue.index.SearchUnavailableException         at com.atlassian.jira.issue.index.DefaultIndexManager.getEntitySearcher(DefaultIndexManager.java:927)         at com.atlassian.jira.issue.index.DefaultIndexManager.getIssueSearcher(DefaultIndexManager.java:906)         at com.atlassian.jira.config.component.SwitchingInvocationHandler.invoke(SwitchingInvocationHandler.java:38)         at com.sun.proxy.$Proxy17.getIssueSearcher(Unknown Source)         at com.atlassian.jira.issue.search.SearchProviderFactoryImpl.getSearcher(SearchProviderFactoryImpl.java:17)         at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.getIssueSearcher(LuceneSearchProvider.java:132)         at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.searchDocuments(LuceneSearchProvider.java:455)         at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.runSearch(LuceneSearchProvider.java:450)         at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.getHits(LuceneSearchProvider.java:230)         at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.search(LuceneSearchProvider.java:374)         at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.search(LuceneSearchProvider.java:137)         at com.atlassian.jira.issue.search.providers.LuceneSearchProvider.search(LuceneSearchProvider.java:142)         at com.atlassian.jira.bc.issue.search.DefaultSearchService.search(DefaultSearchService.java:118)         at com.atlassian.jira.bc.issue.search.SearchService$search$0.call(Unknown Source)         at Script25.run(Script25.groovy:71) 2021-11-16 11:47:20,237-0300 IssueIndexer:thread-21 ERROR <-USER-> 706x2926x1 ndh56w 105.210.50.127 /secure/admin/IndexReIndex!reindex.jspa [c.a.j.issue.index.DefaultIndexManager] Wait attempt timed out - waited 30000 milliseconds  

            toddler added a comment -

            Is Jira still ready for production? Not sure!?

            toddler added a comment - Is Jira still ready for production? Not sure!?

            jasonsmith1590948304 - please create a support case so we can investigate your problem. 

            Maciej Swinarski (Inactive) added a comment - jasonsmith1590948304  - please create a support case so we can investigate your problem. 

            We're on 8.13.9 and are still getting this during a background reindex.

            Jason D Smith added a comment - We're on 8.13.9 and are still getting this during a background reindex.

            Gonchik Tsymzhitov added a comment - - edited

            mswinarski Thank you!
            Now it works much better! Upgraded a few instances to 8.13.6

            Gonchik Tsymzhitov added a comment - - edited mswinarski Thank you! Now it works much better! Upgraded a few instances to 8.13.6

            Tomas Karas added a comment - - edited

            ^ What is up with that display name? :DDD

            Anyhow there's plenty of teams affected and as Atlassian has it's process on how fast they release that particular version obviously the teams responsible for their instances of the product have their processes to upgrade, obviously on both sides it takes time.

            We use LTS releases only for example. We do not upgrade each time there is release. It all depends team to team and/or impact/benefits.

            Tomas Karas added a comment - - edited ^ What is up with that display name? :DDD Anyhow there's plenty of teams affected and as Atlassian has it's process on how fast they release that particular version obviously the teams responsible for their instances of the product have their processes to upgrade, obviously on both sides it takes time. We use LTS releases only for example. We do not upgrade each time there is release. It all depends team to team and/or impact/benefits.

              mswinarski Maciej Swinarski (Inactive)
              mswinarski Maciej Swinarski (Inactive)
              Affected customers:
              49 This affects my team
              Watchers:
              101 Start watching this issue

                Created:
                Updated:
                Resolved: