Performance issue with the "Language support" configuration makes the page to never load when the project has a large number of fields and/or languages

XMLWordPrintable

    • 1
    • Severity 3 - Minor

      Issue Summary

      In case the JSM project has a lot of (thousands) logical ID entries defined ("AO_C7F17E_LINGO"."LOGICAL_ID"), attempt to open "Language Support" configuration page for this project will trigger thousands requests to DB to load translation settings done by rest/servicedesk/lingo/1/<project_key>/languages/settings endpoint.

      Even for a properly tuned DB with enough resources, the execution of all requests triggered by rest/servicedesk/lingo/1/<project_key>/languages/settings might take a lot of time to complete. This will lead to timeout issues while loading "Language Support" making it inaccessible.

      This service is supported by a cache (valid only for 30 minutes). However, due to the size of the loaded data, this request may always time out at the frontend and would never load data to the cache.

      This is reproducible on Data Center: (yes)

      Steps to Reproduce

      The exact steps and number of logical ID entries ("AO_C7F17E_LINGO"."LOGICAL_ID") required to trigger this issue are unclear. During research on the incident that spawned this defect, the number of these records was around 6K for affected project ID

      Expected Results

      The "Language support" configuration page would be able to load despite the number of records and within an acceptable time

      Actual Results

      The "Language support" configuration page never loads for the affected project.

      Reviewing thread dumps will indicate thread is running for a really long time (10+ minutes) without consuming notable CPU resources but constantly fetching data from DB:

       "http-nio-8088-exec-15 url: /aws-test2-jira/rest/servicedesk/lingo/1/<project_key>/languages/settings; user: IUAD1FXI" #901 daemon prio=5 os_prio=0 cpu=19751.17ms elapsed=163920.68s tid=0x00007f381d808000 nid=0xe9d runnable  [0x00007f35edbf1000]
         java.lang.Thread.State: RUNNABLE
      	at java.net.SocketInputStream.socketRead0(java.base@11.0.13/Native Method)
      ...
      	at sun.security.ssl.SSLSocketImpl$AppInputStream.read(java.base@11.0.13/Unknown Source)
      	at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
      ...
      	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355)
      	- locked <0x00000007140df448> (a org.postgresql.core.v3.QueryExecutorImpl)
      	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
      	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
      ...
      	at com.querydsl.sql.AbstractSQLQuery.fetch(AbstractSQLQuery.java:446)
      	at com.atlassian.servicedesk.plugins.lingo.internal.dao.LingoQStore.internalRetrieveLanguages(LingoQStore.java:219)
      

      The internalRetrieveLanguages() method is querying AO_C7F17E_LINGO* tables that could be observed while reviewing SQL logging

      Sample output. For a short period of time the thread queried “AO_C7F17E_LINGO” 343 times (logging was enabled for 2 minutes, so it is ~2.85 requests per sec)
      grep "http-nio-8088-exec-1 URL: <jira_base_url>/rest/servicedesk/lingo/1/<project_key>/languages/settings" atlassian-jira.log | awk -F'(from|where)' '/select/{print $2}' | sort | uniq -c
      
          343  "public"."AO_C7F17E_LINGO" "AO_C7F17E_LINGO" inner join "public"."AO_C7F17E_LINGO_REVISION" "AO_C7F17E_LINGO_REVISION" on "AO_C7F17E_LINGO"."ID" = "AO_C7F17E_LINGO_REVISION"."LINGO_ID" left join "public"."AO_C7F17E_LINGO_TRANSLATION" "AO_C7F17E_LINGO_TRANSLATION" on "AO_C7F17E_LINGO_REVISION"."ID" = "AO_C7F17E_LINGO_TRANSLATION"."LINGO_REVISION_ID" 
            1  "public"."AO_C7F17E_PROJECT_LANG_CONFIG" "AO_C7F17E_PROJECT_LANG_CONFIG" inner join "public"."AO_C7F17E_PROJECT_LANG_REV" "AO_C7F17E_PROJECT_LANG_REV" on "AO_C7F17E_PROJECT_LANG_CONFIG"."ID" = "AO_C7F17E_PROJECT_LANG_REV"."PROJECT_LANG_CONFIG_ID" 
            1  "public"."AO_C7F17E_PROJECT_LANG_CONFIG" "AO_C7F17E_PROJECT_LANG_CONFIG" inner join "public"."AO_C7F17E_PROJECT_LANG_REV" "AO_C7F17E_PROJECT_LANG_REV" on "AO_C7F17E_PROJECT_LANG_CONFIG"."ID" = "AO_C7F17E_PROJECT_LANG_REV"."PROJECT_LANG_CONFIG_ID" inner join "public"."AO_C7F17E_PROJECT_LANGUAGE" "AO_C7F17E_PROJECT_LANGUAGE" on "AO_C7F17E_PROJECT_LANG_REV"."ID" = "AO_C7F17E_PROJECT_LANGUAGE"."PROJECT_LANG_REV_ID" 
            2  public.propertyentry O_S_PROPERTY_ENTRY join public.propertytext O_S_PROPERTY_TEXT on O_S_PROPERTY_ENTRY.id = O_S_PROPERTY_TEXT.id 
      

      Workaround

      As a potential workaround, you could invoke rest/servicedesk/lingo/1/<project_key>/languages/settings using an external REST client in order to force it to load data in the cache:

      curl -s -w 'Establish Connection: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n' --connect-timeout 1800 -o out.txt -u jira_admin:jira_admin {jira_url}/rest/servicedesk/lingo/1/{project_key}/languages/settings
      
      // Description of used arguments
      -s to disable download progress bar
      -w formatted output for the details of the request
      --connect-timeout 1800 connection timeout set to 30 mins. As we can see from the previous research, sometimes requests take a lot of time to complete
      -o out.txt redirects output to out.txt
      -u username:password plain text authorization. In case basic authorization is disabled at your host, you could generate PAT https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html and  add that token at 'Authorization' HTTP header: -H 'Authorization: Bearer {token}'
      {project_key} should be replaced with affected project key
      

      Please note that in some cases it would require running this request a couple of times, one by one, in case the first attempt will take more than 30 minutes to complete. Even if the first request is timed out, the code should still partially load data into the cache and some further subsequent requests should be completed within readable time. Once the request is completed successfully and at least one time, the data will be placed in the cache and slowing down should no longer be observed until cache data is valid.

      Next, you could write a short script and schedule it (for example, using cron) to warm up the cache periodically. For example, before the start of the working day and after around lunchtime, so cache will be prepopulated with required data before end-users will start to actively use it.

              Assignee:
              Mingyi Yang
              Reporter:
              Alexander Artemenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: