Uploaded image for project: 'Jira Service Management Data Center'
  1. Jira Service Management Data Center
  2. JSDSERVER-10886

SLA configuration changes create indexing pressure on the instance

      Issue Summary

      When SLA configuration changes all issues in the project may be re-indexed.
      The re-index is done in a heavy way (real-time cross-cluster re-index) and can affect an unlimited number of issues.

      This can cause the whole instance to fail.

      Steps to Reproduce

      1. Create a project with >50K issues on a multinode DC instance 
      2. Change SLA configuration in the project

      Expected Results

      SLAs on the project should be updated and no performance impact

      Actual Results

      A huge number of indexing calls cause high CPU load and slow down the whole instance. Other indexing task may start to timeout (DBR - real time index replication,  NodeReindexService - delayed node index replication/index consistency check, ... any other re-indexing triggered by background, ie. non-http threads). This may cause index inconsistency between nodes which can eventually be only resolved with a full foreground reindex. 

      Workaround

      Currently there is no workaround for the problem.

          Form Name

            [JSDSERVER-10886] SLA configuration changes create indexing pressure on the instance

            Bartosz Ornatowski added a comment - - edited

            Hello,

            Thank you for your patience waiting for an update for this issue. JSM and Jira teams have investigated this issue and addressed the behaviours that contributed to this bug. 

            Root Cause

            Excessive amount of reindexing calls for a large amount of issues in a short period of time may cause the indexing queue overflow.

            JSM Footprint

            When an SLA configuration is updated or a new SLA is created, JSM will fetch all the issues affected by the change, calculate the new SLA value, save it to the DB, and then reindex all changed issues. More on how we recalculate SLAs can be found here. For JSM to have a significant footprint on indexes, the project will have to have 50k+ tickets, and a large portion of them needs to be affected by the configuration change which is in most scenarios not the case as JSM would only recalculate ongoing cycles, and in general scenario the majority of historical tickets would have the SLAs completed. Nonetheless, there are scenarios where such changes are made.

            A side note here: in our testing, we have found that sheer number of issues is not enough to overload the indexing queues. Issues with a significant amounts of worklog, history and/or comments need to be present in the affected project for this issue to manifest itself.

            Contributing factors

            1. Other plugins that call reindexing in their operation. One example would be Automation for Jira plugin that has a rule configured that is triggered by issue or SLA update event. Chained rules would additionally exacerbate the indexing strain.
            2. Custom scripts that call reindexing
            3. Slack integration that calls reindexing
            4. And similar 3rd party plugin behaviour
            5. Large amounts of worklog, history and/or comments in the affected tickets
            6. A busy instance where many creation or update events happen during SLA configuration change

            Mitigation options

            1. When updating an SLA configuration on a large project, assess the scope of the change, and avoid making big changes during peak usage hours for your instance.
            2. Consider your custom scripts. Do they need to call Jira's reindex API? Since Jira 8.0, index has become much more reliable, and this step is considered an unnecessary overhead in most cases.
            3. Consider the scope of reindexing in your scrips. The API allows you to choose to reindex only the parts that you need for your script to work properly. Since Jira 8.19.0, the default call to issue reindex has been changed to reindex issue only. Read more here.

            Product improvements

            1. Jira's indexing queue size has been increased 4 times in order to prevent queue overflow (since 4.13.8+/4.16.1+).
            2. JSM indexing calls post SLA configuration change have been halved (since 4.13.12+/4.16.1+)
            3. Jira has fixed the default reindexing API behaviour to reindex only the issue scope (since 4.19.0)

            Bartosz Ornatowski added a comment - - edited Hello, Thank you for your patience waiting for an update for this issue. JSM and Jira teams have investigated this issue and addressed the behaviours that contributed to this bug.  Root Cause Excessive amount of reindexing calls for a large amount of issues in a short period of time may cause the indexing queue overflow. JSM Footprint When an SLA configuration is updated or a new SLA is created, JSM will fetch all the issues affected by the change, calculate the new SLA value, save it to the DB, and then  reindex all changed issues. More on how we recalculate SLAs can be found here . For JSM to have a significant footprint on indexes, the project will have to have 50k+ tickets, and a large portion of them needs to be affected by the configuration change which is in most scenarios  not the case as JSM would only recalculate ongoing cycles, and in general scenario the majority of historical tickets would have the SLAs completed. Nonetheless, there are scenarios where such changes are made. A side note here: in our testing, we have found that sheer number of issues is not enough to overload the indexing queues. Issues with a significant amounts of worklog, history and/or comments need to be present in the affected project for this issue to manifest itself. Contributing factors Other plugins that call reindexing in their operation. One example would be Automation for Jira plugin that has a rule configured that is triggered by issue or SLA update event. Chained rules would additionally exacerbate the indexing strain. Custom scripts that call reindexing Slack integration that calls reindexing And similar 3rd party plugin behaviour Large amounts of worklog, history and/or comments in the affected tickets A busy instance where many creation or update events happen during SLA configuration change Mitigation options When updating an SLA configuration on a large project, assess the scope of the change, and avoid making big changes during peak usage hours for your instance. Consider your custom scripts. Do they need to call Jira's reindex API? Since Jira 8.0, index has become much more reliable, and this step is considered an unnecessary overhead in most cases. Consider the scope of reindexing in your scrips. The API allows you to choose to reindex only the parts that you need for your script to work properly. Since Jira 8.19.0, the default call to issue reindex has been changed to reindex issue only. Read more here . Product improvements Jira's indexing queue size has been increased 4 times in order to prevent queue overflow (since 4.13.8+/4.16.1+). JSM indexing calls post SLA configuration change have been halved (since 4.13.12+/4.16.1+) Jira has fixed the default reindexing API behaviour to reindex only the issue scope (since 4.19.0)

              bornatowski Bartosz Ornatowski
              kkanojia Kunal Kanojia
              Affected customers:
              3 This affects my team
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: