Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-66839

UPM actions have significant impact on JIRA Datacenter performance

      Summary

      During plugin (add-on) installs/updates/delete/disable action UPM will flush the large number of caches (see JRASERVER-64908).
      For Jira datacenter this cache flush will be propagated to other nodes, thus affecting them. Also cache flush is triggered during node startup as UPM needs to register and load modules from plugins.

      Environment

      • Jira datacenter

      Steps to Reproduce

      1. Upgrade plugin (or start node)
      2. Check performance

      Expected Results

      • Starting new node will not have significant impact for cluster
      • Plugin action will not cause cache flush to be propagated to other nodes, only plugin events.
        • Plugin action will have performance impact due to JRASERVER-64908 and nature of action, but it will be isolated

      Actual Results

      • Starting new node has significant impact for cluster
      • Plugin action causes cache flush to be propagated to other nodes, making cluster wide replication storm and putting extra load to DB.

      Notes

      None

      Workaround

      • Please plan any plugin actions updates/delete/disable (UPM actions) during low peak hours or maintenance windows.
      • Please plan node start-up during low peak hours or maintenance windows (start one node at a time).

            [JRASERVER-66839] UPM actions have significant impact on JIRA Datacenter performance

            Problem

            Jira has an issue (covered by JRASERVER-64908) which is the following: when loading/reloading a plugin/module Jira flushes a number of caches.

            In Jira DC this problem was even intensified as any flush on a remote cache happening on every node is being replicated to all other nodes. This issue is only covering this problem which is specific to Jira DC.

            Fix

            Any operation which is performed on a remote cache when doing a plugin/module load/reload is not replicated to other nodes. Note that the plugin/module reload event is still send to all nodes so the cache clear will still happen on all nodes (JRASERVER-64908).

            Logging

            When setting DEBUG logging on the com.atlassian.jira.plugin.JiraCacheResetter the following logs will appear when loading/reloading a plugin/module:

            [thread1] Start resetting caches triggered by: {}, jiraIsShuttingDown: {}
            ...
            [thread1] Done resetting caches triggered by: {}, timeMillis: {}, jiraIsShuttingDown: {} 

            All remote cache operations which are happening between those 2 log lines should not and are not replicated (with this fix these are local only operations).

            Maciej Swinarski (Inactive) added a comment - Problem Jira has an issue (covered by  JRASERVER-64908 ) which is the following: when loading/reloading a plugin/module Jira flushes a number of caches. In Jira DC this problem was even intensified as any flush on a remote cache happening on every node is being replicated to all other nodes. This issue is only covering this problem which is specific to Jira DC. Fix Any operation which is performed on a remote cache when doing a plugin/module load/reload is not replicated to other nodes. Note that the plugin/module reload event is still send to all nodes so the cache clear will still happen on all nodes ( JRASERVER-64908 ). Logging When setting DEBUG logging on the com.atlassian.jira.plugin.JiraCacheResetter the following logs will appear when loading/reloading a plugin/module: [thread1] Start resetting caches triggered by: {}, jiraIsShuttingDown: {} ... [thread1] Done resetting caches triggered by: {}, timeMillis: {}, jiraIsShuttingDown: {} All remote cache operations which are happening between those 2 log lines should not and are not replicated (with this fix these are local only operations).

            Hey James, jhunt

            Thanks for your comment.
            You mentioned: with only one node active. That effectively makes your Jira setup during that period of time non-DC, so better description for the problem should be another ticket JRASERVER-64908, which is focusing on UPM action itself.

            Still, there is something got my attention: triggered the JiraCacheResetter storm in the logs.
            You are taking about those events: com.atlassian.jira.plugin.JiraCacheResetter$Delegate.onPluginModule<Action> ?
            Can you please share small snippet from the logs so we can verify that? (if you prefer not to do this in public ticket, please let me know we can raise separate support request)

            Thanks.
            Cheers

            Andriy Yakovlev [Atlassian] added a comment - - edited Hey James, jhunt Thanks for your comment. You mentioned: with only one node active . That effectively makes your Jira setup during that period of time non-DC, so better description for the problem should be another ticket JRASERVER-64908 , which is focusing on UPM action itself. Still, there is something got my attention: triggered the JiraCacheResetter storm in the logs. You are taking about those events: com.atlassian.jira.plugin.JiraCacheResetter$Delegate.onPluginModule<Action> ? Can you please share small snippet from the logs so we can verify that? (if you prefer not to do this in public ticket, please let me know we can raise separate support request) Thanks. Cheers

            Encountered today after upgrading from Jira Software 7.2.12 Data Center to Jira Software 7.6.4 Data Center. Updated Jira Service Desk via OBR (with only one node active and only one user logged in) and triggered the JiraCacheResetter storm in the logs.

            James E. Hunt [ASRC Federal] added a comment - Encountered today after upgrading from Jira Software 7.2.12 Data Center to Jira Software 7.6.4 Data Center. Updated Jira Service Desk via OBR (with only one node active and only one user logged in) and triggered the JiraCacheResetter storm in the logs.

            Encountered this with 7.4.3. Luckily the updates were happening during evening time. Problem was resolved by rebooting a node.

            Artem Chatlikov added a comment - Encountered this with 7.4.3. Luckily the updates were happening during evening time. Problem was resolved by rebooting a node.

            Matt Doar added a comment -

            I remember in 7.2.8 I think where updating a plugin made our Jira DC insfance unavailable for 30 mins.

            Matt Doar added a comment - I remember in 7.2.8 I think where updating a plugin made our Jira DC insfance unavailable for 30 mins.

              Unassigned Unassigned
              ayakovlev@atlassian.com Andriy Yakovlev [Atlassian]
              Affected customers:
              8 This affects my team
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: