[JRASERVER-66839] UPM actions have significant impact on JIRA Datacenter performance

Type: Bug
Resolution: Fixed
Priority: High (View bug fix roadmap)
Fix Version/s: 7.7.0, 7.6.4
Affects Version/s: 6.4.14, 7.2.12, 7.6.3
Component/s: Data Center - Other
Labels:

Fixed in Long Term Support Release/s:

Download 7.6
Introduced in Version:
6.04
Support reference count:
4
Symptom Severity:
Severity 1 - Critical
UIS:
91
Bug Fix Policy:
View Atlassian Server bug fix policy

Summary

During plugin (add-on) installs/updates/delete/disable action UPM will flush the large number of caches (see ~~JRASERVER-64908~~).
For Jira datacenter this cache flush will be propagated to other nodes, thus affecting them. Also cache flush is triggered during node startup as UPM needs to register and load modules from plugins.

Environment

Jira datacenter

Steps to Reproduce

Upgrade plugin (or start node)
Check performance

Expected Results

Starting new node will not have significant impact for cluster
Plugin action will not cause cache flush to be propagated to other nodes, only plugin events.
- Plugin action will have performance impact due to ~~JRASERVER-64908~~ and nature of action, but it will be isolated

Actual Results

Starting new node has significant impact for cluster
Plugin action causes cache flush to be propagated to other nodes, making cluster wide replication storm and putting extra load to DB.

Notes

None

Workaround

Please plan any plugin actions updates/delete/disable (UPM actions) during low peak hours or maintenance windows.
Please plan node start-up during low peak hours or maintenance windows (start one node at a time).

is caused by

JRASERVER-64908 UPM actions may flush internal caches, leading to performance problems

Closed

relates to

JRASERVER-67703 JIRA Datacenter start generates large network traffic result in higher response time

Gathering Impact

mentioned in: Page Failed to load; Page Failed to load; Page Failed to load; Page Loading...; Page Loading...

(2 mentioned in)

Maciej Swinarski (Inactive) added a comment - 07/May/2018 10:59 AM

Problem

Jira has an issue (covered by ~~JRASERVER-64908~~) which is the following: when loading/reloading a plugin/module Jira flushes a number of caches.

In Jira DC this problem was even intensified as any flush on a remote cache happening on every node is being replicated to all other nodes. This issue is only covering this problem which is specific to Jira DC.

Fix

Any operation which is performed on a remote cache when doing a plugin/module load/reload is not replicated to other nodes. Note that the plugin/module reload event is still send to all nodes so the cache clear will still happen on all nodes (~~JRASERVER-64908~~).

Logging

When setting DEBUG logging on the com.atlassian.jira.plugin.JiraCacheResetter the following logs will appear when loading/reloading a plugin/module:

[thread1] Start resetting caches triggered by: {}, jiraIsShuttingDown: {}
...
[thread1] Done resetting caches triggered by: {}, timeMillis: {}, jiraIsShuttingDown: {}

All remote cache operations which are happening between those 2 log lines should not and are not replicated (with this fix these are local only operations).

Maciej Swinarski (Inactive) added a comment - 07/May/2018 10:59 AM Problem Jira has an issue (covered by JRASERVER-64908 ) which is the following: when loading/reloading a plugin/module Jira flushes a number of caches. In Jira DC this problem was even intensified as any flush on a remote cache happening on every node is being replicated to all other nodes. This issue is only covering this problem which is specific to Jira DC. Fix Any operation which is performed on a remote cache when doing a plugin/module load/reload is not replicated to other nodes. Note that the plugin/module reload event is still send to all nodes so the cache clear will still happen on all nodes ( JRASERVER-64908 ). Logging When setting DEBUG logging on the com.atlassian.jira.plugin.JiraCacheResetter the following logs will appear when loading/reloading a plugin/module: [thread1] Start resetting caches triggered by: {}, jiraIsShuttingDown: {} ... [thread1] Done resetting caches triggered by: {}, timeMillis: {}, jiraIsShuttingDown: {} All remote cache operations which are happening between those 2 log lines should not and are not replicated (with this fix these are local only operations).

Andriy Yakovlev [Atlassian] added a comment - 06/Apr/2018 8:19 AM - edited

Hey James, jhunt

Thanks for your comment.
You mentioned: with only one node active. That effectively makes your Jira setup during that period of time non-DC, so better description for the problem should be another ticket ~~JRASERVER-64908~~, which is focusing on UPM action itself.

Still, there is something got my attention: triggered the JiraCacheResetter storm in the logs.
You are taking about those events: com.atlassian.jira.plugin.JiraCacheResetter$Delegate.onPluginModule<Action> ?
Can you please share small snippet from the logs so we can verify that? (if you prefer not to do this in public ticket, please let me know we can raise separate support request)

Thanks.
Cheers

Andriy Yakovlev [Atlassian] added a comment - 06/Apr/2018 8:19 AM - edited Hey James, jhunt Thanks for your comment. You mentioned: with only one node active . That effectively makes your Jira setup during that period of time non-DC, so better description for the problem should be another ticket JRASERVER-64908 , which is focusing on UPM action itself. Still, there is something got my attention: triggered the JiraCacheResetter storm in the logs. You are taking about those events: com.atlassian.jira.plugin.JiraCacheResetter$Delegate.onPluginModule<Action> ? Can you please share small snippet from the logs so we can verify that? (if you prefer not to do this in public ticket, please let me know we can raise separate support request) Thanks. Cheers

James E. Hunt [ASRC Federal] added a comment - 06/Apr/2018 6:11 AM

Encountered today after upgrading from Jira Software 7.2.12 Data Center to Jira Software 7.6.4 Data Center. Updated Jira Service Desk via OBR (with only one node active and only one user logged in) and triggered the JiraCacheResetter storm in the logs.

James E. Hunt [ASRC Federal] added a comment - 06/Apr/2018 6:11 AM Encountered today after upgrading from Jira Software 7.2.12 Data Center to Jira Software 7.6.4 Data Center. Updated Jira Service Desk via OBR (with only one node active and only one user logged in) and triggered the JiraCacheResetter storm in the logs.

Artem Chatlikov added a comment - 03/Apr/2018 3:17 PM

Encountered this with 7.4.3. Luckily the updates were happening during evening time. Problem was resolved by rebooting a node.

Artem Chatlikov added a comment - 03/Apr/2018 3:17 PM Encountered this with 7.4.3. Luckily the updates were happening during evening time. Problem was resolved by rebooting a node.

Matt Doar added a comment - 01/Mar/2018 11:05 PM

I remember in 7.2.8 I think where updating a plugin made our Jira DC insfance unavailable for 30 mins.

Matt Doar added a comment - 01/Mar/2018 11:05 PM I remember in 7.2.8 I think where updating a plugin made our Jira DC insfance unavailable for 30 mins.

Details

Description

Summary

Environment

Steps to Reproduce

Expected Results

Actual Results

Notes

Workaround

Attachments

Issue Links

Forms

Activity

Collapse comment: Maciej Swinarski (Inactive) added a comment - 07/May/2018 10:59 AM

Problem

Fix

Logging

Expand comment: Maciej Swinarski (Inactive) added a comment - 07/May/2018 10:59 AM

Collapse comment: Andriy Yakovlev [Atlassian] added a comment - 06/Apr/2018 8:19 AM, Edited by Andriy Yakovlev [Atlassian] - 06/Apr/2018 8:19 AM

Expand comment: Andriy Yakovlev [Atlassian] added a comment - 06/Apr/2018 8:19 AM, Edited by Andriy Yakovlev [Atlassian] - 06/Apr/2018 8:19 AM

Collapse comment: James E. Hunt [ASRC Federal] added a comment - 06/Apr/2018 6:11 AM

Expand comment: James E. Hunt [ASRC Federal] added a comment - 06/Apr/2018 6:11 AM

Collapse comment: Artem Chatlikov added a comment - 03/Apr/2018 3:17 PM

Expand comment: Artem Chatlikov added a comment - 03/Apr/2018 3:17 PM

Collapse comment: Matt Doar added a comment - 01/Mar/2018 11:05 PM

Expand comment: Matt Doar added a comment - 01/Mar/2018 11:05 PM

People

Dates