Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: High
Fix Version/s: 7.2.8, 7.3.4
Affects Version/s: 6.4.12, 6.4.14, 7.2.7
Component/s: Indexing, Jira Importers Plugin
Labels:

Introduced in Version:
6.04
Support reference count:
23
Symptom Severity:
Severity 2 - Major
UIS:
2,243
Bug Fix Policy:
View Atlassian Server bug fix policy

Summary

In JIRA datacenter, LexoRank Rebalance causes read/write amplifications on Lucene which may cause performance degradation if cluster doesn't have IO/CPU capacity.

Environment

JIRA Datacenter
Large number of issues 1M+
Large number of custom fields: 1k+

Steps to Reproduce

Setup JIRA Datacenter
Trigger or wait for LexoRank Rebalance

Expected Results

JIRA Datacenter performance will be not affected and no replication lag

Actual Results

JIRA Datacenter performance could be affected and there will be replication lag. That will cause data discrepancy between nodes.
You will have following health-check error:

["Index replication for cluster node 'node3' is behind by 2,991 seconds.","Index replication for cluster node 'node1' is behind by 1,501 seconds.","Index replication for cluster node 'node2_0004' is behind by 2,123 seconds."]

Notes

Problem is caused by set of conditions/problems:

LexoRank Rebalancing requires rebalancing of all records in Rank field
In case of JDC that required reindexing of all issues with all related customfields/comments
That causes read and write amplification, as all nodes needs to update their Lucene index for all issues
JDC uses same replicatedindexoperation mechanism for all updates.
- That means that critical replication updates from at other nodes initiated by user action compete with non-urgent LexoRank updates.

Workarounds

Reduce the number of nodes in the cluster. Testing reveals diminishing returns in performance in clusters larger than 4 nodes.
Avoid closing a sprint with > 200 issues in an unresolved status. This requires a new Rank value for all these issues and can trigger a Lexorank rebalance.
If you are planning to import hundreds of issues, delay this until you have tested and resolved performance bottlenecks. A Rank must be generated for every issue and this can trigger a Lexorank rebalance.
Leave only one node in LB to prevent serving stale data from other nodes. This negates the high availability value of Data Center so is considered a last resort. This also requires that each node is capable of handling the full concurrent user traffic for your organization, as is the best practice for an HA cluster.

Full details on workarounds and solutions are available at JIRAKB/JIRA Software Data Center Lexorank Indexing Lag.

Note on Fix

Problem mitigation:

We have worked on reducing the need of LexoRank balancing being triggered (~~JSW-15710~~).
Also a number of improvements to LexoRank balancing has been implemented that reduce impact of it running on JIRA cluster.
- We have addressed replication lag in ~~JSW-15703~~.

That said, a running LexoRank balancing will still cause some read and write amplification. More details about resolution are available at JIRAKB/JIRA Software Data Center Lexorank Indexing Lag.

is caused by

JSWSERVER-13163 LexoRank database query performance is slow due to the way the field is constructed

Closed

JSWSERVER-15703 LexoRank Rebalance can cause index replication delays in JIRA Datacenter

Closed

JSWSERVER-15710 Ranking large number of issues triggers lexorank rebalancing

Closed

is related to

JSWSERVER-15707 As an admin, I would like the ability to enable and disable LexoRank rebalancing so that I can control and defer expensive operations until after peak business hours

Closed

JSWSERVER-15711 Changing order of neighbour issues does not narrow lexorank space

Closed

JSWSERVER-15712 Don't lock ranking on boards during rebalancing

Closed

relates to

JRASERVER-70423 DC index replication delays are affecting end-users due to single thread processing and re-computation of CF values

Closed

RUM-1573 Loading...

is depended on by: JGTM-1201 Loading...

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

was split into: RUM-1547 Loading...

(1 is related to, 2 relates to, 1 is depended on by, 25 mentioned in, 1 was split into)

Assignee:: Unassigned

Reporter:: Andriy Yakovlev [Atlassian]

Votes:: 25 Vote for this issue

Watchers:: 56 Start watching this issue

Created:: 23/Jan/2017 9:21 AM

Updated:: 31/Jan/2025 2:33 PM

Resolved:: 17/Mar/2017 3:32 PM

Estimated:

Not Specified

Remaining:

Logged:

Details

Description

Summary

Environment

Steps to Reproduce

Expected Results

Actual Results

Notes

Workarounds

Note on Fix

Attachments

Issue Links

Forms

Activity

People

Dates

Time Tracking