-
Bug
-
Resolution: Fixed
-
High
-
6.4.12, 6.4.14, 7.2.7
-
6.04
-
23
-
Severity 2 - Major
-
2,243
-
Summary
In JIRA datacenter, LexoRank Rebalance causes read/write amplifications on Lucene which may cause performance degradation if cluster doesn't have IO/CPU capacity.
Environment
- JIRA Datacenter
- Large number of issues 1M+
- Large number of custom fields: 1k+
Steps to Reproduce
- Setup JIRA Datacenter
- Trigger or wait for LexoRank Rebalance
Expected Results
JIRA Datacenter performance will be not affected and no replication lag
Actual Results
JIRA Datacenter performance could be affected and there will be replication lag. That will cause data discrepancy between nodes.
You will have following health-check error:
["Index replication for cluster node 'node3' is behind by 2,991 seconds.","Index replication for cluster node 'node1' is behind by 1,501 seconds.","Index replication for cluster node 'node2_0004' is behind by 2,123 seconds."]
Notes
Problem is caused by set of conditions/problems:
- LexoRank Rebalancing requires rebalancing of all records in Rank field
- In case of JDC that required reindexing of all issues with all related customfields/comments
- That causes read and write amplification, as all nodes needs to update their Lucene index for all issues
- JDC uses same replicatedindexoperation mechanism for all updates.
- That means that critical replication updates from at other nodes initiated by user action compete with non-urgent LexoRank updates.
Workarounds
- Reduce the number of nodes in the cluster. Testing reveals diminishing returns in performance in clusters larger than 4 nodes.
- Avoid closing a sprint with > 200 issues in an unresolved status. This requires a new Rank value for all these issues and can trigger a Lexorank rebalance.
- If you are planning to import hundreds of issues, delay this until you have tested and resolved performance bottlenecks. A Rank must be generated for every issue and this can trigger a Lexorank rebalance.
- Leave only one node in LB to prevent serving stale data from other nodes. This negates the high availability value of Data Center so is considered a last resort. This also requires that each node is capable of handling the full concurrent user traffic for your organization, as is the best practice for an HA cluster.
Full details on workarounds and solutions are available at JIRAKB/JIRA Software Data Center Lexorank Indexing Lag.
Note on Fix
Problem mitigation:
- We have worked on reducing the need of LexoRank balancing being triggered (
JSW-15710). - Also a number of improvements to LexoRank balancing has been implemented that reduce impact of it running on JIRA cluster.
- We have addressed replication lag in
JSW-15703.
- We have addressed replication lag in
That said, a running LexoRank balancing will still cause some read and write amplification. More details about resolution are available at JIRAKB/JIRA Software Data Center Lexorank Indexing Lag.
- is caused by
-
JSWSERVER-13163 LexoRank database query performance is slow due to the way the field is constructed
- Closed
-
JSWSERVER-15703 LexoRank Rebalance can cause index replication delays in JIRA Datacenter
- Closed
-
JSWSERVER-15710 Ranking large number of issues triggers lexorank rebalancing
- Closed
- is related to
-
JSWSERVER-15707 As an admin, I would like the ability to enable and disable LexoRank rebalancing so that I can control and defer expensive operations until after peak business hours
- Closed
-
JSWSERVER-15711 Changing order of neighbour issues does not narrow lexorank space
- Closed
-
JSWSERVER-15712 Don't lock ranking on boards during rebalancing
- Closed
- relates to
-
JRASERVER-70423 DC index replication delays are affecting end-users due to single thread processing and re-computation of CF values
- Closed
-
RUM-1573 Loading...
- is depended on by
-
JGTM-1201 Loading...
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
- was split into
-
RUM-1547 Loading...