Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Low
Fix Version/s: None
Affects Version/s: 7.9.0, 7.13.0, 8.2.4, 8.5.4
Component/s: Data Center - Node replication
Labels:
- autoscaling
- delta
- l1l2
- pse-request
- whl-fy25q2

Introduced in Version:
7.09
Support reference count:
43
Symptom Severity:
Severity 2 - Major
UIS:
38
Bug Fix Policy:
View Atlassian Server bug fix policy
Current Status:

Hide

Atlassian Update – 19 May 2023

Hi everyone,

We’ve investigated the bug and decided to lower its priority. For this decision, we’ve considered the number of votes and watches, as well as lack of recent activity on this ticket.

Reason:

While the underlying problem has not been resolved, the chances of its occurrence are much lower than they used to be. The described cache inefficiency shows up when there are many stale nodes. The workaround suggests removing those. Since Jira 8.10 stale node removal has been automated thanks to ~~JRASERVER-42916~~.

Next steps:

If this problem still occurs on your instance, don't hesitate to comment on this ticket and/or contact Atlassian Support, so that we can learn the true impact.

Best regards,

Kamil Cichy
Senior Software Engineer, Jira DC

Show
Atlassian Update – 19 May 2023 Hi everyone, We’ve investigated the bug and decided to lower its priority. For this decision, we’ve considered the number of votes and watches, as well as lack of recent activity on this ticket. Reason : While the underlying problem has not been resolved, the chances of its occurrence are much lower than they used to be. The described cache inefficiency shows up when there are many stale nodes. The workaround suggests removing those. Since Jira 8.10 stale node removal has been automated thanks to JRASERVER-42916 . Next steps : If this problem still occurs on your instance, don't hesitate to comment on this ticket and/or contact Atlassian Support, so that we can learn the true impact. Best regards, Kamil Cichy Senior Software Engineer, Jira DC

Summary

Asynchronous cache replication can cause extra overhead in case of large number cache updates and many stale nodes.

Environment

Jira DC
A large number of stale nodes (see ~~JRASERVER-42916~~)
Plugin (code) generating a large number of cache update events, eg reaching 2000 messages/min.

Steps to Reproduce

Open a URL which produces the cache update event while computing the business logic
- Eg. #* /rest/servicedesk/1/<PRJ>/webfragments/sections/sd-queues-nav,servicedesk.agent.queues,servicedesk.agent.queues.ungrouped
Measure response time and number of replication events

Expected Results

Performance doesn't degrade with a number of old nodes.

Actual Results

Performance degrades with a number of old stale nodes.

While taking thread dumps you can see a lot of threads busy in the following stack:

  java.lang.Thread.State: RUNNABLE
	at java.io.RandomAccessFile.writeBytes(Native Method)
	at java.io.RandomAccessFile.write(RandomAccessFile.java:512)
	at com.squareup.tape.QueueFile.writeHeader(QueueFile.java:184)
	at com.squareup.tape.QueueFile.add(QueueFile.java:321)
	- locked <0x00000003ce9b43e0> (a com.squareup.tape.QueueFile)
	at com.squareup.tape.FileObjectQueue.add(FileObjectQueue.java:46)
	at com.atlassian.jira.cluster.distribution.localq.tape.TapeLocalQCacheOpQueue.add(TapeLocalQCacheOpQueue.java:151)
	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpQueueWithStats.add(LocalQCacheOpQueueWithStats.java:115)
	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheManager.addToQueue(LocalQCacheManager.java:370)
	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheManager.addToAllQueues(LocalQCacheManager.java:354)
	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheReplicator.replicateToQueue(LocalQCacheReplicator.java:85)
	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheReplicator.replicatePutNotification(LocalQCacheReplicator.java:65)
	at com.atlassian.jira.cluster.cache.ehcache.AbstractJiraCacheReplicator.notifyElementUpdated(AbstractJiraCacheReplicator.java:123)
	at net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementUpdated(RegisteredEventListeners.java:228)
	at net.sf.ehcache.event.RegisteredEventListeners.notifyElementUpdated(RegisteredEventListeners.java:206)
...

From client's case, we saw 15 - 20% of all threads doing replicateToQueue

Notes

None

Workaround

Clean-up old node data manually, see ~~JRASERVER-42916~~

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

2021-04-30-flamechart.png
3.35 MB
07/Jun/2021 2:50 PM

is related to

JRASERVER-65538 Active nodes query for offline node messages and index operations

Closed

JRASERVER-42916 Stale node ids should automatically be removed in Jira Data Center

Closed

JRASERVER-67019 Asynchronous cache replication in Jira Data Center

Closed

relates to

JSDSERVER-6490 Opening Service Desk Queue will send many Cache replication requests for Queue Count

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(22 mentioned in)

Assignee:: Unassigned

Reporter:: Andriy Yakovlev [Atlassian]

Votes:: 6 Vote for this issue

Watchers:: 19 Start watching this issue

Created:: 22/Jul/2019 8:28 AM

Updated:: 17 hours ago

Details

Description

Summary

Environment

Steps to Reproduce

Expected Results

Actual Results

Notes

Workaround

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates