[JRASERVER-68653] Asynchronous cache replication queue - leaking file descriptor when queue file corrupted - Create and track feature requests for Atlassian products.

Type: Bug
Resolution: Fixed
Priority: Low (View bug fix roadmap)
Fix Version/s: 7.13.1, 8.0.0
Affects Version/s: 7.9.2, 7.7.4, 7.8.4, 7.10.2, 7.11.2, 7.13.0, 7.6.9, 7.12.3, 7.6.10
Component/s: Data Center - Other
Labels:
- pse-request

Fixed in Long Term Support Release/s:

Download 7.13
Introduced in Version:
7.06
Support reference count:
1
Symptom Severity:
Severity 2 - Major
UIS:
8
Bug Fix Policy:
View Atlassian Server bug fix policy

Problem

If a cache replication queue is corrupted when a node is shutting down, then on the next node start Jira will try to open this queue file every time it is required (a cache replication message is being send to another node on this particular channel = file). If the existing file is corrupted it fails with the following error:

ERROR      [c.a.j.c.distribution.localq.LocalQCacheManager] Error when creating cache replication queue for node: [node_name]. This node will be inconsistent. Error: File is corrupt; length stored in header is 0.

This results in:

cache replication message not being delivered
leak of file descriptor
- Jira hits into "Too many open files" error. Reviewing lsof output points to many localq entries.

Desired Jira behaviour

If the file is corrupted backup this file (copy with corrupted_ prefix) and create a new file.

Workaround

Delete the corrupted queue file. Step to identify the corrupted file can be found following comment-1917799
It should not be necessary to shut down this node. It should recreate this queue file automatically.

blocks: GHS-143621 You do not have permission to view this issue

mentioned in: Page Failed to load; Page Failed to load; Page Failed to load; Page Failed to load; Page Loading...

(1 mentioned in)

Mark Ellis made changes - 02/Nov/2023 7:35 AM

Remote Link

New: This issue links to "Page (Confluence)" [ 832995 ]

kitkat (Inactive) made changes - 24/Feb/2022 6:06 PM

Remote Link

New: This issue links to "Page (Confluence)" [ 623589 ]

kitkat (Inactive) made changes - 18/Feb/2022 4:52 PM

Remote Link

New: This issue links to "Page (Confluence)" [ 622129 ]

Sylwia Mikołajczuk made changes - 04/Feb/2022 9:03 AM

Remote Link

Original: This issue links to "Page (Confluence)" [ 405243 ]

Sylwia Mikołajczuk made changes - 04/Feb/2022 8:57 AM

Remote Link

New: This issue links to "Page (Confluence)" [ 617670 ]

set-jac-bot made changes - 22/May/2020 8:23 AM

Fixed in Enterprise Release/s

New: [Download 7.13|https://confluence.atlassian.com/enterprise/atlassian-enterprise-releases-948227420.html]

Tomasz Zwierzchowski made changes - 03/Oct/2019 7:17 AM

Remote Link

New: This issue links to "Page (Confluence)" [ 453793 ]

Zul NS [Atlassian] made changes - 01/Apr/2019 8:48 AM

Description

Original: h3. Problem

If a cache replication queue is corrupted when a node is shutting down, then on the next node start Jira will try to open this queue file every time it is required (a cache replication message is being send to another node on this particular channel = file). If the existing file is corrupted it fails with the following error:
{noformat}
ERROR [c.a.j.c.distribution.localq.LocalQCacheManager] Error when creating cache replication queue for node: [node_name]. This node will be inconsistent. Error: File is corrupt; length stored in header is 0.{noformat}
This results in:
* cache replication message not being delivered
* leak of file descriptor

h3. Desired Jira behaviour

If the file is corrupted backup this file (copy with corrupted_ prefix) and create a new file.

h3. Workaround

Delete the corrupted queue file. Step to identify the corrupted file can be found following [comment-1917799|https://jira.atlassian.com/browse/JRASERVER-68653?focusedCommentId=1917799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1917799]
It should not be necessary to shut down this node. It should recreate this queue file automatically.

New: h3. Problem

If a cache replication queue is corrupted when a node is shutting down, then on the next node start Jira will try to open this queue file every time it is required (a cache replication message is being send to another node on this particular channel = file). If the existing file is corrupted it fails with the following error:
{code}
ERROR [c.a.j.c.distribution.localq.LocalQCacheManager] Error when creating cache replication queue for node: [node_name]. This node will be inconsistent. Error: File is corrupt; length stored in header is 0.{code}
This results in:
* cache replication message not being delivered
* leak of file descriptor
** Jira hits into "Too many open files" error. Reviewing lsof output points to many localq entries.

h3. Desired Jira behaviour

If the file is corrupted backup this file (copy with corrupted_ prefix) and create a new file.

h3. Workaround

Delete the corrupted queue file. Step to identify the corrupted file can be found following [comment-1917799|https://jira.atlassian.com/browse/JRASERVER-68653?focusedCommentId=1917799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1917799]
It should not be necessary to shut down this node. It should recreate this queue file automatically.

Zul NS [Atlassian] made changes - 01/Apr/2019 8:44 AM

Description

Original: h3. Problem

If a cache replication queue is corrupted when a node is shutting down, then on the next node start Jira will try to open this queue file every time it is required (a cache replication message is being send to another node on this particular channel = file). If the existing file is corrupted it fails with the following error:
{noformat}
ERROR [c.a.j.c.distribution.localq.LocalQCacheManager] Error when creating cache replication queue for node: [node_name]. This node will be inconsistent. Error: File is corrupt; length stored in header is 0.{noformat}
This results in:
* cache replication message not being delivered
* leak of file descriptor

h3. Desired Jira behaviour

If the file is corrupted backup this file (copy with corrupted_ prefix) and create a new file.

h3. Workaround

Delete the corrupted queue file. It should not be necessary to shut down this node. It should recreate this queue file automatically.

New: h3. Problem

If a cache replication queue is corrupted when a node is shutting down, then on the next node start Jira will try to open this queue file every time it is required (a cache replication message is being send to another node on this particular channel = file). If the existing file is corrupted it fails with the following error:
{noformat}
ERROR [c.a.j.c.distribution.localq.LocalQCacheManager] Error when creating cache replication queue for node: [node_name]. This node will be inconsistent. Error: File is corrupt; length stored in header is 0.{noformat}
This results in:
* cache replication message not being delivered
* leak of file descriptor

h3. Desired Jira behaviour

If the file is corrupted backup this file (copy with corrupted_ prefix) and create a new file.

h3. Workaround

Delete the corrupted queue file. Step to identify the corrupted file can be found following [comment-1917799|https://jira.atlassian.com/browse/JRASERVER-68653?focusedCommentId=1917799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1917799]
It should not be necessary to shut down this node. It should recreate this queue file automatically.

Bugfix Automation Bot made changes - 28/Mar/2019 12:26 AM

Minimum Version

New: 7.06

Assignee:: Maciej Swinarski (Inactive)

Reporter:: Maciej Swinarski (Inactive)

Affected customers:: 2 This affects my team

Watchers:: 13 Start watching this issue

Created:: 27/Dec/2018 5:26 PM

Updated:: 02/Nov/2023 7:35 AM

Resolved:: 07/Jan/2019 3:52 PM

Jira Data Center

Details

Description

Problem

Desired Jira behaviour

Workaround

Attachments

Issue Links

Forms

Activity

[JRASERVER-68653] Asynchronous cache replication queue - leaking file descriptor when queue file corrupted

People

Dates