-
Bug
-
Resolution: Fixed
-
Low (View bug fix roadmap)
-
7.9.2, 7.7.4, 7.8.4, 7.10.2, 7.11.2, 7.13.0, 7.6.9, 7.12.3, 7.6.10
-
7.06
-
1
-
Severity 2 - Major
-
8
-
Problem
If a cache replication queue is corrupted when a node is shutting down, then on the next node start Jira will try to open this queue file every time it is required (a cache replication message is being send to another node on this particular channel = file). If the existing file is corrupted it fails with the following error:
ERROR [c.a.j.c.distribution.localq.LocalQCacheManager] Error when creating cache replication queue for node: [node_name]. This node will be inconsistent. Error: File is corrupt; length stored in header is 0.
This results in:
- cache replication message not being delivered
- leak of file descriptor
- Jira hits into "Too many open files" error. Reviewing lsof output points to many localq entries.
Desired Jira behaviour
If the file is corrupted backup this file (copy with corrupted_ prefix) and create a new file.
Workaround
Delete the corrupted queue file. Step to identify the corrupted file can be found following comment-1917799
It should not be necessary to shut down this node. It should recreate this queue file automatically.
[JRASERVER-68653] Asynchronous cache replication queue - leaking file descriptor when queue file corrupted
Remote Link | New: This issue links to "Page (Confluence)" [ 832995 ] |
Remote Link | New: This issue links to "Page (Confluence)" [ 623589 ] |
Remote Link | New: This issue links to "Page (Confluence)" [ 622129 ] |
Remote Link | Original: This issue links to "Page (Confluence)" [ 405243 ] |
Remote Link | New: This issue links to "Page (Confluence)" [ 617670 ] |
Fixed in Enterprise Release/s | New: [Download 7.13|https://confluence.atlassian.com/enterprise/atlassian-enterprise-releases-948227420.html] |
Remote Link | New: This issue links to "Page (Confluence)" [ 453793 ] |
Description |
Original:
h3. Problem
If a cache replication queue is corrupted when a node is shutting down, then on the next node start Jira will try to open this queue file every time it is required (a cache replication message is being send to another node on this particular channel = file). If the existing file is corrupted it fails with the following error: {noformat} ERROR [c.a.j.c.distribution.localq.LocalQCacheManager] Error when creating cache replication queue for node: [node_name]. This node will be inconsistent. Error: File is corrupt; length stored in header is 0.{noformat} This results in: * cache replication message not being delivered * leak of file descriptor h3. Desired Jira behaviour If the file is corrupted backup this file (copy with corrupted_ prefix) and create a new file. h3. Workaround Delete the corrupted queue file. Step to identify the corrupted file can be found following [comment-1917799|https://jira.atlassian.com/browse/JRASERVER-68653?focusedCommentId=1917799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1917799] It should not be necessary to shut down this node. It should recreate this queue file automatically. |
New:
h3. Problem
If a cache replication queue is corrupted when a node is shutting down, then on the next node start Jira will try to open this queue file every time it is required (a cache replication message is being send to another node on this particular channel = file). If the existing file is corrupted it fails with the following error: {code} ERROR [c.a.j.c.distribution.localq.LocalQCacheManager] Error when creating cache replication queue for node: [node_name]. This node will be inconsistent. Error: File is corrupt; length stored in header is 0.{code} This results in: * cache replication message not being delivered * leak of file descriptor ** Jira hits into "Too many open files" error. Reviewing lsof output points to many localq entries. h3. Desired Jira behaviour If the file is corrupted backup this file (copy with corrupted_ prefix) and create a new file. h3. Workaround Delete the corrupted queue file. Step to identify the corrupted file can be found following [comment-1917799|https://jira.atlassian.com/browse/JRASERVER-68653?focusedCommentId=1917799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1917799] It should not be necessary to shut down this node. It should recreate this queue file automatically. |
Description |
Original:
h3. Problem
If a cache replication queue is corrupted when a node is shutting down, then on the next node start Jira will try to open this queue file every time it is required (a cache replication message is being send to another node on this particular channel = file). If the existing file is corrupted it fails with the following error: {noformat} ERROR [c.a.j.c.distribution.localq.LocalQCacheManager] Error when creating cache replication queue for node: [node_name]. This node will be inconsistent. Error: File is corrupt; length stored in header is 0.{noformat} This results in: * cache replication message not being delivered * leak of file descriptor h3. Desired Jira behaviour If the file is corrupted backup this file (copy with corrupted_ prefix) and create a new file. h3. Workaround Delete the corrupted queue file. It should not be necessary to shut down this node. It should recreate this queue file automatically. |
New:
h3. Problem
If a cache replication queue is corrupted when a node is shutting down, then on the next node start Jira will try to open this queue file every time it is required (a cache replication message is being send to another node on this particular channel = file). If the existing file is corrupted it fails with the following error: {noformat} ERROR [c.a.j.c.distribution.localq.LocalQCacheManager] Error when creating cache replication queue for node: [node_name]. This node will be inconsistent. Error: File is corrupt; length stored in header is 0.{noformat} This results in: * cache replication message not being delivered * leak of file descriptor h3. Desired Jira behaviour If the file is corrupted backup this file (copy with corrupted_ prefix) and create a new file. h3. Workaround Delete the corrupted queue file. Step to identify the corrupted file can be found following [comment-1917799|https://jira.atlassian.com/browse/JRASERVER-68653?focusedCommentId=1917799&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1917799] It should not be necessary to shut down this node. It should recreate this queue file automatically. |
Minimum Version | New: 7.06 |