-
Bug
-
Resolution: Fixed
-
High (View bug fix roadmap)
-
7.6.9, 8.4.2, 8.5.1, 8.5.4
-
7.06
-
31
-
Severity 2 - Major
-
47
-
Problem:
NodeReindexServiceThread is capable of entering a state where it is no longer checking messages. This can cause inconsistency between JIRA Data Center Nodes.
Customer situation:
In a standby environment, during the cutover, the snapshot was recovered but NodeReindexServiceThread was not checking messages, causing the nodes to get behind.
Environment
- JIRA Data Center
Expected Results
NodeReindexServiceThread:thread-1 Timed_waiting in a parkNanos state, sleeping until wake up to check messages again:
"NodeReindexServiceThread:thread-1" #128 prio=5 os_prio=0 tid=0x00007f8feb238800 nid=0x8379 waiting on condition [0x00007f8fd1bf9000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000006869304c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
What was observed instead:
- Thread dumps showed that NodeReindexServiceThread:thread-1 was Waiting in a park state:
"NodeReindexServiceThread:thread-1" #110 prio=5 os_prio=0 tid=0x00007ff06f215800 nid=0x18f46 waiting on condition [0x00007ff04b429000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000686202370> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Perceived results:
- Health Check detects that index data is behind the Database date and reports the delay, that keeps increasing as NodeReindexServiceThread:thread-1 is not getting the messages.
Notes
As part of Lucene changes in Jira 8.x, code was improved in this area, which makes the problem less likely to occur, eg. com.atlassian.jira.index.ha.DefaultNodeReindexService#reIndex now catches the Throwable exception.
Workaround:
- Stop Jira in all nodes at the same time
- Start the first node. Wait until the snapshot is restored and then restart the node.
- After the restart and after NodeReindexServiceThread:thread-1 updates the index, start the second node
- If there are more nodes, start them sequentially.
- is related to
-
JRASERVER-72099 Index snapshot restore fails and Jira does not start in Disaster Recovery mode
-
- Closed
-
-
JRASERVER-66557 ClusterMessageHandlerServiceThread can stop checking messages if Throwable is encountered
-
- Gathering Impact
-
- relates to
-
JRASERVER-72125 Index replication service is paused indefinitely after failing to obtain an index snapshot from another node
-
- Closed
-
- links to
- mentioned in
-
Page Failed to load
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
We are running Jira Data Center v. 8.4.3 and this issue happen very frequency now when user get difference data when load same object i.e. board, filter, Sprint
in same time. Please provide fix / patching as soon as possible. This issue submitted as ticket GHS-203523