[JRASERVER-71976] DBR replication Fails on issues with high count of worklog entries - Create and track feature requests for Atlassian products.

Type: Bug
Resolution: Fixed
Priority: Medium (View bug fix roadmap)
Fix Version/s: 8.16.0, 8.13.6
Affects Version/s: 8.13.1, 8.13.2
Component/s: Data Center - Index, Data Center - Node replication
Labels:

Introduced in Version:
8.13
Support reference count:
9
Symptom Severity:
Severity 2 - Major
UIS:
63
Bug Fix Policy:
View Atlassian Server bug fix policy

Summary

Issues with large count of worklogs in a single issue can result in DBR replication buffer overflow when replicating changes.

Steps to Reproduce

Create a sample Jira issue

Add many worklog entries. In my reproduction test I added 16284 worklog entries with a simple curl loop like the following and let it run overnight.

 #!/bin/bash 
 COUNTER=16000
 until [  $COUNTER -lt 10 ]; do
     curl -u admin:sphere -H "Content-Type: application/json" -X POST "http://jira:8080/rest/api/2/issue/TEST-1/worklog" -d '{"comment":"workin over here","started":"2018-02-21T00:00:00.000+0000","timeSpentSeconds":12000}'
     let COUNTER-=1
 done

With the resulting count:

# select issueid, count(id) from worklog group by issueid having count(id) > 1000 order by (count(id)) desc;
 issueid | count
---------+-------
   10000 | 16284

Perform an operation that calls the indexer on this test issue. Example ScriptRunner script below will work to reproduce the problem:

import com.atlassian.jira.issue.IssueManager
import com.atlassian.jira.component.ComponentAccessor
import com.atlassian.jira.issue.index.IssueIndexingService

def issueManager = ComponentAccessor.getIssueManager()
def issueIndexingService = ComponentAccessor.getComponentOfType(IssueIndexingService.class)
def myIssue = issueManager.getIssueByKeyIgnoreCase("TEST-1")

issueIndexingService.reIndex(myIssue)

return "OK"

Expected Results

The change is successfully replicated.

Actual Results

The below exception is thrown in the atlassian-jira.log and DBR fails to replicate the change.

2021-01-11 08:43:00,151+0000 dbr-async-25 ERROR admin 488x83x1 1rn67se 172.20.0.1 /rest/api/2/issue/TEST-1/worklog [c.a.j.cluster.dbr.DefaultDBRSender] [DBR] [SENDER] Error when creating a dbr update with related message for documents: {issueId=10000, comments=[], worklogs=[10000, 10001, 10002, 10003, 10004, 10005, 10006, 10007, 10008, 10009, 10010, 10011, 10012, 10013, 10014, 10015, 10016, 10017, 10018, 10019, 10020, 10021, 10022, 10023, 10024, 10025, 10026, 10027, 

. . . <snipped>

4, 1028375, 1028376, 1028377, 1028378, 1028379, 1028380, 1028381, 1028382], changes.size=16285}. You can disable this functionality with the feature flag: 'com.atlassian.jira.dbr'. Error: Buffer overflow. Available: 0, required: 1
    Serialization trace:
    tokenStream (org.apache.lucene.document.TextField)
    fields (org.apache.lucene.document.Document)
    worklogs (com.atlassian.jira.cluster.dbr.DBRMessageUpdateWithRelatedData)
com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 1
Serialization trace:
tokenStream (org.apache.lucene.document.TextField)
fields (org.apache.lucene.document.Document)
worklogs (com.atlassian.jira.cluster.dbr.DBRMessageUpdateWithRelatedData)
        at com.esotericsoftware.kryo.io.Output.require(Output.java:186)
        at com.esotericsoftware.kryo.io.Output.writeByte(Output.java:245)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:109)
        at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:539)
        at com.esotericsoftware.kryo.serializers.ReflectField.write(ReflectField.java:64)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:107)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:637)
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:158)
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:43)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:568)
        at com.esotericsoftware.kryo.serializers.ReflectField.write(ReflectField.java:70)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:107)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:568)
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:154)
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:43)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:568)
        at com.esotericsoftware.kryo.serializers.ReflectField.write(ReflectField.java:70)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:107)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:553)
        at com.atlassian.jira.cluster.dbr.KryoDBRMessageDataSerializer.serialize(KryoDBRMessageDataSerializer.java:55)
        at com.atlassian.jira.cluster.dbr.DBRMessageFactory.createDBRMessage(DBRMessageFactory.java:74)
        at com.atlassian.jira.cluster.dbr.DBRMessageFactory.createUpdateWithRelated(DBRMessageFactory.java:60)
        at com.atlassian.jira.cluster.dbr.DefaultDBRSender.sendUpdateWithRelated(DefaultDBRSender.java:85)
        at com.atlassian.jira.cluster.dbr.AsyncDBRSender.lambda$sendUpdateWithRelated$0(AsyncDBRSender.java:68)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Workaround

Execute the query below to find our issues that have more than 1000 worklog entries:

select issueid, count(id) from worklog group by issueid having count(id) > 1000 order by (count(id)) desc;

Clone these affected issues and either delete or archive the original ones.

If there are a large amount of affected issues, thus making the above workaround cumbersome, there's a JVM property available as well. From Creating DBR message fails with: KryoException: Buffer overflow. Available: 0, required: 1:

Since JIRA DC 8.12 we are using Document Based Replication to replicate the index across the cluster. When a change on the issue is triggered on one node, JIRA synchronously re-indexes this issue then asynchronously serialises the object with all Lucene document(s) and distributes it to other nodes.

JIRA is using Kryo for the serialisation/deserialisation of Lucene documents. JIRA comes with some assumptions about how big the serialised documents may be. By default the maximum size of the object with Lucene documents is set to 16MB.
...
It can be overridden with the following system property (example: overriding the maximum size to 32MB). Please don't set this parameter to a very high value.

-Dcom.atlassian.jira.cluster.dbr.serialization.max.size.bytes=33554432

Note: you will have to set this property on every node and this will require a rolling restart of all nodes.

is related to

JRASERVER-73874 Reindexing issues with large number of comments, worklogs, history overloads indexers

Closed

relates to

JRASERVER-71980 nodereindex replication fails for single issues with 1000 or more worklog entries

Closed

DBR-276 You do not have permission to view this issue

mentioned in: Page Failed to load; Page Failed to load; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(7 mentioned in)

Loading...

Type: Bug
Resolution: Fixed
Priority: Medium
Fix Version/s: 8.16.0, 8.13.6
Affects Version/s: 8.13.1, 8.13.2
Component/s: Data Center - Index, Data Center - Node replication
Labels:

Introduced in Version:
8.13
Support reference count:
9
Symptom Severity:
Severity 2 - Major
UIS:
63
Bug Fix Policy:
View Atlassian Server bug fix policy

Summary

Issues with large count of worklogs in a single issue can result in DBR replication buffer overflow when replicating changes.

Steps to Reproduce

Create a sample Jira issue

Add many worklog entries. In my reproduction test I added 16284 worklog entries with a simple curl loop like the following and let it run overnight.

 #!/bin/bash 
 COUNTER=16000
 until [  $COUNTER -lt 10 ]; do
     curl -u admin:sphere -H "Content-Type: application/json" -X POST "http://jira:8080/rest/api/2/issue/TEST-1/worklog" -d '{"comment":"workin over here","started":"2018-02-21T00:00:00.000+0000","timeSpentSeconds":12000}'
     let COUNTER-=1
 done

With the resulting count:

# select issueid, count(id) from worklog group by issueid having count(id) > 1000 order by (count(id)) desc;
 issueid | count
---------+-------
   10000 | 16284

Perform an operation that calls the indexer on this test issue. Example ScriptRunner script below will work to reproduce the problem:

import com.atlassian.jira.issue.IssueManager
import com.atlassian.jira.component.ComponentAccessor
import com.atlassian.jira.issue.index.IssueIndexingService

def issueManager = ComponentAccessor.getIssueManager()
def issueIndexingService = ComponentAccessor.getComponentOfType(IssueIndexingService.class)
def myIssue = issueManager.getIssueByKeyIgnoreCase("TEST-1")

issueIndexingService.reIndex(myIssue)

return "OK"

Expected Results

The change is successfully replicated.

Actual Results

The below exception is thrown in the atlassian-jira.log and DBR fails to replicate the change.

2021-01-11 08:43:00,151+0000 dbr-async-25 ERROR admin 488x83x1 1rn67se 172.20.0.1 /rest/api/2/issue/TEST-1/worklog [c.a.j.cluster.dbr.DefaultDBRSender] [DBR] [SENDER] Error when creating a dbr update with related message for documents: {issueId=10000, comments=[], worklogs=[10000, 10001, 10002, 10003, 10004, 10005, 10006, 10007, 10008, 10009, 10010, 10011, 10012, 10013, 10014, 10015, 10016, 10017, 10018, 10019, 10020, 10021, 10022, 10023, 10024, 10025, 10026, 10027, 

. . . <snipped>

4, 1028375, 1028376, 1028377, 1028378, 1028379, 1028380, 1028381, 1028382], changes.size=16285}. You can disable this functionality with the feature flag: 'com.atlassian.jira.dbr'. Error: Buffer overflow. Available: 0, required: 1
    Serialization trace:
    tokenStream (org.apache.lucene.document.TextField)
    fields (org.apache.lucene.document.Document)
    worklogs (com.atlassian.jira.cluster.dbr.DBRMessageUpdateWithRelatedData)
com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 1
Serialization trace:
tokenStream (org.apache.lucene.document.TextField)
fields (org.apache.lucene.document.Document)
worklogs (com.atlassian.jira.cluster.dbr.DBRMessageUpdateWithRelatedData)
        at com.esotericsoftware.kryo.io.Output.require(Output.java:186)
        at com.esotericsoftware.kryo.io.Output.writeByte(Output.java:245)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:109)
        at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:539)
        at com.esotericsoftware.kryo.serializers.ReflectField.write(ReflectField.java:64)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:107)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:637)
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:158)
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:43)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:568)
        at com.esotericsoftware.kryo.serializers.ReflectField.write(ReflectField.java:70)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:107)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:568)
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:154)
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:43)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:568)
        at com.esotericsoftware.kryo.serializers.ReflectField.write(ReflectField.java:70)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:107)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:553)
        at com.atlassian.jira.cluster.dbr.KryoDBRMessageDataSerializer.serialize(KryoDBRMessageDataSerializer.java:55)
        at com.atlassian.jira.cluster.dbr.DBRMessageFactory.createDBRMessage(DBRMessageFactory.java:74)
        at com.atlassian.jira.cluster.dbr.DBRMessageFactory.createUpdateWithRelated(DBRMessageFactory.java:60)
        at com.atlassian.jira.cluster.dbr.DefaultDBRSender.sendUpdateWithRelated(DefaultDBRSender.java:85)
        at com.atlassian.jira.cluster.dbr.AsyncDBRSender.lambda$sendUpdateWithRelated$0(AsyncDBRSender.java:68)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Workaround

Execute the query below to find our issues that have more than 1000 worklog entries:

select issueid, count(id) from worklog group by issueid having count(id) > 1000 order by (count(id)) desc;

Clone these affected issues and either delete or archive the original ones.

If there are a large amount of affected issues, thus making the above workaround cumbersome, there's a JVM property available as well. From Creating DBR message fails with: KryoException: Buffer overflow. Available: 0, required: 1:

Since JIRA DC 8.12 we are using Document Based Replication to replicate the index across the cluster. When a change on the issue is triggered on one node, JIRA synchronously re-indexes this issue then asynchronously serialises the object with all Lucene document(s) and distributes it to other nodes.

JIRA is using Kryo for the serialisation/deserialisation of Lucene documents. JIRA comes with some assumptions about how big the serialised documents may be. By default the maximum size of the object with Lucene documents is set to 16MB.
...
It can be overridden with the following system property (example: overriding the maximum size to 32MB). Please don't set this parameter to a very high value.

-Dcom.atlassian.jira.cluster.dbr.serialization.max.size.bytes=33554432

Note: you will have to set this property on every node and this will require a rolling restart of all nodes.

is related to

JRASERVER-73874 Reindexing issues with large number of comments, worklogs, history overloads indexers

Closed

relates to

JRASERVER-71980 nodereindex replication fails for single issues with 1000 or more worklog entries

Closed

DBR-276 Loading...

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(7 mentioned in)

Assignee:: Maciej Swinarski (Inactive)

Reporter:: David Di Blasio

Votes:: 5 Vote for this issue

Watchers:: 17 Start watching this issue

Created:: 11/Jan/2021 2:40 PM

Updated:: 17/Jan/2025 12:44 PM

Resolved:: 23/Mar/2021 1:08 PM

Estimated:

Not Specified

Remaining:

0h

Logged:

1h

Details

Description

Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Issue Links

Forms

Activity

Details

Description

Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Issue Links

Forms

Activity

People

Dates

Time Tracking

People

Dates

Time Tracking