Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-71976

DBR replication Fails on issues with high count of worklog entries

    XMLWordPrintable

Details

    Description

      Summary

      Issues with large count of worklogs in a single issue can result in DBR replication buffer overflow when replicating changes.

      Steps to Reproduce

      1. Create a sample Jira issue
      2. Add many worklog entries. In my reproduction test I added 16284 worklog entries with a simple curl loop like the following and let it run overnight.
         #!/bin/bash 
         COUNTER=16000
         until [  $COUNTER -lt 10 ]; do
             curl -u admin:sphere -H "Content-Type: application/json" -X POST "http://jira:8080/rest/api/2/issue/TEST-1/worklog" -d '{"comment":"workin over here","started":"2018-02-21T00:00:00.000+0000","timeSpentSeconds":12000}'
             let COUNTER-=1
         done
        

        With the resulting count:

        # select issueid, count(id) from worklog group by issueid having count(id) > 1000 order by (count(id)) desc;
         issueid | count
        ---------+-------
           10000 | 16284
        
      • Perform an operation that calls the indexer on this test issue. Example ScriptRunner script below will work to reproduce the problem:
        import com.atlassian.jira.issue.IssueManager
        import com.atlassian.jira.component.ComponentAccessor
        import com.atlassian.jira.issue.index.IssueIndexingService
        
        def issueManager = ComponentAccessor.getIssueManager()
        def issueIndexingService = ComponentAccessor.getComponentOfType(IssueIndexingService.class)
        def myIssue = issueManager.getIssueByKeyIgnoreCase("TEST-1")
        
        issueIndexingService.reIndex(myIssue)
        
        return "OK"
        

      Expected Results

      The change is successfully replicated.

      Actual Results

      The below exception is thrown in the atlassian-jira.log and DBR fails to replicate the change.

      2021-01-11 08:43:00,151+0000 dbr-async-25 ERROR admin 488x83x1 1rn67se 172.20.0.1 /rest/api/2/issue/TEST-1/worklog [c.a.j.cluster.dbr.DefaultDBRSender] [DBR] [SENDER] Error when creating a dbr update with related message for documents: {issueId=10000, comments=[], worklogs=[10000, 10001, 10002, 10003, 10004, 10005, 10006, 10007, 10008, 10009, 10010, 10011, 10012, 10013, 10014, 10015, 10016, 10017, 10018, 10019, 10020, 10021, 10022, 10023, 10024, 10025, 10026, 10027, 
      
      . . . <snipped>
      
      4, 1028375, 1028376, 1028377, 1028378, 1028379, 1028380, 1028381, 1028382], changes.size=16285}. You can disable this functionality with the feature flag: 'com.atlassian.jira.dbr'. Error: Buffer overflow. Available: 0, required: 1
          Serialization trace:
          tokenStream (org.apache.lucene.document.TextField)
          fields (org.apache.lucene.document.Document)
          worklogs (com.atlassian.jira.cluster.dbr.DBRMessageUpdateWithRelatedData)
      com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 1
      Serialization trace:
      tokenStream (org.apache.lucene.document.TextField)
      fields (org.apache.lucene.document.Document)
      worklogs (com.atlassian.jira.cluster.dbr.DBRMessageUpdateWithRelatedData)
              at com.esotericsoftware.kryo.io.Output.require(Output.java:186)
              at com.esotericsoftware.kryo.io.Output.writeByte(Output.java:245)
              at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:109)
              at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:539)
              at com.esotericsoftware.kryo.serializers.ReflectField.write(ReflectField.java:64)
              at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:107)
              at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:637)
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:158)
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:43)
              at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:568)
              at com.esotericsoftware.kryo.serializers.ReflectField.write(ReflectField.java:70)
              at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:107)
              at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:568)
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:154)
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:43)
              at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:568)
              at com.esotericsoftware.kryo.serializers.ReflectField.write(ReflectField.java:70)
              at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:107)
              at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:553)
              at com.atlassian.jira.cluster.dbr.KryoDBRMessageDataSerializer.serialize(KryoDBRMessageDataSerializer.java:55)
              at com.atlassian.jira.cluster.dbr.DBRMessageFactory.createDBRMessage(DBRMessageFactory.java:74)
              at com.atlassian.jira.cluster.dbr.DBRMessageFactory.createUpdateWithRelated(DBRMessageFactory.java:60)
              at com.atlassian.jira.cluster.dbr.DefaultDBRSender.sendUpdateWithRelated(DefaultDBRSender.java:85)
              at com.atlassian.jira.cluster.dbr.AsyncDBRSender.lambda$sendUpdateWithRelated$0(AsyncDBRSender.java:68)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

      Workaround

      • Execute the query below to find our issues that have more than 1000 worklog entries:
        select issueid, count(id) from worklog group by issueid having count(id) > 1000 order by (count(id)) desc;
        
      • Clone these affected issues and either delete or archive the original ones.

      If there are a large amount of affected issues, thus making the above workaround cumbersome, there's a JVM property available as well. From Creating DBR message fails with: KryoException: Buffer overflow. Available: 0, required: 1:

      Since JIRA DC 8.12 we are using Document Based Replication to replicate the index across the cluster. When a change on the issue is triggered on one node, JIRA synchronously re-indexes this issue then asynchronously serialises the object with all Lucene document(s) and distributes it to other nodes.

      JIRA is using Kryo for the serialisation/deserialisation of Lucene documents. JIRA comes with some assumptions about how big the serialised documents may be. By default the maximum size of the object with Lucene documents is set to 16MB.
      ...
      It can be overridden with the following system property (example: overriding the maximum size to 32MB). Please don't set this parameter to a very high value.

      -Dcom.atlassian.jira.cluster.dbr.serialization.max.size.bytes=33554432

      Note: you will have to set this property on every node and this will require a rolling restart of all nodes.

      Attachments

        Issue Links

          Activity

            People

              mswinarski Maciej Swinarski (Inactive)
              ddiblasio David Di Blasio
              Votes:
              5 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h