Uploaded image for project: 'Jira Platform Cloud'
  1. Jira Platform Cloud
  2. JRACLOUD-86260

DvcsScheduler job is not running, stops the scheduled synchronisation

      .

            [JRACLOUD-86260] DvcsScheduler job is not running, stops the scheduled synchronisation

            ZehuaA added a comment -

            Hi Everyone,

            I would like to update that DVCS Connector version 2.1.4, which includes this fix, has been deployed to all OnDemand instances.

            Please let us know if the problem that you face in this issue still exist.

            Regards,

            Zehua Liu | Atlassian

            ZehuaA added a comment - Hi Everyone, I would like to update that DVCS Connector version 2.1.4, which includes this fix, has been deployed to all OnDemand instances. Please let us know if the problem that you face in this issue still exist. Regards, Zehua Liu | Atlassian

            ZehuaA added a comment -

            The internal issue that blocks the deployment of 2.1.4 will be fixed on 28 Apr. We will resume the rest of the DVCS Connector 2.1.4 deployment on 29 Apr.

            ZehuaA added a comment - The internal issue that blocks the deployment of 2.1.4 will be fixed on 28 Apr. We will resume the rest of the DVCS Connector 2.1.4 deployment on 29 Apr.

            ZehuaA added a comment -

            There was an internal issue that stopped the OnDemand deployment of the 2.1.4 version of DVCS connector, which includes the fix for this issue.

            We are still investigating the internal issue and will update the status once we have an estimated date of resuming the deployment.

            ZehuaA added a comment - There was an internal issue that stopped the OnDemand deployment of the 2.1.4 version of DVCS connector, which includes the fix for this issue. We are still investigating the internal issue and will update the status once we have an estimated date of resuming the deployment.

            ZehuaA added a comment -

            Will be going to OnDemand on 10 Apr.

            ZehuaA added a comment - Will be going to OnDemand on 10 Apr.

            sladey added a comment -

            Thanks zliu great update.

            sladey added a comment - Thanks zliu great update.

            ZehuaA added a comment -

            pslade@atlassian.com there have been several reports about pull requests not being sync-ed in time on jdog, but we did not pay sufficient attention to them and simply fixed them by doing a manually sync. We only started to investigate the problem yesterday morning after I confirmed that the missing PR synchronization is suspicious.

            The DVCS Connector was kind of working because webhook would mitigate most of the problems introduced by a missing sync scheduler. That's one of the main reasons why it went unnoticed on jdog for so long, causing the false sense of confidence.

            The work on clustering support for DVCS connector was a rush for JAC deployment and we focused too much on making it cluster safe and deployable. We only spent minimum amount of time on regression testing. pobara and scia have come out with a proper test plan for the regression, but I think they did not have a chance to execute it as the test cluster was being used for JAC troubleshooting.

            On the other hand, the automated tests in DVCS connector is incomplete. This is why it did not get caught in the builds. We are working on improving that as part of Fusion in JIRA 6.3. I've created BBC-741 to have tests covering the scheduler.

            Will discuss further with QA to see how we could prevent similar regression in the future.

            ZehuaA added a comment - pslade@atlassian.com there have been several reports about pull requests not being sync-ed in time on jdog, but we did not pay sufficient attention to them and simply fixed them by doing a manually sync. We only started to investigate the problem yesterday morning after I confirmed that the missing PR synchronization is suspicious. The DVCS Connector was kind of working because webhook would mitigate most of the problems introduced by a missing sync scheduler. That's one of the main reasons why it went unnoticed on jdog for so long, causing the false sense of confidence. The work on clustering support for DVCS connector was a rush for JAC deployment and we focused too much on making it cluster safe and deployable. We only spent minimum amount of time on regression testing. pobara and scia have come out with a proper test plan for the regression, but I think they did not have a chance to execute it as the test cluster was being used for JAC troubleshooting. On the other hand, the automated tests in DVCS connector is incomplete. This is why it did not get caught in the builds. We are working on improving that as part of Fusion in JIRA 6.3. I've created BBC-741 to have tests covering the scheduler. Will discuss further with QA to see how we could prevent similar regression in the future.

            crf added a comment - - edited

            To clarify, a fixed version would read something like:

                private void scheduleJobIfReady()
                {
                    if (readyToSchedule.decrementAndGet() != 0)
                    {
                        // Not ready to schedule
                        return;
                    }
            
                    scheduler.registerJobHandler(JOB_HANDLER_KEY, dvcsSchedulerJob);   // <--- Must happen even if already scheduled!
                    if (scheduler.getJobInfo(JOB_ID) != null))
                    {
                        // Already scheduled
                        return;
                    }
            
                    final long interval = Long.getLong(PROPERTY_KEY, DEFAULT_INTERVAL);
                    final long randomStartTimeWithinInterval = new Date().getTime() + (long) (new Random().nextDouble() * interval);
                    final Date startTime = new Date(randomStartTimeWithinInterval);
                    scheduler.scheduleClusteredJob(JOB_ID, JOB_HANDLER_KEY, startTime, interval);
                    log.info("DvcsScheduler start planned at " + startTime + ", interval=" + interval);
                }
            

            crf added a comment - - edited To clarify, a fixed version would read something like: private void scheduleJobIfReady() { if (readyToSchedule.decrementAndGet() != 0) { // Not ready to schedule return ; } scheduler.registerJobHandler(JOB_HANDLER_KEY, dvcsSchedulerJob); // <--- Must happen even if already scheduled! if (scheduler.getJobInfo(JOB_ID) != null )) { // Already scheduled return ; } final long interval = Long .getLong(PROPERTY_KEY, DEFAULT_INTERVAL); final long randomStartTimeWithinInterval = new Date().getTime() + ( long ) ( new Random().nextDouble() * interval); final Date startTime = new Date(randomStartTimeWithinInterval); scheduler.scheduleClusteredJob(JOB_ID, JOB_HANDLER_KEY, startTime, interval); log.info( "DvcsScheduler start planned at " + startTime + ", interval=" + interval); }

            sladey added a comment -

            From a Freezer flow perspective I am interested in why we are only now seeing OD1 in the DVCS related test environments. There have been 67 builds of OD1 released to DEV.

            sladey added a comment - From a Freezer flow perspective I am interested in why we are only now seeing OD1 in the DVCS related test environments. There have been 67 builds of OD1 released to DEV.

            crf added a comment - - edited

            DvcsScheduler contains this code block:

                private void scheduleJobIfReady()
                {
                    if (readyToSchedule.decrementAndGet() != 0 || scheduler.getJobInfo(JOB_ID) != null)
                    {
                        // Not ready to schedule or already scheduled
                        return;
                    }
                    scheduler.registerJobHandler(JOB_HANDLER_KEY, dvcsSchedulerJob);   // <--- BUG!
                    final long interval = Long.getLong(PROPERTY_KEY, DEFAULT_INTERVAL);
                    final long randomStartTimeWithinInterval = new Date().getTime() + (long) (new Random().nextDouble() * interval);
                    final Date startTime = new Date(randomStartTimeWithinInterval);
                    scheduler.scheduleClusteredJob(JOB_ID, JOB_HANDLER_KEY, startTime, interval);
                    log.info("DvcsScheduler start planned at " + startTime + ", interval=" + interval);
                }
            

            If this is not the first time for the job to be scheduled, then we fail to register the job handler. The job handler needs to be registered every time the plugin is restarted, even if the job is already scheduled. With no handler registered, there is nothing to run the job. To confirm this, run this db query:

            SELECT * FROM rundetails WHERE run_outcome = 'U';
            

            This will show the jobs that tried to run but were unable to because the plugin is missing or did not register the JobRunner for it.

            crf added a comment - - edited DvcsScheduler contains this code block: private void scheduleJobIfReady() { if (readyToSchedule.decrementAndGet() != 0 || scheduler.getJobInfo(JOB_ID) != null ) { // Not ready to schedule or already scheduled return ; } scheduler.registerJobHandler(JOB_HANDLER_KEY, dvcsSchedulerJob); // <--- BUG! final long interval = Long .getLong(PROPERTY_KEY, DEFAULT_INTERVAL); final long randomStartTimeWithinInterval = new Date().getTime() + ( long ) ( new Random().nextDouble() * interval); final Date startTime = new Date(randomStartTimeWithinInterval); scheduler.scheduleClusteredJob(JOB_ID, JOB_HANDLER_KEY, startTime, interval); log.info( "DvcsScheduler start planned at " + startTime + ", interval=" + interval); } If this is not the first time for the job to be scheduled, then we fail to register the job handler. The job handler needs to be registered every time the plugin is restarted, even if the job is already scheduled. With no handler registered, there is nothing to run the job. To confirm this, run this db query: SELECT * FROM rundetails WHERE run_outcome = 'U' ; This will show the jobs that tried to run but were unable to because the plugin is missing or did not register the JobRunner for it.

              Unassigned Unassigned
              zliu ZehuaA
              Affected customers:
              0 This affects my team
              Watchers:
              15 Start watching this issue

                Created:
                Updated:
                Resolved: