Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-54870

Site imports fail due to not being able to put the scheduler in standby

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • 5.10, 6.3.2
    • None

    Description

      Summary

      Site backup imports fail when Confluence is connected to LDAP (connector) with many user accounts.

      Environment

      • Connect to an LDAP with connector method
      • Issue observed with the following example values:
            <groups>6538</groups>
            <users>5513</users>
        

      Steps to Reproduce

      1. Connect Confluence with the LDAP
      2. Try to import a site backup

      Expected Results

      Site backup is imported normally.

      Actual Results

      1. After some time, the following message is seen in the UI:
        Import failed. Check your server logs for more information. Error putting scheduler on standby just prior to backup restore. Please wait a few minutes before trying again. If this error persists, please try restarting the server.
        
      1. The following error is seen in the atlassian-confluence.log:
        2017-08-02 10:15:30,850 ERROR [Long running task: Importing data] [confluence.importexport.xmlimport.BackupImporter] preImport Timed out waiting for atlassian-scheduler currently executing jobs to complete: [RunningJobImpl[startTime=1501682374170,jobId=com.atlassian.crowd.manager.directory.monitor.poller.DirectoryPollerManager.2424833,jobConfig=JobConfig[jobRunnerKey=com.atlassian.crowd.manager.directory.monitor.poller.DirectoryPollerManager,runMode=RUN_LOCALLY,schedule=Schedule[type=INTERVAL,intervalScheduleInfo=IntervalScheduleInfo[firstRunTime=Tue Aug 14 10:59:34 EDT 2017,intervalInMillis=3600000]],parameters={DIRECTORY_ID=3414633}],cancelled=true]]
         -- url: /admin/restore.action | referer: http://localhost:8090/admin/backup.action | traceId: 5a1c0719fd13a976 | userName: admin | action: restore
        2017-08-14 10:15:30,853 ERROR [Long running task: Importing data] [confluence.importexport.actions.ImportLongRunningTask] runInternal Failure during import
         -- url: /admin/restore.action | referer: http://localhost:8090/admin/backup.action | traceId: 5a1c0719fd13a976 | userName: admin | action: restore
        com.atlassian.confluence.importexport.ImportExportException: Error putting scheduler on standby just prior to backup restore. Please wait a few minutes before trying again. If this error persists, please try restarting the server.
        

      As we can see, the backup cannot be started due to the scheduler being in use by the job com.atlassian.crowd.manager.directory.monitor.poller.DirectoryPollerManager.

      Notes

      The same problem was also spotted when the scheduler is locked by the TaskQueueFlushJob, as we can see below:

      2017-07-20 18:12:00,610 ERROR [Long running task: Importing data] [confluence.importexport.xmlimport.BackupImporter] preImport Timed out waiting for atlassian-scheduler currently executing jobs to complete: [RunningJobImpl[startTime=1499880376933,jobId=TaskQueueFlushJob,jobConfig=JobConfig[jobRunnerKey=TaskQueueFlushJob,runMode=RUN_LOCALLY,schedule=Schedule[type=INTERVAL,intervalScheduleInfo=IntervalScheduleInfo[firstRunTime=null,intervalInMillis=60000]],parameters={}],cancelled=true]]
      – url: /admin/restore-local-file.action | referer: https://confluencecorp.ctsp.prod.cloud.ihf/admin/backup.action | traceId: 96e77bda873a9b76 | userName: admin | action: restore-local-file
      2017-07-20 18:12:00,612 ERROR [Long running task: Importing data] [confluence.importexport.actions.ImportLongRunningTask] runInternal Failure during import
      – url: /admin/restore-local-file.action | referer: https://confluencecorp.ctsp.prod.cloud.ihf/admin/backup.action | traceId: 96e77bda873a9b76 | userName: admin | action: restore-local-file
      com.atlassian.confluence.importexport.ImportExportException: Error putting scheduler on standby just prior to backup restore. Please wait a few minutes before trying again. If this error persists, please try restarting the server.
      

      Workaround

      Since the data in the scheduler_clustered_jobs table is transient, you could simply remove the entries from the entities.xml file inside the backup zip. These entries will look something like this:

      <object class="SchedulerClusteredJob" package="com.atlassian.confluence.impl.schedule.caesium">
      <id name="id">########</id>
      <property name="jobId"><![CDATA[com.atlassian.crowd.manager.directory.monitor.poller.DirectoryPollerManager.########]]></property>
      <property name="nextRunTime">YYYY-MM-DD HH:MM:SS.SSS</property>
      <property name="version">#####</property>
      <property name="jobRunnerKey"><![CDATA[com.atlassian.crowd.manager.directory.monitor.poller.DirectoryPollerManager]]></property>
      <property name="rawParameters"><![CDATA[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]]></property>
      <property name="schedType">I</property>
      <property name="cronExpression"/><property name="cronTimeZone"/><property name="intervalFirstRunTime">YYYY-MM-DD HH:MM:SS.SSS</property>
      <property name="intervalMillis">300000</property>
      </object>
      

      Step-by-step:

      1. Unzip the backup zip into a temporary location
      2. Open the entities.xml file and remove the entry that looks like above
      3. Zip all files back into a new backup.zip
      4. Re-attempt the import using the new backup.zip

      If you don't want to manually edit the backup file, then you can try the below workarounds instead:

      If the Job ID on the error message is com.atlassian.crowd.manager.directory.monitor.poller.DirectoryPollerManager:

      1. Restart Confluence and attempt to import the space
      2. If that does not help, disable the LDAP while importing is in progress, or delete the exact entry directly from the scheduler_clustered_jobs table also while the import is in progress
        Notice that you need to be logged in as an internal system administrator account to follow the second option

      If the Job ID on the error message is TaskQueueFlushJob:

      1. Restart Confluence and attempt to import the space
      2. If that does not help, disable the scheduled job Task Queue Flush Job at General Configuration > Scheduled Jobs while you import the space

      More details on the following KB:

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bandreeti Bernardo Andreeti
              Votes:
              3 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: