Using the Backup Client on an instance with lots of Active Objects may incorrectly cause a timeout and fail

XMLWordPrintable

    • 4
    • Severity 2 - Major
    • 2

      Issue Summary

      The Backup Client performs various actions on a Bitbucket instance before a backup is started, such as putting the system into maintenance mode and draining database connections as well as SCM hosting operations.

      The backup client waits up to 1 minute for connections and operations to drain. However, before connections and operations are drained, Bitbucket exports Active Objects to disk. The Backup Client is unaware of this and if backing up Active Objects takes longer than one minute, Backup client will print a message saying that draining the SCM operations timed out.

      Steps to Reproduce

      1. Have an instance where backing up Active Objects takes longer than one minute
      2. Perform a backup using the Backup Client

      Expected Results

      The backup succeeds successfully and a backup is created on the disk.

      Actual Results

      The backup client times out waiting for draining SCM requests and fails with the following message:

      com.atlassian.bitbucket.internal.backup.client.ScmDrainTimedOutException: Operations from one or more SCMs did not finish within the allotted timeout. To prevent corruption due to inconsistent state, the backup has been aborted. Please try backup up again when the system is under less load.
      

      The below exception is thrown in the atlassian-bitbucket.log file:

      WARN  [threadpool:thread-2] admin @1NNNN222NNNx1x0 0:0:0:0:0:0:0:1 "POST /mvc/admin/backups HTTP/1.1" c.a.s.i.m.DefaultMaintenanceTaskMonitor BACKUP maintenance has been canceled (Cause: BackupException: A backup file could not be created.)
      com.atlassian.stash.internal.backup.BackupException: A backup file could not be created.
      	at com.atlassian.stash.internal.maintenance.backup.BackupPhase.run(BackupPhase.java:75)
      	at com.atlassian.stash.internal.maintenance.CompositeMaintenanceTask$Step.run(CompositeMaintenanceTask.java:130)
      	at com.atlassian.stash.internal.maintenance.CompositeMaintenanceTask.run(CompositeMaintenanceTask.java:69)
      	at com.atlassian.stash.internal.maintenance.MaintenanceModePhase.run(MaintenanceModePhase.java:27)
      	at com.atlassian.stash.internal.maintenance.backup.AbstractBackupTask.run(AbstractBackupTask.java:85)
      	at com.atlassian.stash.internal.maintenance.DefaultMaintenanceTaskMonitor.run(DefaultMaintenanceTaskMonitor.java:212)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	...
      Caused by: com.atlassian.stash.internal.backup.BackupException: Backup of ActiveObjects table data failed.
      	at com.atlassian.stash.internal.maintenance.backup.ActiveObjectsBackupStep.run(ActiveObjectsBackupStep.java:54)
      	at com.atlassian.stash.internal.maintenance.CompositeMaintenanceTask$Step.run(CompositeMaintenanceTask.java:130)
      	at com.atlassian.stash.internal.maintenance.CompositeMaintenanceTask.run(CompositeMaintenanceTask.java:69)
      	at com.atlassian.stash.internal.maintenance.backup.BackupPhase.run(BackupPhase.java:71)
      	... 15 common frames omitted
      Caused by: com.atlassian.activeobjects.spi.ActiveObjectsImportExportException: There was an error during import/export with <unknown plugin>:com.ctc.wstx.exc.WstxIOException: null
      	at com.atlassian.activeobjects.backup.ImportExportErrorServiceImpl.newParseException(ImportExportErrorServiceImpl.java:40)
      	at com.atlassian.dbexporter.node.stax.StaxStreamWriter.close(StaxStreamWriter.java:159)
      	at com.atlassian.activeobjects.backup.ActiveObjectsBackup.closeCloseable(ActiveObjectsBackup.java:182)
      	at com.atlassian.activeobjects.backup.ActiveObjectsBackup.save(ActiveObjectsBackup.java:113)
      	at com.atlassian.stash.internal.plugin.OsgiServiceProxyFactoryImpl$DynamicServiceInvocationHandler.invoke(OsgiServiceProxyFactoryImpl.java:104)
      	at com.atlassian.stash.internal.maintenance.backup.ActiveObjectsBackupStep.run(ActiveObjectsBackupStep.java:51)
      	... 18 common frames omitted
      Caused by: com.ctc.wstx.exc.WstxIOException: null
      	at com.ctc.wstx.sw.BaseStreamWriter._finishDocument(BaseStreamWriter.java:1431)
      	at com.ctc.wstx.sw.BaseStreamWriter.close(BaseStreamWriter.java:264)
      	at com.atlassian.javanet.staxutils.helpers.StreamWriterDelegate.close(StreamWriterDelegate.java:182)
      	at com.atlassian.dbexporter.node.stax.StaxStreamWriter.close(StaxStreamWriter.java:157)
      	... 22 common frames omitted
      Caused by: java.nio.channels.ClosedChannelException: null
      	at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
      	at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:199)
      	at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
      	at java.nio.channels.Channels.writeFully(Channels.java:101)
      	at java.nio.channels.Channels.access$000(Channels.java:61)
      	at java.nio.channels.Channels$1.write(Channels.java:174)
      	at de.schlichtherle.truezip.io.LEDataOutputStream.write(LEDataOutputStream.java:91)
      	at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253)
      	at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
      	at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
      	at de.schlichtherle.truezip.io.DecoratingOutputStream.write(DecoratingOutputStream.java:54)
      	at de.schlichtherle.truezip.zip.ZipOutputStream.write(ZipOutputStream.java:288)
      	at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:90)
      	at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
      	at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
      	at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
      	at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
      	at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
      	at com.ctc.wstx.sw.BufferingXmlWriter.flush(BufferingXmlWriter.java:225)
      	at com.ctc.wstx.sw.BufferingXmlWriter.close(BufferingXmlWriter.java:198)
      	at com.ctc.wstx.sw.BaseStreamWriter._finishDocument(BaseStreamWriter.java:1429)
      	... 25 common frames omitted
      

      Workaround

      Increasing the timespan before Backup Client times out waiting for SCM requests to drain will allow Active Objects to be exported successfully and the backup will continue normally.

      1. Open the file backup-config.properties in the Backup Client directory
      2. Put the following line at the end of the file:
        backup.scmdrain.timeout=3600

      This increases the timeout to 1 hour. Note that this means your backups might wait this long before a legitimate failure to drain SCM requests is detected as well.

            Assignee:
            Unassigned
            Reporter:
            Wolfgang Kritzinger
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: