[BSERV-7894] Upgrade Bitbucket Data Center with zero downtime

Type: Suggestion
Resolution: Unresolved
Fix Version/s: None
Component/s: Enterprise, Maintenance - Upgrade
Labels:
None

UIS:
43
Support reference count:
34
Feedback Policy:

We collect Bitbucket feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

Atlassian status as of Mar 2021

Hi everyone,

As already noted, we're excited to have shipped the first iteration of work in this area earlier this year in v7.9, available for those running Bitbucket Data Center in a clustered environment.

This allows you to upgrade to our security patches and bugfix versions on minor releases with no downtime for end users. E.g from 7.9.0 -> 7.9.1. We know that for those of you who stay on versions of Bitbucket for longer periods, particularly LTS releases, this will allow you to easily keep patched without downtime.

You can read more about it in the release notes here and see an overview of an end to end upgrade experience on our Data Center Video Library page here.

We're not stopping here though. We're continuing to investigate how we can eliminate further downtime and look forward to giving more concrete updates on this in the near future.

As always, if you'd like to be involved in feedback sessions, we'd love to speak to you. Please reach out at rsaunders at atlassian dot com

Rob Saunders
Product Manager - Data Center

Original Description:
Currently it is not possible to upgrade Bitbucket Data Center with zero downtime, as the cluster never allows nodes with mixed versions.

If upgrade tasks between versions allow it, it should be possible to upgrade Bitbucket Data Center one node at a time, so that there is little or no loss of availability during the upgrade process.

is duplicated by

BSERV-8987 Zero time upgrade for Bitbucket DC

Closed

relates to

BSERV-13583 Allow rolling upgrades for Mirror Farms

Closed

PS-44019 You do not have permission to view this issue

causes: PS-73969 You do not have permission to view this issue

is related to: JSEV-810 You do not have permission to view this issue

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(39 mentioned in)

Sake added a comment - 16/Dec/2020 3:09 PM

Great to hear that this is coming! It would be nice if also Feature updates could use Rolling Upgrades. Because some bugs are only fixed in Feature releases or new features are needed by our customers and waiting for a new LTS release takes to long.

This feature is really making our customers happy!

Sake added a comment - 16/Dec/2020 3:09 PM Great to hear that this is coming! It would be nice if also Feature updates could use Rolling Upgrades. Because some bugs are only fixed in Feature releases or new features are needed by our customers and waiting for a new LTS release takes to long. This feature is really making our customers happy!

Rob Saunders added a comment - 16/Nov/2020 6:52 AM - edited

Hey varun.agarwal,

The good news is that we're shipping rolling upgrades for bugfix versions in Bitbucket Data Center later this year. This can be used for admins running bitbucket in a multi node configuration to upgrade between bugfix versions (e.g. 7.X.0 > 7.X.1). We just shipped similar functionality in Confluence 7.9, and you can read about it here (click through to the release notes). For those whose preference is to stay on our LTS versions (6.10 & 7.6), this will allow you to upgrade to bugfix and security patches without downtime for end users.

We know many customers want more than this though, so we're planning our next steps in this space to align with larger pieces of work in flight - updates to come.

Rob Saunders added a comment - 16/Nov/2020 6:52 AM - edited Hey varun.agarwal , The good news is that we're shipping rolling upgrades for bugfix versions in Bitbucket Data Center later this year. This can be used for admins running bitbucket in a multi node configuration to upgrade between bugfix versions (e.g. 7.X.0 > 7.X.1). We just shipped similar functionality in Confluence 7.9, and you can read about it here (click through to the release notes ). For those whose preference is to stay on our LTS versions (6.10 & 7.6), this will allow you to upgrade to bugfix and security patches without downtime for end users. We know many customers want more than this though, so we're planning our next steps in this space to align with larger pieces of work in flight - updates to come.

agarwva2 added a comment - 11/Nov/2020 1:28 PM

Hi Rob,

Following up to check if we had any progress on this improvement.

Regards,

Varun

agarwva2 added a comment - 11/Nov/2020 1:28 PM Hi Rob, Following up to check if we had any progress on this improvement. Regards, Varun

Rob Saunders added a comment - 21/Aug/2020 1:12 AM

Hi varun.agarwal - please feel to reach out to me on my email (rsaunders at atlassian dot com) and we can setup some time

Rob Saunders added a comment - 21/Aug/2020 1:12 AM Hi varun.agarwal - please feel to reach out to me on my email (rsaunders at atlassian dot com) and we can setup some time

agarwva2 added a comment - 13/Jul/2020 9:12 AM

Looking forward for the design, would be interested in the discussion around this topic and perform POC when this is released.

agarwva2 added a comment - 13/Jul/2020 9:12 AM Looking forward for the design, would be interested in the discussion around this topic and perform POC when this is released.

Rob Saunders added a comment - 13/Jul/2020 7:10 AM

Hi everyone,

Thanks for voting and commenting on this suggestion. Your input in the comments helps us understand how this affects you and what you're hoping to accomplish with Bitbucket.

We're pleased to share that work on this feature is now in our short term roadmap and we're looking forward to providing some more concrete updates on that very soon.

If you'd like to be involved in our discussions as we continue to gather feedback and share initial designs in the next few weeks and months, we'd love to speak to you.

Cheers,
Rob Saunders
Product Manager - Data Center
rsaunders at atlassian dot com

Rob Saunders added a comment - 13/Jul/2020 7:10 AM Hi everyone, Thanks for voting and commenting on this suggestion. Your input in the comments helps us understand how this affects you and what you're hoping to accomplish with Bitbucket. We're pleased to share that work on this feature is now in our short term roadmap and we're looking forward to providing some more concrete updates on that very soon. If you'd like to be involved in our discussions as we continue to gather feedback and share initial designs in the next few weeks and months, we'd love to speak to you. Cheers, Rob Saunders Product Manager - Data Center rsaunders at atlassian dot com

Tom Coolen added a comment - 08/Jul/2020 11:49 AM

I would like to get an update on this "feature request". For an enterprise level product, not having zero-downtime upgrade capabilities puts you years behind on the competition.

We can no longer afford to put our central GIT repositories offline for hours just to be able to install security patches and upgrades.

This feature (or lack off) will play a major role in our decision making for renewal of our licenses.

Tom Coolen added a comment - 08/Jul/2020 11:49 AM I would like to get an update on this "feature request". For an enterprise level product, not having zero-downtime upgrade capabilities puts you years behind on the competition. We can no longer afford to put our central GIT repositories offline for hours just to be able to install security patches and upgrades. This feature (or lack off) will play a major role in our decision making for renewal of our licenses.

Dieter Verhelst added a comment - 27/May/2020 5:54 AM

How often do you do feature version upgrade e.g. 5.1 ? How often do you do bugfix version upgrade e.g 5.2.1?
We are following Enterprise Releases and would like to keep current on the latest minor release because of bug-fixes and security-fixes.
Our main requirement is minor version upgrades, however being able to feature-upgrade to the next ER without any downtime would be the ultimate goal.
What are the most trivial and most time consuming tasks e.g upgrading database, upgrading application nodes, plugin compatibility check etc?
There are no trivial tasks, the complete deployment/upgrade is automated. Our staging/test environments are automated including a refresh-procedure, meaning we have a sound procedure and grasp on the production upgrade.
How long does the current upgrade takes? How much of it is planning, testing on staging environment and execution?
We also use Ansible and require roughly 10 minutes for the first node + 5 minutes per node/mirror to upgrade in production. Planning/testing/staging takes up the longest part given critical processes are using Bitbucket+mirrors and cannot afford any downtime. While being fully automated, the planning of an upgrade can take months because our global teams using Bitbucket Datacenter are working on different releases and schedules and there just isn't a good time for down-time. Downtime on the main-cluster or a mirror node can be quite disruptive (especially when pushes are impossible to a mirror when the master is unavailable for maintenance)
What have you tried so far to reduce the amount of downtime during upgrade? Hypothetically, will you still need zero downtime upgrade even if the upgrade happened with few minutes of downtime?
The deployment is automated, so essentially after stopping the entire cluster, the service becomes available again in a few minutes when the first node has completed the upgrade. Having ZDU would remove this downtime window and we can upgrade a lot more and faster being completely transparent to the users.
Have you ever rolled back the upgrade? If yes, what was the root cause?
No, rollbacks are not done. The upgrade has been tested multiple times, we know what is coming and we will address issues in order to move forward in case they should happen on prod.
Have you considered redirecting builds to mirror nodes during the upgrade process?
No. That would add complexity because of the global usage of our deployment and the extra /bitbucket/ being added in the url. Given that the mirrors are not accepting pushes when the master is unavailable, that would cause confusion instead of a workaround-fix.

Dieter Verhelst added a comment - 27/May/2020 5:54 AM How often do you do feature version upgrade e.g. 5.1 ? How often do you do bugfix version upgrade e.g 5.2.1? We are following Enterprise Releases and would like to keep current on the latest minor release because of bug-fixes and security-fixes. Our main requirement is minor version upgrades, however being able to feature-upgrade to the next ER without any downtime would be the ultimate goal. What are the most trivial and most time consuming tasks e.g upgrading database, upgrading application nodes, plugin compatibility check etc? There are no trivial tasks, the complete deployment/upgrade is automated. Our staging/test environments are automated including a refresh-procedure, meaning we have a sound procedure and grasp on the production upgrade. How long does the current upgrade takes? How much of it is planning, testing on staging environment and execution? We also use Ansible and require roughly 10 minutes for the first node + 5 minutes per node/mirror to upgrade in production. Planning/testing/staging takes up the longest part given critical processes are using Bitbucket+mirrors and cannot afford any downtime. While being fully automated, the planning of an upgrade can take months because our global teams using Bitbucket Datacenter are working on different releases and schedules and there just isn't a good time for down-time. Downtime on the main-cluster or a mirror node can be quite disruptive (especially when pushes are impossible to a mirror when the master is unavailable for maintenance) What have you tried so far to reduce the amount of downtime during upgrade? Hypothetically, will you still need zero downtime upgrade even if the upgrade happened with few minutes of downtime? The deployment is automated, so essentially after stopping the entire cluster, the service becomes available again in a few minutes when the first node has completed the upgrade. Having ZDU would remove this downtime window and we can upgrade a lot more and faster being completely transparent to the users. Have you ever rolled back the upgrade? If yes, what was the root cause? No, rollbacks are not done. The upgrade has been tested multiple times, we know what is coming and we will address issues in order to move forward in case they should happen on prod. Have you considered redirecting builds to mirror nodes during the upgrade process? No. That would add complexity because of the global usage of our deployment and the extra /bitbucket/ being added in the url. Given that the mirrors are not accepting pushes when the master is unavailable, that would cause confusion instead of a workaround-fix.

Richard Cross added a comment - 26/May/2020 12:31 PM

How often do you do feature version upgrade e.g. 5.1 ? How often do you do bugfix version upgrade e.g 5.2.1?
- We use the most recent Enterprise Release as soon as it becomes available.
What are the most trivial and most time consuming tasks e.g upgrading database, upgrading application nodes, plugin compatibility check etc?
- There are no trivial tasks; everything about upgrading this application is time-consuming and disruptive to end users.
How long does the current upgrade takes? How much of it is planning, testing on staging environment and execution?
- About 40 minutes per cluster node, as we deploy individual Bitbucket instances via Ansible.
What have you tried so far to reduce the amount of downtime during upgrade?
- Automated deployment via Ansible.
Hypothetically, will you still need zero downtime upgrade even if the upgrade happened with few minutes of downtime?
- Yes, in order to be able to run upgrades during sociable hours for the support team. We consider "downtime" and "planned outage" of application services to be very much a relic of pre-2005 (or thereabouts). All our internally-developed applications are designed for continuous deployment and zero downtime.
Have you ever rolled back the upgrade? If yes, what was the root cause?
- No, and we would never consider this; we would fix forward. Our system is under constant use, and it would be far too complex to synchronise the corresponding database backup.
Have you considered redirecting builds to mirror nodes during the upgrade process?
- No.

Richard Cross added a comment - 26/May/2020 12:31 PM How often do you do feature version upgrade e.g. 5.1 ? How often do you do bugfix version upgrade e.g 5.2.1? We use the most recent Enterprise Release as soon as it becomes available. What are the most trivial and most time consuming tasks e.g upgrading database, upgrading application nodes, plugin compatibility check etc? There are no trivial tasks; everything about upgrading this application is time-consuming and disruptive to end users. How long does the current upgrade takes? How much of it is planning, testing on staging environment and execution? About 40 minutes per cluster node, as we deploy individual Bitbucket instances via Ansible. What have you tried so far to reduce the amount of downtime during upgrade? Automated deployment via Ansible. Hypothetically, will you still need zero downtime upgrade even if the upgrade happened with few minutes of downtime? Yes, in order to be able to run upgrades during sociable hours for the support team. We consider "downtime" and "planned outage" of application services to be very much a relic of pre-2005 (or thereabouts). All our internally-developed applications are designed for continuous deployment and zero downtime. Have you ever rolled back the upgrade? If yes, what was the root cause? No, and we would never consider this; we would fix forward. Our system is under constant use, and it would be far too complex to synchronise the corresponding database backup. Have you considered redirecting builds to mirror nodes during the upgrade process? No.

agarwva2 added a comment - 19/May/2020 8:02 AM

agarwva2 added a comment - 19/May/2020 8:02 AM +1

Assignee:: Rob Saunders

Reporter:: Cristan Szmajda (Inactive)

Votes:: 126 Vote for this issue

Watchers:: 97 Start watching this issue

Created:: 08/Oct/2015 9:42 PM

Updated:: 29/May/2025 2:34 AM

Bitbucket Data Center

Details

Description

Attachments

Issue Links

Forms

Activity

Collapse comment: Sake added a comment - 16/Dec/2020 3:09 PM

Expand comment: Sake added a comment - 16/Dec/2020 3:09 PM

Collapse comment: Rob Saunders added a comment - 16/Nov/2020 6:52 AM, Edited by Rob Saunders - 16/Nov/2020 7:05 AM

Expand comment: Rob Saunders added a comment - 16/Nov/2020 6:52 AM, Edited by Rob Saunders - 16/Nov/2020 7:05 AM

Collapse comment: agarwva2 added a comment - 11/Nov/2020 1:28 PM

Expand comment: agarwva2 added a comment - 11/Nov/2020 1:28 PM

Collapse comment: Rob Saunders added a comment - 21/Aug/2020 1:12 AM

Expand comment: Rob Saunders added a comment - 21/Aug/2020 1:12 AM

Collapse comment: agarwva2 added a comment - 13/Jul/2020 9:12 AM

Expand comment: agarwva2 added a comment - 13/Jul/2020 9:12 AM

Collapse comment: Rob Saunders added a comment - 13/Jul/2020 7:10 AM

Expand comment: Rob Saunders added a comment - 13/Jul/2020 7:10 AM

Collapse comment: Tom Coolen added a comment - 08/Jul/2020 11:49 AM

Expand comment: Tom Coolen added a comment - 08/Jul/2020 11:49 AM

Collapse comment: Dieter Verhelst added a comment - 27/May/2020 5:54 AM

Expand comment: Dieter Verhelst added a comment - 27/May/2020 5:54 AM

Collapse comment: Richard Cross added a comment - 26/May/2020 12:31 PM

Expand comment: Richard Cross added a comment - 26/May/2020 12:31 PM

Collapse comment: agarwva2 added a comment - 19/May/2020 8:02 AM

Expand comment: agarwva2 added a comment - 19/May/2020 8:02 AM

People

Dates

Backbone Issue Sync