Lucky release t-shirts anyone? In our last blog post, we talked a little cheekily about “big bang” software releases and the risks they bring. This “big-batch-of-changes” approach to releasing new features can result in hours or days of downtime, comes with no real rollback scenario but does, instead, come with a large run-book of all the possible workarounds you need if a release goes awry. In this post, we move on in our series to explore the benefits and drawbacks of blue-green deployments as a release strategy. And we start unpacking canary releasing, and specifically, the policy-based version as an effective alternative.

Advantages of blue-green deployments over “big bang” releases

The major difference between big push releases and the blue-green method is that blue-green uses a second, identical, but inactive production environment. This is where the upgrade is done, while the active production environment continues to serve all traffic. When the upgrade is tested and deemed fit for production, all traffic is switched over. At any time, only one of the environments is live, with the active environment serving all production traffic.

Comparatively, in a big-bang scenario, an upgrade has to take place in a live production environment. Often, these upgrades take a long time due to all the steps that need to occur for the upgrade to complete successfully. In a blue-green scenario, cut-over time is minimized because most work is done on the inactive environment, where it doesn’t matter how long the upgrade takes.

1 Source:

Advantages and drawbacks of blue-green deployments

The upsides are obvious:

1.      Downtime is much shorter, requiring less preparatory time, meetings up-front to come up with mitigations to all plausible failure scenarios and are less disruptive to customers.

2.      The cutover is simple and works both ways. No complicated plan to switch to the new release.

3.      There is a rollback scenario that doesn’t require undoing all previous work in the release plan. The rollback plan is simple: just cut back to the other environment.

4.      In case of a failed release, the environment state is preserved, decoupling triage and post-mortems from the actual production environment.

The downsides are plenty, too.

1.      The cutover between the blue and green is encompassing and immediate. All traffic is switched over to the other environment, which increases the blast radius (risk plus impact) if something turns out to be wrong with the new release that wasn’t caught during testing.

2.      It requires double the resources. There’s no cheating either; as each environment needs to be able to handle peak loads on its own.

3.      Keeping the inactive environment in-sync and up-to-date is a pain. This doubles operational effort to upgrade and patch versions and keep configuration in lockstep with the active environment.

4.      Risk of data corruption. Not taking care of the transactional state stored in relational databases and block, file and object storage in blue-green deployments can lead to data corruption.

5.      Databases needs to maintain backward and forward compatibility to be able to switch between blue and green.

Canary releasing the policy-based way, or what to do instead

Fortunately, there are better release strategies out there to mitigate these downsides. Canary releasing solves a couple of major pain points of blue-green deployments:

1.      Limiting the blast radius to a smaller but representative user segment (called the canaries) by dosing the traffic that goes to the new release

2.      Requiring only extra resources for services that are running a canary, which often is only a small part of the entire production environment

3.      By using stateless microservices, keeping configuration in sync is no longer an issue

The biggest pro of using canary testing is that you test each release in production by sending it a segmented amount of user traffic. That’s is a huge improvement over the synthetic testing in blue-green deployments (because only one environment receives user traffic, testing cannot happen with production traffic).

However, typically teams must subjectively decide on which conditions to base the release test and manually monitor the release’s health, performance and state to decide whether to scale the release up or down. There’s a lot of guesstimating and manually monitoring squiggles on a graph.

But what if you could set all the conditions to base release decisions on before the release test? And you automate those conditions for autonomous scaling or rollback? You can add technical metrics such as service performance and user response time, but also – and crucially – business metrics related to conversion, revenue, or basket size. You could do that by gathering all stakeholder input – Dev, Ops, SRE, Product Owners and Business Analysts – upfront and codifying it into release policies that act as a fulcrum to support a robust continuous releasing process. The fine-grained control that policy-based canary releasing offers decreases the cost, risk and stress of either the big bang or blue-green approach to software releasing.

If you would like to learn how automated release policies work to release simply, safely, with confidence and at scale, book a Guided Tour with one of our experts!

Or join us for more on release strategies in next week’s blog post as we deep-dive into canary releases, regular and policy-flavored, and in our on-demand webinar 5 Release Strategies You Should Know for Continuous Delivery.