In a less ruthlessly pragmatic profession than software engineering, a topic like the point of no return might be relegated to something one only refers to in whispers, like “that which shall not be named”. Alas! As those responsible for making the “technological visions” of others stable and scalable under any pressure and environment, engineers and architects do not have the luxury of resorting to superstition. That’s why in this blog post, we deal with the topic of the point-of-no-return head on. Here’s how to rethink your release process to avoid lengthy manual rollbacks, outages in production, and the general stress to people and risk to business of failing software.
Hitting “release” can mean bad consequences, but how did we get here?
The point-of-no-return is the point at which you push a new software change and you effectively lose control over the release process. You lose control because you’ve structured your release in one of two ways:
- As a “big bang” in which all components get updated at once; or
- As a “rolling update” (in, for instance, Kubernetes) in which you are updating a series of instances one-at-a-time.
These are effectively the same approaches to releasing. The reason you lose control at the moment of releasing to production is that you are not releasing in a way that would allow you control in the first place.
How to keep control? Rethink your release process as a test
We write a lot about making releases incremental and controlled by applying a methodology like canary releasing. That controls the impact of failing software and makes it easier to automate rollback. Here’s what it comes down to:
- Rethink your release process: releasing should not be the moment you expose all your changes to real customers in production, but think of the release process as a test.
- Testing a new release on small sub-segments of real user traffic in production gives you a lot of control.
- But in order to set a controlled test, we recommend writing steps and conditions for the release test into a flexible release policy. A policy includes information like: the specific traffic segment, the business and health metrics your code needs to meet, the duration of the release and the response time.
- If any of those things aren’t met, it makes it easy for you to roll back to the previous version, and find the cause of the failure in the release, because you are testing for a just a few changes, not “a big batch of changes”.
What steps to take, what pitfalls to avoid and how to automate rollback
- whether you are releasing a monolithic application or a collection of microservices, don’t release all at once
- create instead a release process that enables incremental releasing to production using a methodology such as canary releasing
- codify the conditions for your release into a release policy: the release policy outlines the steps and conditions for the release test to be successful as well as an on-failure procedure if something goes wrong. Testing based on a release policy also allows you to automate a rollback if something goes wrong or scale up of traffic if your release is successful.
- Pitfall to avoid: If you have interdependent services make sure you incrementally release the dependencies first and ensure that they work before releasing any customer-facing changes. Releasing customer-facing changes should also be done incrementally (one component or one service at a time) as specified in your release policy.
- You end up with a “Soft PNR”: taking an incremental approach to releasing means that the point-of-no-return (or rather point-of-no-control) is reduced to a one or two-minute window as the policy exposes your new code to a small subset of your users and monitors the response.
- So, the first point you can rollback is at the end of the “canary’s” execution window, and the rollback is automated as the next step in the policy if one of your conditions isn’t met
If you would like to read more about how release policies work
A policy-based approach to release management makes releasing repeatable, scalable and safe. That’s why we wrote our Release with Conﬁdence: How to Perform a Policy-Based Canary Release in 5 Steps whitepaper. In it, you will learn how this approach can inject more confidence into your release process.
Vamp Intelligent Release Orchestration
We know what it’s like to work on painful, risky releases that bring stress to people and risk to business. Vamp was built to fight the fear of the point of no return, and Friday releases by providing a Cloud-Native and smart release orchestration solution that takes over release decisions for you and ensures your current software in production is always online. If you would like to learn how Vamp works, book a guided tour of the product with our CTO, Olaf Molenveld.