High software delivery performance unlocks competitive advantages. No wonder that business is always pushing for shorter release cycles and releasing more often. But increasing software delivery performance is not as simple as it looks. Teams working on optimizing software delivery often fear actually releasing software, allowing a massive queue of changes in each release to stack. Instead of fixing the root cause behind the fear, organizations often fall into the ‘enterprise release trap’, thinking slow throughput of new releases is a way of ensuring stability in the production environment. It is not. In this blog post, we look at the key metrics that define software performance and give practical tips on how to balance throughput and stability in your release process.
The status quo in release management: the throughput vs. stability trade-off
While many organizations are updating software delivery with CI/CD tooling and agile processes, the process of releasing software to users in production is often still done in a classic “big bang” release. Organizations take a cautious approach to releasing software by releasing infrequently and trying to mitigate risk by extending testing and quality assurance cycles.
But the result of this cautious approach to throughput doesn’t result in increased stability. When new releases contain a lot of changes pushed to the entire customer base at once, the chances that something will go wrong is higher, the impact on end-users is greater and the specific cause is harder to trace. That makes the mean-time-to-restore service after an outage longer. Faced with potentially long hours of their support team troubleshooting in the dark while their system suffers downtime, it’s not surprising that organizations get nervous and build in more contingencies in pre-production testing. This leads to a vicious circle, with the result that business can’t innovate fast enough.
Throughput and stability are not mutually exclusive for the release process
To show you how throughput and stability are not mutually exclusive, we have to crack open software delivery into four key measures of performance:
1. Deployment Frequency; how often is code deployed?
2. Lead time for changes; how long does it take to go from code commit to running in production?
3. Mean time to restore (MTTR); the time needed to restore service after an incident.
4. Change fail rate;the percentage of changes that result in degraded or impaired service.
You can see that software delivery performance is really about excelling in all these areas. You want to release often and quickly, roll back any issues immediately, and determine the contributing factors and root cause failures expediently.
But how do we get there? How do we increase the deployment frequency and lead time, while decreasing the MTTR and failure rate?
The big importance of incremental releasing in the release management process
The solution, on paper, is simple. Don’t optimize for just stability, but create a balance between throughput and stability. Instead of keeping to the conventional approach of large-batch releases, it makes sense to build confidence in the release process by taking an iterative approach. How do you do this?
1. Make each release small
Plan releases so that each one contains fewer changes. That way, if something goes wrong, it’s easier to determine exactly what and efficiently troubleshoot.
2. Release to a small subset of users in production, rollout incrementally
Even a small release can cause catastrophic issues. So, to protect the end-user experience, expose your new version as a test to a small subset of live production traffic (this is called canary releasing). Structuring the release as a test allows you to continuously check its health and scale up user traffic in a series of controlled steps.
3. Automate toil and rollback strategy
Make your release test conditions re-usable across teams by codifying them into rules or release policies. Add on-failure procedures to the rules in your policies. Policies lay the foundation for release automation.
4. Make your process smarter and safer with continuous validation
Use IT and business metrics to track the success of each release, even once it’s in production. Intelligent automation built around release policies with baked in on-failure procedures helps keep your system safe by constantly scanning your IT landscape, and automatically taking action if something goes wrong, without the need for human intervention.
Balancing throughput and stability to release with confidence – even on a Friday afternoon
You can see how, paradoxically, increasing the number of releases while decreasing the size of each release has a positive effect on both the throughput, as well as the stability of releasing to production. Throughput and stability enable each other; with an increase in the number of releases, each release becomes smaller, and simpler to put into production or troubleshoot if problems occur. That relieves the fear of releasing, and injects more confidence in the release process. That’s our mission at Vamp. We know what it’s like to work on big, risky releases that result in hard to trace issues and labor-intensive manual rollback procedures. We thought there must be a better way, so we built it. Vamp is intelligent release orchestration that uses the above principles to take over release decisions for teams, and across multiple teams and environments.
If you are interested in seeing how Vamp Cloud-Native Release Orchestration can help your organization innovate more frequently with near-100% reliability, even on a Friday afternoon, book a guided tour, with or without your own containers!