This is part two in a series on near-zero downtime deployments.
As we begin to approach zero-downtime database deployments, I’m going to lay out a few assumptions which will make your life much easier. Then, I will cover the idea of the release flow.
There are three major things which will greatly increase the likelihood of successful (near) zero-downtime deployments: source control, automated releases, and continuous integration. Let’s look at each in order, starting with source control.
You really want to have source control in place for a number of reasons. It’s not absolutely required, but it is quite helpful. Source control is one of your safety nets as you make changes: you have a history of what the code looked like before and have the possibility of reverting code in the event of failure. When it comes to minimizing downtime, a stable source control repository helps us put into place the other two key assumptions.
What should you use for source control? Pick whatever you want. Git is pretty much the standard these days, but if you want to use Mercurial, SVN, Team Foundation Version Control, or pretty much any other source control product, go for it. The advantage to Git is that it’s pretty easy to hook into a Git repo and automate processes, but at the end of the day, a source control system is primarily a place where you put code so that you don’t lose it.
A Release Process, Preferably Automated
The next assumption is that you have a release process in place as opposed to “developers push F5 to run scripts whenever they remember to.”
Like source control, that you have something is more important than what you have. Maybe you use Azure DevOps (nee Team Foundation Server). Maybe it’s Octopus Deploy or Jenkins or Cruise Control or something else. Maybe you wrote your own build process with Ant (or NAnt) or MSBuild or just a set of Powershell scripts. I’m not here to shame you into using a particular product; you do you. But you really need a release process. Ideally, this is an automated release process where it takes zero (or just a few) clicks to deploy code. If you have a person running scripts manually, that introduces a lot of uncertainty into the process. Did that person run all of the scripts? In the right order? At the right time? It also tends to be much slower than an automated process where you have the possibility of concurrency.
You also have choices about how your release scripts end up in the right place. There are a couple of alternatives here: automatically generate deployment scripts or hand-write deployment scripts. SQL Server Data Tools database projects, for example, can generate a diff file for schema changes. One warning with that, however, is that the auto-generated diff file will not necessarily avoid downtime. I find it really useful for relatively simple operations: creating and modifying stored procedures, adding new tables, that kind of thing. But my preference is to hand-write deployment scripts, as I’m less likely to do something like drop and recreate an index offline because the casing changed. That comes with the downside of being quite a bit more effort, though, so maybe a combination of the two is best.
Once you have a release process in place, you can get to the last key assumption.
This is the least important of the three but still plenty important in its own right. You should have a QA environment where your continuous integration process can keep deploying scripts. That way, you can make sure that as you develop during a sprint (or project or whatever you use), you can still deploy all of your code successfully. This means your deployment process has to be re-runnable.
Any of the processes above—even rolling your own Powershell scripts—can be used in a Continuous Integration scenario, but some of the tools are better-suited for it.
Now that we have these key assumptions in place, let’s talk about the flow of operations.
Going with the Flow
We have an environment with source control, an automated release process, and continuous integration. With that structure in mind, let’s talk about the timings of release as phases. There are four phases in total: pre-release, database release, code release, database post-release.
When we start pre-release, we have a stable system. We know what the code looks like and what the database objects look like. The pre-release process starts…well, whenever you want it to start. Unlike the rest of the phases, pre-release has no fixed starting point.
I like to use pre-release for three things:
- Scheduling things that will take a long time. For example, building a new index (online, to prevent blocking as much as I can) on a big table or migrating data from one table to another.
- Making changes which need to happen before the rest of the process. This might be a preparatory step like shutting off a back-end service temporarily in order to make some changes to it.
- Phase 1 of a very complex set of database changes where I need multiple steps. We’re going to see an example of this in my case study as part 8 in this series. Yeah, it’s a long series.
One of the key considerations here is that during the pre-release process, we typically do not want to degrade the customer experience. Keep those changes behind the scenes.
After pre-release is over, we can move to the database release phase.
The database release phase is the first “primary” phase. It usually starts on a schedule, maybe 2 PM on a Wednesday or maybe “every day at 9 AM, 1 PM, 6 PM, and 10 PM” for more mature shops. Depending upon how much of an effect our release process normally has on end users, we might alert them that we expect to see a degradation in services starting at this point.
This phase of the release has us push out our database changes. This can involve creating or altering database objects but will not involve dropping existing objects.
Our database changes should support the blue-green deployment model. At this point in the process, all of the application code is “blue”—that is, the current production code. Our procedure changes need to be able to support that code without breaking. If we need to drop a column from a stored procedure, for example, we would not want to do it here. If we need to add a column to a stored procedure, we might do it here as long as it doesn’t break the calling code.
Once we’ve taken care of these non-disruptive database changes, we can move to the next phase of the process: the application code release.
Application Code Release
Here is where we get to the classic stateless blue-green deployment scenario. Ideally, we have multiple application servers running or we can spin up new ones easily (like in AWS or Azure). For the case of oversimplification, let’s say that we have 4 on-prem application servers. When we start out, all of our servers have the blues.
We take two of the four servers out of our load balancer, so they become grey.
While they’re offline, we load the new code on those servers and bring them back into the load balancer. They’re running new code, which we represent as green.
At this point, we have both sets of code running concurrently: some users are hitting old code and others are hitting new code. Our database needs to support both versions of the code for long enough for us to know that this release will be successful.
Once we’re confident that the new code is working as expected, we bring the remaining blue servers down.
We then load the new code on our remaining servers and re-enlist them into the load balancer.
Once this happens, the customer experience degradation should be over: users should now continue on, blissfully unaware that a bunch of stuff changed around them.
We aren’t out of the woods yet, though, because there’s still one more phase to go, and it’s my favorite.
The database post-release phase is where we get to do all that destructive stuff: dropping obsolete tables, dropping columns, dropping everything. Burn the house down.
At this point, we don’t have any code relying on old database objects, so they’re fair game. Now, as we go through this series, I’ll add some important nuance to the slightly overboard version above. But at its core, post-release is about cleaning up the old stuff.
Database post-release begins after the code release finishes and it goes on until you are completely done. This is also non-disruptive in the sense that we’re not running anything which should impede end users or block tables for an extended time.
In today’s post, we covered some of the key assumptions and process behind (near) zero-downtime deployments. With these in mind, we’ll spend the rest of the series working through different scenarios and seeing what we can and cannot do. Because that depends upon a few factors, our next post will look at ways we can make our lives simpler.