This is part five in a series on near-zero downtime deployments.
Last time we met to talk about deployments, we covered stored procedure changes. In this post, we will look at a number of basic changes around tables.
Creating a New Table
Phase | Process |
Database pre-release | |
Database release | Deploy new table Deploy new procedures (optional) |
Code release | Deploy calling code (optional) |
Database post-release |
Creating a table is easy when it comes to deployments. It’s a new table, so there won’t be any code dependencies. We want to deploy the table, then deploy any procedures which access this table, and deploy calling code in the release. As an example, here is a re-runnable script which creates a table called dbo.ProductLink
and includes indexes and foreign key constraints.
USE [Database]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF(OBJECT_ID('dbo.ProductLink') IS NULL)
BEGIN
CREATE TABLE [dbo].[ProductLink]
(
[LinkID] BIGINT NOT NULL,
[LinkTypeID] TINYINT NOT NULL,
[CustomerID] [int] NOT NULL,
[ProductID] [bigint] NOT NULL,
[CreateDateGMT] [datetime2](3) NOT NULL,
[LastModifiedGMT] [datetime2](3) NOT NULL,
CONSTRAINT [PK_ProductLink] PRIMARY KEY CLUSTERED
(
[LinkID] ASC,
[CustomerID] ASC,
[ProductID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100, DATA_COMPRESSION = PAGE) ON [Tables]
) ON [Tables]
CREATE NONCLUSTERED INDEX [IX_ProductLink_CustomerID_ProductID] ON [dbo].[ProductLink]
(
[CustomerID] ASC,
[ProductID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100, DATA_COMPRESSION = PAGE) ON [Indexes]
CREATE NONCLUSTERED INDEX [IX_ProductLink_LinkID_LinkTypeID] ON [dbo].[ProductLink]
(
LinkID,
LinkTypeID
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100, DATA_COMPRESSION = PAGE) ON [Indexes]
ALTER TABLE dbo.ProductLink ADD CONSTRAINT [FK_ProductLink_LinkType]
FOREIGN KEY (LinkTypeID)
REFERENCES dbo.LinkType (LinkTypeID);
ALTER TABLE dbo.ProductLink ADD CONSTRAINT [FK_ProductLink_Link]
FOREIGN KEY (LinkID)
REFERENCES dbo.Link (LinkID);
END
GO
This table links to a couple other tables, Link and LinkType and could link to a Product table too, so if these are new tables, they would need to be created before we can execute this code.
After deploying the table, we can deploy procedures to modify or query data in the ProductLink table. Then we can release code which uses this new table.
Adding a Column to an Existing Table
Phase | Process |
Database pre-release | |
Database release | Alter table Deploy altered procedures (optional) Deploy versioned procedures (optional) |
Code release | Deploy calling code (optional) |
Database post-release | Deprecate old versions of procedures (optional) |
Adding a new column to an existing table is easy if you can meet one of the two circumstances:
- The new column is nullable.
- The new column is non-nullable but contains a default constraint.
The nice thing about default constraints is that they work as metadata operations: adding a default constraint to an existing table does not require writing the default value to every row on every page. Let’s say I have a table called dbo.PredictionTest
. This table has three columns:
CREATE TABLE [dbo].[PredictionTest]
(
[PredictionID] [bigint] IDENTITY(1,1) NOT NULL,
[PredictedQuantitySold] [decimal](8, 3) NOT NULL,
[ActualQuantitySold] [decimal](8, 3) NULL
) ON [Tables]
GO
I want to add a new column called IsCloseEnough, which is a bit field describing whether the predicted quantity was close enough to the actual quantity. If I simply want to make this a nullable column, the re-runnable script could look like:
IF NOT EXISTS
(
SELECT 1
FROM sys.columns sc
INNER JOIN sys.Tables st
ON sc.object_id = st.object_id
WHERE
st.name = N'PredictionTest'
AND sc.name = N'IsCloseEnough'
)
BEGIN
ALTER TABLE dbo.PredictionTest ADD IsCloseEnough BIT NULL;
END
GO
This is your quick reminder that “near-zero” still means we need locks. Here’s a quick demo. In session 99, I start a transaction to insert a new row into PredictionTest:
BEGIN TRANSACTION
INSERT INTO dbo.PredictionTest
(
PredictedQuantitySold,
ActualQuantitySold
)
VALUES
(
500,
495
);
Then, in session 86, I try running the alter statement. Finally, in session 114, I try to insert another row. Here’s what the blocking chain looks like with sp_whoisactive:

Notice that 86, our script to add the new column, wants a LCK_M_SCH_M lock. This is a schema modification lock. This lock type is incompatible with LCK_M_IX, which the insert operation needs and is why 86 is blocking session 114.
Eventually, I commit transaction 99.

We still have the blocking chain and I picked up a friend from some automated task which was running during my demo that wanted to read from the table.
Now, in practice, this alter operation won’t take long assuming you commit your transaction and unless it’s on an extremely busy table, people typically won’t notice the milliseconds of locking.
I decided to roll back the table alter statement so that I could continue with adding a column with a default constraint. To do this, we need to include the constraint as part of the ALTER TABLE
statement. Otherwise, if you alter a table and set it to NOT NULL
without a constraint, you will get a surprisingly helpful error:
Msg 4901, Level 16, State 1, Line 22
ALTER TABLE only allows columns to be added that can contain nulls, or have a DEFAULT definition specified, or the column being added is an identity or timestamp column, or alternatively if none of the previous conditions are satisfied the table must be empty to allow addition of this column. Column ‘IsCloseEnough’ cannot be added to non-empty table ‘PredictionTest’ because it does not satisfy these conditions.
With a constraint, the script now looks more like this:
IF NOT EXISTS
(
SELECT 1
FROM sys.columns sc
INNER JOIN sys.Tables st
ON sc.object_id = st.object_id
WHERE
st.name = N'PredictionTest'
AND sc.name = N'IsCloseEnough'
)
BEGIN
ALTER TABLE dbo.PredictionTest ADD IsCloseEnough BIT NOT NULL CONSTRAINT [DF_PredictionTest_IsCloseEnough] DEFAULT(0);
END
GO
Once you alter the table, you can push out new procedures which use the table. This breaks down into procedures you need to version and procedures you do not need to version. For the long version, you can infer from the prior post in this series what the rules are, but here is a quick summary:
- If you do not intend to expose this new column in a procedure, you probably don’t need to make any changes at all.
- If you expose this column in a result set, you probably do not need to version the procedure unless you have a particularly brittle data access layer. Hopefully you don’t have that.
- If you use this new column in filters, joins, or the like without exposing the column in a result set, you can deploy the changed procedure without versioning it or changing code.
- If you add a new parameter to a stored procedure—a procedure for inserting or updating the data, for example—you probably will need to version the procedure.
- If you use a table-valued parameter and the table type needs this new field as well, you will need to create a new version of the table type and then version the procedures which use the table type.
Once those changes are in place, deploy the code. Then, clean up by removing any deprecated procedures, table types, or other objects which are no longer in use.
Removing a Column from an Existing Table
Phase | Process |
Database pre-release | |
Database release | Deploy new versions of procedures |
Code release | Deploy calling code |
Database post-release | Deploy altered procedures Drop old versions of procedures Drop column |
Removing a column is trickier than adding a column. There are even debates about whether you ever remove columns because removal is permanent. If you drop the column, you can’t roll that back later when you find out you really need it.
As a dedicated Deletionist (that is, the type of person who really enjoys deleting code), this does not deter me one bit. You have to make sure that the column really isn’t in use any longer but if that’s the case, I don’t see much value in keeping it around.
The flow here is a little different from adding the column. First, for procedures which you need to version (e.g., cases where the input parameter change), create the versioned procedures as part of the normal database release. Then, release code which eliminates all reference to the to-be-removed column and uses the new versions of these procedures. After code is deployed and we are in the database post-release phase, we can deploy procedures which have changed, drop the procedures which were versioned, and finally drop the column.
Before I move on, I should note that you might want to rebuild indexes (including the clustered index) on the table after you drop columns, especially if you modify columns in LOB storage.
Changing Column Details
When I say “column details,” I mean one of a few things:
- Making a non-nullable column nullable or vice versa.
- Extending or shortening the maximum length of an (N)(VAR)CHAR column.
- Changing the column type on a table.
I’ll break these out into less significant and more significant changes.
Less Significant Changes
Phase | Process |
Database pre-release | |
Database release | Alter table Deploy altered procedures (optional) Deploy versioned procedures (optional) |
Code release | Deploy calling code (optional) |
Database post-release | Deprecate old versions of procedures (optional) |
Less significant changes are ones where we can make the change in-place. An example of this is taking a VARCHAR(30) column and extending it to VARCHAR(40) because we find out we need the extra space. This is the type of change you can build in place because it’s effectively a metadata operation. Here are a couple examples of metadata operations:
-- Make a non-nullable column nullable.
BEGIN TRAN
ALTER TABLE dbo.PredictionTest ALTER COLUMN PredictedQuantitySold DECIMAL(8, 3) NULL;
ROLLBACK TRAN
-- Extending a VARCHAR column. SomeData started off as VARCHAR(30)
BEGIN TRAN
ALTER TABLE dbo.PredictionTest ALTER COLUMN SomeData VARCHAR(40) NULL;
ROLLBACK TRAN
Of note is that changing a DECIMAL(8,3) to DECIMAL(10,3) or a TINYINT to an INT does modify the data and will cause a block as we scan through all of the data. Also, switching a nullable column to non-nullable requires a scan of the data and blocking. If you want to make a change like one of those, you’re in the realm of More Significant Changes.
More Significant Changes
Phase | Process |
Database pre-release | Add new column Create backfill process (and maybe temp index) Create trigger |
Database release | Deploy temporary versioned procedures (optional) |
Code release | Deploy calling code (optional) |
Database post-release | Deploy altered procedures (optional) Deploy final versioned procedures (optional) Drop old versions (optional) Drop temporary index Drop trigger Drop backfill process Drop old column Rename new column (optional) |
The realm of More Significant Changes is not where you often want to be. There’s a lot of scaffolding code you need to write. Basically, suppose you want to make a repair on the 5th story exterior of an 8-story building. You have a couple of options: the YOLO option is to kick everybody out of the building and have people rappel from the top of the building down to the 5th story to make their changes. They need all of the people out of the building because shut up it’s a strained analogy. This approach is kind of inconvenient: people have to stay out of your building until your people are done repairing the exterior. That’s blocking in the database world.
On the other side, you can build a bunch of scaffolding and attach it to the exterior of the building, perform your repairs, and tear down that scaffolding. While the scaffolding is up, people come and go like normal and don’t even think about it. As you tear the scaffolding down, you temporarily block the door for a moment as you’re finishing your work. This is much more convenient for end users and fits the “near-zero downtime” approach we’re taking.

If you look at that scaffolding and think “that’s a lot of waste material there and a lot of effort that will just go away as soon as the repair is done,” that’s technically correct but misses the broader point: we do extra work to make our changes happen safely and with a minimum of disturbance.
Strained analogy aside, let’s talk about the mechanics of this. Let’s say that we find out our ActualQuantitySold needs to be DECIMAL(10,3) instead of DECIMAL(8,3). I want to keep the name as ActualQuantitySold but we have a lot of rows and can’t afford to block for more than a few milliseconds.
Step 1: Add New Column
The first thing I need to do is add the new column, which I will call ActualQuantitySoldTemp
. All procedures are currently pointing to ActualQuantitySold
, which is a DECIMAL(8,3). If I’m okay with the final column being nullable, I can run a command like so:
ALTER TABLE dbo.PredictionTest ADD ActualQuantitySoldTemp DECIMAL(10, 3) NULL;
But I want this to be non-nullable so I’m going to have to create a default. Here I’ll have a sentinel value (that is, one that I as a developer know is a false, impossible value to reach and can use it as a logic check):
ALTER TABLE dbo.PredictionTest ADD ActualQuantitySoldTemp DECIMAL(10, 3) NOT NULL CONSTRAINT [DF_PredictionTest_ActualQuantitySoldTemp] DEFAULT(-99);
With that column in play, I’m going to start my backfill.
Step 2: Create Backfill Process
Now that I have a temporary column, I need to create a process to update this table. For a small table, I might do the update as a simple UPDATE
statement like:
UPDATE dbo.PredictionSet
SET ActualQuantitySoldTemp = ActualQuantitySold;
If I’m dealing with an exceptionally large table, though, I don’t want to update the entire thing in one go. Even if there’s no blocking because I’m using Read Committed Snapshot Isolation, I can use up a lot of transaction log space with a big enough update. For those big tables, I’m going to want to run a batching process.
There are two main approaches I tend to take when batching, where one requires a bit of extra infrastructure and the other doesn’t. If your table contains a monotonically increasing clustered index (e.g., an identity column or column using a sequence), then you can store the latest ID in a table and run batches. For example, let’s create a clustered primary key on dbo.PredictionTest
and create a backfill table called dbo.BackfillMetadata
.
ALTER TABLE dbo.PredictionTest ADD CONSTRAINT [PK_PredictionTest] PRIMARY KEY CLUSTERED (PredictionID);
CREATE TABLE dbo.BackfillMetadata
(
BackfillTable SYSNAME NOT NULL,
MaximumProcessedValue BIGINT NOT NULL,
CONSTRAINT [PK_BackfillMetadata] PRIMARY KEY CLUSTERED(BackfillTable)
);
Our batch update process is fairly simple, allowing us to control the number of operations and number of records per batch. You could certainly make this more complex or feature-rich; this is just a quick demonstration of approach.
DECLARE
@LatestBackfilledID BIGINT,
@BackfillTable SYSNAME = N'PredictionTest',
@BatchSize INT = 1000,
@NumberOfBatches INT = 5;
SET NOCOUNT ON;
SELECT
@LatestBackfilledID = MaximumProcessedValue
FROM dbo.BackfillMetadata bm
WHERE
bm.BackfillTable = @BackfillTable;
IF (@LatestBackfilledID IS NULL)
BEGIN
-- If you start your identity columns at MIN_VALUE, this would change.
SET @LatestBackfilledID = ISNULL(@LatestBackfilledID, 0);
INSERT INTO dbo.BackfillMetadata
(
BackfillTable,
MaximumProcessedValue
)
VALUES
(
@BackfillTable,
@LatestBackfilledID
);
END
DECLARE
@i INT = 0,
@LoadThroughID BIGINT,
@RowsAffected INT = 0;
WHILE (@i < @NumberOfBatches)
BEGIN;
WITH records AS
(
SELECT TOP (@BatchSize)
PredictionID
FROM dbo.PredictionTest pt
WHERE
pt.PredictionID > @LatestBackfilledID
ORDER BY
pt.PredictionID ASC
)
SELECT
@LoadThroughID = MAX(PredictionID)
FROM records;
IF (@LoadThroughID IS NULL)
-- We have finished processing.
BREAK;
UPDATE dbo.PredictionTest
SET
ActualQuantitySoldTemp = ActualQuantitySold
WHERE
PredictionID > @LatestBackfilledID
AND PredictionID <= @LoadThroughID;
SET @RowsAffected = @@ROWCOUNT;
RAISERROR ('%i rows updated.', 10, 1, @RowsAffected) WITH NOWAIT;
SET @i = @i + 1;
SET @LatestBackfilledID = @LoadThroughID;
END
UPDATE dbo.BackfillMetadata
SET
MaximumProcessedValue = @LoadThroughID
WHERE
BackfillTable = @BackfillTable;
GO
If you do a lot of backfills and typically use identity columns as your clustered indexes, this approach has a lot going for it. It’s going to be efficient and as long as you put in error handling (for example, so that you don’t update the maximum processed value to NULL in the event of an error updating the table), you’ll be fine.
Supposing you don’t have a nice clustered key to use, you can simply keep updating until you handle all missing records (here I went with the IS NULL variant but it could also be my sentinel value of -99 if I go down the non-nullable path):
UPDATE TOP(1000) dbo.PredictionTest
SET
ActualQuantitySoldTemp = ActualQuantitySold
WHERE
ActualQuantitySoldTemp IS NULL;
Here’s the problem: that operation gets progressively worse and worse. Here’s what the execution plan looks like:

The first time you run it, it probably performs pretty well. We can see why on the execution plan details:

We read 2000 rows and update 1000 rows. We needed to read 2000 rows because I had previously updated the first thousand (as a test; I actually test my blog post code…sometimes…). The next time we run this scan, we’ll read 3000 rows and so on. This is after me running the update statement 13 times:

On a multi-billion row table, your dev testing probably looks just fine (because dev has 1500 rows total) and it even looks great in production for the first hour or two, depending upon how aggressive you are in updating. But once you start getting into millions of rows, the number of I/O operations will increase steadily until you’re reading past 9,999,000 rows to get to the next thousand.
This is what you might call an anti-scaling pattern. Sometimes, people see this bad performance and say, “Hey, I’ll add an index!” So let’s add an index:
CREATE NONCLUSTERED INDEX [IX_PredictionTest_ActualQuantitySoldTemp] ON dbo.PredictionTest
(
ActualQuantitySoldTemp,
ActualQuantitySold
) WITH(DATA_COMPRESSION = PAGE, ONLINE = ON) ON [Indexes];
That solved our problem:

Oh, wait, that was sarcasm. It’s been hard to tell since I broke my sarcasm detector. This index didn’t help. It made things worse. So let’s drop that stupid index and use a smarter index:
CREATE NONCLUSTERED INDEX [IX_PredictionTest_ActualQuantitySoldTempFiltered] ON dbo.PredictionTest
(
ActualQuantitySold,
ActualQuantitySoldTemp
)
WHERE
(
ActualQuantitySoldTemp IS NULL
) WITH(DATA_COMPRESSION = PAGE, ONLINE = ON) ON [Indexes];
But to use this filtered index, we’re going to need to rewrite our update statement; otherwise, it won’t use the filtered index and will just use the clustered index.
WITH records AS
(
SELECT TOP(1000)
pt.PredictionID,
pt.ActualQuantitySold,
pt.ActualQuantitySoldTemp
FROM dbo.PredictionTest pt WITH(INDEX([IX_PredictionTest_ActualQuantitySoldTempFiltered]))
WHERE
pt.ActualQuantitySoldTemp IS NULL
)
UPDATE records
SET
ActualQuantitySoldTemp = ActualQuantitySold
WHERE 1=1;
The good news from introducing all of this added complexity is that at least the query plan is stable. Here’s what it looks like after 15 executions:

Now we read 1000 rows to update 1000 rows twice: once for the clustered index and once for the temporary filtered index. This makes it more expensive up-front than the anti-scaling approach, but as you get into the millions, this will perform much better. Still not as good as the clustered index update, however, so use this approach only if you need it.
Step 3: Create a Trigger (Optional but Recommended)
During our backfill, people are still writing data to the table. Remember: we’re trying to act in a way which disrupts our customers the least. When we flip the switch on this column, we need the new column to have all of the data. The best way to ensure this is to create a trigger which updates ActualQuantitySoldTemp
and sets it to ActualQuantitySold
if those two values are not the same. That gives you an end cap for your backfill: you need to backfill up to the point when that trigger went out, but nothing past it. If you have a relatively “stupid” backfill like my clustered index version above, it won’t hurt for the backfill to keep running past the point when we added the trigger, though it is technically some wasted processing power. If you want to make a smarter backfill strategy which knows when to stop, go for it.
We need to pause here until the backfill is complete. Once the backfill is complete, we can leave the pre-release phase and move into the database release phase.
Step 4: Release the Hounds
Here, I’m combining the next couple of phases together. You’ll want to deploy out new versions of procedures which take in ActualQuantitySold
as DECIMAL(10,3)
rather than DECIMAL(8,3)
. These new versions are temporary versions, though: they will need to write to ActualQuantitySold
and ActualQuantitySoldTemp
(or you could just write to the former and let the trigger continue to handle the data move). Basically, this version of the procedure exists to enable the next phase: deploying code which expects the new version. Until you’ve completed the type change, your application code should still try to limit itself to the more restrictive of the two data types (that is DECIMAL(8,3)
in our case).
Step 5: The Great Switcharoo
Now what we have is a fully-finished ActualQuantitySoldTemp
but I want it to be called ActualQuantitySold
. We are officially in the database post-release phase and we’ve got a lot of work yet to do. Here are the steps in order:
Step 5a: Deploy Altered Procedures
First, we need to deploy any procedure alterations which did not require versioning the procedure. An example of this is where we load values in dbo.PredictionTest
into a temp table and do processing from that table. We can simply update the signature of the temp table and move on with our lives.
Step 5b: Deploy Final Versioned Procedures
In the database deployment phase, we pushed out temporary versions of versioned procedures. Now we can see them in their final form!
What I mean by this is that we can drop all reference to ActualQuantitySoldTemp
and reference only ActualQuantitySold
. For a few milliseconds, this won’t be quite right but we’ll survive, yes we will.
Step 5c: Drop Temporary Index and Backfill Procedure
If you created a procedure to perform this backfill, you can drop it here. You can also drop any temporary indexes used in the process.
Step 5d: Drop Trigger, Drop Old Column, Rename New Column
That temporary trigger you created during the pre-release phase won’t be needed any longer. In a single transaction, we want to drop the old trigger, drop the old column, and rename the new column (if necessary).
BEGIN TRANSACTION
DROP TRIGGER [dbo].[PredictionTest].[tr_PredictionTest_Backfill_IU];
ALTER TABLE dbo.PredictionTest DROP COLUMN ActualQuantitySold;
EXEC sp_rename
'dbo.PredictionTest.ActualQuantitySoldTemp',
'ActualQuantitySold',
'COLUMN';
COMMIT TRANSACTION
This is a blocking step! During the transaction, you will block anybody trying to read or write. The good news is that dropping triggers, dropping columns, and renaming columns are all fast operations so it shouldn’t block for more than a few milliseconds once your transaction is able to begin (that is, once the writer queue ahead of it disappears).
Step 5e: Rebuild Indexes
If you make big enough changes (like during a VARCHAR(1500)
into a VARCHAR(MAX)
), you might want to rebuild indexes afterward to ensure that you re-pack any gaps in the indexes. This is something you might want to do in a regular maintenance period and doesn’t necessarily have to happen right after these other steps.
Step 5f: Optionally Rename Default Constraint
Recall that if we wanted the expanded ActualQuantitySold column to be non-nullable, we needed to add a default constraint. Here I would like to change the default constraint’s name to match its new column, and doing this is just another sp_rename
call:
EXEC sys.sp_rename
@objname = N'DF_PredictionTest_ActualQuantitySoldTemp',
@newname = N'DF_PredictionTest_ActualQuantitySold',
@objtype = N'OBJECT';
Whew. Told ya’ you didn’t want to go down this path too often.
Renaming a Table
Phase | Process |
Database pre-release | |
Database release | Rename table Create temporary view Deploy altered procedures |
Code release | |
Database post-release | Drop temporary view |
Let’s say you have a table named dbo.tblTableTibble
and it’s a really stupid name that you’d like to change to dbo.Customer
because it actually stores customer data and tibbling is annoying beyond all get-out.
If you followed the handy-dandy assumptions and have all of your calling code (including application code, SQL Agent jobs, and the like) go through stored procedures, this is a really easy process. Let’s go through an example with dbo.PredictionTest
and rename it to dbo.TestPrediction
.
First, we call sp_rename
to rename the table. In the same transaction, I’m going to create a new view which matches the schema of the old table. But here’s the trick: views must be the first operation in a batch. You cannot run a rename operation and then a CREATE VIEW
statement together. Unless we change the context, that is, and execute the view creation statement as dynamic SQL; then it’s in its own batch and we are fine.
BEGIN TRANSACTION
EXEC sp_rename
@objname = N'dbo.PredictionTest',
@newname = 'TestPrediction',
@objtype = NULL;
EXEC (N'CREATE VIEW dbo.PredictionTest AS
SELECT
pt.PredictionID,
pt.PredictedQuantitySold,
pt.ActualQuantitySold,
pt.SomeData,
pt.SomeInt,
pt.ActualQuantitySoldTemp
FROM dbo.TestPrediction pt;');
COMMIT TRANSACTION
This is a blocking step! It should only block for a very short amount of time but watch out if that table is extremely busy.
Creating the view means that we don’t need to update any procedure calls. Because it’s a trivial view, updates and deletes will pass through.
After we update the view, we can update any existing stored procedures which call dbo.PredictionTest
and have them point to dbo.TestPrediction
. This should probably be a simple find and replace in your source control system. Alternatively, if you know all of the references are in a single database, you can search sys.sql_modules
to find references like so:
SELECT
OBJECT_SCHEMA_NAME(sm.object_id) AS SchemaName,
OBJECT_NAME(sm.object_id) AS ObjectName,
CONCAT(OBJECT_SCHEMA_NAME(sm.object_id), '.', OBJECT_NAME(sm.object_id)) AS FullName,
CONCAT(OBJECT_SCHEMA_NAME(sm.object_id), '.', OBJECT_NAME(sm.object_id), ',') AS CommaSeparatedName,
definition
FROM sys.sql_modules sm
WHERE
sm.definition LIKE '%PredictionTest%'
--AND OBJECT_SCHEMA_NAME(sm.object_id) = 'Something'
--AND OBJECT_NAME(sm.object_id) = 'Something'
ORDER BY
SchemaName,
ObjectName;
Once you’re sure you have everything, we can skip the code phase (because code doesn’t care about table names; it’s going through stored procedure interfaces) and in the post-release phase, drop the temporary view.
DROP VIEW dbo.PredictionTest;
Dropping a Table
Phase | Process |
Database pre-release | |
Database release | Rename table |
Code release | |
FUTURE Database post-release | Drop table |
If you have an obsolete table and you are sure there are no code or procedure references to that table, it can be safe to drop. My advice is to rename the table to something like dbo.PredictionTest_DELETEMEAFTER20190501
. This tells other database developers that you understand this table should go away, but you’re giving it until May 1st of 2019 before you decide to pull the plug on it. I typically try to give 3-4 weeks of time between renaming and deleting.
The biggest benefit here is that sometimes you don’t realize how a table is used until some process which only runs once a month breaks or some weird application a long-gone intern wrote a decade ago makes direct database calls and depends on that table. It’s a lot easier to rename a table back than it is to restore a table from a backup.
At some point, you’re reasonably confident that this table really doesn’t get used and in a future database post-release, you can write the DROP TABLE
script.
Creating or Modifying a Non-Clustered Index
Phase | Process |
Database pre-release | Create index |
Database release | |
Code release | |
Database post-release | Drop old index (optional) |
When working with indexes, I’m assuming that you have Enterprise Edition. If you are running Standard Edition, you don’t get the benefit of online index creation and that means any index modifications you make will be liable to cause blocking and hence downtime. Also, I’m assuming that this is an existing table with code references. If it’s a brand new table, you can simply drop the index into the database release phase and be done with it.
Creating indexes should happen in the database pre-release phase, especially for larger tables with a lot of rows. You want to keep the database release and code release phases as short as possible to reduce the amount of exposure you have to weird interactions between different versions of code and database objects, so if you can complete all of your index operations before the database release happens, life is better.
If you are replacing an existing index, you have a couple of options: you can create the index with DROP_EXISTING = ON
or create an index with a new name and drop the old one during the database post-release phase. My personal preference is that if I’m changing the key columns on the index (or if the index has a bad name like [index] or [dta_##_####_###_##_##_#_###_###], I’ll create an index with a new name.
One warning is that if you are running SQL Server 2017 and have a clustered columnstore index, any non-clustered indexes must be created offline. It’s very important to get columnstore tables right the first time and limit changes on them.
Other than the warning about columnstore, working with nonclustered indexes is pretty easy.
Creating a New Clustered Index
Phase | Process |
Database pre-release | Create index |
Database release | |
Code release | |
Database post-release |
If you have a table lacking a clustered index, you can create one online. Let’s walk through an example of this by first dropping the existing primary key and clustered index on dbo.PredictionTest
:
ALTER TABLE dbo.PredictionTest DROP CONSTRAINT [PK_PredictionTest];
Now the prediction test table is a heap. If I want to create a new clustered index (not primary key; I’ll cover constraints later in the series), I can create one like so:
CREATE UNIQUE CLUSTERED INDEX [CIX_PredictionTest] ON dbo.PredictionTest
(
PredictionID
)
WITH (DATA_COMPRESSION = PAGE, ONLINE = ON);
Locking gets just a little tricky here. Other sessions can select from the table while this online operation is occurring without an issue. Once the index is complete, the database engine needs to take a lock. During that phase, readers will be blocked with a schema stability lock (LCK_M_SCH_S) until the clustered index creation transaction commits.
Writers will block the start of clustered index creation with a LCK_M_S wait. Once the database engine begins creating the clustered index online, writers do not block operation, but if there are any writers still working once the clustered index is complete and ready to tag in, the session creating this clustered index will wait with a schema stability lock until all writers have finished their work. Once the clustered index has a chance to sneak in, it only needs a moment to wrap up its work.
Changing an Existing Clustered Index
Phase | Process |
Database pre-release | Drop index (if necessary) Create index (preferably drop_existing) |
Database release | |
Code release | |
Database post-release |
Let’s say that you have a clustered index now but want to change it. I’d generally recommend using DROP_EXISTING = ON
for clustered index modifications because of the reasons in Microsoft Docs. Namely, DROP_EXISTING
doesn’t do as much work as if you need to drop the clustered index and add it again. The key benefit is that if you use DROP_EXISTING
, the database engine only rebuilds non-clustered indexes once, whereas if you drop and re-create the clustered index, non-clustered indexes get built twice. Also, suppose that you decide to drop and create a clustered index for some reason. During the time period after the first clustered index gets dropped and before the second clustered index completely replaces it, performance might suffer. Degraded performance isn’t the end of the world, but it is something we can mitigate.
Here is a sample create statement with DROP_EXISTING
on:
CREATE UNIQUE CLUSTERED INDEX [CIX_PredictionTest] ON dbo.PredictionTest
(
PredictionID
)
WITH (DROP_EXISTING = ON, DATA_COMPRESSION = PAGE, ONLINE = ON);
Conclusion
In this post, we spent a lot of time on handling table changes. Some of these are trivial and others require a lot of foresight and planning. The long process of changing a data type is one of the reasons I can’t get entirely behind automated database build tools (like database projects in Visual Studio): I can write a deployment which leaves my end users happy, but those tools will either fail or try to create a new table and perform a fill-and-swap which can itself be trouble if the table is in use during this process.
4 thoughts on “Approaching Zero: Table Changes”