Testing, Part 1: The Basics

The first step in refactoring code, regardless of the language, is having repeatable, working tests.  This allows you to make changes to code safe in the knowledge that your changes do not break anything—if you have evidence to assert that your new code does what the old code did, you’ll find it much easier to improve your code base through refactoring.  Ideally, you’re working in an environment in which all code paths are covered by great unit and integration tests which run on a regular basis.  For those of us in the real world, however, that’s not generally the case, so we need to move in that direction by building out unit and integration tests.

Languages like C# and Java have full unit test support using tools like NUnit.  There are also great methods for developing unit tests against legacy code in object-oriented languages:  extract-and-override, creating interfaces and using an isolation framework, etc.  The point of these methods is to simplify dependencies in your existing code, making it easier to design unit tests (which fake dependency actions) rather than integration tests (which require dependency actions).

In T-SQL, we’re kind of limited on what we can do to simplify dependencies.  Assuming that your code base is all in stored procedures—which isn’t always true, but I’m going to be optimistic here—think about stored procedure dependencies:  we have other stored procedures; underlying tables, views, and synonyms; table-valued and scalar functions; linked server queries; CLR methods; and much, much more.  Even if you have a very simplistic stored procedure which selects a few columns from a single table, extracting all dependencies is practically impossible.

CREATE PROCEDURE dbo.Tally_GetTopNValues
	@NumberOfRecords INT
AS
SELECT TOP (@NumberOfRecords)
	t.N
FROM dbo.Tally t
ORDER BY
	t.N;

In this scenario, how do we test the procedure Tally_GetTopNValues?  Our dependency here is on a table named Tally.  By dependency, I mean that the results we get back from this procedure depend upon something external to the procedure itself:  the contents of the table named Tally.  If you want to get crazy, you could re-write all of your stored procedures to run dynamic SQL and sort-of-but-not-quite dependency injection, like the following:

CREATE PROCEDURE dbo.Tally_GetTopNValues_FakeDependencyInjection
	@NumberOfRecords INT,
	@TallySchema SYSNAME = 'dbo',
	@TallyTable SYSNAME = 'Tally'
AS

DECLARE
	@sql NVARCHAR(MAX) = CONCAT(N'
SELECT TOP(@NumberOfRecords)
	t.N
FROM ', QUOTENAME(@TallySchema), '.', QUOTENAME(@TallyTable), ' t
ORDER BY
	t.N;');

EXECUTE sys.sp_executesql
	@sql,
	N'@NumberOfRecords INT',
	@NumberOfRecords;

This is a clever idea.  Now, when I want to test, I can plug in my own table with known results.  That table might be a temp table set up with data how you want it or another table which handles a known data set.

But if this is your solution, you now have a bigger problem…  Imagine maintaining this when you’re looking at dozens of tables, or imagine maintaining hundreds of these.  It’s a neat idea in principle, but you lose several things with this.  Your code becomes harder to read due to a loss of syntax highlighting.  Your code becomes harder to maintain due to a loss of syntax checking.  You run the risk of parameter sniffing problems when generating these dynamic SQL stored procedures, but even if you don’t have parameter sniffing, there’s a small but non-zero added cost to building that dynamic SQL statement and executing it using sp_executesql.  Even if you’re willing to set aside those costs, this is more difficult and time-consuming to write, and it really goes against the way people develop SQL code.

So instead of trying to design some sort of dependency injection for SQL, let’s think some more about how we develop T-SQL and try to find ways to build testing in here.

Let’s say I’m responsible for building a query which accepts certain parameters { @x, @y, @z } and whose job is to return specific attributes { a1, a2, a3, b1, b2, b3, c1, c2, c3 } from specific entities { a, b, c }.  My first step as a T-SQL developer is to start from one of these tables. We can assume that we know that table a is the main table and that tables b and c provide lookup information which is helpful to users.

SELECT
	*
FROM a;

My results tell me that I can find relevant attributes { a1, a2, a3, keyb, keyc, x } on this table. So I start building up my next query:

SELECT
	a1,
	a2,
	a3,
	keyb,
	keyc,
	x
FROM a;

I can quickly tie in @x by declaring a variable (for test purposes) and filtering on x = @x.

DECLARE
	@x INT = 4;

SELECT
	a1,
	a2,
	a3,
	keyb,
	keyc,
	x
FROM a
WHERE
	x = ISNULL(@x, x)

Then, I want to use keyb to join together entities a and b. I know that b has attributes { b1, b2, b3, y } in addition to its primary key of { keyb }. Adding in this information gives us:

DECLARE
	@x INT = 4,
	@y INT = 6;

SELECT
	a.a1,
	a.a2,
	a.a3,
	b.b1,
	b.b2,
	b.b3,
	a.keyb,
	a.keyc,
	a.x,
	b.y
FROM a
	INNER JOIN b
		ON a.keyb = b.keyb
WHERE
	a.x = ISNULL(@x, a.x)
	AND b.y = ISNULL(@y, b.y);

After that, I want to use keyc to join together a and c and get attributes { c1, c2, c3, z }.

DECLARE
	@x INT = 4,
	@y INT = 6,
	@z INT = 8;

SELECT
	a.a1,
	a.a2,
	a.a3,
	b.b1,
	b.b2,
	b.b3,
	c.c1,
	c.c2,
	c.c3,
	a.keyb,
	a.keyc,
	a.x,
	b.y,
	c.z
FROM a
	INNER JOIN b
		ON a.keyb = b.keyb
	INNER JOIN c
		ON a.keyc = c.keyc
WHERE
	a.x = ISNULL(@x, a.x)
	AND b.y = ISNULL(@y, b.y)
	AND c.z = ISNULL(@z, c.z);

Cleaning up the results a bit, I have come to the final result set:

DECLARE
	@x INT = 4,
	@y INT = 6,
	@z INT = 8;

SELECT
	a.a1,
	a.a2,
	a.a3,
	b.b1,
	b.b2,
	b.b3,
	c.c1,
	c.c2,
	c.c3
FROM a
	INNER JOIN b
		ON a.keyb = b.keyb
	INNER JOIN c
		ON a.keyc = c.keyc
WHERE
	a.x = ISNULL(@x, a.x)
	AND b.y = ISNULL(@y, b.y)
	AND c.z = ISNULL(@z, c.z);

I can now turn this into a stored procedure by getting rid of the DECLARE and turning those into input variables. I may or may not keep the default values, but in this case, I’ll get rid of them.

CREATE PROCEDURE dbo.SomeProcedure
	@x INT,
	@y INT,
	@z INT
AS

SELECT
	a.a1,
	a.a2,
	a.a3,
	b.b1,
	b.b2,
	b.b3,
	c.c1,
	c.c2,
	c.c3
FROM a
	INNER JOIN b
		ON a.keyb = b.keyb
	INNER JOIN c
		ON a.keyc = c.keyc
WHERE
	a.x = ISNULL(@x, a.x)
	AND b.y = ISNULL(@y, b.y)
	AND c.z = ISNULL(@z, c.z);

Fantastic. So what does this have to do with testing?  The answer is, everything!  In the process of building up this query, we have iteratively tested several parts of the system:  attributes available on each entity, key structures, and filtering logic were all covered in separate tests.  The only problem is, these were ad hoc tests that we wrote on the way to developing a procedure.  It would be nice to save them for re-use later.  For example, we could have a test to make sure that we don’t lose records when we join table a to tables b and c.  To set up that test, we just need to formalize one of the earlier sections:  compare the number of rows we got from our earlier query with just table a to the number of rows returned when we use the same filter for @x and pass in NULL for @y and @z.  Similarly, if we need to test that the filter for @y works correctly, we have parts of this query which we can re-use to check filtering logic. Here’s an example of the latter test:

/* Arrange */
DECLARE
	@y INT = 6;

DECLARE @results
(
	a1 INT,
	a2 INT,
	a3 INT,
	b1 VARCHAR(30),
	b2 INT,
	b3 DECIMAL(12,2),
	c1 INT,
	c2 VARCHAR(40),
	c3 VARCHAR(120)
);

DECLARE
	@ExpectedNumberOfRecords INT;

SELECT
	@ExpectedNumberOfRecords = COUNT(1)
FROM a
	INNER JOIN b
		ON a.keyb = b.keyb
WHERE
	b.y = @y;


/* Act */
INSERT INTO @results
(
	a1,
	a2,
	a3,
	b1,
	b2,
	b3,
	c1,
	c2,
	c3
)
EXEC dbo.SomeProcedure
	@x = NULL,
	@y = @y,
	@z = NULL;

/* Assert:  empty result set means the test passes. */
SELECT
	@ExpectedNumberOfRecords AS NumberOfRecords
EXCEPT
SELECT
	COUNT(1)
FROM @results;

The downside to testing like this is that we are building tests which demand a certain table infrastructure.  If I move attribute y over from table b to table a, I now need to re-write those tests such that they fit the new table design.  If you don’t do a lot of major table refactoring, this is probably an acceptable cost.  If you do a lot of table refactoring, however, you’ll want a more fluid design, and that’s something we’ll talk about in the next post, along with testing frameworks.

Advertisements

3 thoughts on “Testing, Part 1: The Basics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s