Tally Tables: Generating Multiple Rows

Suppose we have the need to generate multiple rows in a result set for a single row in a source.    For example, we’re looking at a set of rows and want to report on errors*.  Let’s start with a data set:

CREATE TABLE #Data
(
	Id int NOT NULL,
	Field1 varchar(50) NOT NULL,
	Field2 varchar(50) NULL,
	SomeNum1 int NOT NULL,
	SomeNum2 int NULL
);

There are four errors that we could potentially see:

  1. Field2 is NULL
  2. Field1 has a greater value than Field2
  3. SomeNum2 is NULL
  4. SomeNum1 is larger than SomeNum2

This is a bit of a contrived example, but bear with me.  Let’s now load some data in:

INSERT INTO #Data(Id, Field1, Field2, SomeNum1, SomeNum2) VALUES
(1, 'a', 'b', 1, 2),
(2, 'a', NULL, 1, 2),
(3, 'b', 'a', 1, 2),
(4, 'b', 'a', 2, 1),
(5, 'b', NULL, 1, NULL);

Now that we have data, let’s generate some errors.  The format that we want to return is { Id, ErrorMessage }.  It’s not important that we return the errors in order, but it is vital that we report on all errors, and that we not simply give up after the first error.  We also don’t want to have any more SELECT statements than necessary because this is an expensive process (with billions and billions of stars rows).  We definitely don’t want to use a cursor or while loop, because that will cause performance to drag.  But our requirement also means that something like the following is out:

SELECT Id, 'Field2 is NULL!' as ErrorMessage FROM #Data WHERE Field2 IS NULL
UNION ALL
SELECT Id, 'SomeNum2 is NULL!' FROM #Data WHERE SomeNum2 IS NULL
UNION ALL
SELECT Id, 'Field1 is bigger than Field2!' FROM #Data WHERE Field1 > Field2
UNION ALL
SELECT Id, 'SomeNum1 is bigger than SomeNum2!' FROM #Data WHERE SomeNum1 > SomeNum2
ORDER BY Id, ErrorMessage;

Fortunately, there’s one easy way to make a single pass through our data set while still generating 0 to many results per source row:  a tally table.  I’m going to use a slightly-modified version of a tally table, giving it one extra column:  error message.

WITH tally as
(
	SELECT 1 as N, 'Field2 is NULL!' as ErrorMessage
	UNION ALL
	SELECT 2 as N, 'SomeNum2 is NULL!' as ErrorMessage
	UNION ALL
	SELECT 3 as N, 'Field1 is bigger than Field2!' as ErrorMessage
	UNION ALL
	SELECT 4 as N, 'SomeNum1 is bigger than SomeNum2!' as ErrorMessage
)
SELECT
	d.Id,
	t.ErrorMessage
FROM
	#Data d
	CROSS JOIN tally t
WHERE
	(
		(t.N = 1 AND d.Field2 IS NULL)
		OR (t.N = 2 AND d.SomeNum2 IS NULL)
		OR (t.N = 3 AND d.Field1 > d.Field2)
		OR (t.N = 4 AND d.SomeNum1 > d.SomeNum2)
	);

You can see that the result sets are the same.  Our tally table version also chews through the data appreciably faster than even a set of UNION ALL statements, and certainly better than a cursor.  Here’s a query plan comparison:

Tally Table Multiple Rows Query Plan

 

As you can see, the plan cost is rather low for our tally table version, and we also see fewer IO operations:

(6 row(s) affected)
Table '#Data'. Scan count 4, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
(6 row(s) affected)
Table '#Data'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)

What this means is that our tally table version is faster with a small rowset, and will scale much better as our data set gets bigger and bigger.

* – Yeah, you could do this in SSIS and if this is part of a regular process, it probably belongs there.  There are other examples where this is useful, such as generating historical data post-hoc.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s