Image

Time To Say Goodbye

wpid-img_20141122_185630213.jpg

The last time I bought tennis shoes was eight years ago.   Actually,  these were running shoes I bought to exercise in the Black Forest.  Since then, these shoes have served me well over the course of three continents and a lot of miles.  I finally bought their replacement today.  I could deal with the holes and bits falling apart.  I could deal with the nonexistent padding remaining.  The last straw was when eyeholes from both shoes ripped apart.  Rest in peace, shoes.

Status

Thematic Updates

I decided to bite the bullet and pay for custom themes.  I’ve been using the free version of WordPress for eight years, but being able to fix some theme annoyances is worth it.

I might make further tweaks, but so far, I’ve been able to approximately triple the size of the main post box for wide-screen monitors.  I’ve also customized the colors a bit, making them a little darker and less pastel than the theme defaults.  The final change I’ve made thus far is to reduce the size of the font used for code snippets.  Instead of being 20-30% bigger than my text (and about twice the size of the old theme’s text), this font is probably about 10-15% smaller than my text, and I think that’s a good trade-off.  Now the code complements the discussion rather than drowning it out, yet it should still be readable to most.

These are the biggest issues that I’ve had so far.  As an aside, because of the amount of code I’ve been dumping on this blog the past couple of weeks, it’s pretty hard to read on a mobile device.  I’m going to start collapsing code sections by default, and that should help mobile readability.  It looks good on tablets, so I am happy about that.

Tracing Foreign Key Dependencies In SQL Server

In my latest post on using BIML, I created a table called MigrationStep and populated it with data. Now I want to show how exactly I generated that data and how you can find foreign key constraints.

Let’s use the AdventureWorksDW2012 database as an example and look at the thirteen tables we identified as being related to DIMCustomer in some way. Note that my diagram is a bit of a post hoc analysis—I already know which tables are involved—but our solution doesn’t depend upon this level of foreknowledge.

AdventureWorksDWDiagram

Looking at this diagram, we can see that DimCustomer has “parent” tables and “child” tables. By this, I mean that DimCustomer has foreign key constraints leading to other (parent) tables, and there are tables (children) which have foreign key constraints leading back to DimCustomer. In particular, DimCustomer has a GeographyKey which ties back to DimGeography, making DimGeography a “parent” table for DimCustomer. If we want to load a row into DimCustomer, we need to load that row’s GeographyKey into DimGeography first.

In addition, there are two tables which have CustomerKey attributes which tie back to DimCustomer: FactInternetSales and FactSurveyResponse. Looking at FactInternetSales, there are a number of keys: DimProduct, DimPromotion, DimCurrency, etc. We need to load all of these dimensions before we can populate FactInternetSales…but before we can load DimProduct, we need to load DimProductCategory and then DimProductSubcategory.

With this in mind, I want to use SQL Server metadata to determine all of the tables which tie directly to some base table. I’m going to start out with a big blob of code and then I’ll explain it in detail.

DECLARE
	@BaseSchemaName SYSNAME = N'dbo',
	@BaseTableName SYSNAME = N'DimCustomer',
	@BasePrimaryKeyName SYSNAME = N'CustomerKey';

WITH Predecessors(SchemaName, TableName, TableID, Ordinal, Path, JoinCriteria, WhereCriteria, LookupColumns, LookupColumnXML) AS
(
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		0 AS Ordinal,
		CAST('/' + @BaseSchemaName + '.' + @BaseTableName + '/' AS VARCHAR(8000)) AS Path,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'x.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							', x.' + c.name
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 2, ''
				) AS VARCHAR(8000))
		) AS LookupColumns,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 0, ''
				) AS VARCHAR(8000))
		) AS LookupColumnXML
	FROM sys.tables t
	WHERE
		t.type = 'U'
		AND t.name = @BaseTableName
		AND OBJECT_SCHEMA_NAME(t.object_id) = @BaseSchemaName
	UNION ALL
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		tt.Ordinal - 1 AS Ordinal,
		CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							', x.' + c.name
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 2, ''
				) AS VARCHAR(8000))
		) AS LookupColumns,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 0, ''
				) AS VARCHAR(8000))
		) AS LookupColumnXML
	FROM sys.tables t
		INNER JOIN sys.foreign_keys f
			ON f.referenced_object_id = t.object_id
			AND f.parent_object_id <> f.referenced_object_id
		INNER JOIN Predecessors tt
			ON f.parent_object_id = tt.TableID
	WHERE
		t.type = 'U'
		AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'
),
Successors(SchemaName, TableName, TableID, Ordinal, Path, NumberOfMoves, CurrentDirection, JoinCriteria, WhereCriteria, LookupColumns, LookupColumnXML) AS
(
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		0 AS Ordinal,
		CAST('/' + @BaseSchemaName + '.' + @BaseTableName + '/' AS VARCHAR(8000)) AS Path,
		CAST(0 AS INT) AS NumberOfMoves,
		'' AS CurrentDirection,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'x.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							', x.' + c.name
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 2, ''
				) AS VARCHAR(8000))
		) AS LookupColumns,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 0, ''
				) AS VARCHAR(8000))
		) AS LookupColumnXML
	FROM sys.tables t
	WHERE
		t.type = 'U'
		AND t.name = @BaseTableName
	UNION ALL
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		tt.Ordinal + 1 AS Ordinal,
		CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
		NumberOfMoves + CASE WHEN CurrentDirection = 'D' THEN 0 ELSE 1 END AS NumberOfMoves,
		'D' AS CurrentDirection,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							', x.' + c.name
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 2, ''
				) AS VARCHAR(8000))
		) AS LookupColumns,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 0, ''
				) AS VARCHAR(8000))
		) AS LookupColumnXML
	FROM sys.tables t
		INNER JOIN sys.foreign_keys f
			ON f.parent_object_id = t.object_id
			AND f.parent_object_id <> f.referenced_object_id
		INNER JOIN Successors tt
			ON f.referenced_object_id = tt.TableID
	WHERE
		t.type = 'U'
		AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'
		--We can change direction twice:  once to go "down" the tree to get child tables, and once more to get those
		--child tables' parents.  We don't need any more moves than that; this way, we get the minimum number of tables
		--necessary to populate our base table.
		AND NumberOfMoves + CASE WHEN CurrentDirection = 'D' THEN 0 ELSE 1 END < 2
	UNION ALL
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		tt.Ordinal - 1 AS Ordinal,
		CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
		NumberOfMoves + CASE WHEN CurrentDirection = 'U' THEN 0 ELSE 1 END AS NumberOfMoves,
		'U' AS CurrentDirection,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							', x.' + c.name
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 2, ''
				) AS VARCHAR(8000))
		) AS LookupColumns,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 0, ''
				) AS VARCHAR(8000))
		) AS LookupColumnXML
	FROM sys.tables t
		INNER JOIN sys.foreign_keys f
			ON f.referenced_object_id = t.object_id
			AND f.parent_object_id <> f.referenced_object_id
		INNER JOIN Successors tt
			ON f.parent_object_id = tt.TableID
	WHERE
		t.type = 'U'
		AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'
		--No check here for NumberOfMoves because we want to be able to always move up the tree to find foreign key constraints.
)
SELECT
	a.SchemaName,
	a.TableName,
	MAX(Ordinal) AS Section,
	a.JoinCriteria,
	MAX(a.Path) AS Path,
	--We can hardcode DISTINCT x.* here, but there are some tables which don't need the DISTINCT
	'DISTINCT x.*' AS SelectCriteria,
	a.WhereCriteria,
	--We can hardcode 0 here because all tables have primary key constraints
	0 AS NeedToDelete,
	a.LookupColumns,
	--A side effect of FOR XML Path is that < becomes &lt; and > becomes &gt;, so we need to fix those
	REPLACE(REPLACE(a.LookupColumnXML, '&lt;', '<'), '&gt;', '>')
FROM
(
	SELECT
		p.SchemaName,
		p.TableName,
		p.TableID,
		p.Ordinal,
		p.JoinCriteria,
		p.Path,
		p.WhereCriteria,
		p.LookupColumns,
		p.LookupColumnXML
	FROM Predecessors p

	UNION ALL

	SELECT
		s.SchemaName,
		s.TableName,
		s.TableID,
		s.Ordinal,
		s.JoinCriteria,
		s.Path,
		s.WhereCriteria,
		s.LookupColumns,
		s.LookupColumnXML
	FROM Successors s
		LEFT OUTER JOIN Predecessors p
			ON s.TableID = p.TableID
	WHERE
		p.TableID IS NULL
		AND s.Path LIKE '%/' + @BaseSchemaName + '.' + @BaseTableName + '/%'
) a
GROUP BY
	a.SchemaName,
	a.TableName,
	a.JoinCriteria,
	a.WhereCriteria,
	a.LookupColumns,
	a.LookupColumnXML
ORDER BY
	Section,
	SchemaName,
	TableName;

Yeah, that’s 370 lines of code. Don’t worry—we’ll walk through it piece by piece.

My first code segment is the Predecessors CTE. This is a recursive common table expression, so we’ll break it apart into bite-sized pieces. Let’s start with the base case of our recursive CTE and hide the LookupColumns and LookupColumnsXML work:

SELECT
	OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
	OBJECT_NAME(t.object_id) AS TableName,
	t.object_id AS TableID,
	0 AS Ordinal,
	CAST('/' + @BaseSchemaName + '.' + @BaseTableName + '/' AS VARCHAR(8000)) AS Path,
	CAST('' AS VARCHAR(8000)) AS JoinCriteria,
	'x.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
	'TODO' AS LookupColumns,
	'TODO' AS LookupColumnXML
FROM sys.tables t
WHERE
	t.type = 'U'
	AND t.name = @BaseTableName
	AND OBJECT_SCHEMA_NAME(t.object_id) = @BaseSchemaName

This query looks pretty simple: look for a table named [@BaseSchemaName].[@BaseTableName]. In our example, that will translate to dbo.DimCustomer. Once we have that, we select the schema and table names, the table ID (for easy comparison), and start setting some values. Ordinal is a parameter which tells us how deep we are in the recursion chain. This is the base case, so we’re at level 0. I also generate a path to tell us the exact chain we followed to get to this point. In this case, it’s just /dbo.DimCustomer/. I put the slashes in to separate levels, and needed to cast as VARCHAR(8000) because all parts of my recursive CTE must use the exact same table types, including data sizes. For the WHERE clause, I’m assuming my table will be aliased as “x” and has my base primary key. If your base table has a composite key…well, good luck with the modifications…

That part was pretty easy, so let’s look at LookupColumns and LookupColumnsXML. You’ll see that they’re both really the same thing, just in a slightly different form:

(
	SELECT
		CAST(STUFF
		(
			(
				SELECT
					', x.' + c.name
				FROM sys.key_constraints kc
					INNER JOIN sys.index_columns ic
						ON kc.parent_object_id = ic.object_id
						AND kc.unique_index_id = ic.index_id
					INNER JOIN sys.columns c
						ON ic.column_id = c.column_id
						AND ic.object_id = c.object_id
				WHERE
					kc.type = 'PK'
					AND kc.parent_object_id = t.object_id
				FOR XML PATH ('')
			), 1, 2, ''
		) AS VARCHAR(8000))
) AS LookupColumns,
(
	SELECT
		CAST(STUFF
		(
			(
				SELECT
					'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
				FROM sys.key_constraints kc
					INNER JOIN sys.index_columns ic
						ON kc.parent_object_id = ic.object_id
						AND kc.unique_index_id = ic.index_id
					INNER JOIN sys.columns c
						ON ic.column_id = c.column_id
						AND ic.object_id = c.object_id
				WHERE
					kc.type = 'PK'
					AND kc.parent_object_id = t.object_id
				FOR XML PATH ('')
			), 1, 0, ''
		) AS VARCHAR(8000))
) AS LookupColumnXML

I’ll focus on LookupColumns for this description. What I’m doing here is finding all of the columns associated with this table’s primary key and turning it into a comma-delimited list with the FOR XML PATH trick. This will create a string like “, x.Column1, x.Column2, x.Column3″ and then I use the STUFF function to strip off the first two characters, leaving “x.Column1, x.Column2, x.Column3″ for us.

LookupColumnXML is basically the same thing, except that instead of a comma-delimited list, I’m generating .

Having walked through this, let’s now look at the second half of the Predecessors recursive CTE.

UNION ALL
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		tt.Ordinal - 1 AS Ordinal,
		CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		'LOOKED AT' AS LookupColumns,
		'LOOKED AT' AS LookupColumnXML
	FROM sys.tables t
		INNER JOIN sys.foreign_keys f
			ON f.referenced_object_id = t.object_id
			AND f.parent_object_id <> f.referenced_object_id
		INNER JOIN Predecessors tt
			ON f.parent_object_id = tt.TableID
	WHERE
		t.type = 'U'
		AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'

We’ve already looked at how LookupColumns and LookupColumnXML get populated, and it’s the same throughout the entire code base, so I’m going to replace those references with “LOOKED AT” to simplify the code.

In this code snippet, we have the recursive case. Once we look at our base table, we want to see any tables to which our base table links, followed by any tables to which those tables link, and so on until we run out of links. The way that we do this is in the FROM clause: we join sys.tables to sys.foreign_keys to get any tables in which the parent object ID is not the referenced object ID (to skip self joins), and then we join to the Predecessors common table expression to form the chain. In our concrete scenario, DimCustomer has a foreign key to DimGeography, so the second round of recursion would bring in that table. DimGeography has a foreign key to DimSalesTerritory, so the third round of recursion would bring that table in. We aren’t limited to bringing in just one table per round, but given this particular data model, that’s all we get.

In the SELECT clause, get the schema and table information (SchemaName, TableName, and TableID) the same as in the base case. After that, several things have changed. First of all, I decremented the Ordinal value. This lets us build a number line for precedence: I know that DimSalesTerritory precedes DimGeography, which itself precedes DimCustomer. Thus, I could assign DimCustomer a value of 0, DimGeography a value of -1, and DimSalesTerritory a value of -2. If I have two tables at the same level, it means that I can load records into both concurrently without violating foreign key constraints, assuming I have already loaded the appropriate records for any tables with earlier levels.

Next up is Path, in which I append the current schema and table name to the previous recursion level’s Path. Path is important here because relational database keys can be expressed as a directed cyclical graph, but recursion in a cycle leads to stack overflows. If I have tables A, B, and C, in which A has a key to B, which has a key to C, which has a key back to A, my recursion would build up A, B, C, A, B, C, etc. I don’t want any cycles in my report (not only to protect against stack overflows, but also because I’m interested in the shortest path to a table, and that necessarily precludes a cycle.

Once I take care of path, I set my WhereCriteria value. The “c.” is going to be the base table alias that I plan to use. Because every table in this set eventually ties back to the base table, I can build a valid set of joins to filter every table to include only the records I care about.

So, that’s the Predecessors common table expression, which takes care of any dependencies for our base table. From there, we want to branch down to the children of our base table, as well as those tables’ children, etc. To do this, I use the Successors common table expression. You’ll note that the Successors common table expression is very similar to Predecessors, but includes just a little bit more. Just like with Predecessors, let’s start with the base case.

SELECT
	OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
	OBJECT_NAME(t.object_id) AS TableName,
	t.object_id AS TableID,
	0 AS Ordinal,
	CAST('/' + @BaseSchemaName + '.' + @BaseTableName + '/' AS VARCHAR(8000)) AS Path,
	CAST(0 AS INT) AS NumberOfMoves,
	'' AS CurrentDirection,
	CAST('' AS VARCHAR(8000)) AS JoinCriteria,
	'x.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
	'LOOKED AT' AS LookupColumns,
	'LOOKED AT' AS LookupColumnXML
FROM sys.tables t
WHERE
	t.type = 'U'
	AND t.name = @BaseTableName

This base case is almost exactly the same as Predecessors. The two differences are the attributes NumberOfMoves and CurrentDirection. For our base case, we don’t have a direction (either “up” the chain to parents or “down” the chain to children), and we haven’t made any directional changes yet, so NumberOfMoves is 0.

Let’s look at the next part of this CTE:

UNION ALL
SELECT
	OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
	OBJECT_NAME(t.object_id) AS TableName,
	t.object_id AS TableID,
	tt.Ordinal + 1 AS Ordinal,
	CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
	NumberOfMoves + CASE WHEN CurrentDirection = 'D' THEN 0 ELSE 1 END AS NumberOfMoves,
	'D' AS CurrentDirection,
	CAST('' AS VARCHAR(8000)) AS JoinCriteria,
	'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
	'LOOKED AT' AS LookupColumns,
	'LOOKED AT' AS LookupColumnXML
FROM sys.tables t
	INNER JOIN sys.foreign_keys f
		ON f.parent_object_id = t.object_id
		AND f.parent_object_id <> f.referenced_object_id
	INNER JOIN Successors tt
		ON f.referenced_object_id = tt.TableID
WHERE
	t.type = 'U'
	AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'
	--We can change direction twice:  once to go "down" the tree to get child tables, and once more to get those
	--child tables' parents.  We don't need any more moves than that; this way, we get the minimum number of tables
	--necessary to populate our base table.
	AND NumberOfMoves + CASE WHEN CurrentDirection = 'D' THEN 0 ELSE 1 END < 2

This part selects the “children” of our base table, as well as those tables’ children, and so on. Thus, CurrentDirection is “down” the tree. To understand NumberOfMoves, let me walk through this concrete example. Remember that DimCustomer is the base table and has an Ordinal of 0. There are two tables which have CustomerKeys which tie back to DimCustomer: FactInternetSales and FactSurveyResponse. At this point, we still have the base case’s CurrentDirection, which is ”. Because that doesn’t match ‘D’, we set NumberOfMoves equal to the base case’s number of moves (0) plus 1. We have changed direction once at this point.

Now, let’s go to the next level. FactInternetSales has one child table: FactInternetSalesReason. This table will be picked up in the third run of this CTE, so its ordinal will be 2, and we check number of moves again. Because the CurrentDirection for FactInternetSales is D and we’re still going down the chain, NumberOfMoves = parent number of moves (1) + 0, or 1. We have not changed direction; we’ve simply moved down the chain.

This is great, but leads to a problem: FactInternetSalesReason has a relationship with FactInternetSales, but also one to DimSalesReason. We don’t have DimSalesReason in our set anywhere, so trying to insert solely into the tables we’ve gathered so far will fail! That’s where the next part of the Successors CTE comes in:

UNION ALL
SELECT
	OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
	OBJECT_NAME(t.object_id) AS TableName,
	t.object_id AS TableID,
	tt.Ordinal - 1 AS Ordinal,
	CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
	NumberOfMoves + CASE WHEN CurrentDirection = 'U' THEN 0 ELSE 1 END AS NumberOfMoves,
	'U' AS CurrentDirection,
	CAST('' AS VARCHAR(8000)) AS JoinCriteria,
	'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
	'LOOKED AT' AS LookupColumns,
	'LOOKED AT' AS LookupColumnXML
FROM sys.tables t
	INNER JOIN sys.foreign_keys f
		ON f.referenced_object_id = t.object_id
		AND f.parent_object_id <> f.referenced_object_id
	INNER JOIN Successors tt
		ON f.parent_object_id = tt.TableID
WHERE
	t.type = 'U'
	AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'
	--No check here for NumberOfMoves because we want to be able to always move up the tree to find foreign key constraints.

Now we’re moving “up” the tree to find any parents. For FactInternetSalesReason, we’d move back up to see DimSalesReason (remember that we won’t see FactInternetSales again because the Path check forbids cycles). With this moving up the tree, we change the current direction from ‘D’ to ‘U’ and increment the number of directional changes if we were headed downstream or are starting out at the base.

Note my comment at the bottom of this CTE: I do not check the number of directional changes because I always want to be able to drive up through the tree to include any tables required for loading. This leads to a final question on the topic: why check for number of moves at all? The reason for that is to prevent “filling the tree,” as I’ll call it. I might start from DimCustomer and drive down to FactInternetSales. From there, we need to go up the tree and load DimProduct. If I had an unlimited number of up-and-down moves, the recursive CTE would then move down the tree from DimProduct into FactProductInventory. But nothing in FactProductInventory really ties back to a particular customer—I don’t need that table loaded. This is why I allow one downward cut but all the upward movement the recursive CTE needs.

With all of this in mind, my final query starts with a subquery:

SELECT
	p.SchemaName,
	p.TableName,
	p.TableID,
	p.Ordinal,
	p.JoinCriteria,
	p.Path,
	p.WhereCriteria,
	p.LookupColumns,
	p.LookupColumnXML
FROM Predecessors p

UNION ALL

SELECT
	s.SchemaName,
	s.TableName,
	s.TableID,
	s.Ordinal,
	s.JoinCriteria,
	s.Path,
	s.WhereCriteria,
	s.LookupColumns,
	s.LookupColumnXML
FROM Successors s
	LEFT OUTER JOIN Predecessors p
		ON s.TableID = p.TableID
WHERE
	p.TableID IS NULL
	AND s.Path LIKE '%/' + @BaseSchemaName + '.' + @BaseTableName + '/%'

Yeah, this could be another common table expression, but for the purposes of demonstration, I wanted to separate my two recursive CTEs from the mechanics of combining results together.

There are two interesting bits in this subquery, both on the Successors side of the union. First of all, I join Successors back to Predecessors and filter out any tables which exist in Predecessors. This has one important use: the way that I derive my final Section number is to check the MAX(Ordinal). If I have a table in Predecessors, that number should be negative, but if it shows up in Successors, it will be positive. If the number is positive, then a process would try to load my base table’s parent table only after trying to load the base table! This would cause a foreign key constraint violation, so instead I have to take care of this scenario. As a note, this actually does appear in the data set: DimSalesTerritory is in the Predecessors list as well as the Successors list (where it ties to FactInternetSales).

My other interesting part is that I force Path to include my base table. As an aside, this is technically unnecessary, but just in case somebody starts monkeying with the recursive CTE and introduces invalid rows, this is a weak filter.

That all forms my base table, which I alias as “a.” From there, I have my final SELECT, GROUP BY, and ORDER BY clauses:

SELECT
	a.SchemaName,
	a.TableName,
	MAX(Ordinal) AS Section,
	a.JoinCriteria,
	MAX(a.Path) AS Path,
	--We can hardcode DISTINCT x.* here, but there are some tables which don't need the DISTINCT
	'DISTINCT x.*' AS SelectCriteria,
	a.WhereCriteria,
	--We can hardcode 0 here because all tables have primary key constraints
	0 AS NeedToDelete,
	a.LookupColumns,
	--A side effect of FOR XML Path is that < becomes &lt; and > becomes &gt;, so we need to fix those
	REPLACE(REPLACE(a.LookupColumnXML, '&lt;', '<'), '&gt;', '>')
FROM
(LOOKED AT) a
GROUP BY
	a.SchemaName,
	a.TableName,
	a.JoinCriteria,
	a.WhereCriteria,
	a.LookupColumns,
	a.LookupColumnXML
ORDER BY
	Section,
	SchemaName,
	TableName;

As noted above, my Section is the largest value of Ordinal. This way, in the event that a table shows up at two separate points, we can ensure that all predecessors are loaded. I also show the path, which lets us see how the table is connected. The rest of this is simply hard-coding values that I need or displaying the set of lookup columns.

One important note on this is that I do not have join criteria created. Given that I know the tables and keys, I could create a subquery in my CTEs which defines valid join criteria. But because I did this for a small data set, I decided to hand-craft my join criteria. If I ever need to do this for a larger table set, I’ll probably write the query to do this correctly… In any event, we now have a table which describes our foreign key relationships between tables. Of course, if you have a database without foreign key constraints, you can’t walk up and down the chain this way, so that’s yet another reason to populate those foreign keys, whether it be a normalized system or Kimball-style warehouse.

Status

New Theme

We’ve just switched over to a new WordPress theme.  The theme that we initially set up worked great for us back in 2006, because it was one of the few WordPress themes which showed you the author’s name in each post.  Over time, however, it’s gotten a bit stale and was never really designed for a tablet or mobile experience.  Our new theme rectifies these issues…while creating new ones.

I like the new theme in general, but I’m going to keep tweaking it, especially when it comes to displaying code.  I don’t like how narrow my blog post box is on the main page, and I will need to do something about that.

Generating An Empty Node

In BIML, Part 5, I talked about AdventureWorksDW2012 and AdventureWorksDW2012Node2. If you don’t know how to create an empty node for testing, it’s pretty easy. For this example, I’m going to assume that you have SQL Server 2012 or 2014 installed. If you’re still on 2008 R2 or earlier, this will still work but you’ll need to change all of the “AdventureWorksDW2012″ references to be the correct version.

The first step is to download the AdventureWorksDW2012 database file. Once you have that, you can Attach it to your database—but be sure to remove the invalid log file reference when you attach! The following screen shot shows you the Attach Databases screen:

AttachDatabase

Once you have AdventureWorksDW2012 set up, right-click on the database and go to Tasks –> Generate Scripts.

GenerateScripts

From there, you’ll walk through a wizard. In this scenario, we want to script out all objects and make the new database a replica of the old—except without the data.

ScriptAllObjects

Click Next and you’ll have the option to save this in a new query window, which is what I’m going to do.

SaveToNewQueryWindow

This gives you a fairly long file which looks a little bit like:

Scripts

There’s just one problem with this: I still have everything named AdventureWorksDW2012, but I want it named AdventureWorksDW2012Node2. So let’s hit Ctrl+H and bring up the Quick Replace window.

RenameObjects

Then, we’ll fill out the “Find what:” and “Replace with:” sections and hit the “Replace All” button. That renames all references of AdventureWorksDW2012 in our script to be AdventureWorksDW2012Node2.

RenamedObjects

After we have all of that done, hit F5 or the Execute button and you now have an empty node for your own testing purposes.

BIML, Part 5—Expanding To Multiple Tables

In my last post on the topic, we walked through the BIML necessary to generate an SSIS package. The end result is useful and allows us to treat SSIS packages as binary code rather than source code, but the real power to BIML is in automation. In this case, I want to automate every relevant table in my database.

From here on out, I’m going to use AdventureWorksDW2012 as my warehouse. My local SQL Server instance has two databases: AdventureWorksDW2012 and AdventureWorksDW2012Node2. AdventureWorksDW2012Node2 is the same as the main instance except that I only have database structure and not any data. That way, we can migrate data from the primary node to Node2.

In our scenario, we want to move data for a customer from one node to another. This drives from DimCustomer.CustomerKey. To insert a DimCustomer record, we need DimGeography populated, and to populate that, we need to load DimSalesTerritory. The three facts which tie to DimCustomer are FactInternetSales, FactInternetSalesReason, and FactSurveyResponse. To get those facts’ data in, we need to have valid information for DimProduct, DimDate, DimPromotion, DimCurrency, and DimSalesReason. Before I can load DimProduct, I need to load DimProductCategory and DimProductSubcategory. All in all, this means that I need to migrate data from 13 separate tables to migrate customer data to another node.

To maintain this information and know in which order I need to load each table, I’ve created a table called MigrationStep. In a future post, I’ll describe how to populate MigrationStep so we can keep this table in sync if the underlying fact-dimension structure changes. For now, I’ll offer this as a T-SQL script to create and populate the table:

USE [AdventureWorksDW2012]
GO
CREATE TABLE [dbo].[MigrationStep]
(
	[MigrationStepID] [int] IDENTITY(1,1) NOT NULL,
	[SchemaName] [sysname] NOT NULL,
	[TableName] [sysname] NOT NULL,
	[Section] [int] NULL,
	[JoinCriteria] [nvarchar](2000) NULL,
	[SelectCriteria] [nvarchar](2000) NULL,
	[WhereCriteria] [nvarchar](100) NULL,
	[NeedToDelete] [bit] NOT NULL,
	[LookupColumns] [nvarchar](512) NULL,
	[LookupColumnXML] [nvarchar](1024) NULL,
	CONSTRAINT [PK_MigrationStep] PRIMARY KEY CLUSTERED 
	(
		[MigrationStepID] ASC
	)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY]
) ON [PRIMARY]
GO

INSERT INTO dbo.MigrationStep
(
	SchemaName,
	TableName,
	Section,
	JoinCriteria,
	SelectCriteria,
	WhereCriteria,
	NeedToDelete,
	LookupColumns,
	LookupColumnXML
)
VALUES
('dbo', 'DimProductCategory', -2, 'INNER JOIN dbo.DimProductSubcategory s ON s.ProductCategoryKey = x.ProductCategoryKey INNER JOIN dbo.DimProduct p ON p.ProductSubcategoryKey = s.ProductSubcategoryKey INNER JOIN dbo.FactInternetSales f ON p.ProductKey = f.ProductKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.ProductCategoryKey', '<Column SourceColumn="ProductCategoryKey" TargetColumn="ProductCategoryKey" />'),
('dbo', 'DimSalesTerritory', -2, 'INNER JOIN dbo.DimGeography g ON x.SalesTerritoryKey = g.SalesTerritoryKey INNER JOIN dbo.DimCustomer c ON g.GeographyKey = c.GeographyKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.SalesTerritoryKey', '<Column SourceColumn="SalesTerritoryKey" TargetColumn="SalesTerritoryKey" />'),
('dbo', 'DimProductSubcategory', -1, 'INNER JOIN dbo.DimProduct p ON p.ProductSubcategoryKey = x.ProductSubcategoryKey INNER JOIN dbo.FactInternetSales f ON p.ProductKey = f.ProductKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.ProductSubcategoryKey', '<Column SourceColumn="ProductSubcategoryKey" TargetColumn="ProductSubcategoryKey" />'),
('dbo', 'DimGeography', -1, 'INNER JOIN dbo.DimCustomer c ON x.GeographyKey = c.GeographyKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.GeographyKey', '<Column SourceColumn="GeographyKey" TargetColumn="GeographyKey" />'),
('dbo', 'DimCurrency', 0, 'INNER JOIN dbo.FactInternetSales f ON x.CurrencyKey = f.CurrencyKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.CurrencyKey', '<Column SourceColumn="CurrencyKey" TargetColumn="CurrencyKey" />'),
('dbo', 'DimCustomer', 0, '', 'DISTINCT x.*', 'x.CustomerKey = ?', 0, 'x.CustomerKey', '<Column SourceColumn="CustomerKey" TargetColumn="CustomerKey" />'),
('dbo', 'DimDate', 0, 'INNER JOIN dbo.FactInternetSales f ON x.DateKey = f.OrderDateKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', '1 = 1', 0, 'x.DateKey', '<Column SourceColumn="DateKey" TargetColumn="DateKey" />'),
('dbo', 'DimProduct', 0, 'INNER JOIN dbo.FactInternetSales f ON x.ProductKey = f.ProductKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.ProductKey', '<Column SourceColumn="ProductKey" TargetColumn="ProductKey" />'),
('dbo', 'DimPromotion', 0, 'INNER JOIN dbo.FactInternetSales f ON x.PromotionKey = f.PromotionKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.PromotionKey', '<Column SourceColumn="PromotionKey" TargetColumn="PromotionKey" />'),
('dbo', 'DimSalesReason', 1, 'INNER JOIN dbo.FactInternetSalesReason fsr ON x.SalesReasonKey = fsr.SalesReasonKey INNER JOIN dbo.FactInternetSales f ON fsr.SalesOrderNumber = f.SalesOrderNumber AND fsr.SalesOrderLineNumber = f.SalesOrderLineNumber INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.SalesReasonKey', '<Column SourceColumn="SalesReasonKey" TargetColumn="SalesReasonKey" />'),
('dbo', 'FactInternetSales', 1, 'INNER JOIN dbo.DimCustomer c ON x.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.SalesOrderNumber, x.SalesOrderLineNumber', '<Column SourceColumn="SalesOrderNumber" TargetColumn="SalesOrderNumber" /><Column SourceColumn="SalesOrderLineNumber" TargetColumn="SalesOrderLineNumber" />'),
('dbo', 'FactSurveyResponse', 1, 'INNER JOIN dbo.DimCustomer c ON x.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.SurveyResponseKey', '<Column SourceColumn="SurveyResponseKey" TargetColumn="SurveyResponseKey" />'),
('dbo', 'FactInternetSalesReason', 2, 'INNER JOIN dbo.FactInternetSales f ON x.SalesOrderNumber = f.SalesOrderNumber AND x.SalesOrderLineNumber = f.SalesOrderLineNumber INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.SalesOrderNumber, x.SalesOrderLineNumber, x.SalesReasonKey', '<Column SourceColumn="SalesOrderNumber" TargetColumn="SalesOrderNumber" /><Column SourceColumn="SalesOrderLineNumber" TargetColumn="SalesOrderLineNumber" /><Column SourceColumn="SalesReasonKey" TargetColumn="SalesReasonKey" />');

The one thing I want to point out is that the line which loads DIMDate has a WHERE clause of 1 = 1—that’s the easier method to populate that dimension, as otherwise I would need to UNION together all of the date keys for all of the facts and dimensions which join to DIMDate, and it’s just simpler to move everything.

Now that we have a table, it’s time to integrate that into our BIML that we developed. Just like last time, I’ll give you the full file first and then walk through it piece by piece.

<#@ template language="C#" hostspecific="true" #>
<#@ import namespace="System.Data" #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<#
	string metadataConnectionString ="Provider=SQLNCLI10;Server=LOCALHOST;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;";
	DataTable tables = ExternalDataAccess.GetDataTable(metadataConnectionString,
	"SELECT SchemaName, TableName, Section, JoinCriteria, SelectCriteria, WhereCriteria, NeedToDelete, LookupColumns, LookupColumnXML FROM dbo.MigrationStep ORDER BY Section, SchemaName, TableName");
	DataTable sections = ExternalDataAccess.GetDataTable(metadataConnectionString,
	"SELECT DISTINCT Section FROM dbo.MigrationStep ORDER BY Section");
	DataTable sectionsDesc = ExternalDataAccess.GetDataTable(metadataConnectionString,
	"SELECT DISTINCT Section FROM dbo.MigrationStep ORDER BY Section DESC"); #>
	<PackageProjects>
		<PackageProject Name="MigrateWarehouseData" ProtectionLevel="DontSaveSensitive">
			<Parameters>
				<Parameter Name="CustomerID" DataType="Int32">1234</Parameter>
				<Parameter Name="OldWarehouseServer" DataType="String">LOCALHOST</Parameter>
				<Parameter Name="OldWarehouseDatabase" DataType="String">AdventureWorksDW2012</Parameter>
				<Parameter Name="NewWarehouseServer" DataType="String">LOCALHOST</Parameter>
				<Parameter Name="NewWarehouseDatabase" DataType="String">AdventureWorksDW2012Node2</Parameter>
				<Parameter Name="PerformMigration" DataType="Boolean">True</Parameter>
				<Parameter Name="PerformDeletion" DataType="Boolean">True</Parameter>
			</Parameters>
			<Packages>
				<Package PackageName="MigrateCustomer" />
			</Packages>
		</PackageProject>
	</PackageProjects>
	<Connections>
		<Connection Name="OldWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
		<Connection Name="NewWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=AdventureWorksDW2012Node2;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
	</Connections>
	<Packages>
		<Package Name="MigrateCustomer" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">
			<Connections>
				<Connection ConnectionName="OldWarehouseNode">
					<Expressions>
						<Expression ExternalProperty="InitalCatalog">@[$Project::OldWarehouseDatabase]</Expression>
						<Expression ExternalProperty="ServerName">@[$Project::OldWarehouseServer]</Expression>
					</Expressions>
				</Connection>
				<Connection ConnectionName="NewWarehouseNode">
					<Expressions>
						<Expression ExternalProperty="InitialCatalog">@[$Project::NewWarehouseDatabase]</Expression>
						<Expression ExternalProperty="ServerName">@[$Project::NewWarehouseServer]</Expression>
					</Expressions>
				</Connection>
			</Connections>
			<Tasks>
				<Container Name="Delete From New Tables" ConstraintMode="Linear">
					<Expressions>
						<Expression ExternalProperty="Disable">!(@[$Project::PerformDeletion])</Expression>
					</Expressions>
					<Tasks>
						<# foreach (DataRow row in sectionsDesc.Rows) { #>
						<Container Name="Delete From New Table <#=row[0]#>" ConstraintMode="Linear">
							<Tasks>
								<# foreach (DataRow trow in tables.Rows) {
							//For this round, I want to make sure EVERY table has a Delete action.  We'll fix that later...
							if (Convert.ToInt32(trow[2]) == Convert.ToInt32(row[0]) && (Convert.ToBoolean(trow[6]) || 1 == 1)) {#>
								<ExecuteSQL Name="Prepare <#=trow[1]#>" ConnectionName="NewWarehouseNode">
									<DirectInput>
										DELETE FROM [<#=trow[0]#>].[<#=trow[1]#>];
									</DirectInput>
								</ExecuteSQL>
								<# }
							} #>
							</Tasks>
						</Container>
						<# } #>
					</Tasks>
				</Container>
				
				<Container Name="Migrate To New Table" ConstraintMode="Linear">
					<Expressions>
						<Expression ExternalProperty="Disable">!(@[$Project::PerformMigration])</Expression>
					</Expressions>
					<Tasks>
						<# foreach (DataRow row in sections.Rows) { #>
						<Container Name="Migrate To New Table <#=row[0]#>" ConstraintMode="Linear">
							<Tasks>
								<# foreach (DataRow trow in tables.Rows) {
							if (Convert.ToInt32(trow[2]) == Convert.ToInt32(row[0])) {#>
								<Dataflow Name="Migrate <#=trow[1]#>" DefaultBufferSize="104857600">
									<Transformations>
										<OleDbSource Name="Old Node" ConnectionName="OldWarehouseNode">
											<DirectInput>
												SELECT 
													<#=trow[4]#>
												FROM [<#=trow[0]#>].[<#=trow[1]#>] x 
													<#=trow[3]#>
												WHERE
													<#=trow[5]#>;
											</DirectInput>
											<Parameters>
												<Parameter Name="0" VariableName="MigrateWarehouseData.CustomerID" />
											</Parameters>
										</OleDbSource>
										<OleDbDestination Name="New Node" ConnectionName="NewWarehouseNode" BatchSize="10000" KeepIdentity=<# if(!Convert.ToBoolean(trow[6])) { #> "true" <# } else { #> "false" <# } #> UseFastLoadIfAvailable="true" MaximumInsertCommitSize="2000000">
											<ExternalTableOutput Table="[<#=trow[0]#>].[<#=trow[1]#>]"></ExternalTableOutput>
										</OleDbDestination>
									</Transformations>
								</Dataflow>
								<# }
							} #>
							</Tasks>
						</Container>
						<# } #>
					</Tasks>
				</Container>
			</Tasks>
		</Package>
	</Packages>
</Biml>

Given 25 more lines of BIML, we now have a package which deletes from 13 tables and migrates data to 13 tables. The best part is that if I add more tables, I don’t need to modify this BIML script–I just generate a new set of packages from the BIML and I’m good to go. In contrast, adding tables the old-fashioned way involves modifying packages and can get time-consuming depending upon how many steps your workflow has.

At any rate, let’s look at the changes, starting at the top.

<#@ template language="C#" hostspecific="true" #>
<#@ import namespace="System.Data" #>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<#
	string metadataConnectionString ="Provider=SQLNCLI10;Server=LOCALHOST;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;";
	DataTable tables = ExternalDataAccess.GetDataTable(metadataConnectionString,
	"SELECT SchemaName, TableName, Section, JoinCriteria, SelectCriteria, WhereCriteria, NeedToDelete, LookupColumns, LookupColumnXML FROM dbo.MigrationStep ORDER BY Section, SchemaName, TableName");
	DataTable sections = ExternalDataAccess.GetDataTable(metadataConnectionString,
	"SELECT DISTINCT Section FROM dbo.MigrationStep ORDER BY Section");
	DataTable sectionsDesc = ExternalDataAccess.GetDataTable(metadataConnectionString,
	"SELECT DISTINCT Section FROM dbo.MigrationStep ORDER BY Section DESC"); #>

On the very first line, I tell BIML that my templating language is C#. After that, I use the “import” declaration to bring in the .NET framework’s System.Data namespace. By the way, this templating may look familiar if you’re used to classic ASP or ASP.NET Web Forms.

My first major block of code is another templated section. In this, I create a connection string to my AdventureWorksDW2012 node, as that’s where my MigrationStep table is. I create three separate DataTables, one to get all of the columns, one to get just a distinct list of Sections, and one to get a distinct list of Sections in descending order. Each Section here is a grouping of tables; I have different sections because of foreign key chain dependencies. With these sections, I’ll use the descending-order list to delete from blocks of tables at a time and the ascending-order list to insert into blocks of tables.

<PackageProjects>
	<PackageProject Name="MigrateWarehouseData" ProtectionLevel="DontSaveSensitive">
		<Parameters>
			<Parameter Name="CustomerID" DataType="Int32">1234</Parameter>
			<Parameter Name="OldWarehouseServer" DataType="String">LOCALHOST</Parameter>
			<Parameter Name="OldWarehouseDatabase" DataType="String">AdventureWorksDW2012</Parameter>
			<Parameter Name="NewWarehouseServer" DataType="String">LOCALHOST</Parameter>
			<Parameter Name="NewWarehouseDatabase" DataType="String">AdventureWorksDW2012Node2</Parameter>
			<Parameter Name="PerformMigration" DataType="Boolean">True</Parameter>
			<Parameter Name="PerformDeletion" DataType="Boolean">True</Parameter>
		</Parameters>
		<Packages>
			<Package PackageName="MigrateCustomer" />
		</Packages>
	</PackageProject>
</PackageProjects>
<Connections>
	<Connection Name="OldWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
	<Connection Name="NewWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=AdventureWorksDW2012Node2;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
</Connections>
<Packages>
	<Package Name="MigrateCustomer" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">
		<Connections>
			<Connection ConnectionName="OldWarehouseNode">
				<Expressions>
					<Expression ExternalProperty="InitalCatalog">@[$Project::OldWarehouseDatabase]</Expression>
					<Expression ExternalProperty="ServerName">@[$Project::OldWarehouseServer]</Expression>
				</Expressions>
			</Connection>
			<Connection ConnectionName="NewWarehouseNode">
				<Expressions>
					<Expression ExternalProperty="InitialCatalog">@[$Project::NewWarehouseDatabase]</Expression>
					<Expression ExternalProperty="ServerName">@[$Project::NewWarehouseServer]</Expression>
				</Expressions>
			</Connection>
		</Connections>

This next segment of code is exactly the same as in my first example. I’m setting up my project parameters, telling the project that there’s one package, and setting up a list of connections. Then, I start work on my one package, called MigrateCustomer. Once I get through this part, we get to the next crazy bit.

<Tasks>
	<Container Name="Delete From New Tables" ConstraintMode="Linear">
		<Expressions>
			<Expression ExternalProperty="Disable">!(@[$Project::PerformDeletion])</Expression>
		</Expressions>
		<Tasks>
			<# foreach (DataRow row in sectionsDesc.Rows) { #>
			<Container Name="Delete From New Table <#=row[0]#>" ConstraintMode="Linear">
				<Tasks>
					<# foreach (DataRow trow in tables.Rows) {
				//For this round, I want to make sure EVERY table has a Delete action.  We'll fix that later...
				if (Convert.ToInt32(trow[2]) == Convert.ToInt32(row[0]) && (Convert.ToBoolean(trow[6]) || 1 == 1)) {#>
					<ExecuteSQL Name="Prepare <#=trow[1]#>" ConnectionName="NewWarehouseNode">
						<DirectInput>
							DELETE FROM [<#=trow[0]#>].[<#=trow[1]#>];
						</DirectInput>
					</ExecuteSQL>
					<# }
				} #>
				</Tasks>
			</Container>
			<# } #>
		</Tasks>
	</Container>

These few lines of code generate all of our Delete containers and tasks. Inside the Tasks element, there is a templated snippet of code: foreach (DataRow row in sectionsDesc.Rows) {. This snippet loops through sectionsDesc, which will be a set containing the values { 2, 1, 0, -1, -2 } based on our example. For each of these sections, we create a new container called “Delete From New Table [Section]” and which contains one task. Inside that Task element, we have a couple lines of C#. The first line loops through each of our tables. Then, for each of our tables, we hit the second line of C#: if (Convert.ToInt32(trow[2]) == Convert.ToInt32(row[0]) && (Convert.ToBoolean(trow[6]) || 1 == 1)). This IF block checks to see if the Section for our table matches the current Section. If so, we check to see if the NeedsDeleted flag is set to true…or if 1 = 1… Yeah, for this scenario, I’m going to force deletion on every table. Don’t worry—I’ll show you how to do this a better way later on in the series.

So, if we have a matching Section (i.e., our table belongs in this container) and if we pass the NeedsDeleted check, then we create an ExecuteSQL task which deletes all data from that table (as represented by columns 0 and 1 in the table row array). We can’t TRUNCATE because we have foreign key constraints, and remember that you really should be batching these deletes and making sure you don’t delete any data you still want hanging around. For our demo, Node2 is empty so I don’t feel bad about deleting everything, but you probably won’t have that same nonchalant attitude about your production environment.

One set of containers down; now we move on to the next.

<Container Name="Migrate To New Table" ConstraintMode="Linear">
	<Expressions>
		<Expression ExternalProperty="Disable">!(@[$Project::PerformMigration])</Expression>
	</Expressions>
	<Tasks>
		<# foreach (DataRow row in sections.Rows) { #>
		<Container Name="Migrate To New Table <#=row[0]#>" ConstraintMode="Linear">
			<Tasks>
				<# foreach (DataRow trow in tables.Rows) {
			if (Convert.ToInt32(trow[2]) == Convert.ToInt32(row[0])) {#>
				<Dataflow Name="Migrate <#=trow[1]#>" DefaultBufferSize="104857600">
					<Transformations>
						<OleDbSource Name="Old Node" ConnectionName="OldWarehouseNode">
							<DirectInput>
								SELECT 
									<#=trow[4]#>
								FROM [<#=trow[0]#>].[<#=trow[1]#>] x 
									<#=trow[3]#>
								WHERE
									<#=trow[5]#>;
							</DirectInput>
							<Parameters>
								<Parameter Name="0" VariableName="MigrateWarehouseData.CustomerID" />
							</Parameters>
						</OleDbSource>
						<OleDbDestination Name="New Node" ConnectionName="NewWarehouseNode" BatchSize="10000" KeepIdentity="true" UseFastLoadIfAvailable="true" MaximumInsertCommitSize="2000000">
							<ExternalTableOutput Table="[<#=trow[0]#>].[<#=trow[1]#>]"></ExternalTableOutput>
						</OleDbDestination>
					</Transformations>
				</Dataflow>
				<# }
			} #>
			</Tasks>
		</Container>
		<# } #>
	</Tasks>
</Container>

In this, our final real segment of code, I create another set of containers, one for each Section in ascending order—that is, { -2, -1, 0, 1, 2 }. This lets me insert rows for “parent” tables before inserting rows into “child” tables. That way, for example, I load DimProduct and DimCustomer before trying to load FactInternetSales.

For each of my section containers, I do another loop through the tables variable. If the table belongs in this section, then I would create a data flow named “Migrate [TableName].” Of course, if you have two tables with the same name (but a different schema), you would probably need to change this to be “Migrate [SchemaName].[TableName]” but I didn’t have to worry about that with my table set.

Inside our data flow, we have an OLE DB Source named Old Node and an OLE DB Destination named New Node. The source connects back to our AdventureWorksDW2012 database and ties in with MigrationStep pretty closely: we use that table to tell our SELECT column list (trow[4]), our schema name (trow[0]), table name (trow[1]), join criteria (trow[3]), and WHERE clause (trow[5]). Looking back at the MigrationStep table, you’ll notice that all of our WHERE clauses look something like c.CustomerKey = ?;. We need to fill in the parameter, so that’s what the Parameters element does, and we tie that back to the CustomerID project parameter.

Our OLE DB destination is pretty simple. BIML knows how to hook up my source and destination components, and as soon as I tell it that I need to connect to [SchemaName].[TableName] on AdventureWorksDW2012Node2, it can also connect the metadata together and load my columns list automatically.

Once I’m done, all I have to do is right-click the BIML file and select Generate SSIS packages. That creates a package which looks a bit like:

MigrateCustomerBIML

Inside each of those control flows, I see the following data flow:

MigrateCustomerDataFlowBIML

This is the power of BIML in action.