Tracing Foreign Key Dependencies In SQL Server

In my latest post on using BIML, I created a table called MigrationStep and populated it with data. Now I want to show how exactly I generated that data and how you can find foreign key constraints.

Let’s use the AdventureWorksDW2012 database as an example and look at the thirteen tables we identified as being related to DIMCustomer in some way. Note that my diagram is a bit of a post hoc analysis—I already know which tables are involved—but our solution doesn’t depend upon this level of foreknowledge.

AdventureWorksDWDiagram

Looking at this diagram, we can see that DimCustomer has “parent” tables and “child” tables. By this, I mean that DimCustomer has foreign key constraints leading to other (parent) tables, and there are tables (children) which have foreign key constraints leading back to DimCustomer. In particular, DimCustomer has a GeographyKey which ties back to DimGeography, making DimGeography a “parent” table for DimCustomer. If we want to load a row into DimCustomer, we need to load that row’s GeographyKey into DimGeography first.

In addition, there are two tables which have CustomerKey attributes which tie back to DimCustomer: FactInternetSales and FactSurveyResponse. Looking at FactInternetSales, there are a number of keys: DimProduct, DimPromotion, DimCurrency, etc. We need to load all of these dimensions before we can populate FactInternetSales…but before we can load DimProduct, we need to load DimProductCategory and then DimProductSubcategory.

With this in mind, I want to use SQL Server metadata to determine all of the tables which tie directly to some base table. I’m going to start out with a big blob of code and then I’ll explain it in detail.

DECLARE
	@BaseSchemaName SYSNAME = N'dbo',
	@BaseTableName SYSNAME = N'DimCustomer',
	@BasePrimaryKeyName SYSNAME = N'CustomerKey';

WITH Predecessors(SchemaName, TableName, TableID, Ordinal, Path, JoinCriteria, WhereCriteria, LookupColumns, LookupColumnXML) AS
(
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		0 AS Ordinal,
		CAST('/' + @BaseSchemaName + '.' + @BaseTableName + '/' AS VARCHAR(8000)) AS Path,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'x.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							', x.' + c.name
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 2, ''
				) AS VARCHAR(8000))
		) AS LookupColumns,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 0, ''
				) AS VARCHAR(8000))
		) AS LookupColumnXML
	FROM sys.tables t
	WHERE
		t.type = 'U'
		AND t.name = @BaseTableName
		AND OBJECT_SCHEMA_NAME(t.object_id) = @BaseSchemaName
	UNION ALL
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		tt.Ordinal - 1 AS Ordinal,
		CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							', x.' + c.name
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 2, ''
				) AS VARCHAR(8000))
		) AS LookupColumns,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 0, ''
				) AS VARCHAR(8000))
		) AS LookupColumnXML
	FROM sys.tables t
		INNER JOIN sys.foreign_keys f
			ON f.referenced_object_id = t.object_id
			AND f.parent_object_id <> f.referenced_object_id
		INNER JOIN Predecessors tt
			ON f.parent_object_id = tt.TableID
	WHERE
		t.type = 'U'
		AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'
),
Successors(SchemaName, TableName, TableID, Ordinal, Path, NumberOfMoves, CurrentDirection, JoinCriteria, WhereCriteria, LookupColumns, LookupColumnXML) AS
(
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		0 AS Ordinal,
		CAST('/' + @BaseSchemaName + '.' + @BaseTableName + '/' AS VARCHAR(8000)) AS Path,
		CAST(0 AS INT) AS NumberOfMoves,
		'' AS CurrentDirection,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'x.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							', x.' + c.name
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 2, ''
				) AS VARCHAR(8000))
		) AS LookupColumns,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 0, ''
				) AS VARCHAR(8000))
		) AS LookupColumnXML
	FROM sys.tables t
	WHERE
		t.type = 'U'
		AND t.name = @BaseTableName
	UNION ALL
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		tt.Ordinal + 1 AS Ordinal,
		CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
		NumberOfMoves + CASE WHEN CurrentDirection = 'D' THEN 0 ELSE 1 END AS NumberOfMoves,
		'D' AS CurrentDirection,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							', x.' + c.name
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 2, ''
				) AS VARCHAR(8000))
		) AS LookupColumns,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 0, ''
				) AS VARCHAR(8000))
		) AS LookupColumnXML
	FROM sys.tables t
		INNER JOIN sys.foreign_keys f
			ON f.parent_object_id = t.object_id
			AND f.parent_object_id <> f.referenced_object_id
		INNER JOIN Successors tt
			ON f.referenced_object_id = tt.TableID
	WHERE
		t.type = 'U'
		AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'
		--We can change direction twice:  once to go "down" the tree to get child tables, and once more to get those
		--child tables' parents.  We don't need any more moves than that; this way, we get the minimum number of tables
		--necessary to populate our base table.
		AND NumberOfMoves + CASE WHEN CurrentDirection = 'D' THEN 0 ELSE 1 END < 2
	UNION ALL
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		tt.Ordinal - 1 AS Ordinal,
		CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
		NumberOfMoves + CASE WHEN CurrentDirection = 'U' THEN 0 ELSE 1 END AS NumberOfMoves,
		'U' AS CurrentDirection,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							', x.' + c.name
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 2, ''
				) AS VARCHAR(8000))
		) AS LookupColumns,
		(
			SELECT
				CAST(STUFF
				(
					(
						SELECT
							'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
						FROM sys.key_constraints kc
							INNER JOIN sys.index_columns ic
								ON kc.parent_object_id = ic.object_id
								AND kc.unique_index_id = ic.index_id
							INNER JOIN sys.columns c
								ON ic.column_id = c.column_id
								AND ic.object_id = c.object_id
						WHERE
							kc.type = 'PK'
							AND kc.parent_object_id = t.object_id
						FOR XML PATH ('')
					), 1, 0, ''
				) AS VARCHAR(8000))
		) AS LookupColumnXML
	FROM sys.tables t
		INNER JOIN sys.foreign_keys f
			ON f.referenced_object_id = t.object_id
			AND f.parent_object_id <> f.referenced_object_id
		INNER JOIN Successors tt
			ON f.parent_object_id = tt.TableID
	WHERE
		t.type = 'U'
		AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'
		--No check here for NumberOfMoves because we want to be able to always move up the tree to find foreign key constraints.
)
SELECT
	a.SchemaName,
	a.TableName,
	MAX(Ordinal) AS Section,
	a.JoinCriteria,
	MAX(a.Path) AS Path,
	--We can hardcode DISTINCT x.* here, but there are some tables which don't need the DISTINCT
	'DISTINCT x.*' AS SelectCriteria,
	a.WhereCriteria,
	--We can hardcode 0 here because all tables have primary key constraints
	0 AS NeedToDelete,
	a.LookupColumns,
	--A side effect of FOR XML Path is that < becomes &lt; and > becomes &gt;, so we need to fix those
	REPLACE(REPLACE(a.LookupColumnXML, '&lt;', '<'), '&gt;', '>')
FROM
(
	SELECT
		p.SchemaName,
		p.TableName,
		p.TableID,
		p.Ordinal,
		p.JoinCriteria,
		p.Path,
		p.WhereCriteria,
		p.LookupColumns,
		p.LookupColumnXML
	FROM Predecessors p

	UNION ALL

	SELECT
		s.SchemaName,
		s.TableName,
		s.TableID,
		s.Ordinal,
		s.JoinCriteria,
		s.Path,
		s.WhereCriteria,
		s.LookupColumns,
		s.LookupColumnXML
	FROM Successors s
		LEFT OUTER JOIN Predecessors p
			ON s.TableID = p.TableID
	WHERE
		p.TableID IS NULL
		AND s.Path LIKE '%/' + @BaseSchemaName + '.' + @BaseTableName + '/%'
) a
GROUP BY
	a.SchemaName,
	a.TableName,
	a.JoinCriteria,
	a.WhereCriteria,
	a.LookupColumns,
	a.LookupColumnXML
ORDER BY
	Section,
	SchemaName,
	TableName;

Yeah, that’s 370 lines of code. Don’t worry—we’ll walk through it piece by piece.

My first code segment is the Predecessors CTE. This is a recursive common table expression, so we’ll break it apart into bite-sized pieces. Let’s start with the base case of our recursive CTE and hide the LookupColumns and LookupColumnsXML work:

SELECT
	OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
	OBJECT_NAME(t.object_id) AS TableName,
	t.object_id AS TableID,
	0 AS Ordinal,
	CAST('/' + @BaseSchemaName + '.' + @BaseTableName + '/' AS VARCHAR(8000)) AS Path,
	CAST('' AS VARCHAR(8000)) AS JoinCriteria,
	'x.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
	'TODO' AS LookupColumns,
	'TODO' AS LookupColumnXML
FROM sys.tables t
WHERE
	t.type = 'U'
	AND t.name = @BaseTableName
	AND OBJECT_SCHEMA_NAME(t.object_id) = @BaseSchemaName

This query looks pretty simple: look for a table named [@BaseSchemaName].[@BaseTableName]. In our example, that will translate to dbo.DimCustomer. Once we have that, we select the schema and table names, the table ID (for easy comparison), and start setting some values. Ordinal is a parameter which tells us how deep we are in the recursion chain. This is the base case, so we’re at level 0. I also generate a path to tell us the exact chain we followed to get to this point. In this case, it’s just /dbo.DimCustomer/. I put the slashes in to separate levels, and needed to cast as VARCHAR(8000) because all parts of my recursive CTE must use the exact same table types, including data sizes. For the WHERE clause, I’m assuming my table will be aliased as “x” and has my base primary key. If your base table has a composite key…well, good luck with the modifications…

That part was pretty easy, so let’s look at LookupColumns and LookupColumnsXML. You’ll see that they’re both really the same thing, just in a slightly different form:

(
	SELECT
		CAST(STUFF
		(
			(
				SELECT
					', x.' + c.name
				FROM sys.key_constraints kc
					INNER JOIN sys.index_columns ic
						ON kc.parent_object_id = ic.object_id
						AND kc.unique_index_id = ic.index_id
					INNER JOIN sys.columns c
						ON ic.column_id = c.column_id
						AND ic.object_id = c.object_id
				WHERE
					kc.type = 'PK'
					AND kc.parent_object_id = t.object_id
				FOR XML PATH ('')
			), 1, 2, ''
		) AS VARCHAR(8000))
) AS LookupColumns,
(
	SELECT
		CAST(STUFF
		(
			(
				SELECT
					'<Column SourceColumn="' + c.name + '" TargetColumn="' + c.name + '" />'
				FROM sys.key_constraints kc
					INNER JOIN sys.index_columns ic
						ON kc.parent_object_id = ic.object_id
						AND kc.unique_index_id = ic.index_id
					INNER JOIN sys.columns c
						ON ic.column_id = c.column_id
						AND ic.object_id = c.object_id
				WHERE
					kc.type = 'PK'
					AND kc.parent_object_id = t.object_id
				FOR XML PATH ('')
			), 1, 0, ''
		) AS VARCHAR(8000))
) AS LookupColumnXML

I’ll focus on LookupColumns for this description. What I’m doing here is finding all of the columns associated with this table’s primary key and turning it into a comma-delimited list with the FOR XML PATH trick. This will create a string like “, x.Column1, x.Column2, x.Column3” and then I use the STUFF function to strip off the first two characters, leaving “x.Column1, x.Column2, x.Column3” for us.

LookupColumnXML is basically the same thing, except that instead of a comma-delimited list, I’m generating .

Having walked through this, let’s now look at the second half of the Predecessors recursive CTE.

UNION ALL
	SELECT
		OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
		OBJECT_NAME(t.object_id) AS TableName,
		t.object_id AS TableID,
		tt.Ordinal - 1 AS Ordinal,
		CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
		CAST('' AS VARCHAR(8000)) AS JoinCriteria,
		'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
		'LOOKED AT' AS LookupColumns,
		'LOOKED AT' AS LookupColumnXML
	FROM sys.tables t
		INNER JOIN sys.foreign_keys f
			ON f.referenced_object_id = t.object_id
			AND f.parent_object_id <> f.referenced_object_id
		INNER JOIN Predecessors tt
			ON f.parent_object_id = tt.TableID
	WHERE
		t.type = 'U'
		AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'

We’ve already looked at how LookupColumns and LookupColumnXML get populated, and it’s the same throughout the entire code base, so I’m going to replace those references with “LOOKED AT” to simplify the code.

In this code snippet, we have the recursive case. Once we look at our base table, we want to see any tables to which our base table links, followed by any tables to which those tables link, and so on until we run out of links. The way that we do this is in the FROM clause: we join sys.tables to sys.foreign_keys to get any tables in which the parent object ID is not the referenced object ID (to skip self joins), and then we join to the Predecessors common table expression to form the chain. In our concrete scenario, DimCustomer has a foreign key to DimGeography, so the second round of recursion would bring in that table. DimGeography has a foreign key to DimSalesTerritory, so the third round of recursion would bring that table in. We aren’t limited to bringing in just one table per round, but given this particular data model, that’s all we get.

In the SELECT clause, get the schema and table information (SchemaName, TableName, and TableID) the same as in the base case. After that, several things have changed. First of all, I decremented the Ordinal value. This lets us build a number line for precedence: I know that DimSalesTerritory precedes DimGeography, which itself precedes DimCustomer. Thus, I could assign DimCustomer a value of 0, DimGeography a value of -1, and DimSalesTerritory a value of -2. If I have two tables at the same level, it means that I can load records into both concurrently without violating foreign key constraints, assuming I have already loaded the appropriate records for any tables with earlier levels.

Next up is Path, in which I append the current schema and table name to the previous recursion level’s Path. Path is important here because relational database keys can be expressed as a directed cyclical graph, but recursion in a cycle leads to stack overflows. If I have tables A, B, and C, in which A has a key to B, which has a key to C, which has a key back to A, my recursion would build up A, B, C, A, B, C, etc. I don’t want any cycles in my report (not only to protect against stack overflows, but also because I’m interested in the shortest path to a table, and that necessarily precludes a cycle.

Once I take care of path, I set my WhereCriteria value. The “c.” is going to be the base table alias that I plan to use. Because every table in this set eventually ties back to the base table, I can build a valid set of joins to filter every table to include only the records I care about.

So, that’s the Predecessors common table expression, which takes care of any dependencies for our base table. From there, we want to branch down to the children of our base table, as well as those tables’ children, etc. To do this, I use the Successors common table expression. You’ll note that the Successors common table expression is very similar to Predecessors, but includes just a little bit more. Just like with Predecessors, let’s start with the base case.

SELECT
	OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
	OBJECT_NAME(t.object_id) AS TableName,
	t.object_id AS TableID,
	0 AS Ordinal,
	CAST('/' + @BaseSchemaName + '.' + @BaseTableName + '/' AS VARCHAR(8000)) AS Path,
	CAST(0 AS INT) AS NumberOfMoves,
	'' AS CurrentDirection,
	CAST('' AS VARCHAR(8000)) AS JoinCriteria,
	'x.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
	'LOOKED AT' AS LookupColumns,
	'LOOKED AT' AS LookupColumnXML
FROM sys.tables t
WHERE
	t.type = 'U'
	AND t.name = @BaseTableName

This base case is almost exactly the same as Predecessors. The two differences are the attributes NumberOfMoves and CurrentDirection. For our base case, we don’t have a direction (either “up” the chain to parents or “down” the chain to children), and we haven’t made any directional changes yet, so NumberOfMoves is 0.

Let’s look at the next part of this CTE:

UNION ALL
SELECT
	OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
	OBJECT_NAME(t.object_id) AS TableName,
	t.object_id AS TableID,
	tt.Ordinal + 1 AS Ordinal,
	CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
	NumberOfMoves + CASE WHEN CurrentDirection = 'D' THEN 0 ELSE 1 END AS NumberOfMoves,
	'D' AS CurrentDirection,
	CAST('' AS VARCHAR(8000)) AS JoinCriteria,
	'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
	'LOOKED AT' AS LookupColumns,
	'LOOKED AT' AS LookupColumnXML
FROM sys.tables t
	INNER JOIN sys.foreign_keys f
		ON f.parent_object_id = t.object_id
		AND f.parent_object_id <> f.referenced_object_id
	INNER JOIN Successors tt
		ON f.referenced_object_id = tt.TableID
WHERE
	t.type = 'U'
	AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'
	--We can change direction twice:  once to go "down" the tree to get child tables, and once more to get those
	--child tables' parents.  We don't need any more moves than that; this way, we get the minimum number of tables
	--necessary to populate our base table.
	AND NumberOfMoves + CASE WHEN CurrentDirection = 'D' THEN 0 ELSE 1 END < 2

This part selects the “children” of our base table, as well as those tables’ children, and so on. Thus, CurrentDirection is “down” the tree. To understand NumberOfMoves, let me walk through this concrete example. Remember that DimCustomer is the base table and has an Ordinal of 0. There are two tables which have CustomerKeys which tie back to DimCustomer: FactInternetSales and FactSurveyResponse. At this point, we still have the base case’s CurrentDirection, which is ”. Because that doesn’t match ‘D’, we set NumberOfMoves equal to the base case’s number of moves (0) plus 1. We have changed direction once at this point.

Now, let’s go to the next level. FactInternetSales has one child table: FactInternetSalesReason. This table will be picked up in the third run of this CTE, so its ordinal will be 2, and we check number of moves again. Because the CurrentDirection for FactInternetSales is D and we’re still going down the chain, NumberOfMoves = parent number of moves (1) + 0, or 1. We have not changed direction; we’ve simply moved down the chain.

This is great, but leads to a problem: FactInternetSalesReason has a relationship with FactInternetSales, but also one to DimSalesReason. We don’t have DimSalesReason in our set anywhere, so trying to insert solely into the tables we’ve gathered so far will fail! That’s where the next part of the Successors CTE comes in:

UNION ALL
SELECT
	OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
	OBJECT_NAME(t.object_id) AS TableName,
	t.object_id AS TableID,
	tt.Ordinal - 1 AS Ordinal,
	CAST(Path + '/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/' AS VARCHAR(8000)) AS Path,
	NumberOfMoves + CASE WHEN CurrentDirection = 'U' THEN 0 ELSE 1 END AS NumberOfMoves,
	'U' AS CurrentDirection,
	CAST('' AS VARCHAR(8000)) AS JoinCriteria,
	'c.' + @BasePrimaryKeyName + ' = ?' AS WhereCriteria,
	'LOOKED AT' AS LookupColumns,
	'LOOKED AT' AS LookupColumnXML
FROM sys.tables t
	INNER JOIN sys.foreign_keys f
		ON f.referenced_object_id = t.object_id
		AND f.parent_object_id <> f.referenced_object_id
	INNER JOIN Successors tt
		ON f.parent_object_id = tt.TableID
WHERE
	t.type = 'U'
	AND Path NOT LIKE '%/' + OBJECT_SCHEMA_NAME(t.object_id) + '.' + OBJECT_NAME(t.object_id) + '/%'
	--No check here for NumberOfMoves because we want to be able to always move up the tree to find foreign key constraints.

Now we’re moving “up” the tree to find any parents. For FactInternetSalesReason, we’d move back up to see DimSalesReason (remember that we won’t see FactInternetSales again because the Path check forbids cycles). With this moving up the tree, we change the current direction from ‘D’ to ‘U’ and increment the number of directional changes if we were headed downstream or are starting out at the base.

Note my comment at the bottom of this CTE: I do not check the number of directional changes because I always want to be able to drive up through the tree to include any tables required for loading. This leads to a final question on the topic: why check for number of moves at all? The reason for that is to prevent “filling the tree,” as I’ll call it. I might start from DimCustomer and drive down to FactInternetSales. From there, we need to go up the tree and load DimProduct. If I had an unlimited number of up-and-down moves, the recursive CTE would then move down the tree from DimProduct into FactProductInventory. But nothing in FactProductInventory really ties back to a particular customer—I don’t need that table loaded. This is why I allow one downward cut but all the upward movement the recursive CTE needs.

With all of this in mind, my final query starts with a subquery:

SELECT
	p.SchemaName,
	p.TableName,
	p.TableID,
	p.Ordinal,
	p.JoinCriteria,
	p.Path,
	p.WhereCriteria,
	p.LookupColumns,
	p.LookupColumnXML
FROM Predecessors p

UNION ALL

SELECT
	s.SchemaName,
	s.TableName,
	s.TableID,
	s.Ordinal,
	s.JoinCriteria,
	s.Path,
	s.WhereCriteria,
	s.LookupColumns,
	s.LookupColumnXML
FROM Successors s
	LEFT OUTER JOIN Predecessors p
		ON s.TableID = p.TableID
WHERE
	p.TableID IS NULL
	AND s.Path LIKE '%/' + @BaseSchemaName + '.' + @BaseTableName + '/%'

Yeah, this could be another common table expression, but for the purposes of demonstration, I wanted to separate my two recursive CTEs from the mechanics of combining results together.

There are two interesting bits in this subquery, both on the Successors side of the union. First of all, I join Successors back to Predecessors and filter out any tables which exist in Predecessors. This has one important use: the way that I derive my final Section number is to check the MAX(Ordinal). If I have a table in Predecessors, that number should be negative, but if it shows up in Successors, it will be positive. If the number is positive, then a process would try to load my base table’s parent table only after trying to load the base table! This would cause a foreign key constraint violation, so instead I have to take care of this scenario. As a note, this actually does appear in the data set: DimSalesTerritory is in the Predecessors list as well as the Successors list (where it ties to FactInternetSales).

My other interesting part is that I force Path to include my base table. As an aside, this is technically unnecessary, but just in case somebody starts monkeying with the recursive CTE and introduces invalid rows, this is a weak filter.

That all forms my base table, which I alias as “a.” From there, I have my final SELECT, GROUP BY, and ORDER BY clauses:

SELECT
	a.SchemaName,
	a.TableName,
	MAX(Ordinal) AS Section,
	a.JoinCriteria,
	MAX(a.Path) AS Path,
	--We can hardcode DISTINCT x.* here, but there are some tables which don't need the DISTINCT
	'DISTINCT x.*' AS SelectCriteria,
	a.WhereCriteria,
	--We can hardcode 0 here because all tables have primary key constraints
	0 AS NeedToDelete,
	a.LookupColumns,
	--A side effect of FOR XML Path is that < becomes &lt; and > becomes &gt;, so we need to fix those
	REPLACE(REPLACE(a.LookupColumnXML, '&lt;', '<'), '&gt;', '>')
FROM
(LOOKED AT) a
GROUP BY
	a.SchemaName,
	a.TableName,
	a.JoinCriteria,
	a.WhereCriteria,
	a.LookupColumns,
	a.LookupColumnXML
ORDER BY
	Section,
	SchemaName,
	TableName;

As noted above, my Section is the largest value of Ordinal. This way, in the event that a table shows up at two separate points, we can ensure that all predecessors are loaded. I also show the path, which lets us see how the table is connected. The rest of this is simply hard-coding values that I need or displaying the set of lookup columns.

One important note on this is that I do not have join criteria created. Given that I know the tables and keys, I could create a subquery in my CTEs which defines valid join criteria. But because I did this for a small data set, I decided to hand-craft my join criteria. If I ever need to do this for a larger table set, I’ll probably write the query to do this correctly… In any event, we now have a table which describes our foreign key relationships between tables. Of course, if you have a database without foreign key constraints, you can’t walk up and down the chain this way, so that’s yet another reason to populate those foreign keys, whether it be a normalized system or Kimball-style warehouse.

Advertisements

One thought on “Tracing Foreign Key Dependencies In SQL Server

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s