Suppose that you want to find all of the tables in a database which are related via foreign key constraint.  Here’s an easy example: suppose T3 has a foreign key (T2ID) to T2, and T2 has a foreign key (T1ID) to T1. Given a table name (T2, say), we want to return T2, as well as any tables to which T2 links, as well as any tables to which those tables link, etc. In our example, we would have two results: T2 and T1. If we selected T3, we would like to see T3, T2, and T1. If we selected T1, we would like to see only T1, because it does not have any foreign key relationships with other tables.

Let’s set up some tables and foreign keys to show what I mean:

create table T1(id int not null);
create table T2(id int not null, T1ID int not null);
create table T3(id int not null, T2ID int not null);

alter table T1 add constraint [PK_T1] primary key(id);
alter table T2 add constraint [PK_T2] primary key(id);
alter table T3 add constraint [PK_T3] primary key(id);

alter table T3 add constraint [FK_T3] foreign key(T2ID) references T2(id);
alter table T2 add constraint [FK_T2] foreign key(T1ID) references T1(id);

The following is a relatively simple query which does exactly what we want:

declare @SchemaName sysname = 'dbo';
declare @TableName sysname = 'T2';

with foreignkeys as
(
	select
		OBJECT_ID(@SchemaName + '.' + @TableName) as ObjectID,
		@SchemaName as SchemaName,
		@TableName as TableName,
		convert(varchar(8000), @SchemaName + '.' + @TableName) as [Path]

    UNION ALL

    select
        ref.object_id as ObjectID,
        refs.name as SchemaName,
        ref.name as TableName,
        convert(varchar(8000), fks.[Path] + '/' + refs.name + '.' + ref.name) as [Path]
    from
        sys.foreign_keys fk
        inner join sys.tables ref on fk.referenced_object_id = ref.object_id
        inner join sys.schemas refs on ref.schema_id = refs.schema_id
        inner join foreignkeys fks on fks.objectid = fk.parent_object_id
    where
        fks.objectid <> ref.object_id
        AND fks.[Path] NOT LIKE '%' + refs.name + '.' + ref.name + '%'
)
select distinct
    SchemaName + '.' + TableName
from
    foreignkeys;

If we run this script on T3, it will return three tables: T3, T2, and T1. If we run it on T2, it will return two tables: T2 and T1. If we run it on T1, it will return only T1. Now let’s explain how it works.

The foreignkeys common table expression is actually an example of a recursive CTE. Like all recursive calls, we need a base case and a recursion case. In this instance, the base case is the first part of the UNION ALL statement. It simply gets the declared table’s object ID and a base Path, which contains the value ‘dbo.T2’ for our example scenario.  Then, we combine with the recursive case.  The recursive case is a little more difficult.  It drives from sys.foreign_keys, a systemtable which lists all foreign key relationships.  We’re looking for the referenced objects:  the parent in your classic parent-child relationship.  But it’s important not to confuse “parent-child” parent with parent_object_id on sys.foreign_keys:  the parent_object_id actually relates to the “child” table!

What’s interesting in the bottom half of the CTE is the where clause.  We want to avoid self-join loops, so we add in fks.objectid <> ref.object_id.  But there’s another type of loop:  a multi-table loop.  Suppose that we have table A1 which joins to A2 which joins to A3 which joins to A4 which joins to A1.  As we traverse the path, we end up in an infinite loop:  A1 will take you to A2, which takes you to A3, which takes you to A4, which takes you to A1, which takes you to A2, and so on, ad infinitum.  That’s what the [Path] column helps us with:  if we see a joined table already on the path, we can ignore it.  So A1 will show up once, but the next time we get to A1 in our recursive CTE, that record gets filtered out, thereby killing the loop and saving us all from a never-ending query.

Finally, we take the distinct set of schema and table names, and mission accomplished.

But what if you also want to see the child tables?  Suppose that we’re back to T2 and we want to see T3 as well as T2 and T1.  How can we do that?  If your first answer is “another CTE,” you’re right!

declare @SchemaName sysname = 'dbo';
declare @TableName sysname = 'T2';

with foreignkeys as
(
	select
		OBJECT_ID(@SchemaName + '.' + @TableName) as ObjectID,
		@SchemaName as SchemaName,
		@TableName as TableName,
		convert(varchar(8000), @SchemaName + '.' + @TableName) as [Path]

    UNION ALL

    select
        ref.object_id as ObjectID,
        refs.name as SchemaName,
        ref.name as TableName,
        convert(varchar(8000), fks.[Path] + '/' + refs.name + '.' + ref.name) as [Path]
    from
        sys.foreign_keys fk
        inner join sys.tables ref on fk.referenced_object_id = ref.object_id
        inner join sys.schemas refs on ref.schema_id = refs.schema_id
        inner join foreignkeys fks on fks.objectid = fk.parent_object_id
    where
        fks.objectid &lt;&gt; ref.object_id
        AND fks.[Path] NOT LIKE '%' + refs.name + '.' + ref.name + '%'
),
parentsandchildren as
(
    select
        ObjectID,
        SchemaName,
        TableName,
        [Path]
    from
        foreignkeys

    UNION ALL

    select
        tbl.object_id as ObjectID,
        tbls.name as SchemaName,
        tbl.name as TableName,
        convert(varchar(8000), fks.[Path] + '/' + tbls.name + '.' + tbl.name) as [Path]
    from
        sys.foreign_keys fk
        inner join sys.tables tbl on fk.parent_object_id = tbl.object_id
        inner join sys.schemas tbls on tbl.schema_id = tbls.schema_id
        inner join parentsandchildren fks on fks.objectid = fk.referenced_object_id
    where
        fks.objectid &lt;&gt; tbl.object_id
        AND fks.[Path] NOT LIKE '%' + tbls.name + '.' + tbl.name + '%'

)
select distinct
    SchemaName + '.' + TableName
from
    parentsandchildren;

Most of this query looks the same as before, but now we have a second recursive CTE. The first recursive CTE still gets the set of parents. The second recursive CTE takes that set of parents and searches in sys.foreign_keys for child tables.  Once again, we use the Path to make sure we don’t get caught up in any loops. When we run this query, we get the expected results: a result set with the values T1, T2, and T3.

But let’s try a more complicated example, one with some cycles in it. This is a great example for this query, but if your relational database looks like this, you’re in trouble…

create table A1(id int not null, someint int);
create table A2(id int not null, someint int);
create table A3(id int not null, someint int);
create table A4(id int not null, someint int);
create table A5(id int not null, someint int);
create table A6(id int not null, someint int);
create table A7(id int not null, someint int);

alter table A1 add constraint [PK_A1] primary key(id);
alter table A2 add constraint [PK_A2] primary key(id);
alter table A3 add constraint [PK_A3] primary key(id);
alter table A4 add constraint [PK_A4] primary key(id);
alter table A5 add constraint [PK_A5] primary key(id);
alter table A6 add constraint [PK_A6] primary key(id);
alter table A7 add constraint [PK_A7] primary key(id);

alter table A6 add constraint [FK_A6_A1] foreign key(someint) references A1(id);
alter table A6 add constraint [FK_A6_A2] foreign key(someint) references A2(id);
alter table A6 add constraint [FK_A6_A7] foreign key(someint) references A7(id);
alter table A7 add constraint [FK_A7_A2] foreign key(someint) references A2(id);
alter table A7 add constraint [FK_A7_A3] foreign key(someint) references A3(id);
alter table A1 add constraint [FK_A1_A4] foreign key(someint) references A4(id);
alter table A1 add constraint [FK_A1_A6] foreign key(someint) references A6(id);
alter table A2 add constraint [FK_A2_A1] foreign key(someint) references A1(id);
alter table A2 add constraint [FK_A2_A3] foreign key(someint) references A3(id);
alter table A2 add constraint [FK_A2_A4] foreign key(someint) references A4(id);
alter table A3 add constraint [FK_A3_A5] foreign key(someint) references A5(id);
alter table A4 add constraint [FK_A4_A5] foreign key(someint) references A5(id);

Here is an ASCII graphical representation of the graph we created:

Key structure:
	X===>Y		X has a foreign key to Y
	X           X has a foreign key to Y
	\\              (top has a FK relation to bottom)
	 \\
	  Y

	  X			X has a foreign key to Y
	  ##			and Y has a foreign key to X
	   ##
	    Y

	   A6======>A7
	  ## \\    //\\
	 ##   \\  //  \\
	A1<=====A2====>A3
          \\  //       //
           \\//       //
            A4======>A5

In this case, we have a cycle:  A2 –> A1 –> A6 –> A2.  We also have a second cycle, from A2 –> A1 –> A6 –> A7 –> A2.  Even with these two cycles, the queries above work without getting caught up in a cycle.

Finally, here is a drop script for everything we’ve done so far:

drop table T3;
drop table T2;
drop table T1;

alter table A6 drop constraint [FK_A6_A1];
alter table A6 drop constraint [FK_A6_A2];
alter table A6 drop constraint [FK_A6_A7];
alter table A7 drop constraint [FK_A7_A2];
alter table A7 drop constraint [FK_A7_A3];
alter table A1 drop constraint [FK_A1_A4];
alter table A1 drop constraint [FK_A1_A6];
alter table A2 drop constraint [FK_A2_A1];
alter table A2 drop constraint [FK_A2_A3];
alter table A2 drop constraint [FK_A2_A4];
alter table A3 drop constraint [FK_A3_A5];
alter table A4 drop constraint [FK_A4_A5];

drop table A1;
drop table A2;
drop table A3;
drop table A4;
drop table A5;
drop table A6;
drop table A7;

2 thoughts on “Finding Foreign Key Chains

Leave a comment