In my last post on the topic, we walked through the BIML necessary to generate an SSIS package. The end result is useful and allows us to treat SSIS packages as binary code rather than source code, but the real power to BIML is in automation. In this case, I want to automate every relevant table in my database.
From here on out, I’m going to use AdventureWorksDW2012 as my warehouse. My local SQL Server instance has two databases: AdventureWorksDW2012 and AdventureWorksDW2012Node2. AdventureWorksDW2012Node2 is the same as the main instance except that I only have database structure and not any data. That way, we can migrate data from the primary node to Node2.
In our scenario, we want to move data for a customer from one node to another. This drives from DimCustomer.CustomerKey. To insert a DimCustomer record, we need DimGeography populated, and to populate that, we need to load DimSalesTerritory. The three facts which tie to DimCustomer are FactInternetSales, FactInternetSalesReason, and FactSurveyResponse. To get those facts’ data in, we need to have valid information for DimProduct, DimDate, DimPromotion, DimCurrency, and DimSalesReason. Before I can load DimProduct, I need to load DimProductCategory and DimProductSubcategory. All in all, this means that I need to migrate data from 13 separate tables to migrate customer data to another node.
To maintain this information and know in which order I need to load each table, I’ve created a table called MigrationStep. In a future post, I’ll describe how to populate MigrationStep so we can keep this table in sync if the underlying fact-dimension structure changes. For now, I’ll offer this as a T-SQL script to create and populate the table:
USE [AdventureWorksDW2012] GO CREATE TABLE [dbo].[MigrationStep] ( [MigrationStepID] [int] IDENTITY(1,1) NOT NULL, [SchemaName] [sysname] NOT NULL, [TableName] [sysname] NOT NULL, [Section] [int] NULL, [JoinCriteria] [nvarchar](2000) NULL, [SelectCriteria] [nvarchar](2000) NULL, [WhereCriteria] [nvarchar](100) NULL, [NeedToDelete] [bit] NOT NULL, [LookupColumns] [nvarchar](512) NULL, [LookupColumnXML] [nvarchar](1024) NULL, CONSTRAINT [PK_MigrationStep] PRIMARY KEY CLUSTERED ( [MigrationStepID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY] ) ON [PRIMARY] GO INSERT INTO dbo.MigrationStep ( SchemaName, TableName, Section, JoinCriteria, SelectCriteria, WhereCriteria, NeedToDelete, LookupColumns, LookupColumnXML ) VALUES ('dbo', 'DimProductCategory', -2, 'INNER JOIN dbo.DimProductSubcategory s ON s.ProductCategoryKey = x.ProductCategoryKey INNER JOIN dbo.DimProduct p ON p.ProductSubcategoryKey = s.ProductSubcategoryKey INNER JOIN dbo.FactInternetSales f ON p.ProductKey = f.ProductKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.ProductCategoryKey', '<Column SourceColumn="ProductCategoryKey" TargetColumn="ProductCategoryKey" />'), ('dbo', 'DimSalesTerritory', -2, 'INNER JOIN dbo.DimGeography g ON x.SalesTerritoryKey = g.SalesTerritoryKey INNER JOIN dbo.DimCustomer c ON g.GeographyKey = c.GeographyKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.SalesTerritoryKey', '<Column SourceColumn="SalesTerritoryKey" TargetColumn="SalesTerritoryKey" />'), ('dbo', 'DimProductSubcategory', -1, 'INNER JOIN dbo.DimProduct p ON p.ProductSubcategoryKey = x.ProductSubcategoryKey INNER JOIN dbo.FactInternetSales f ON p.ProductKey = f.ProductKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.ProductSubcategoryKey', '<Column SourceColumn="ProductSubcategoryKey" TargetColumn="ProductSubcategoryKey" />'), ('dbo', 'DimGeography', -1, 'INNER JOIN dbo.DimCustomer c ON x.GeographyKey = c.GeographyKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.GeographyKey', '<Column SourceColumn="GeographyKey" TargetColumn="GeographyKey" />'), ('dbo', 'DimCurrency', 0, 'INNER JOIN dbo.FactInternetSales f ON x.CurrencyKey = f.CurrencyKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.CurrencyKey', '<Column SourceColumn="CurrencyKey" TargetColumn="CurrencyKey" />'), ('dbo', 'DimCustomer', 0, '', 'DISTINCT x.*', 'x.CustomerKey = ?', 0, 'x.CustomerKey', '<Column SourceColumn="CustomerKey" TargetColumn="CustomerKey" />'), ('dbo', 'DimDate', 0, 'INNER JOIN dbo.FactInternetSales f ON x.DateKey = f.OrderDateKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', '1 = 1', 0, 'x.DateKey', '<Column SourceColumn="DateKey" TargetColumn="DateKey" />'), ('dbo', 'DimProduct', 0, 'INNER JOIN dbo.FactInternetSales f ON x.ProductKey = f.ProductKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.ProductKey', '<Column SourceColumn="ProductKey" TargetColumn="ProductKey" />'), ('dbo', 'DimPromotion', 0, 'INNER JOIN dbo.FactInternetSales f ON x.PromotionKey = f.PromotionKey INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.PromotionKey', '<Column SourceColumn="PromotionKey" TargetColumn="PromotionKey" />'), ('dbo', 'DimSalesReason', 1, 'INNER JOIN dbo.FactInternetSalesReason fsr ON x.SalesReasonKey = fsr.SalesReasonKey INNER JOIN dbo.FactInternetSales f ON fsr.SalesOrderNumber = f.SalesOrderNumber AND fsr.SalesOrderLineNumber = f.SalesOrderLineNumber INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.SalesReasonKey', '<Column SourceColumn="SalesReasonKey" TargetColumn="SalesReasonKey" />'), ('dbo', 'FactInternetSales', 1, 'INNER JOIN dbo.DimCustomer c ON x.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.SalesOrderNumber, x.SalesOrderLineNumber', '<Column SourceColumn="SalesOrderNumber" TargetColumn="SalesOrderNumber" /><Column SourceColumn="SalesOrderLineNumber" TargetColumn="SalesOrderLineNumber" />'), ('dbo', 'FactSurveyResponse', 1, 'INNER JOIN dbo.DimCustomer c ON x.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.SurveyResponseKey', '<Column SourceColumn="SurveyResponseKey" TargetColumn="SurveyResponseKey" />'), ('dbo', 'FactInternetSalesReason', 2, 'INNER JOIN dbo.FactInternetSales f ON x.SalesOrderNumber = f.SalesOrderNumber AND x.SalesOrderLineNumber = f.SalesOrderLineNumber INNER JOIN dbo.DimCustomer c ON f.CustomerKey = c.CustomerKey', 'DISTINCT x.*', 'c.CustomerKey = ?', 0, 'x.SalesOrderNumber, x.SalesOrderLineNumber, x.SalesReasonKey', '<Column SourceColumn="SalesOrderNumber" TargetColumn="SalesOrderNumber" /><Column SourceColumn="SalesOrderLineNumber" TargetColumn="SalesOrderLineNumber" /><Column SourceColumn="SalesReasonKey" TargetColumn="SalesReasonKey" />');
The one thing I want to point out is that the line which loads DIMDate has a WHERE clause of 1 = 1—that’s the easier method to populate that dimension, as otherwise I would need to UNION together all of the date keys for all of the facts and dimensions which join to DIMDate, and it’s just simpler to move everything.
Now that we have a table, it’s time to integrate that into our BIML that we developed. Just like last time, I’ll give you the full file first and then walk through it piece by piece.
<#@ template language="C#" hostspecific="true" #> <#@ import namespace="System.Data" #> <Biml xmlns="http://schemas.varigence.com/biml.xsd"> <# string metadataConnectionString ="Provider=SQLNCLI10;Server=LOCALHOST;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;"; DataTable tables = ExternalDataAccess.GetDataTable(metadataConnectionString, "SELECT SchemaName, TableName, Section, JoinCriteria, SelectCriteria, WhereCriteria, NeedToDelete, LookupColumns, LookupColumnXML FROM dbo.MigrationStep ORDER BY Section, SchemaName, TableName"); DataTable sections = ExternalDataAccess.GetDataTable(metadataConnectionString, "SELECT DISTINCT Section FROM dbo.MigrationStep ORDER BY Section"); DataTable sectionsDesc = ExternalDataAccess.GetDataTable(metadataConnectionString, "SELECT DISTINCT Section FROM dbo.MigrationStep ORDER BY Section DESC"); #> <PackageProjects> <PackageProject Name="MigrateWarehouseData" ProtectionLevel="DontSaveSensitive"> <Parameters> <Parameter Name="CustomerID" DataType="Int32">1234</Parameter> <Parameter Name="OldWarehouseServer" DataType="String">LOCALHOST</Parameter> <Parameter Name="OldWarehouseDatabase" DataType="String">AdventureWorksDW2012</Parameter> <Parameter Name="NewWarehouseServer" DataType="String">LOCALHOST</Parameter> <Parameter Name="NewWarehouseDatabase" DataType="String">AdventureWorksDW2012Node2</Parameter> <Parameter Name="PerformMigration" DataType="Boolean">True</Parameter> <Parameter Name="PerformDeletion" DataType="Boolean">True</Parameter> </Parameters> <Packages> <Package PackageName="MigrateCustomer" /> </Packages> </PackageProject> </PackageProjects> <Connections> <Connection Name="OldWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" /> <Connection Name="NewWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=AdventureWorksDW2012Node2;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" /> </Connections> <Packages> <Package Name="MigrateCustomer" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive"> <Connections> <Connection ConnectionName="OldWarehouseNode"> <Expressions> <Expression ExternalProperty="InitalCatalog">@[$Project::OldWarehouseDatabase]</Expression> <Expression ExternalProperty="ServerName">@[$Project::OldWarehouseServer]</Expression> </Expressions> </Connection> <Connection ConnectionName="NewWarehouseNode"> <Expressions> <Expression ExternalProperty="InitialCatalog">@[$Project::NewWarehouseDatabase]</Expression> <Expression ExternalProperty="ServerName">@[$Project::NewWarehouseServer]</Expression> </Expressions> </Connection> </Connections> <Tasks> <Container Name="Delete From New Tables" ConstraintMode="Linear"> <Expressions> <Expression ExternalProperty="Disable">!(@[$Project::PerformDeletion])</Expression> </Expressions> <Tasks> <# foreach (DataRow row in sectionsDesc.Rows) { #> <Container Name="Delete From New Table <#=row[0]#>" ConstraintMode="Linear"> <Tasks> <# foreach (DataRow trow in tables.Rows) { //For this round, I want to make sure EVERY table has a Delete action. We'll fix that later... if (Convert.ToInt32(trow[2]) == Convert.ToInt32(row[0]) && (Convert.ToBoolean(trow[6]) || 1 == 1)) {#> <ExecuteSQL Name="Prepare <#=trow[1]#>" ConnectionName="NewWarehouseNode"> <DirectInput> DELETE FROM [<#=trow[0]#>].[<#=trow[1]#>]; </DirectInput> </ExecuteSQL> <# } } #> </Tasks> </Container> <# } #> </Tasks> </Container> <Container Name="Migrate To New Table" ConstraintMode="Linear"> <Expressions> <Expression ExternalProperty="Disable">!(@[$Project::PerformMigration])</Expression> </Expressions> <Tasks> <# foreach (DataRow row in sections.Rows) { #> <Container Name="Migrate To New Table <#=row[0]#>" ConstraintMode="Linear"> <Tasks> <# foreach (DataRow trow in tables.Rows) { if (Convert.ToInt32(trow[2]) == Convert.ToInt32(row[0])) {#> <Dataflow Name="Migrate <#=trow[1]#>" DefaultBufferSize="104857600"> <Transformations> <OleDbSource Name="Old Node" ConnectionName="OldWarehouseNode"> <DirectInput> SELECT <#=trow[4]#> FROM [<#=trow[0]#>].[<#=trow[1]#>] x <#=trow[3]#> WHERE <#=trow[5]#>; </DirectInput> <Parameters> <Parameter Name="0" VariableName="MigrateWarehouseData.CustomerID" /> </Parameters> </OleDbSource> <OleDbDestination Name="New Node" ConnectionName="NewWarehouseNode" BatchSize="10000" KeepIdentity=<# if(!Convert.ToBoolean(trow[6])) { #> "true" <# } else { #> "false" <# } #> UseFastLoadIfAvailable="true" MaximumInsertCommitSize="2000000"> <ExternalTableOutput Table="[<#=trow[0]#>].[<#=trow[1]#>]"></ExternalTableOutput> </OleDbDestination> </Transformations> </Dataflow> <# } } #> </Tasks> </Container> <# } #> </Tasks> </Container> </Tasks> </Package> </Packages> </Biml>
Given 25 more lines of BIML, we now have a package which deletes from 13 tables and migrates data to 13 tables. The best part is that if I add more tables, I don’t need to modify this BIML script–I just generate a new set of packages from the BIML and I’m good to go. In contrast, adding tables the old-fashioned way involves modifying packages and can get time-consuming depending upon how many steps your workflow has.
At any rate, let’s look at the changes, starting at the top.
<#@ template language="C#" hostspecific="true" #> <#@ import namespace="System.Data" #> <Biml xmlns="http://schemas.varigence.com/biml.xsd"> <# string metadataConnectionString ="Provider=SQLNCLI10;Server=LOCALHOST;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;"; DataTable tables = ExternalDataAccess.GetDataTable(metadataConnectionString, "SELECT SchemaName, TableName, Section, JoinCriteria, SelectCriteria, WhereCriteria, NeedToDelete, LookupColumns, LookupColumnXML FROM dbo.MigrationStep ORDER BY Section, SchemaName, TableName"); DataTable sections = ExternalDataAccess.GetDataTable(metadataConnectionString, "SELECT DISTINCT Section FROM dbo.MigrationStep ORDER BY Section"); DataTable sectionsDesc = ExternalDataAccess.GetDataTable(metadataConnectionString, "SELECT DISTINCT Section FROM dbo.MigrationStep ORDER BY Section DESC"); #>
On the very first line, I tell BIML that my templating language is C#. After that, I use the “import” declaration to bring in the .NET framework’s System.Data namespace. By the way, this templating may look familiar if you’re used to classic ASP or ASP.NET Web Forms.
My first major block of code is another templated section. In this, I create a connection string to my AdventureWorksDW2012 node, as that’s where my MigrationStep table is. I create three separate DataTables, one to get all of the columns, one to get just a distinct list of Sections, and one to get a distinct list of Sections in descending order. Each Section here is a grouping of tables; I have different sections because of foreign key chain dependencies. With these sections, I’ll use the descending-order list to delete from blocks of tables at a time and the ascending-order list to insert into blocks of tables.
<PackageProjects> <PackageProject Name="MigrateWarehouseData" ProtectionLevel="DontSaveSensitive"> <Parameters> <Parameter Name="CustomerID" DataType="Int32">1234</Parameter> <Parameter Name="OldWarehouseServer" DataType="String">LOCALHOST</Parameter> <Parameter Name="OldWarehouseDatabase" DataType="String">AdventureWorksDW2012</Parameter> <Parameter Name="NewWarehouseServer" DataType="String">LOCALHOST</Parameter> <Parameter Name="NewWarehouseDatabase" DataType="String">AdventureWorksDW2012Node2</Parameter> <Parameter Name="PerformMigration" DataType="Boolean">True</Parameter> <Parameter Name="PerformDeletion" DataType="Boolean">True</Parameter> </Parameters> <Packages> <Package PackageName="MigrateCustomer" /> </Packages> </PackageProject> </PackageProjects> <Connections> <Connection Name="OldWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" /> <Connection Name="NewWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=AdventureWorksDW2012Node2;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" /> </Connections> <Packages> <Package Name="MigrateCustomer" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive"> <Connections> <Connection ConnectionName="OldWarehouseNode"> <Expressions> <Expression ExternalProperty="InitalCatalog">@[$Project::OldWarehouseDatabase]</Expression> <Expression ExternalProperty="ServerName">@[$Project::OldWarehouseServer]</Expression> </Expressions> </Connection> <Connection ConnectionName="NewWarehouseNode"> <Expressions> <Expression ExternalProperty="InitialCatalog">@[$Project::NewWarehouseDatabase]</Expression> <Expression ExternalProperty="ServerName">@[$Project::NewWarehouseServer]</Expression> </Expressions> </Connection> </Connections>
This next segment of code is exactly the same as in my first example. I’m setting up my project parameters, telling the project that there’s one package, and setting up a list of connections. Then, I start work on my one package, called MigrateCustomer. Once I get through this part, we get to the next crazy bit.
<Tasks> <Container Name="Delete From New Tables" ConstraintMode="Linear"> <Expressions> <Expression ExternalProperty="Disable">!(@[$Project::PerformDeletion])</Expression> </Expressions> <Tasks> <# foreach (DataRow row in sectionsDesc.Rows) { #> <Container Name="Delete From New Table <#=row[0]#>" ConstraintMode="Linear"> <Tasks> <# foreach (DataRow trow in tables.Rows) { //For this round, I want to make sure EVERY table has a Delete action. We'll fix that later... if (Convert.ToInt32(trow[2]) == Convert.ToInt32(row[0]) && (Convert.ToBoolean(trow[6]) || 1 == 1)) {#> <ExecuteSQL Name="Prepare <#=trow[1]#>" ConnectionName="NewWarehouseNode"> <DirectInput> DELETE FROM [<#=trow[0]#>].[<#=trow[1]#>]; </DirectInput> </ExecuteSQL> <# } } #> </Tasks> </Container> <# } #> </Tasks> </Container>
These few lines of code generate all of our Delete containers and tasks. Inside the Tasks element, there is a templated snippet of code: foreach (DataRow row in sectionsDesc.Rows) {
. This snippet loops through sectionsDesc, which will be a set containing the values { 2, 1, 0, -1, -2 } based on our example. For each of these sections, we create a new container called “Delete From New Table [Section]” and which contains one task. Inside that Task element, we have a couple lines of C#. The first line loops through each of our tables. Then, for each of our tables, we hit the second line of C#: if (Convert.ToInt32(trow[2]) == Convert.ToInt32(row[0]) && (Convert.ToBoolean(trow[6]) || 1 == 1))
. This IF block checks to see if the Section for our table matches the current Section. If so, we check to see if the NeedsDeleted flag is set to true…or if 1 = 1… Yeah, for this scenario, I’m going to force deletion on every table. Don’t worry—I’ll show you how to do this a better way later on in the series.
So, if we have a matching Section (i.e., our table belongs in this container) and if we pass the NeedsDeleted check, then we create an ExecuteSQL task which deletes all data from that table (as represented by columns 0 and 1 in the table row array). We can’t TRUNCATE because we have foreign key constraints, and remember that you really should be batching these deletes and making sure you don’t delete any data you still want hanging around. For our demo, Node2 is empty so I don’t feel bad about deleting everything, but you probably won’t have that same nonchalant attitude about your production environment.
One set of containers down; now we move on to the next.
<Container Name="Migrate To New Table" ConstraintMode="Linear"> <Expressions> <Expression ExternalProperty="Disable">!(@[$Project::PerformMigration])</Expression> </Expressions> <Tasks> <# foreach (DataRow row in sections.Rows) { #> <Container Name="Migrate To New Table <#=row[0]#>" ConstraintMode="Linear"> <Tasks> <# foreach (DataRow trow in tables.Rows) { if (Convert.ToInt32(trow[2]) == Convert.ToInt32(row[0])) {#> <Dataflow Name="Migrate <#=trow[1]#>" DefaultBufferSize="104857600"> <Transformations> <OleDbSource Name="Old Node" ConnectionName="OldWarehouseNode"> <DirectInput> SELECT <#=trow[4]#> FROM [<#=trow[0]#>].[<#=trow[1]#>] x <#=trow[3]#> WHERE <#=trow[5]#>; </DirectInput> <Parameters> <Parameter Name="0" VariableName="MigrateWarehouseData.CustomerID" /> </Parameters> </OleDbSource> <OleDbDestination Name="New Node" ConnectionName="NewWarehouseNode" BatchSize="10000" KeepIdentity="true" UseFastLoadIfAvailable="true" MaximumInsertCommitSize="2000000"> <ExternalTableOutput Table="[<#=trow[0]#>].[<#=trow[1]#>]"></ExternalTableOutput> </OleDbDestination> </Transformations> </Dataflow> <# } } #> </Tasks> </Container> <# } #> </Tasks> </Container>
In this, our final real segment of code, I create another set of containers, one for each Section in ascending order—that is, { -2, -1, 0, 1, 2 }. This lets me insert rows for “parent” tables before inserting rows into “child” tables. That way, for example, I load DimProduct and DimCustomer before trying to load FactInternetSales.
For each of my section containers, I do another loop through the tables
variable. If the table belongs in this section, then I would create a data flow named “Migrate [TableName].” Of course, if you have two tables with the same name (but a different schema), you would probably need to change this to be “Migrate [SchemaName].[TableName]” but I didn’t have to worry about that with my table set.
Inside our data flow, we have an OLE DB Source named Old Node and an OLE DB Destination named New Node. The source connects back to our AdventureWorksDW2012 database and ties in with MigrationStep pretty closely: we use that table to tell our SELECT column list (trow[4]), our schema name (trow[0]), table name (trow[1]), join criteria (trow[3]), and WHERE clause (trow[5]). Looking back at the MigrationStep table, you’ll notice that all of our WHERE clauses look something like c.CustomerKey = ?;
. We need to fill in the parameter, so that’s what the Parameters element does, and we tie that back to the CustomerID project parameter.
Our OLE DB destination is pretty simple. BIML knows how to hook up my source and destination components, and as soon as I tell it that I need to connect to [SchemaName].[TableName] on AdventureWorksDW2012Node2, it can also connect the metadata together and load my columns list automatically.
Once I’m done, all I have to do is right-click the BIML file and select Generate SSIS packages. That creates a package which looks a bit like:
Inside each of those control flows, I see the following data flow:
This is the power of BIML in action.
4 thoughts on “BIML, Part 5—Expanding To Multiple Tables”