BIML, Part 4—Implementing The Ideal Package

My last post in the BIML series introduced an SSIS package which would form the basis of our correct solution. Given this baseline, I want to write some BIML to generate our single package; that way, we can eventually expand our package to include more complex logic and solve the our problem.

For the rest of this series, I’m going to assume you have BIDS Helper installed. In Visual Studio, add a new BIML file by right-clicking your project and selecting the “Add New BIML File” option. This creates a nearly-empty BIML file named BimlScript.biml. Open that file and you’ll see the following code:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
</Biml>

To generate our SSIS package, we know that we need the following things:

  1. Seven package parameters for CustomerID, the old and new database connections, and flags.
  2. Two database connections, one called NewWarehouseNode and one called OldWarehouseNode.
  3. A sequence named “Delete From New Tables,” inside of which is an Execute SQL task named “Prepare Table” and which truncates our table.
  4. A sequence named “Migrate To New Table,” which has three sub-sequences: Migrate to New Table 1, 2, and 3. Inside 3, we have a data flow named “Migrate Table. Inside this data flow is a SQL Server source and a SQL Server destination. We would need to configure each of these components as well. On the data flow level, the max buffer count also needs to be set to 10,000 and the default buffer size to 100MB.

Armed with these requirements, we can get to work writing BIML. Before I go any further writing the BIML, here’s a script to generate two databases with a table named SomeTable, so that our BIML doesn’t fail and we can see our work in action:

IF NOT EXISTS (SELECT 1 FROM sys.databases WHERE name = 'ReportingOld')
BEGIN
	CREATE DATABASE [ReportingOld]
END
GO
IF NOT EXISTS (SELECT 1 FROM sys.databases WHERE name = 'ReportingNew')
BEGIN
	CREATE DATABASE [ReportingNew]
END
GO
USE [ReportingOld]
GO
IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'SomeTable')
BEGIN
	CREATE TABLE SomeTable
	(
		Id BIGINT IDENTITY(1,1) NOT NULL,
		CustomerID INT NOT NULL,
		Details VARCHAR(30) NOT NULL
	);

	INSERT INTO dbo.SomeTable(CustomerID, Details)
	VALUES
	(1234, 'ABC'),
	(1234, 'ABCD'),
	(1234, 'ABCDE'),
	(1234, 'ABCDEF'),
	(2345, 'ABC'),
	(2345, 'ABC'),
	(3456, 'ABC'),
	(4567, 'ABC');
END
GO
USE [ReportingNew]
GO
IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'SomeTable')
BEGIN
	CREATE TABLE SomeTable
	(
		Id BIGINT IDENTITY(1,1) NOT NULL,
		CustomerID INT NOT NULL,
		Details VARCHAR(30) NOT NULL
	);
END
GO

Here’s a full version of the BIML, followed by a breakdown in parts:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<PackageProjects>
		<PackageProject Name="MigrateWarehouseData" ProtectionLevel="DontSaveSensitive">
			<Parameters>
				<Parameter Name="CustomerID" DataType="Int32">1234</Parameter>
				<Parameter Name="OldWarehouseServer" DataType="String">LOCALHOST</Parameter>
				<Parameter Name="OldWarehouseDatabase" DataType="String">ReportingOld</Parameter>
				<Parameter Name="NewWarehouseServer" DataType="String">LOCALHOST</Parameter>
				<Parameter Name="NewWarehouseDatabase" DataType="String">ReportingNew</Parameter>
				<Parameter Name="PerformMigration" DataType="Boolean">True</Parameter>
				<Parameter Name="PerformDeletion" DataType="Boolean">True</Parameter>
			</Parameters>
			<Packages>
				<Package PackageName="MigrateCustomer" />
			</Packages>
		</PackageProject>
	</PackageProjects>
	<Connections>
		<Connection Name="OldWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=ReportingOld;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
		<Connection Name="NewWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=ReportingNew;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
	</Connections>
	<Packages>
		<Package Name="MigrateCustomer" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">
			<Connections>
				<Connection ConnectionName="OldWarehouseNode">
					<Expressions>
						<Expression ExternalProperty="InitalCatalog">@[$Project::OldWarehouseDatabase]</Expression>
						<Expression ExternalProperty="ServerName">@[$Project::OldWarehouseServer]</Expression>
					</Expressions>
				</Connection>
				<Connection ConnectionName="NewWarehouseNode">
					<Expressions>
						<Expression ExternalProperty="InitialCatalog">@[$Project::NewWarehouseDatabase]</Expression>
						<Expression ExternalProperty="ServerName">@[$Project::NewWarehouseServer]</Expression>
					</Expressions>
				</Connection>
			</Connections>
			<Tasks>
				<Container Name="Delete From New Tables" ConstraintMode="Parallel">
					<Expressions>
						<Expression ExternalProperty="Disable">!(@[$Project::PerformDeletion])</Expression>
					</Expressions>
					<Tasks>
						<ExecuteSQL Name="Prepare Table" ConnectionName="NewWarehouseNode">
							<DirectInput>
								TRUNCATE TABLE dbo.SomeTable;
							</DirectInput>
						</ExecuteSQL>
					</Tasks>
				</Container>
				<Container Name="Migrate To New Table" ConstraintMode="Linear">
					<Expressions>
						<Expression ExternalProperty="Disable">!(@[$Project::PerformMigration])</Expression>
					</Expressions>
					<Tasks>
						<Container Name="Migrate To New Table 1" ConstraintMode="Linear">
						</Container>
						<Container Name="Migrate To New Table 2" ConstraintMode="Linear">
						</Container>
						<Container Name="Migrate To New Table 3" ConstraintMode="Linear">
							<Tasks>
								<Dataflow Name="Migrate Table" DefaultBufferSize="104857600">
									<Transformations>
										<OleDbSource Name="Old Node" ConnectionName="OldWarehouseNode">
											<DirectInput>
												SELECT * FROM dbo.SomeTable WHERE CustomerID = ?;
											</DirectInput>
											<Parameters>
												<Parameter Name="0" VariableName="MigrateWarehouseData.CustomerID" />
											</Parameters>
										</OleDbSource>
										<OleDbDestination Name="New Node" ConnectionName="NewWarehouseNode" BatchSize="10000" KeepIdentity="true" UseFastLoadIfAvailable="true" MaximumInsertCommitSize="2000000">
											<ExternalTableOutput Table="[dbo].[SomeTable]">
											</ExternalTableOutput>
										</OleDbDestination>
									</Transformations>
								</Dataflow>
							</Tasks>
						</Container>
					</Tasks>
				</Container>
			</Tasks>
		</Package>
	</Packages>
</Biml>

Once you save this BIML file, right-click on BimlScript.biml and select “Generate SSIS Packages.” It will create MigrateCustomer.dtsx, which is our BIML-generated package. This package does what our original package did, and it takes us 92 lines of XML. That may not seem like a good trade-off right now, but the power to BIML is that with just a few lines of C# added in, we can create dozens or hundreds of data flows and packages without wearing our your button-clicking finger.

So now let’s break this BIML file down into components and explain them more thoroughly, starting with package projects.

<PackageProjects>
	<PackageProject Name="MigrateWarehouseData" ProtectionLevel="DontSaveSensitive">
		<Parameters>
			<Parameter Name="CustomerID" DataType="Int32">1234</Parameter>
			<Parameter Name="OldWarehouseServer" DataType="String">LOCALHOST</Parameter>
			<Parameter Name="OldWarehouseDatabase" DataType="String">ReportingOld</Parameter>
			<Parameter Name="NewWarehouseServer" DataType="String">LOCALHOST</Parameter>
			<Parameter Name="NewWarehouseDatabase" DataType="String">ReportingNew</Parameter>
			<Parameter Name="PerformMigration" DataType="Boolean">True</Parameter>
			<Parameter Name="PerformDeletion" DataType="Boolean">True</Parameter>
		</Parameters>
		<Packages>
			<Package PackageName="MigrateCustomer" />
		</Packages>
	</PackageProject>
</PackageProjects>

We have one package project named “MigrateWarehouseData” and without any sensitive data like usernames or passwords. In that project, we have a set of parameters, one for each project parameter we created in the previous blog post. We need to define data types and can provide default values as well. After that, we have a packages section which lists each package in our project. In this case, I only have one package: MigrateCustomer, which will create MigrateCustomer.dtsx. This way, my package can read project parameters.

Next up on the hit list is connections:

<Connections>
	<Connection Name="OldWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=ReportingOld;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
	<Connection Name="NewWarehouseNode" ConnectionString="Data Source=LOCALHOST;Initial Catalog=ReportingNew;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
</Connections>

These are my common connections that I will use across all packages. There’s nothing special about this: it’s just a name and a connection string pointing ot my local SQL Server instance.

The next step is the Packages section, in which I define one package: MigrateCustomer.

<Packages>
	<Package Name="MigrateCustomer" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">

There are two properties I have set: ConstraintMode and ProtectionLevel. Just like my project parameters, I have no sensitive data stored, so I don’t need to protect the package with a password or user key. As for ConstraintMode, I have two options: Linear or Parallel. For this sample, I’m typically going to stick with Linear constraints, but there is one spot in which I choose to go Parallel. The difference is pretty simple: in Linear mode, SSIS draws constraint lines between components listed in the order that we include them in the BIML file, whereas in Parallel mode, all tasks within a container are allowed to run concurrently (whether they actually do or not is a totally different story!).

Inside each package, we have a set of connections. In this case, I want to take my connection strings as given in the Connections section, but I want to do something a little special. We’re collecting the real server and database names for our old and new warehouses, so let’s put that information to good use:

<Connections>
	<Connection ConnectionName="OldWarehouseNode">
		<Expressions>
			<Expression ExternalProperty="InitalCatalog">@[$Project::OldWarehouseDatabase]</Expression>
			<Expression ExternalProperty="ServerName">@[$Project::OldWarehouseServer]</Expression>
		</Expressions>
	</Connection>
	<Connection ConnectionName="NewWarehouseNode">
		<Expressions>
			<Expression ExternalProperty="InitialCatalog">@[$Project::NewWarehouseDatabase]</Expression>
			<Expression ExternalProperty="ServerName">@[$Project::NewWarehouseServer]</Expression>
		</Expressions>
	</Connection>
</Connections>

This snippet tells us that BIML can tie in with SSIS expressions. In this case, we are setting the InitialCatalog and ServerName properties on our connection. We can access project parameters the same way you would in the SSIS GUI: @[$Project::MyParameterName].

Now that we have the connections on our package, we can start laying out the control flow. All control flow tasks go in the Tasks section, and because our package has its constraint mode set to Linear, we will execute each top-level container one at a time. Let’s start by looking at the first container:

<Tasks>
	<Container Name="Delete From New Tables" ConstraintMode="Parallel">
		<Expressions>
			<Expression ExternalProperty="Disable">!(@[$Project::PerformDeletion])</Expression>
		</Expressions>
		<Tasks>
			<ExecuteSQL Name="Prepare Table" ConnectionName="NewWarehouseNode">
				<DirectInput>
					TRUNCATE TABLE dbo.SomeTable;
				</DirectInput>
			</ExecuteSQL>
		</Tasks>
	</Container>

This container will delete from new tables to avoid any primary key violations. Because it’s a toy example, I’m just going to truncate the table; a more realistic example would delete rows in batches, but I’m leaving that as an exercise for the reader.

One more thing to note before I move on is that we can have container-level expressions. In this case, I want to set the “Disable” flag to be equivalent to the opposite of my PerformDeletion project parameter. I personally don’t like how SSIS has properties named “Disable” rather than “Enable;” I think it confuses the matter a little bit, as you either need to have parameters with negative phrasing—such as “DoNotPerformDeletion”—or flip the boolean like I do here. Because “Enabled” makes a lot more sense to me intuitively, I’d rather let my end user have the equivalent of “Enabled” and handle the flip myself.

With that aside handled, we’ll move on to the next container. This one is a container of containers, and has a constraint mode of Linear, meaning that we run each sub-container one at a time.

<Container Name="Migrate To New Table" ConstraintMode="Linear">
	<Expressions>
		<Expression ExternalProperty="Disable">!(@[$Project::PerformMigration])</Expression>
	</Expressions>
	<Tasks>
		<Container Name="Migrate To New Table 1" ConstraintMode="Linear">
		</Container>
		<Container Name="Migrate To New Table 2" ConstraintMode="Linear">
		</Container>
		<Container Name="Migrate To New Table 3" ConstraintMode="Linear">
			<Tasks>
				<Dataflow Name="Migrate Table" DefaultBufferSize="104857600">

Just like before, I have my expression to check whether I want to perform this part of the process. Once we get past that, I have a Tasks flow for this container. Inside that, I have three sub-containers. Two of these containers are empty, but like I mentioned last time, it helps remind us that we need to think about multiple layers of operation—there are some tables we will need to migrate first and some we can hold off on until later.

Looking inside the third container, we have a data flow. This data flow is where we can set our default buffer size to 100 MB. Our default row count per buffer is already 10,000, so I don’t have to specify that property. Once I handle the properties, I move on to the Transformations, which is the data flow equivalent of control flow tasks. This is a relatively easy data flow, so as you’d expect, our Transformations section isn’t very complex.

					<Transformations>
						<OleDbSource Name="Old Node" ConnectionName="OldWarehouseNode">
							<DirectInput>
								SELECT * FROM dbo.SomeTable WHERE CustomerID = ?;
							</DirectInput>
							<Parameters>
								<Parameter Name="0" VariableName="MigrateWarehouseData.CustomerID" />
							</Parameters>
						</OleDbSource>
						<OleDbDestination Name="New Node" ConnectionName="NewWarehouseNode" BatchSize="10000" KeepIdentity="true" UseFastLoadIfAvailable="true" MaximumInsertCommitSize="2000000">
							<ExternalTableOutput Table="[dbo].[SomeTable]">
							</ExternalTableOutput>
						</OleDbDestination>
					</Transformations>
				</Dataflow>
			</Tasks>
		</Container>
	</Tasks>
</Container>

Inside this section, we start off with an OLE DB source which connects to OldWarehouseNode and gets all of the data from dbo.SomeTable where the Customer ID is set to the value that we have in the CustomerID project parameter. Note that the way we access project parameters from an OLE DB source is different than the way we specify them in an expression. We don’t need to have the @[$Project::] cruft, but at the same time, we need to specify the project name of MigrateWarehouseData.

Once data gets through the source, we push it through to the destination, which is connected to NewWarehouseNode and connects to that database’s SomeTable table. We set a few bulk insert settings, such as a 10K batch size, keeping identity (which is different from our SSIS example last time around), turning on fast load if it’s available, and setting a max commit size of 2 million rows.

One important thing to note here is that if the input columns for our OLE DB destination component have the same names as the columns on our table, we don’t need to specify any transformations. If we do, there’s a Column tag we can use in our OleDbDestination component to specify source and target column names, as well as sort order if it’s sorted.

Put all of this together and you have a way to describe an SSIS package. Even if you don’t have to do much more with this, I’d argue that you should still write the BIML because you can check this BIML file into source control and see changes. If I need to flip the KeepIdentity bit, I can make that change and check it in, letting code reviewers easily see what changed. Otherwise, SSIS packages themselves are little better than binaries: so many of the changes are difficult to read, even if you do literally nothing other than re-sizing boxes.

This is a pretty big step: right now, we have a BIML file that we can use to generate an SSIS package. From here, I want to take the next step and show how we can use the power of C# to generate packages dynamically. Stay tuned!

Advertisements

2 thoughts on “BIML, Part 4—Implementing The Ideal Package

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s