Azure ML: Reading from and Writing to Azure SQL DB

Not too long ago, I worked through an interesting issue with Azure Machine Learning. The question was, what’s the best way to read from Azure SQL Database, perform model processing, and then write results out to Azure SQL Database? Oh, by the way, I want to use a service principal rather than SQL authentication. Here’s what I’ve got.

Set up a Service Principal

I’m going to assume that you already have Azure SQL Database and the Azure Machine Learning workspace set up. In the Azure search bar, search for App Registrations and then select + New registration.

Register a new application.

Name the account whatever you’d like. I called mine AzureMLDatabaseAccess to keep its purpose clear. On the Overview page, you will see some important information:

Your details probably will not be blurred out. If they are, clean your glasses and try again.

You’ll want to copy the Application ID and the Directory ID. And note that each one of these has a name in parentheses. As we all know, two extremely common names is better than one extremely common name.

After you copy those down, navigate to the Certificates & secrets tab on the Manage menu and select + New client secret.

What’s the answer? It’s a secret.

Name your client secret. I happened to name mine DatabaseConnection. This generates a Value which you can use as a password. Copy this down as well, as you’ll need it later and this is your one chance to see it. Unless you delete this secret and create a new one, I guess.

Tie the Service Principal to Azure SQL Database

Now we need to make this app registration mean something. To do that, we need to connect to Azure SQL Database and tie the app registration to our database. We can do that by running a code snippet like the following:

CREATE USER [AzureMLDatabaseAccess] FROM EXTERNAL PROVIDER

Where naturally, AzureMLDatabaseAccess is the name of your app registration. Then, you can grant whatever SQL Server rights you need, such as adding this app registration to specific roles, all the way up to granting it db_owner because what’s the worst that could happen, amirite?

Oh, by the way, make sure you’re connected to the database using Azure Active Directory. Otherwise, you’ll get an error like this:

Well, that didn’t work out as I wanted.

If you’re like me and you don’t have an Azure Active Directory tenant set up, you can connect to the SQL server (and not the SQL database), navigate to Active Directory admin in the Settings menu, select Set admin, and choose your app registration. Be careful with this, however—that app registration can now do anything on the server, so don’t go sharing details all willy-nilly.

Usually it’s a good idea to avoid doing much of anything all willy-nilly.

Save that and you’re golden. Now let’s go to Azure Machine Learning.

Configure the Azure Machine Learning Datastore

Navigate to Azure Machine Learning Studio and select your workspace. Then, in the Manage menu, select Datastores. Select + New datastore to create a new datastore.

Create a new datastore.

I’m going to call this datastore expense-reports because that’s the name of my database. I’ll choose Azure SQL database as the Datastore type and select my subscription and database. Change the Authentication type to Service principal and you’ll need to enter a Tenant ID, Client ID, and Client secret. Those are the three things I just happened to have you write down. It’s great how we sometimes luck out that way. With all of that done, we can create the datastore and move on.

One filled-out datastore later.

The good news is that we have a datastore. The bad news is that we have no idea if it worked. So let’s figure that out. Navigate to Datasets in the Assets menu and choose + Create dataset and then From datastore.

Create a new dataset.

I’m going to call my dataset expense-reports because that’s what it is. Keep the Dataset type as Tabular and move to the next section.

In this section, I’ll select Previously created datastore and pick the expense_reports datastore. Then, choose Select datastore. As a quick note, I want to make sure that the Skip data validation box is unchecked. We absolutely want to validate the data, as we’re using this test to make sure that our app registration is set up correctly.

I can then enter my SQL query and select Next to continue.

This definitely looks like a query.

If your security setup is correct, you’ll be able to continue. Otherwise, you’ll see an error like the following:

This message is simultaneously verbose and empty.

The real error message is in the More details link. In this case, my error code was 40615. Check out the guide for troubleshooting connectivity errors in Azure SQL Database and Azure SQL Managed Instance and 40615 is a firewall issue. This one’s pretty easy to fix. Return to the SQL server and navigate to Firewalls and virtual networks in the Security menu. The easy button here is to allow Azure services and resources to access the server. The hard button is to figure out the relevant IP addresses that Azure Machine Learning will use and add those. I voted for easy.

Let those Azure services roam free.

After saving and updating the firewall rule, give it a few minutes to propagate and try again. If you paid your Azure bill and included the bribes special payments, you should get your dataset preview.

“Special payments” successful.

Now that we have this in place, let’s test it all out.

Building a Notebook

For this last part, we’ll want to have a table in Azure SQL Database to store predictions. Here’s the table creation statement:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[ExpenseReportPrediction](
	[ExpenseReportPredictionID] [int] IDENTITY(1,1) NOT NULL,
	[EmployeeID] [int] NOT NULL,
    [EmployeeName] [nvarchar](50) NOT NULL,
	[ExpenseCategoryID] [tinyint] NOT NULL,
    [ExpenseCategory] [nvarchar](50) NOT NULL,
	[ExpenseDate] [date] NOT NULL,
    [ExpenseYear] [int] NOT NULL,
    [Amount] [decimal](5, 2) NOT NULL,
	[PredictedAmount] [decimal](5, 2) NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ExpenseReportPrediction] ADD  CONSTRAINT [PK_ExpenseReportPrediction] PRIMARY KEY CLUSTERED 
(
	[ExpenseReportPredictionID] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

And here’s a picture of the output:

We have predictions.

Conclusion

In this post, we learned a few things.

  • It is possible to read from Azure SQL Database in Azure Machine Learning. It’s pretty easy, even.
  • It is not easy at all to write to Azure SQL Database from Azure Machine Learning.
  • Service Principals are pretty cool.