Not too long ago, I worked through an interesting issue with Azure Machine Learning. The question was, what’s the best way to read from Azure SQL Database, perform model processing, and then write results out to Azure SQL Database? Oh, by the way, I want to use a service principal rather than SQL authentication. Here’s what I’ve got.
Set up a Service Principal
I’m going to assume that you already have Azure SQL Database and the Azure Machine Learning workspace set up. In the Azure search bar, search for App Registrations and then select + New registration.
Name the account whatever you’d like. I called mine
AzureMLDatabaseAccess to keep its purpose clear. On the Overview page, you will see some important information:
You’ll want to copy the Application ID and the Directory ID. And note that each one of these has a name in parentheses. As we all know, two extremely common names is better than one extremely common name.
After you copy those down, navigate to the Certificates & secrets tab on the Manage menu and select + New client secret.
Name your client secret. I happened to name mine
DatabaseConnection. This generates a Value which you can use as a password. Copy this down as well, as you’ll need it later and this is your one chance to see it. Unless you delete this secret and create a new one, I guess.
Tie the Service Principal to Azure SQL Database
Now we need to make this app registration mean something. To do that, we need to connect to Azure SQL Database and tie the app registration to our database. We can do that by running a code snippet like the following:
CREATE USER [AzureMLDatabaseAccess] FROM EXTERNAL PROVIDER
Where naturally, AzureMLDatabaseAccess is the name of your app registration. Then, you can grant whatever SQL Server rights you need, such as adding this app registration to specific roles, all the way up to granting it
db_owner because what’s the worst that could happen, amirite?
Oh, by the way, make sure you’re connected to the database using Azure Active Directory. Otherwise, you’ll get an error like this:
If you’re like me and you don’t have an Azure Active Directory tenant set up, you can connect to the SQL server (and not the SQL database), navigate to Active Directory admin in the Settings menu, select Set admin, and choose your app registration. Be careful with this, however—that app registration can now do anything on the server, so don’t go sharing details all willy-nilly.
Save that and you’re golden. Now let’s go to Azure Machine Learning.
Configure the Azure Machine Learning Datastore
Navigate to Azure Machine Learning Studio and select your workspace. Then, in the Manage menu, select Datastores. Select + New datastore to create a new datastore.
I’m going to call this datastore
expense-reports because that’s the name of my database. I’ll choose Azure SQL database as the Datastore type and select my subscription and database. Change the Authentication type to Service principal and you’ll need to enter a Tenant ID, Client ID, and Client secret. Those are the three things I just happened to have you write down. It’s great how we sometimes luck out that way. With all of that done, we can create the datastore and move on.
The good news is that we have a datastore. The bad news is that we have no idea if it worked. So let’s figure that out. Navigate to Datasets in the Assets menu and choose + Create dataset and then From datastore.
I’m going to call my dataset
expense-reports because that’s what it is. Keep the Dataset type as Tabular and move to the next section.
In this section, I’ll select Previously created datastore and pick the expense_reports datastore. Then, choose Select datastore. As a quick note, I want to make sure that the Skip data validation box is unchecked. We absolutely want to validate the data, as we’re using this test to make sure that our app registration is set up correctly.
I can then enter my SQL query and select Next to continue.
If your security setup is correct, you’ll be able to continue. Otherwise, you’ll see an error like the following:
The real error message is in the More details link. In this case, my error code was 40615. Check out the guide for troubleshooting connectivity errors in Azure SQL Database and Azure SQL Managed Instance and 40615 is a firewall issue. This one’s pretty easy to fix. Return to the SQL server and navigate to Firewalls and virtual networks in the Security menu. The easy button here is to allow Azure services and resources to access the server. The hard button is to figure out the relevant IP addresses that Azure Machine Learning will use and add those. I voted for easy.
After saving and updating the firewall rule, give it a few minutes to propagate and try again. If you paid your Azure bill and included the
bribes special payments, you should get your dataset preview.
Now that we have this in place, let’s test it all out.
Building a Notebook
For this last part, we’ll want to have a table in Azure SQL Database to store predictions. Here’s the table creation statement:
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[ExpenseReportPrediction]( [ExpenseReportPredictionID] [int] IDENTITY(1,1) NOT NULL, [EmployeeID] [int] NOT NULL, [EmployeeName] [nvarchar](50) NOT NULL, [ExpenseCategoryID] [tinyint] NOT NULL, [ExpenseCategory] [nvarchar](50) NOT NULL, [ExpenseDate] [date] NOT NULL, [ExpenseYear] [int] NOT NULL, [Amount] [decimal](5, 2) NOT NULL, [PredictedAmount] [decimal](5, 2) NOT NULL ) ON [PRIMARY] GO ALTER TABLE [dbo].[ExpenseReportPrediction] ADD CONSTRAINT [PK_ExpenseReportPrediction] PRIMARY KEY CLUSTERED ( [ExpenseReportPredictionID] ASC )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] GO
And here’s a picture of the output:
In this post, we learned a few things.
- It is possible to read from Azure SQL Database in Azure Machine Learning. It’s pretty easy, even.
- It is not easy at all to write to Azure SQL Database from Azure Machine Learning.
- Service Principals are pretty cool.