Upcoming Events: Charlotte BI Group

Key Details

What: Charlotte BI Group
Where: Microsoft Teams
When: Tuesday, August 4th.
You can RSVP via Meetup or just show up on Twitch.

What I’m Presenting

6:00 PM — 7:30 PM EDT — Forensic Accounting with SQL and R

I really enjoy this talk, as it lets me dig into the question of, “What does it mean to know your data?”

Upcoming Events: Data Platform Summit

Key Details

What: Data Platform Summit 2020
Where: Online
When: November 30th through December 8th
Register at the Data Platform Summit site.

What I’m Presenting

Monday, December 7th and Tuesday, December 8th, 12 PM through 4 PM EST — Data Analysis in Python and SQL

This is a paid post-event training. I promise to spend no more than about 20 minutes complaining about how data analysis is a lot easier to do in R… But I’m really looking forward to this—two separate 4-hour days digging into data analysis techniques using T-SQL as well as Python.

I’m also scheduled to present a regular session, though that hasn’t been announced yet, so stay tuned for that.

My Presentations at PASS Summit 2020

I have two sessions at PASS Summit 2020, one a full-day training and the other a general session.

SQL Server on the Edge: a Full-Day Training

I posted about this when Summit pre-cons were announced and the info is still good. The short version is, I’m going to take people through a real-ish IoT scenario involving SQL Server, .NET Core, Azure, and some IoT devices. We’ll see how things work, dive into Azure SQL Edge, and have some fun with Raspberry Pi’s along the way.

Here is the PASS Summit writeup if you’d like more information. And if you do decide to register for this full-day training, you’ll save $200 off of what it would have been had we attended PASS Summit in person.

The Curated Data Platform

I am really excited about this talk. Several years ago, I had a talk called Big Data, Small Data, and Everything in Between, and the idea of the talk was to walk through various data platform technologies and see where they fit.

That talk isn’t that out of date, but I decided to revamp it entirely, taking advantage of my insanity dedication as a Curator to give it a better name and a better theme.

The idea now is, let’s take a fictional but realistic company, walk through the types of data problems it experiences, and see which data platform technologies solve its problems, along with the biggest players in those spaces, and some reference architectures to boot.

The talk is currently under development and I plan to revise it a fair bit between now and Summit, but here’s a sneak peek of the agenda:

Register for PASS Summit

Registration is still open for PASS Summit 2020, so join me for the virtual event. And as a special offer, I’m giving away free hot takes for anyone who wants one. If that doesn’t seal the deal, I don’t know what would.

Upcoming Events: TriPASS Data Science

Key Details

What: TriPASS Data Science sub-group
Where: TriPASS on Twitch.
When: Tuesday, July 26th.
You can RSVP via Meetup or just show up on Twitch.

What I’m Presenting

6:00 PM — 7:30 PM EDT — IoT and Machine Learning in Azure

This won’t be a formal talk so much as it is a discussion of IoT strategies around Azure. I’ll talk about combining together several services in Azure, where the pain points are, and discuss a few alternative strategies around processing and analyzing data.

With ML Services, Watch Those Resource Groups

I wanted to cover something which has bitten me in two separate ways regarding SQL Server Machine Learning Services and Resource Governor.

Resource Governor and Default Memory

If you install a brand new copy of SQL Server and enable SQL Server Machine Learning Services, you’ll want to look at sys.resource_governor_external_resource_pools:

That’s a mighty fine cap you’re wearing.

By default, SQL Server will grant 20% of available memory to any R or Python scripts running. The purpose of this limit is to prevent you from hurting server performance with expensive external scripts (like, say, training large neural networks on a SQL Server).

Here’s the kicker: this affects you even if you don’t have Resource Governor enabled. If you see out-of-memory exceptions in Python or error messages about memory allocation in R, I’d recommend bumping this max memory percent up above 20, and I have scripts to help you with the job. Of course, making this change assumes that your server isn’t stressed to the breaking point; if it is, you might simply want to offload that work somewhere else.

Resource Governor and CPU

Notice that by default, the max CPU percent for external pools is 100, meaning that we get to push the server to its limits with respect to CPU.

Well, what happens if you accidentally change that? I found out the answer the hard way!

In my case, our servers were accidentally scaled down to 1% max CPU utilization. The end result was that even something as simple as print("Hello") in either R or Python would fail after 30 seconds. I thought it had to do with the Launchpad service causing problems, but after investigation, this was the culprit.

Identities blurred to protect the innocent.

The trickiest part about diagnosing this was that the Launchpad logs error messages gave no indication what the problem was—the error message was a vague “could not connect to Launchpad” error and the Launchpad error logs didn’t have any messages about the failed queries. So that’s one more thing to keep in mind when troubleshooting Machine Learning Services failures.