Back to Blogging

It’s been a hot minute since I’ve regularly blogged, but I’m working on getting back into it. Let’s see a few of the things I’ve been up to lately, where “lately” is, oh, most of this year…


I’ve launched a couple of trainings on Teachable, one on the APPLY operator and one entitled the Curated Data Platform. The goal of these trainings is to provide in-depth training on topics at reasonable prices. For now, I’m probably not going to develop any more of these trainings, as they take a lot of time to put together and the ROI isn’t there today.

Another Book

Speaking of “the ROI isn’t there,” I’ve agreed to write a book on anomaly detection in Python. The working title of the book is Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python. I definitely was not anticipating writing a second book after PolyBase Revealed, but I have spent a lot of time in the world of anomaly detection the last few years, and it’s an area where I think I can make a contribution. Most books on anomaly detection tend to have the feel of textbooks: heavy on the statistical and mathematical underpinnings of techniques, but light on implementation. The goal of Finding Ghosts in Your Data is to straddle the line between academic work and tutorial. I’ll still get into a lot of detail on anomaly detection techniques, but the intended audience for this book is a software developer who has forgotten most of his statistics course from university days.

I also intend to do a fair amount of blogging on the book as I write it. I won’t give away the whole thing, but I will share a lot along the way.

New Talks! Some of Them In Person!

Right now, I’m in the midst of developing four new talks, all of which have to be done before the end of the month.

Keeping It Classy: Designing a Great Classifier and Building Your First Data Pipeline in Apache Spark are going to debut at the PASS Data Community Summit as part of two separate learning paths. The first talk provides a solid foundation for what a classification algorithm is in the data science world, different types of classification algorithms, and when you might choose one over the others. I cover a variety of tree-based (e.g., CART, random forest, XGBoost) and non-tree (e.g., kNN, Naive Bayes, Passive-Aggressive) algorithms, explain at a high level how they work, and show how you can work with them using libraries like scikit-learn.

The second talk, meanwhile, provides an introduction to Apache Spark by way of Azure Databricks. In it, I’ll cover the basic details of what Apache Spark is, how Databricks fits into it all, and how we can create data pipelines. Trust me when I say that I stretch the pipeline metaphor as far as it goes, and maybe a little further.

Riding the Rails: Railway-Oriented Programming with F# is the third talk I’m currently working on. It follows the excellent Scott Wlaschin’s Railway-Oriented Programming metaphor and talk, and I plan to give it my own spin by including more code in the talk itself. The cost of focusing more on the code is a loss of some of the depth of discussion that Scott hits, but I hope that trade-off is worthwhile, as I really like the ROP metaphor / Either monad. You can find this at the Azure Community Conference.

Finally, the fourth talk debuting this month is entitled Saving your Wallet from the Cloud, and it is intended to serve as a way of understanding how pricing in the cloud works and different methods you can choose to slice that bill. This one will debut for SQL Saturday #1021 in Orlando.

I’ll probably have a blog series for each talk over the next couple of months, once the time constraints have softened a bit.

Microsoft Cloud Workshops

One of the things I do for Solliance is create and maintain Microsoft Cloud Workshops. Right now, I have two on my plate: Big data and virtualization, and Innovate and modernize apps with Data and AI. Both of them have updated scheduled, and they’re both pretty big ones.

DataCamp Courses

A few months ago, I was the subject matter expert for DataCamp’s Data Modeling in Power BI course. You won’t see or hear me there, but I shaped the course design, developed most of the content, and handed all of that off to DataCamp folks so that I don’t have to think about it any longer…

I’m currently doing the same on a course around data visualization in Power BI. This course should be particularly interesting because I’m combining psychological concepts (knowing your audience, getting an emotional response, reducing cognitive load, tracking focal points, etc.) with a grand overview of most Power BI visuals, including custom visuals and Python/R visuals. In addition to those, there’s an entire lesson on designing for accessibility. For approximately 4-6 learner hours of training, there’s a lot of content packed in there. I’m about 2/3 of the way through this course, so we’ll probably see it release in December.

More on the Plate

There’s a bit more that I’m working on as well, but by this point, I’m now convinced I live on a planet with 36-hour days…or I’m over-booked, one of the two.

Upcoming Events: Machine Learning Week

Key Details

What: Machine Learning Week, sponsored by Predictive Analytics World.
Where: This is a virtual event.
When: Sunday, May 31st through Thursday, June 4th.
Registration is $990 for the livestream and $1490 for livestream and recordings. Register on the Predictive Analytics World website.

What I’m Presenting

Wednesday, 2:45 PM — 3:05 PM PDT — Forecasting Demand in the e-Commerce Space

This is the first time I’m giving a public presentation explicitly related to what my company does. Typically, I present on tangential things: database administration, data science, security. But here, it’s front-and-center. This made it an interesting challenge to provide useful information without laying out any proprietary company information

Upcoming Events: SQL Saturday Brisbane (Virtual)

Key Details

What: SQL Saturday Brisbane.
Where: Internet Australia. You have to turn your monitor upside-down to see it correctly.
When: Saturday, May 30th.
Admission is free. Register on the SQL Saturday website.

What I’m Presenting

12:15 PM — 1:15 PM AEST — Data Virtualization with PolyBase

Please note that all of the times are in Australian Eastern Standard Time (AEST). This is UTC+10, so if you’re on the Eastern Daylight Time zone like me (UTC-4), that’s 14 hours ahead. In other words, I present on Friday starting at 10:15 PM EDT.

SQL on the Edge: Full-Day Training at PASS Summit 2020

I’m presenting a full-day training at PASS Summit again this year. Here are the details:

SQL Server on the Edge: IoT with SQL Server and .NET Core

In this day-long training session, you will learn about Azure SQL Database Edge, the version of SQL Server intended to run on Internet of Things (IoT) devices. We will discuss the types of scenarios we might try to solve using IoT devices. From there, we will learn about device management through the Azure IoT Hub, including installation of applications from the Azure marketplace, as well as the development and deployment of custom Docker containers in an Azure Container Registry. Over the course of the day, we will build out a practical scenario and take a look at features in Azure SQL Database Edge, including handling time series data and machine learning on edge devices. As we expand out from a single device, we will learn how to automate deployment and updates at scale using capabilities in Azure IoT Hub.

Prerequisites: A solid knowledge of T-SQL, a basic understanding of Docker and containers, some knowledge of .NET (C# or F#).

What Can You Expect?

I’m still working out all of the details, but here are the top-level items:

Module 0 — Prep Work

In the first module, we will understand why we might want to care about the Internet of Things, looking at the types of scenarios IoT can solve.

Module 1 — Configuring Azure IoT Edge Devices

In this module, we will set up Azure IoT hub, take a look at the Raspberry Pi 4, and install Azure IoT Edge on the Pi. We will also see how to connect a virtual machine to Azure IoT Hub to assist with testing.

Module 2 — Azure SQL Database Edge Installation

Once we have a device in Azure IoT Hub, we will see how to install Azure SQL Database Edge, including configuration and deployment of dacpacs on a VM and on a Raspberry Pi 4.

Module 3 — Developing and Deploying an Application

With a database in place, we will work on an IoT solution in .NET Core and connect to our Azure SQL Database Edge instance.

Module 4 — Diving into Time Series

One of the main promises of Azure SQL Database Edge is the support for time series, and we will investigate what is available on that front.

Module 5 — Machine Learning on the Edge

In this module, we will review ONNX, the Open Neural Network Exchange. We will see how to train a model on a host, deploy it to Azure SQL Database Edge, and predict using the native PREDICT operator.

Module 6 — Device Management

The final module will extend us beyond a single device, as scale is the name of the game with IoT. We will also look at tools available for monitoring and providing insights.

Course Objectives

Upon completion of this course, attendees will be able to:

  • Configure an Azure IoT Hub
  • Connect IoT edge devices (such as the Raspberry Pi) to Azure IoT Hub
  • Deploy Azure SQL Database Edge to edge devices en masse
  • Develop and deploy custom .NET code using containers
  • Deploy machine learning models to edge devices

If this sounds interesting to you, be sure to register for PASS Summit 2020.

Upcoming Events: SQL Saturday Richmond (Virtual)

Key Details

What: SQL Saturday Richmond.
Where: On your computer. It’s a virtual event.
When: Saturday, April 25th.
Admission is free. Register on the SQL Saturday website.

What I’m Presenting

1:45 PM — 2:50 PM — Data Virtualization with PolyBase
3:00 PM — 4:00 PM — Optimizing Backup Performance Using Data Science Techniques

This is the first ever virtual SQL Saturday, so I’m glad I could take part in it. Although it’d be nicer if I could be in Richmond and present in person, I’m happy to have the opportunity at least to present from afar.

Need a Remote User Group Speaker?

If your user group needs a remote speaker, especially over the next couple of months, I’d be happy to step in. Check out my list of presentations. If that didn’t scare you off and you still want to get ahold of me, hit me up on Twitter, LinkedIn, e-mail (if you know the address), or put your contact details in a trap cleverly designed to look like a SQL Saturday.

It gets me every time.

Or Just Watch Our Group

Also, if you are a user group attendee jonesing for a bit more content and missing your user group over the next couple of months, come check out the Triangle SQL Server User Group. We’re going to be remote-only through April, but we meet three times a month and broadcast nearly all of our user group meetings on Twitch. You don’t need an account to watch, but if you want to follow and chat, registration there is free.

Upcoming Events: SQL Saturday Baton Rouge BI

Key Details

What: SQL Saturday Baton Rouge, BI Edition.
Where: LSU Patrick Taylor Hall, Baton Rouge, Louisiana
When: Saturday, March 7th.
Admission is free. Register on the SQL Saturday website.

What I’m Presenting

11:30 AM — 12:30 PM — Classification with Naive Bayes
1:45 PM — 2:45 PM — Launching a Data Science Project: Cleaning is Half the Battle

It took me several years to make it to Baton Rouge for a SQL Saturday, and now I’m going back for the BI edition as well.

Upcoming Events: SQL Saturday South Florida BI

Key Details

What: SQL Saturday South Florida, BI Edition.
Where: Microsoft FLL, 6750 N Andrews Ave, Suite #400, Fort Lauderdale, Florida, 33309
When: Saturday, February 22nd.
Admission is free. Register on the SQL Saturday website.

What I’m Presenting

4:00 PM — 4:50 PM — Data Virtualization with PolyBase

This will be my second time at a SQL Saturday South Florida event, and I think this is a great time of year to host one.