It’s been a hot minute since I’ve regularly blogged, but I’m working on getting back into it. Let’s see a few of the things I’ve been up to lately, where “lately” is, oh, most of this year…
I’ve launched a couple of trainings on Teachable, one on the APPLY operator and one entitled the Curated Data Platform. The goal of these trainings is to provide in-depth training on topics at reasonable prices. For now, I’m probably not going to develop any more of these trainings, as they take a lot of time to put together and the ROI isn’t there today.
Speaking of “the ROI isn’t there,” I’ve agreed to write a book on anomaly detection in Python. The working title of the book is Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python. I definitely was not anticipating writing a second book after PolyBase Revealed, but I have spent a lot of time in the world of anomaly detection the last few years, and it’s an area where I think I can make a contribution. Most books on anomaly detection tend to have the feel of textbooks: heavy on the statistical and mathematical underpinnings of techniques, but light on implementation. The goal of Finding Ghosts in Your Data is to straddle the line between academic work and tutorial. I’ll still get into a lot of detail on anomaly detection techniques, but the intended audience for this book is a software developer who has forgotten most of his statistics course from university days.
I also intend to do a fair amount of blogging on the book as I write it. I won’t give away the whole thing, but I will share a lot along the way.
New Talks! Some of Them In Person!
Right now, I’m in the midst of developing four new talks, all of which have to be done before the end of the month.
Keeping It Classy: Designing a Great Classifier and Building Your First Data Pipeline in Apache Spark are going to debut at the PASS Data Community Summit as part of two separate learning paths. The first talk provides a solid foundation for what a classification algorithm is in the data science world, different types of classification algorithms, and when you might choose one over the others. I cover a variety of tree-based (e.g., CART, random forest, XGBoost) and non-tree (e.g., kNN, Naive Bayes, Passive-Aggressive) algorithms, explain at a high level how they work, and show how you can work with them using libraries like
The second talk, meanwhile, provides an introduction to Apache Spark by way of Azure Databricks. In it, I’ll cover the basic details of what Apache Spark is, how Databricks fits into it all, and how we can create data pipelines. Trust me when I say that I stretch the pipeline metaphor as far as it goes, and maybe a little further.
Riding the Rails: Railway-Oriented Programming with F# is the third talk I’m currently working on. It follows the excellent Scott Wlaschin’s Railway-Oriented Programming metaphor and talk, and I plan to give it my own spin by including more code in the talk itself. The cost of focusing more on the code is a loss of some of the depth of discussion that Scott hits, but I hope that trade-off is worthwhile, as I really like the ROP metaphor / Either monad. You can find this at the Azure Community Conference.
Finally, the fourth talk debuting this month is entitled Saving your Wallet from the Cloud, and it is intended to serve as a way of understanding how pricing in the cloud works and different methods you can choose to slice that bill. This one will debut for SQL Saturday #1021 in Orlando.
I’ll probably have a blog series for each talk over the next couple of months, once the time constraints have softened a bit.
Microsoft Cloud Workshops
One of the things I do for Solliance is create and maintain Microsoft Cloud Workshops. Right now, I have two on my plate: Big data and virtualization, and Innovate and modernize apps with Data and AI. Both of them have updated scheduled, and they’re both pretty big ones.
A few months ago, I was the subject matter expert for DataCamp’s Data Modeling in Power BI course. You won’t see or hear me there, but I shaped the course design, developed most of the content, and handed all of that off to DataCamp folks so that I don’t have to think about it any longer…
I’m currently doing the same on a course around data visualization in Power BI. This course should be particularly interesting because I’m combining psychological concepts (knowing your audience, getting an emotional response, reducing cognitive load, tracking focal points, etc.) with a grand overview of most Power BI visuals, including custom visuals and Python/R visuals. In addition to those, there’s an entire lesson on designing for accessibility. For approximately 4-6 learner hours of training, there’s a lot of content packed in there. I’m about 2/3 of the way through this course, so we’ll probably see it release in December.
More on the Plate
There’s a bit more that I’m working on as well, but by this point, I’m now convinced I live on a planet with 36-hour days…or I’m over-booked, one of the two.