R Training In Washington DC

Are you interested in learning R but don’t know where to begin?  Do you have corporate training funds burning a hole in your pocket and you desperately need to spend them before the year runs out?  Or alternatively, do you want some good training before an action-packed SQL Saturday but don’t have a big budget?

I am giving a full-day R training on Friday, December 7th in Washington DC.  Tickets are still available for this training event.  Here’s the abstract:

DESCRIPTION

Enter The Tidyverse: R for the Data Professional

In this day-long training, you will learn about R, the premier language for data analysis. We will approach the language from the standpoint of data professionals: database developers, database administrators, and data scientists. We will see how data professionals can translate existing skills with SQL to get started with R. We will also dive into the tidyverse, an opinionated set of libraries which has modernized R development. We will see how to use libraries such as dplyr, tidyr, and purrr to write powerful, set-based code. In addition, we will use ggplot2 to create production-quality data visualizations.
Over the course of the day, we will look at several problem domains. For database administrators, areas of note will include visualizing SQL Server data, predicting error occurrences, and estimating backup times for new databases. We will also look at areas of general interest, including analysis of open source data sets.
No experience with R is necessary. The only requirements are a laptop and an interest in leveling up your data professional skillset.

I’ve done this training a few times now and have settled on Azure Notebooks, which makes it easier for people to take this stuff home and play around (for free!) on their own.  If you attend, you get a jam-packed day of training as well as two dozen notebooks full of material to get you from “new to R” to solving practical problems in your environment.

Sign up today!

Advertisements

PASS Summit 2018 Evaluation Ratings & Comments

Following up on Brent Ozar’s post on the topic, I figured I’d post my own ratings (mostly because they’re not awful!).  This was my first PASS Summit at which I was a speaker, so I don’t have a good comp for scores except what other speakers publish.  I had the privilege of giving three presentations at PASS Summit this year, and I’m grateful for everyone who decided to sit in on these rather than some other talk.  All numeric responses are on a 1-5 scale with 5 being the best.

Applying Forensic Accounting Techniques Using SQL and R

This was a talk that I’ve given a few times and even have an extra-long director’s cut version available.  I had 71 attendees and 14 responses.

Eval Question
Avg Rating
Rate the value of the session content.
4.21
How useful and relevant is the session content to your job/career?
3.43
How well did the session’s track, audience, title, abstract, and level align with what was presented?
4.21
Rate the speaker’s knowledge of the subject matter.
4.86
Rate the overall presentation and delivery of the session content.
4.29
Rate the balance of educational content versus that of sales, marketing, and promotional subject matter.
4.64

The gist of the talk is, here are techniques that forensic accountants use to find fraud; you can use them to learn more about your data.  I fell flat in making that connection, as the low “useful” score shows.  That’s particularly bad because I think this is probably the most “immediately useful” talk that I did.

Event Logistics Comments

  • Room was very cold
  • very cold rooms, very aggressive air-con
  • The stage was squeaky and made banging noises when the speaker was trying to present. Not their fault! The stage just didn’t seem very stable. Also the room had a really unpleasant smell.
  • Everything was great!

The squeak was something I noticed before the talk.  I thought about staying in place to avoid the squeak, but this is a talk where I want to gesticulate to emphasize points—like moving from one side of the stage to the other to represent steps in a process.  My hope was that the squeak wouldn’t be too noticeable but the microphone may have picked it up.

Speaker Comments

  • I was just curious about the topic, but the speaker inspired me with many smaller, but very feasible tips and tricks of how to look at a data! Thank You!
  • The Jupyter notebooks were awesome. I felt the speaker really knew their stuff.  But the downsides were that the analysis methods discussed weren’t really shown to us, or were so far out of context I didn’t quite see how to use them or how they related to the demos. Multiple data sets were used and maybe just focusing all the methods on one of them may have worked better? I just felt overall it was a really interesting topic with a lot of work done but it just didn’t come together for me. Sorry.
  • I like understanding how fraud got uncovered by looking at data. Thanks
  • Very interesting session. Speak made the subject very interesting. I’ve picked up a few ideas I can use in my job.
  • Some examples of discovery of fraud would have been more effective.
  • One of my favorite sessions at PASS. Thank you for making the jupyter notebook available for download.
  • Great speaker/content, would attend again.

The second comment is exactly the kind of comment I want.  My ego loves the rest of the comments, but #2 makes me want to tear this talk apart and rebuild it better.  The biggest problem that I have with the talk is that my case study involved actual fraud, but none of the data sets I have really show fraud.  I’m thinking of rebuilding this talk using just one data set where I seed in fraudulent activities and expose it with various techniques.  Ideally, I’d get a copy of the case study’s data, but I never found it anywhere.  Maybe I could do a FOIA request or figure out some local government contact.

Getting Started with Apache Spark

My second session was Friday morning.  I had 100 somewhat-awake attendees but only 5 responses, so take the ratings with a grain of salt.

Eval Question
Avg Rating
Rate the value of the session content.
4.80
How useful and relevant is the session content to your job/career?
4.60
How well did the session’s track, audience, title, abstract, and level align with what was presented?
4.80
Rate the speaker’s knowledge of the subject matter.
5.00
Rate the overall presentation and delivery of the session content.
4.80
Rate the balance of educational content versus that of sales, marketing, and promotional subject matter.
5.00

This is a talk that I created specifically for PASS Summit.  I’m happy that it turned out well, considering that there was a good chance of complete demo failure:  my Portable Hadoop Cluster was finicky that morning and wanted to connect to the Internet to grab updates before it would let me run anything.  Then I had to restart the Apache Zeppelin service mid-talk to run any notebooks, but once that restarted successfully, the PHC ran like a champ.

Event Logistics Comments

  • Good

Speaker Comments

  • Session was 100, but I would say 200-300
  • Great presentation!

Getting rating levels right is always tricky.  In this case, I chose 100 rather than 200 because I spent the first 30+ minutes going through the history of Hadoop & Spark and a fair amount of the remaining time looking at Spark SQL.  But I did have a stretch where I get into RDD functions and most T-SQL developers will be unfamiliar with map, reduce, aggregate, and other functions.  So that’s a fair point—calling it a 200 level talk doesn’t bother me.

Cleaning is Half the Battle:  Launching a Data Science Project

This was my last PASS Summit talk, which I presented in the last session slot on Friday.  I had 31 attendees and 7 responses.

Eval Question
Avg Rating
Rate the value of the session content.
4.71
How useful and relevant is the session content to your job/career?
4.71
How well did the session’s track, audience, title, abstract, and level align with what was presented?
4.86
Rate the speaker’s knowledge of the subject matter.
4.86
Rate the overall presentation and delivery of the session content.
4.57
Rate the balance of educational content versus that of sales, marketing, and promotional subject matter.
4.71

Again, small sample size bias applies.

Event Logistics Comments

  • Good
  • Great

Speaker Comments

  • You have a lot content, leading to a rushed talk. Also, your jokes have potential, if u slowed down and sold them better
  • Good presentation. I expected more on getting the project off the ground, but enjoyed the info.
  • Funny and informative–I truly enjoyed your presentation!
  • Great
  • Kevin knows how to present, especially for getting stuck w/ the last session of the last day. He brought a lot of energy to the room. Content was on key too, he helped me understand more about handling data that we don’t want to model.

The slowing down comment is on point.  This is a 90-minute talk by its nature.  I did drop some content (like skipping slides on data cleansing and analysis and just showing the demos) so that I could spend a little more time on the neural network portion of the show, but I had to push to keep on time and technically went over by a minute or two.  I was okay with the overage because it was the final session, so I wasn’t going to block anybody.

Synthesis and Commentary

The ratings numbers are something to take with several grains of salt:  26 ratings over 3 sessions isn’t nearly a large enough sample to know for sure how these turned out.  But here are my thoughts from the speaker’s podium.

  • I speak fast.  I know it and embrace it—I know of the (generally good) advice that you want to go so slow that it feels painful, but I’ll never be that person.  In the Ben Stein — Billy Mays continuum, I’d rather err on the Oxyclean side.
  • I need to cut down on material.  In the first and last talks, they could both be better with less.  The problem with cutting material in the data science process talk is that I’d like to cover the whole process with a realistic-enough example and that takes time.  So this is something I’ll have to work on.
  • I might need to think of a different title for my data science process talk.  I explicitly call out that it’s about launching a data science project, but as I was sitting in a different session, I overheard a couple of people mention the talk and one person said something along the lines of not being interested because he’s already seen data cleansing talks.  The title is a bit jokey and has a punchline in the middle of the session, so I like it, but maybe something as simple as swapping the order of the segments to “Launching a Data Science Project:  Cleaning is Half the Battle” would be enough.
  • Using a timer was a really good idea.  I normally don’t use timers and instead go by feel at SQL Saturdays and user group meetings, and that leads to me sometimes running short on time.  I tend to practice talks with a timer to get an idea of how long they should last, but rarely re-time myself later, so they tend to shift in length as I do them.  Having the timer right in front of me helped keep me on track.
  • For the Spark talk, I think when I create my normal slide deck, I’m going to include the RDD (or “Spark 1.0”) examples as inline code segments and walk through them more carefully.  For an example of what I mean, I have a section in my Classification with Naive Bayes talk where I walk through classification of text based on word usage.  Normally, I’d make mention of the topic and go to a notebook where I walk through the code.  But that might have been a little jarring for people brand new to Spark.
  • I tend to have a paucity of images in talks, making up for it by drawing a lot on the screen.  I personally like the effect because action and animation keep people interested and it’s a lot easier for me to do that by drawing than by creating the animations myself…  It does come with the downside of making the slides a bit more turgid and making it harder for people to review the slides later as they lose some of that useful information.  As I’ve moved presentations to GitPitch I’ve focused on adding interesting but not too obtrusive backgrounds in the hopes that this helps.  Still, some of the stuff that I regularly draw should probably show up as images.

So it’s not perfect, but I didn’t have people hounding me with pitchforks and torches after any of the sessions.  I have some specific areas of focus and intend to take a closer look at most of my talks to improve them.