Enter The Tidyverse, Columbus Edition

In conjunction with SQL Saturday Columbus, I am giving a full-day training session entitled Enter the Tidyverse:  R for the Data Professional on Friday, July 27th.  This is a training that I did earlier in the year in Madison, Wisconsin, and aside from having no voice at the end, I think it went really well.  I’ve tweaked a couple of things to make this training even better; it’s well worth the low, low price of $100 for a full day of training on the R programming language.

I use the term “data professional” on purpose:  part of what I do with this session is show attendees how, even if they are database administrators, it can pay to know a bit about the R programming language.  Database developers, application developers, and budding data scientists will also pick up a good bit of useful information during this training, so it’s fun for the whole data platform.

Throughout the day, we will use a number of data sources which should be familiar to database administrators:  wait stats, database backup times, Reporting Services execution log metrics, CPU utilization statistics, and plenty more.  These are the types of things which database administrators need to deal with on a daily basis, and I’ll show you how you can use R to make your life easier.

If you sign up for the training in Columbus, the cost is only $100 and you’ll walk away with a better knowledge of how you can level up your database skills with the help of a language specially designed for analysis.  Below is the full abstract for my training session.  If this sounds interesting to you, sign up today!  I’m not saying you should go out and buy a couple dozen tickets today, but you should probably buy one dozen today and maybe a dozen more tomorrow; pace yourself, that’s all I’m saying.

Course Description

In this day-long training, you will learn about R, the premiere language for data analysis.  We will approach the language from the standpoint of data professionals:  database developers, database administrators, and data scientists.  We will see how data professionals can translate existing skills with SQL to get started with R.  We will also dive into the tidyverse, an opinionated set of libraries which has modernized R development.  We will see how to use libraries such as dplyr, tidyr, and purrr to write powerful, set-based code.  In addition, we will use ggplot2 to create production-quality data visualizations.

Over the course of the day, we will look at several problem domains.  For database administrators, areas of note will include visualizing SQL Server data, predicting error occurrences, and estimating backup times for new databases.  We will also look at areas of general interest, including analysis of open source data sets.

No experience with R is necessary.  The only requirements are a laptop and an interest in leveling up your data professional skillset.

Intended Audience

  • Database developers looking to tame unruly data
  • Database administrators with an interest in visualizing SQL Server metrics
  • Data analysts and budding data scientists looking for an overview of the R landscape
  • Business intelligence professionals needing a powerful language to cleanse and analyze data efficiently

Contents

Module 0 — Prep Work

  • Review data sources we will cover during the training
  • Ensure laptops are ready to go

Module 1 — Basics of R

  • What is R?
  • Basic mechanics of R
  • Embracing functional programming in R
  • Connecting to SQL Server with R
  • Identifying missing values, outliers, and obvious errors

Module 2 — Intro To The Tidyverse

  • What is the Tidyverse?
  • Tidyverse principles
  • Tidyverse basics:  dplyr, tidyr, readr, tibble

Module 3 — Dive Into The Tidyverse

  • Data loading:  rvest, httr, readxl, jsonlite, xml2
  • Data wrangling:  stringr, lubridate, forcats, broom
  • Functional programming:  purrr

Module 4 — Plotting

  • Data visualization principles
  • Chartjunk
  • Types of plots:  good, bad, and ugly
  • Plotting data with ggplot2
    • Exploratory plotting
    • Building professional quality plots

Module 5 — Putting it Together:  Analyzing and Predicting Backup Performance

  • A capstone notebook which covers many of the topics we covered today, focusing on Database Administration use cases
  • Use cases include:
    • Gathering CPU statistics
    • Analyzing Disk Utilization
    • Analyzing Wait Stats
    • Investigating Expensive Reports
    • Analyzing Temp Table Creation Stats
    • Analyzing Backup Times

Course Objectives

Upon completion of this course, attendees will be able to:

  • Perform basic data analysis with the R programming language
  • Take advantage of R functions and libraries to clean up dirty data
  • Build a notebook using Jupyter Notebooks
  • Create data visualizations with ggplot2

Pre-Requisites

No experience with R is necessary, though it would be helpful.  Please bring a laptop to follow along with exercises and get the most out of this course.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s