DBAs: Come Learn R With Me

In conjunction with SQL Saturday Madison, I am giving my first full-day training session entitled Enter the Tidyverse:  R for the Data Professional on Friday, April 6th.  I'm using the term "data professional" in particular because I want to hit a relatively under-served part of the community:  database administrators.  I should note that if you're a…

Data Processing: The Other 90%

This is part three of a series on launching a data science project. The Three Steps Of Data Processing Data processing is made up of a few different activities:  data gathering, data cleansing, and data analysis.  Most estimates are that data scientists spend about 80% of their time in data processing (particularly in data cleansing). …

Tidy Data And Normalization

In Hadley Wickham's paper on tidy data, he makes a few points that I really appreciated. Data sets are made up of variables and observations.  In the database world, we'd call variables attributes and observations entities.  In the spreadsheet world, we'd call variables/attributes columns and observations/entities rows. Each variable contains all values which measure the…

ggplot2: cowplot

This is part seven of a series on ggplot2. Up to this point, I've covered what I consider to be the basics of ggplot2.  Today, I want to cover a library which is still easy to use, but helps you create more advanced visuals:  cowplot.  I was excited by the name cowplot, but once I…

We Speak Linux

I'm pleased to announce the launch of We Speak Linux, a site dedicated to helping Windows administrators and developers become familiar with Linux.  This has been Tracy Boggiano's pet project for several months.  Along for the ride are Brian Carrig (who still needs to update his blog), Mark Wilkinson, Anthony Nocentio, and me.  I have…