Wrapping up this mini-series on R, I’m going to link to a notebook which touches on some more advanced topics, especially from the standpoint of a one-hour presentation starting from scratch. In this last post, I plan to gloss over the details and give you the notebook links and let you see what’s going on.
This notebook started as a lab assignment for a the Polyglot .NET group in the Raleigh-Durham area. Nobody got all the way through the lab, but people did get far enough that I’d call the results a success.
Basically, there are two versions of this notebook: the lab version (which I gave to attendees) and the completed version. If you feel like digging through the lab for a couple hours, you might get more out of it than simply looking at the completed version.
The Data Set
The original data set came from the town of Cary, North Carolina’s open data portal. This data set includes restaurant inspections over the course of approximately one decade. It includes some very interesting details, but is missing one huge part: it has address information, but no latitude and longitude information.
My original plan to take address data and get latitude and longitude was to hit the Google Maps API, which offers 1000 free requests per day. 1000 hits per day is a huge number most days, but when you have over 13,000 requests, it’s not that great an option. As a result, I ended up using the street2coordinates function inside RDSTK. This is a bit slower than Google Maps and didn’t have the nice geocoding methods that ggmap has for Google Maps, but I was able to let the process run overnight and get latitude and longitude details. From there, I saved the file as a text file and make it available for general use.
This notebook takes you through my version of data analysis, including some false starts thrown in to help you understand. I like examples that have built-in failure points because we see them in real life far more often than successes, and so I want to show how I react after hitting a brick wall, ways that I got around the issue at hand, and that in turn helps you later on when you hit a similar wall.
Cool Things We Do
This notebook is full of things that I consider really nice about R:
- Using ggmap to plot data on real Google Maps images, letting users gain immense context
- Cleanse data using dplyr, including filtering results, grouping results, and creating new summary variables that we can plot on a map.
- Make area plots with mean restaurant scores for the whole of Wake County as well as the town of Cary.
This post wraps up my R mini-series. It barely scratches the surface of what you can do with R, but I’m hoping it inspires SQL Server developers to get in on the data analysis game.