Not too long ago, I looked at using dplyr and tidyr to clean up data, and ended the post with an ugly-looking plot on top of a map of the Raleigh area. Today, we’re going to look at using a heat map to understand crime data in Raleigh a little easier.
Let’s Heat It Up
I found a good example of heatmaps in R. I’d now like to apply this to our Raleigh example. Let’s start out by building back up our data set:
install.packages("ggplot2") install.packages("dplyr") install.packages("tidyr") install.packages("ggmap") library(ggplot2) library(dplyr) library(tidyr) library(ggmap) raleighcrime = read.csv('/opt/data/Police.csv') raleighcrime$INCIDENT.DATE = as.Date(raleighcrime$INC.DATETIME, format="%m/%d/%Y %I:%M:%S %p") raleighcrime$BEAT = as.factor(raleighcrime$BEAT) raleighcrime$LOCATION = as.character(raleighcrime$LOCATION) rcf <- filter(raleighcrime, LOCATION != "") rcf <- separate(rcf, LOCATION, c("LATITUDE", "LONGITUDE"), ",", 1) rcf$LATITUDE <- round(extract_numeric(rcf$LATITUDE), 3) rcf$LONGITUDE <- round(extract_numeric(rcf$LONGITUDE), 3) raleighmap <- get_map(location = c(lon = mean(rcf$LONGITUDE), lat = mean(rcf$LATITUDE)), zoom = 11, maptype = "roadmap", scale = 2)
At this point, we’re going to use the rcf data frame to overlay raleighmap. I picked up the zoom level on raleighmap through trial and error, so be willing to try out different levels here.
Now that I have a map, let’s make it look nice:
ggmap(raleighmap, extent = "device") + geom_density2d(data = rcf, aes(x=LONGITUDE,y=LATITUDE),size = 0.3) + stat_density2d(data=rcf, aes(x=LONGITUDE, y=LATITUDE, fill=..level.., alpha=..level..), size=0.01, bins=16, geom="polygon") + scale_fill_gradient(low="green",high="red") + scale_alpha(range = c(0,0.3), guide=FALSE)
We’ve got some new code here, so let’s dig into it. Before I describe the function calls, let’s look at the map:
This looks a lot better than the previous map, and I think it gives a pretty decent view of the crime data set, having previously dealt with it in a different format. We already know ggmap from the last post; it displays a Google Maps map.
The next command is geom_density2d. It builds those blue contour lines based on density. I think that contour lines work out well here because this crime data set does follow a contour pattern: crime does tend to radiate out, with high-crime areas being near other high-crime areas and dissipating over time. Anyhow, on geom_density2d, there aren’t many interesting parameters. We assign a data set, create an aesthetic binding latitude and longitude, and set the size ratio to a thin enough line.
By itself, geom_density2d draws some blue lines, which is cool. But stat_density2d lets us build density maps, like so:
Contours were nice, but this really helps us see high-crime areas more clearly. Our stat_density2d has two new terms in the aesthetic: fill and alpha. Both of these are marked as “..level..” Here’s the quick explanation.
We have three parameters that we’ve set: size, bins, and geom. To show bins in action, I bumped the number up to 60 and rebuilt the plot. What we end up with is a much busier-looking map:
What we can learn from this is that more bins isn’t necessarily a good thing, as this just makes our crime map look noisier. We still see the highest-crime points but somebody trying to pick out the most important details of this crime map has to look harder.
As for geom, the help isn’t really that helpful: “The geometric object to use display the data.” I’ve confirmed that you can use polygon, tile, and density2d, but don’t know a full list.
In our above map, the color range is blacks and blues. It looks pretty nice, but there’s a color scheme which makes a bit more sense: red is worse. That’s what scale_fill_gradient does here: we move from green (at the low end) to red (at the high end).
This makes more intuitive sense: the redder the area, the higher the crime rate.
The big difference between the picture immediately above and the final version is a runthrough with scale_alpha. This lets us tone down the image a bit and let us bring the contour lines back into focus.
Here are three versions with different alpha ratios:
0-0.3 (my final map):
The difference here is a bit more subtle than some of the other transformations, but playing around with a few different alpha levels helps you get a feel for the effect. I don’t like how much alpha level 0.5 bleeds. Alpha level 0.3 is a very understated result, and I think it looks best of the three, although ideally I’d like just a little more color in that plot.
Building graphs and maps is all about making decisions to explain information in a concise manner. R has some fantastic methods for doing this, one of which is using heat maps. When overlaid on top of a real-life map, it brings our data to life.