Not too long ago, I looked at using dplyr and tidyr to clean up data, and ended the post with an ugly-looking plot on top of a map of the Raleigh area.  Today, we’re going to look at using a heat map to understand crime data in Raleigh a little easier.

Let’s Heat It Up

I found a good example of heatmaps in R. I’d now like to apply this to our Raleigh example.  Let’s start out by building back up our data set:

install.packages("ggplot2")
install.packages("dplyr")
install.packages("tidyr")
install.packages("ggmap")
library(ggplot2)
library(dplyr)
library(tidyr)
library(ggmap)

raleighcrime = read.csv('/opt/data/Police.csv')
raleighcrime$INCIDENT.DATE = as.Date(raleighcrime$INC.DATETIME, format="%m/%d/%Y %I:%M:%S %p")
raleighcrime$BEAT = as.factor(raleighcrime$BEAT)
raleighcrime$LOCATION = as.character(raleighcrime$LOCATION)

rcf <- filter(raleighcrime, LOCATION != "")
rcf <- separate(rcf, LOCATION, c("LATITUDE", "LONGITUDE"), ",", 1)
rcf$LATITUDE <- round(extract_numeric(rcf$LATITUDE), 3)
rcf$LONGITUDE <- round(extract_numeric(rcf$LONGITUDE), 3)
raleighmap <- get_map(location = c(lon = mean(rcf$LONGITUDE), lat = mean(rcf$LATITUDE)), zoom = 11, maptype = "roadmap", scale = 2)

At this point, we’re going to use the rcf data frame to overlay raleighmap.  I picked up the zoom level on raleighmap through trial and error, so be willing to try out different levels here.

Now that I have a map, let’s make it look nice:

ggmap(raleighmap, extent = "device") +
  geom_density2d(data = rcf, aes(x=LONGITUDE,y=LATITUDE),size = 0.3) +
  stat_density2d(data=rcf, aes(x=LONGITUDE, y=LATITUDE, fill=..level.., alpha=..level..), size=0.01, bins=16, geom="polygon") +
  scale_fill_gradient(low="green",high="red") +
  scale_alpha(range = c(0,0.3), guide=FALSE)

We’ve got some new code here, so let’s dig into it.  Before I describe the function calls, let’s look at the map:

raleighcrimeheatmap

This looks a lot better than the previous map, and I think it gives a pretty decent view of the crime data set, having previously dealt with it in a different format.  We already know ggmap from the last post; it displays a Google Maps map.

Geom_Density2D

The next command is geom_density2d.  It builds those blue contour lines based on density.  I think that contour lines work out well here because this crime data set does follow a contour pattern:  crime does tend to radiate out, with high-crime areas being near other high-crime areas and dissipating over time.  Anyhow, on geom_density2d, there aren’t many interesting parameters.  We assign a data set, create an aesthetic binding latitude and longitude, and set the size ratio to a thin enough line.

Stat_Density2D

By itself, geom_density2d draws some blue lines, which is cool.  But stat_density2d lets us build density maps, like so:

statdensitymap

Contours were nice, but this really helps us see high-crime areas more clearly.  Our stat_density2d has two new terms in the aesthetic:  fill and alpha.  Both of these are marked as “..level..”  Here’s the quick explanation.

We have three parameters that we’ve set:  size, bins, and geom.  To show bins in action, I bumped the number up to 60 and rebuilt the plot.  What we end up with is a much busier-looking map:

scale60bins

What we can learn from this is that more bins isn’t necessarily a good thing, as this just makes our crime map look noisier.  We still see the highest-crime points but somebody trying to pick out the most important details of this crime map has to look harder.

As for geom, the help isn’t really that helpful:  “The geometric object to use display the data.”  I’ve confirmed that you can use polygon, tile, and density2d, but don’t know a full list.

Scale_Fill_Gradient

In our above map, the color range is blacks and blues.  It looks pretty nice, but there’s a color scheme which makes a bit more sense:  red is worse.  That’s what scale_fill_gradient does here:  we move from green (at the low end) to red (at the high end).

scalefillgradient

This makes more intuitive sense:  the redder the area, the higher the crime rate.

Scale_Alpha

The big difference between the picture immediately above and the final version is a runthrough with scale_alpha.  This lets us tone down the image a bit and let us bring the contour lines back into focus.

Here are three versions with different alpha ratios:

0-0.3 (my final map):

raleighcrimeheatmap

0-0.4:

scalealpha0.4

0-0.5:

scalealpha0.5

The difference here is a bit more subtle than some of the other transformations, but playing around with a few different alpha levels helps you get a feel for the effect.  I don’t like how much alpha level 0.5 bleeds.  Alpha level 0.3 is a very understated result, and I think it looks best of the three, although ideally I’d like just a little more color in that plot.

Conclusion

Building graphs and maps is all about making decisions to explain information in a concise manner.  R has some fantastic methods for doing this, one of which is using heat maps.  When overlaid on top of a real-life map, it brings our data to life.

Additional Resources

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s