This is part one of a series on ggplot2.
I’m starting a new series on using ggplot2 to create high-quality visuals. But in order to understand why ggplot2 behaves the way it does, we need to understand a little bit about the grammar of graphics. Leland Wilkinson published The Grammar of Graphics in 1999, with a revised edition in 2005. By 2006, Hadley Wickham had created ggplot (as mentioned in his presentation A grammar of graphics: past, present, and future) as an implementation of the grammar of graphics in R. In 2010, Wickham published A Layered Grammar of Graphics, which explains the reasoning behind ggplot2.
In this first post of the series, I want to give you an idea of why we should think about the grammar of graphics. From there, we’ll go into detail with ggplot2, starting simple and building up to more complex plots. By the end of the series, I want to build high-quality, publication-worthy visuals. With that flow in mind, let’s get started!
What Is The Grammar of Graphics?
First, my confession. I haven’t read Wilkinson’s book and probably never will. That’s not at all a knock on the book itself, but rather an indication that it is not for everybody, not even for everyone interested in data visualization.
Instead, we will start with Wickham’s paper on ggplot2. This gives us the basic motivation behind the grammar of graphics by covering what a grammar does for us: “A grammar provides a strong foundation for understanding a diverse range of graphics. A grammar may also help guide us on what a well-formed or correct graphic looks like, but there will still be many grammatically correct but nonsensical graphics. This is easy to see by analogy to the English language: good grammar is just the first step in creating a good sentence” (3).
With a language, we have different language components like nouns (which can be subjects, direct objects, or indirect objects), verbs, adjectives, adverbs, etc. We put together combinations of those individual components to form complete sentences and transmit ideas. Our particular word choice and language component usage will affect the likelihood of success in idea transmission, but to an extent, we can work iteratively on a sentence, switching words or adding phrases to get the point across the way we desire.
With graphics, we can do the same thing. Instead of thinking of “a graph” as something which exists in and of itself, we should think of different objects that we combine into its final product: a graph.
Implementing The Grammar
In the ggplot2 grammar, we have different layers of objects. In some particular order, we have:
- The data itself, and a mapping explaining what portions of the data we want to represent parts of our graph. This mapping is made up of things we see on the graph: the aesthetics. Aesthetic elements include the x axis, y axis, color, fill color, and so on.
- The statistical transformation we want to use. For example, there are stats for boxplots, jitter, qq plots, and summarization (page 11). Stats help you transform an input data frame into an output data frame that your plot can use, like generating a density function from an input data frame using stat_density().
- The geometric object (aka, geom) we want to draw. This could be a histogram, a bar or column chart, a line chart, a radar chart, or whatever. These relate closely to statistics.
- Scales and coordinates, which give us the axes and legends.
- Accompaniments to the visual. These include data labels and annotations. This is how we can mark specific points on the graph or give the graph a nice title.
- The ability to break our visual into facets, that is, splitting into multiple graphs. If we have multiple graphs, we can see how different pieces of the data interact.
The key insight in ggplot2 is that these different layers are independent of one another: you can change the geometric object from a line chart to a bar chart without needing to change the title, for example. This lets you program graphs iteratively, starting with very simple graphs and adding on more polish as you go.
As a follow-on to this, you can choose more than one geometric object, for example. So if you want to draw a column chart with a line chart in front of it, that’s two geometric objects and not one special line+column chart. This lets you construct graphics at the level of complexity that you need.
Even though the Wickham paper is nearing 8 years old by this point and the ggplot2 library has expanded considerably in the meantime, it remains a good introduction to the grammar of graphics and gives the motivation behind ggplot2.
Over the rest of this series, we will dig into ggplot2 in some detail, generating some low-quality images at first but building up to better and better things.