ggplot2: cowplot

This is part seven of a series on ggplot2.

Up to this point, I’ve covered what I consider to be the basics of ggplot2.  Today, I want to cover a library which is still easy to use, but helps you create more advanced visuals:  cowplot.  I was excited by the name cowplot, but once I learned that it had nothing to do with cattle (instead, the author’s name is Claus O. Wilke), that did diminish the charm a little bit.  Nevertheless, there are a couple of great things you can do with this library and we’ll see one of them today.

If you’re interested in cowplot, I recommend reading the vignette first, as it provides several useful examples.  For our case, we are going to use cowplot to stack two related charts.

Charting Genocide

To this point, we have been using the gapminder data set to compare GDP and life expectancy across continents, but without looking at any countries in particular.  In today’s post, I want to show a comparison between one country and the world.

First up, let’s load our libraries:

library(tidyverse)
library(gapminder)
library(ggthemes)
library(extrafont)
if(.Platform$OS.type == "windows") {
    loadfonts(device="win")
} else {
    loadfonts()
}

Next up, I want to build a plot showing GDP and life expectancy changes over time across the globe.  The gapminder data set has a number of individual year-GDP-expectancy points, so we’re going to summarize them first in a data frame.  After I do that, I will plot them using ggplot.

global_avg <- gapminder %>%
    group_by(year) %>%
    summarize(m_lifeExp = mean(lifeExp), m_gdpPercap = mean(gdpPercap)) %>%
    select(year, m_lifeExp, m_gdpPercap)

plot_global <- ggplot(data = global_avg, mapping = aes(x = m_gdpPercap, y = m_lifeExp)) +
    geom_point() +
    geom_path(color = "#999999") +
    scale_x_continuous(label = scales::dollar) +
    geom_text_repel(
        mapping = aes(label = year),
        nudge_y = 0.7,
        nudge_x = -120,
        segment.alpha = 0,
        family = "Gill Sans MT",
        size = 4
    ) +
    theme_minimal() +
    labs(
        x = "GDP (PPP, normalized to 2005 USD)",
        y = NULL,
        caption = "Source:  Gapminder data set, 2010"
    ) +
    theme(
        text = element_text(family = "Gill Sans MT"),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 9),
        legend.title = element_text(size = 9),
        axis.title = element_text(size = 10)
    )
plot_global
46_global_changes

Global changes in GDP and life expectancy over time

Notice that I used geom_path().  This is a geom I did not cover earlier in the series.  It’s not a common geom, though it does show up in charts like this where we want to display data for three variables.  The geom_line() geom follows the basic rules for a line:  that the variable on the y axis is a function of the variable on the x axis, which means that for each element of the domain, there is one and only one corresponding element of the range (and I have a middle school algebra teacher who would be very happy right now that I still remember the definition she drilled into our heads all those years ago).

But when you have two variables which change over time, there’s no guarantee that this will be the case, and that’s where geom_path() comes in.  The geom_path() geom does not plot y based on sequential x values, but instead plots values according to a third variable.  The trick is, though, that we don’t define this third variable—it’s implicit in the data set order.  In our case, our data frame comes in ordered by year, but we could decide to order by, for example, life expectancy by setting data = arrange(global_avg, m_lifeExp).  Note that in a scenario like these global numbers, geom_line() and geom_path() produce the same output because we’ve seen consistent improvements in both GDP per capita and life expectancy over the 55-year data set.  So let’s look at a place where that’s not true.

ggplot(data = filter(gapminder, country == "Cambodia"), mapping = aes(x = gdpPercap, y = lifeExp)) +
    geom_point() +
    geom_path(color = "#999999") +
    scale_x_continuous(label = scales::dollar) +
    geom_text_repel(
        mapping = aes(label = year),
        segment.alpha = 0,
        family = "Gill Sans MT",
        size = 4
    ) +
    theme_minimal() +
    labs(
        x = "GDP (PPP, normalized to 2005 USD)",
        y = NULL,
        caption = "Source:  Gapminder data set, 2010"
    ) +
    theme(
        text = element_text(family = "Gill Sans MT"),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 9),
        legend.title = element_text(size = 9),
        axis.title = element_text(size = 10)
    )
47_cambodia

Cambodian changes in GDP and life expectancy over time

Cambodia starts out similar to the rest of the world, seeing some growth in GDP per capita and life expectancy through 1967, but a precipitous drop in both during the 1970s.  The reason was the Khmer Rouge, one of the nastiest communist governments.  This graph alone is evidence of disaster, but I really want to drive the point home:  I want a direct comparison between what happened in Cambodia versus the rest of the world at the same time, and that’s where cowplot comes in.

Plotting A Grid

We’ve seen facet_wrap() and facet_grid() already in ggplot2, but cowplot’s plot_grid() has something very helpful for us:  the rel_heights parameter.  This lets us state what percentage of the total visual space each chart should take.  Let’s take the global plot, attach the Cambodian plot, and clean up titles and axes.  Then we’ll call cowplot’s plot_grid() function.  Here’s the full code:

plot_cambodia <- ggplot(data = filter(gapminder, country == "Cambodia"), mapping = aes(x = gdpPercap, y = lifeExp)) +
    geom_point() +
    geom_path(color = "#999999") +
    scale_x_continuous(label = scales::dollar) +
    geom_text_repel(
        mapping = aes(label = year),
        segment.alpha = 0,
        family = "Gill Sans MT",
        size = 4
    ) +
    theme_minimal() +
    labs(
        x = NULL,
        y = NULL,
        title = "The Khmer Rouge Legacy",
        subtitle = "Charting Cambodian life expectancy and GDP over time, compared to global averages."
    ) +
    theme(
        text = element_text(family = "Gill Sans MT"),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 9),
        legend.title = element_text(size = 9),
        axis.title = element_text(size = 10)
    )

global_avg <- gapminder %>%
    group_by(year) %>%
    summarize(m_lifeExp = mean(lifeExp), m_gdpPercap = mean(gdpPercap)) %>%
    select(year, m_lifeExp, m_gdpPercap)

plot_global <-
    ggplot(data = global_avg, mapping = aes(x = m_gdpPercap, y = m_lifeExp)) +
    geom_point() +
    geom_path(color = "#999999") +
    scale_x_continuous(label = scales::dollar) +
    geom_text_repel(
        mapping = aes(label = year),
        nudge_y = 0.7,
        nudge_x = -120,
        segment.alpha = 0,
        family = "Gill Sans MT",
        size = 4
    ) +
    theme_minimal() +
    labs(
        x = "GDP (PPP, normalized to 2005 USD)",
        y = NULL,
        caption = "Source:  Gapminder data set, 2010"
    ) +
    theme(
        text = element_text(family = "Gill Sans MT"),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 9),
        legend.title = element_text(size = 9),
        axis.title = element_text(size = 10)
    )

cowplot::plot_grid(plot_cambodia, plot_global, rel_heights = c(0.55, 0.45), ncol=1)
48_khmer_rouge_legacy

Comparing Cambodia to the rest of the world

We used relative heights of 55% versus 45% for this plot.  If you squeeze the world chart down, the line flattens out and distorts the image, so we want to keep these plots relatively similarly sized.

One thing I don’t like about this chart is that the year labels still end up overlapping the lines.  The ggrepel library will have text shift away from data points, but it doesn’t appear to prevent overlapping lines in a geom_path() geom.  I tried different nudge values but nothing quite worked right.

Keeping The Same X Axis

In this next chart, we’re going to look at Rwanda, another country which experienced a well-known genocide.  This time, instead of plotting both GDP per capita and life expectancy, we’re only going to look at life expectancy changes over time.  In the top chart, I’ll show Rwanda’s figures.  In the bottom chart, I’ll show a line chart with global averages over the same time frame.  Because we’ll use the same X axis, I don’t want two separate X axes for the two charts; I want them to blend.

plot_rwanda <- ggplot(data = filter(gapminder, country == "Rwanda"), mapping = aes(x = year, y = lifeExp)) +
    geom_point() +
    geom_line(color = "#999999") +
    theme_minimal() +
    labs(
        x = NULL,
        y = NULL,
        title = "The Rwandan Genocide",
        subtitle = "Charting Rwandan life expectancy over time, compared to the global average."
    ) +
    theme(
        text = element_text(family = "Gill Sans MT"),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 12),
        axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        axis.ticks.x = element_blank()
    )

plot_global <-
    ggplot(data = global_avg, mapping = aes(x = year, y = m_lifeExp)) +
    geom_point() +
    geom_line(color = "#999999") +
    theme_minimal() +
    labs(
        x = NULL,
        y = NULL,
        subtitle = "Global Average",
        caption = "Source:  Gapminder data set, 2010"
    ) +
    theme(
        text = element_text(family = "Gill Sans MT"),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 9),
        legend.title = element_text(size = 9),
        axis.title = element_text(size = 10)
    )

cowplot::plot_grid(plot_rwanda, plot_global, rel_heights = c(0.55, 0.45), ncol=1)
49_rwanda

Seeing the Rwandan genocide in stark contrast to global averages

There are a couple of changes here.  Because I have a consistent X axis, I removed the ticks from the top graph.  I also removed the text labels, as we are now showing year explicitly instead of implicitly through the data path.

Conclusion

This is only one of the uses for cowplot, but it’s a good one.  We are also not limited to two charts—we could just as easily stack an indefinite number of charts on top of one another and define relative sizes for each chart.  We can also combine cowplot’s plot_graph() with facet_wrap() to group together a set of charts and fit it in relationship to another chart.  This would be helpful if, say, we showed one country’s change in life expectancy over time to plots of similar countries’ changes over time.

Advertisements