ggplot Basics: Facets

This is part six of a series on ggplot2.

Up to this point, we’ve looked at single graphs.  But sometimes, a single graph can get a little too complicated for us.  Let’s go back to our gapminder data set showing data by continent:

39_final

The relationship between wealth and longevity across the world.

I’d like to see if these relationships hold within the five different continents.  I can easily change the R code to give me five smoothed lines, one per continent:

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
    geom_point(alpha = 0.5, mapping = aes(color = continent)) +
    geom_smooth(method = "lm", se = FALSE, mapping = aes(color = continent)) +
    scale_color_brewer(type = "qual", palette = "Dark2") +
    scale_x_log10(label = scales::dollar) +
    theme_minimal() +
    labs(
        x = "GDP (PPP, normalized to 2005 USD)",
        y = NULL,
        title = "Wealth And Longevity",
        subtitle = "Charting the relationship between a country's prosperity and its residents' life expectancy.",
        caption = "Source:  Gapminder data set, 2010",
        color = "Continent"
    ) +
    guides(color = guide_legend(title = "Continent:")) +
    theme(
        legend.position = "bottom",
        text = element_text(family = "Gill Sans MT"),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 9),
        legend.title = element_text(size = 9),
        axis.title = element_text(size = 10)
    )
41_five_lines

That definitely cleared things up.

That’s pretty ugly.  How about instead, we show each as a separate plot?  We could write the R code to show each individually, but then we’d need to know about each category.  Instead, let’s use the facet functionality in ggplot:  facet_wrap() and facet_grid().

Facet Wrap

The facet_wrap() function wraps one grid after another after another.  Because we’re only displaying two variables per scatter plot (we are no longer showing continent), we can remove the separate colors and go back to a single, consistent color for each graph.

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
    geom_point(alpha = 0.5) +
    geom_smooth(method = "lm", se = FALSE) +
    scale_x_log10(label = scales::dollar) +
    theme_minimal() +
    labs(
        x = "GDP (PPP, normalized to 2005 USD)",
        y = NULL,
        title = "Wealth And Longevity",
        subtitle = "Charting the relationship between a country's prosperity and its residents' life expectancy.",
        caption = "Source:  Gapminder data set, 2010",
        color = "Continent"
    ) +
    guides(color = guide_legend(title = "Continent:")) +
    theme(
        text = element_text(family = "Gill Sans MT"),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 9),
        legend.title = element_text(size = 9),
        axis.title = element_text(size = 10)
    ) +
    facet_wrap(facets = ~continent, ncol = 3)
42_facet_wrap

Using facet_wrap(), we can easily create independent but related graphs.

Notice that we create a graph per continent by setting facets = ~continent.  The tilde there is important—it’s a one-sided formula.  You could also write c("continent") if that’s clearer to you.

I also set the number of columns, guaranteeing that we see no more than 3 columns of grids. I could alternatively set nrow, which would guarantee we see no more than a certain number of rows.

There are a couple other interesting features in facet_wrap. First, we can set scales = "free" if we want to draw each grid as if the others did not exist. By default, we use a scale of “fixed” to ensure that everything plots on the same scale. I prefer that for this exercise because it lets us more easily see those continental clusters.

Facet Grid

The facet_grid() function builds a matrix of panels.  Unlike facet_wrap(), there is no ncol or nrow parameter. Instead, we have the ability to define the left-hand or right-hand side of an equation to populate the grids.

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
    geom_point(alpha = 0.5) +
    scale_x_log10(label = scales::dollar) +
    theme_minimal() +
    labs(
        x = "GDP (PPP, normalized to 2005 USD)",
        y = NULL,
        title = "Wealth And Longevity",
        subtitle = "Charting the relationship between a country's prosperity and its residents' life expectancy.",
        caption = "Source:  Gapminder data set, 2010",
        color = "Continent"
    ) +
    guides(color = guide_legend(title = "Continent:")) +
    theme(
        text = element_text(family = "Gill Sans MT"),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 9),
        legend.title = element_text(size = 9),
        axis.title = element_text(size = 10)
    ) +
    facet_grid(facets = continent~.)
43_facet_grid_rows

A chart with grid faceting by row.

Note that I took the smoothed line off in this case. That way, we can more easily see the data points and not the line. I’ve got one variable of interest on the left-hand side—that is, one variable which defines the rows of this grid. Because the right-hand side is “everything else,” we can share the X axis for all of these grids. This particular setup lets us contrast PPP GDP by continent fairly easily.

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
    geom_point(alpha = 0.5) +
    scale_x_log10(label = scales::dollar) +
    theme_minimal() +
    labs(
        x = "GDP (PPP, normalized to 2005 USD)",
        y = NULL,
        title = "Wealth And Longevity",
        subtitle = "Charting the relationship between a country's prosperity and its residents' life expectancy.",
        caption = "Source:  Gapminder data set, 2010",
        color = "Continent"
    ) +
    guides(color = guide_legend(title = "Continent:")) +
    theme(
        text = element_text(family = "Gill Sans MT"),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 9),
        legend.title = element_text(size = 9),
        axis.title = element_text(size = 10)
    ) +
    facet_grid(facets = .~continent)
44_facet_grid_columns

A chart with grid faceting by column.

And here’s what happens when I put continent on the right-hand side. Now we have a shared Y axis, letting us see relative life expectancy clusters by continent.

So what happens if we define both sides? Then we start building out our grid:

ggplot(data = filter(gapminder, year %in% c(1982, 2007)), mapping = aes(x = gdpPercap, y = lifeExp)) +
    geom_point(alpha = 0.5) +
    scale_x_log10(label = scales::dollar) +
    theme_minimal() +
    labs(
        x = "GDP (PPP, normalized to 2005 USD)",
        y = NULL,
        title = "Wealth And Longevity",
        subtitle = "Charting the relationship between a country's prosperity and its residents' life expectancy.",
        caption = "Source:  Gapminder data set, 2010",
        color = "Continent"
    ) +
    guides(color = guide_legend(title = "Continent:")) +
    theme(
        text = element_text(family = "Gill Sans MT"),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 9),
        legend.title = element_text(size = 9),
        axis.title = element_text(size = 10)
    ) +
    facet_grid(facets = year~continent)
45_facet_grid_both

A chart with grid faceting by row and by column.

In this example, I am looking at the years 1982 and 2007 and comparing life expectancy to income per continent—that is, four separate variables in one plot. It’s getting a bit too busy on this chart, but we can make out some trends, like a big boost in life expectancy across the board, but particularly in Asia.

Conclusion

Faceting is one way to introduce one or more “extra” variables into a plot.  By breaking data out into multiple, connected plots, we can make relationships clearer.  Doing so runs the risk of information overload, however:  if I try to fit 20 or 30 graphs on the same page, I’m probably going to be doing more confusing than elucidating.

In the next post, I’ll look at another way of arranging graphs using an external library.

Advertisements

One thought on “ggplot Basics: Facets

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s