DEV Community

Cover image for Visualizing shapefiles in R with sf and ggplot2!
Chris Greening
Chris Greening

Posted on

Visualizing shapefiles in R with sf and ggplot2!

Introduction

As data scientists, being able to investigate and visualize the geographic world around us is often a critical tool in our toolkit

Whether we're

  • tracking the spread of a global pandemic
  • investigating the effects of climate change
  • or helping a city develop its public transportation system

there are limitless insights to be gleaned and communicated from geographic data

With geographic data being as complex as it is, it's essential to have the right tools to simplify our lives (and code) as much as possible

By leveraging R and its rich package ecosystem, we're able to take advantage of powerful tools such as sf and ggplot2 to bring our geographic data to life and quickly spin up meaningful analyses and graphics

So let's jump in and learn how to synthesize these packages together and create quick visualizations from shapefiles!

Screenshot of Italy regions from the geom_sf plot

NOTE: The complete code and dataset for this blog post can be found on GitHub here

Table of contents

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

favicon christophergreening.com

Prerequisites and installation

The following packages are prerequisite installations for following along with this blog post!

To install them open RStudio (or wherever your R is installed) and run:

install.packages(c("sf", "ggplot2", "tidyverse"))
Enter fullscreen mode Exit fullscreen mode

What is sf?

If you've worked on geographic data before you may have come across a wide range of file formats, tools, and jargon

It can get overwhelming especially when we just need to perform a quick spatial analyses or generate some visually appealing maps. That's where the sf package in R comes in handy

Short for "Simple Features", sf is a package designed to simplify spatial data handling within R via the Simple Features standard that specifies a common storage and access model for geographic data

By using sf, we're able to leverage its:

  • Ease of use: sf enables us to work with spatial data as if it were a regular data.frame, data.table, or tibble (pick your poison). This intuitive handling makes life much easier as it opens up the robust and familiar support R has for tabular data
  • Integration with other tools: sf seamlessly integrates with popular R packages like tidyverse and ggplot2, allowing us to wrangle and visualize spatial data with some of our favorite tools
  • Versatility: From reading and writing various spatial file formats to spatial operations like joining and aggregating data, sf is wonderfully versatile. It supports a wide array of geometric operations, coordinate reference systems, etc.

Now that we're situated with what sf is, let's jump into shapefiles and learn what they are and how we can leverage them for quick visualizations!

What are shapefiles and why do they matter?

Shapefiles are a crucial part of geographic data handling and if you're working with spatial data you'll undoubtedly encounter them. But what exactly are they and why are they so important?

Let's break it down:

  • Definition: A shapefile is a common geospatial vector data format used in Geographic Information System (GIS) software. It stores the geometric location and attribute information of geographic features
  • Components: A shapefile is not just a single file but consists of at least three mandatory files:
    • .shp: Holds the feature geometry stored as a set of vector coordinates
    • .shx: Stores the shape index format of the geometry
    • .dbf: Contains attributes or metadata associated with the shapes

Screenshot of file explorer showing the shapefiles

  • Use cases: Shapefiles can be applied to a wide variety of fields and use cases:
    • Mapping: Shapefiles enable the creation of maps, displaying roads, rivers, landforms, etc.
    • Analysis: They facilitate spatial analyses such as distance calculations, area measurements, and overlays
    • Data integration: Shapefiles are versatile and can be integrated with other data types, aiding comprehensive analyses
  • Widespread use: Shapefiles are one of the most commonly used formats in GIS
  • Accessibility: Many governmental and environmental organizations provide data in shapefile format, making it widely accessible
  • Compatibility: They are supported by various GIS software, enhancing their usability

In short, shapefiles act as a bridge between raw geographic data and the insights we can synthesize and draw from them

Their accessibility and flexibility make them indispensable in the world of geographic data analysis and by leveraging the sf package in R, we can effortlessly load and manipulate these complex files into informative visual insights

Loading shapefiles with sf

Reading a shapefile with sf is incredibly straightforward and requires only a single line of code using sf::st_read

For this example, we're going to load geographic regions of Europe from a shapefile sourced directly from an official European Union data source and filter it to only show Italian geographic regions using dplyr::filter

library(sf)
library(dplyr)

shape.data <- sf::st_read("NUTS_RG_01M_2021_3035.shp") %>%
    dplyr::filter(CNTR_NAME == "IT")
Enter fullscreen mode Exit fullscreen mode

Output of shape.data printed to RStudio's console

And that's it! Our geographic data is now ready for analyzing and visualizing

Visualizing the geographic data with ggplot2

To visualize our shape data we can now leverage ggplot2 like we would with any other analysis using ggplot2::geom_sf!

ggplot2::ggplot(data = shape.data) +
    ggplot2::geom_sf() +
    ggplot2::labs(title = "Italian NUTS-3 regions") +
    ggplot2::theme(
        panel.background = ggplot2::element_blank(),
        axis.text = ggplot2::element_blank(),
        axis.title = ggplot2::element_blank(),
        axis.ticks = ggplot2::element_blank(),
        legend.position = "none"
    )
Enter fullscreen mode Exit fullscreen mode

Screenshot of Italy regions from the geom_sf plot

Conclusion

And thus, in this post we've explored how R makes working with and visualizing geographic data accessible and efficient through the use of the sf and ggplot2 packages

Thanks so much for reading and if you liked my content, be sure to check out some of my other work or connect with me on social media or my personal website 😄

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

favicon christophergreening.com

Cheers!


Additional resources

Top comments (0)