Introduction
As data scientists, being able to investigate and visualize the geographic world around us is often a critical tool in our toolkit
Whether we're
- tracking the spread of a global pandemic
- investigating the effects of climate change
- or helping a city develop its public transportation system
there are limitless insights to be gleaned and communicated from geographic data
With geographic data being as complex as it is, it's essential to have the right tools to simplify our lives (and code) as much as possible
By leveraging R and its rich package ecosystem, we're able to take advantage of powerful tools such as sf
and ggplot2
to bring our geographic data to life and quickly spin up meaningful analyses and graphics
So let's jump in and learn how to synthesize these packages together and create quick visualizations from shapefiles!
NOTE: The complete code and dataset for this blog post can be found on GitHub here
Table of contents
- Prerequisites and installation
- What is sf?
- What are shapefiles and why do they matter?
- Loading shapefiles with sf
- Visualizing the geographic data with ggplot2
- Conclusion
- Additional resources
Prerequisites and installation
The following packages are prerequisite installations for following along with this blog post!
To install them open RStudio (or wherever your R is installed) and run:
install.packages(c("sf", "ggplot2", "tidyverse"))
What is sf?
If you've worked on geographic data before you may have come across a wide range of file formats, tools, and jargon
It can get overwhelming especially when we just need to perform a quick spatial analyses or generate some visually appealing maps. That's where the sf
package in R comes in handy
Short for "Simple Features", sf
is a package designed to simplify spatial data handling within R via the Simple Features standard that specifies a common storage and access model for geographic data
By using sf
, we're able to leverage its:
-
Ease of use:
sf
enables us to work with spatial data as if it were a regular data.frame, data.table, or tibble (pick your poison). This intuitive handling makes life much easier as it opens up the robust and familiar support R has for tabular data -
Integration with other tools:
sf
seamlessly integrates with popular R packages liketidyverse
andggplot2
, allowing us to wrangle and visualize spatial data with some of our favorite tools -
Versatility: From reading and writing various spatial file formats to spatial operations like joining and aggregating data,
sf
is wonderfully versatile. It supports a wide array of geometric operations, coordinate reference systems, etc.
Now that we're situated with what sf
is, let's jump into shapefiles and learn what they are and how we can leverage them for quick visualizations!
What are shapefiles and why do they matter?
Shapefiles are a crucial part of geographic data handling and if you're working with spatial data you'll undoubtedly encounter them. But what exactly are they and why are they so important?
Let's break it down:
- Definition: A shapefile is a common geospatial vector data format used in Geographic Information System (GIS) software. It stores the geometric location and attribute information of geographic features
-
Components: A shapefile is not just a single file but consists of at least three mandatory files:
- .shp: Holds the feature geometry stored as a set of vector coordinates
- .shx: Stores the shape index format of the geometry
- .dbf: Contains attributes or metadata associated with the shapes
-
Use cases: Shapefiles can be applied to a wide variety of fields and use cases:
- Mapping: Shapefiles enable the creation of maps, displaying roads, rivers, landforms, etc.
- Analysis: They facilitate spatial analyses such as distance calculations, area measurements, and overlays
- Data integration: Shapefiles are versatile and can be integrated with other data types, aiding comprehensive analyses
- Widespread use: Shapefiles are one of the most commonly used formats in GIS
- Accessibility: Many governmental and environmental organizations provide data in shapefile format, making it widely accessible
- Compatibility: They are supported by various GIS software, enhancing their usability
In short, shapefiles act as a bridge between raw geographic data and the insights we can synthesize and draw from them
Their accessibility and flexibility make them indispensable in the world of geographic data analysis and by leveraging the sf
package in R, we can effortlessly load and manipulate these complex files into informative visual insights
Loading shapefiles with sf
Reading a shapefile with sf
is incredibly straightforward and requires only a single line of code using sf::st_read
For this example, we're going to load geographic regions of Europe from a shapefile sourced directly from an official European Union data source and filter it to only show Italian geographic regions using dplyr::filter
library(sf)
library(dplyr)
shape.data <- sf::st_read("NUTS_RG_01M_2021_3035.shp") %>%
dplyr::filter(CNTR_NAME == "IT")
And that's it! Our geographic data is now ready for analyzing and visualizing
Visualizing the geographic data with ggplot2
To visualize our shape data we can now leverage ggplot2
like we would with any other analysis using ggplot2::geom_sf
!
ggplot2::ggplot(data = shape.data) +
ggplot2::geom_sf() +
ggplot2::labs(title = "Italian NUTS-3 regions") +
ggplot2::theme(
panel.background = ggplot2::element_blank(),
axis.text = ggplot2::element_blank(),
axis.title = ggplot2::element_blank(),
axis.ticks = ggplot2::element_blank(),
legend.position = "none"
)
Conclusion
And thus, in this post we've explored how R makes working with and visualizing geographic data accessible and efficient through the use of the sf
and ggplot2
packages
Thanks so much for reading and if you liked my content, be sure to check out some of my other work or connect with me on social media or my personal website 😄
Cheers!
Top comments (0)