Mastering Data Import in R — Your 2025 Guide
Importing data into R remains a vital first step in any analytics workflow—and with today’s diverse file formats and data sources, getting it right matters more than ever. Here's how to confidently navigate data importing in R—smartly, efficiently, and using the most relevant tools available.
Set Up for Success: Clean Start & Environment
Start each session fresh:
- Use at least one of these to avoid ghost errors:
rm(list = ls())
- Define and verify your working directory: setwd("path/to/your/directory") getwd()
This ensures clarity in your script, avoids confusion over file paths, and keeps your workspace clean.
Import Flat Files: Text, CSV, and Co.
Base R
read.table("data.txt", header = TRUE, sep = "\t")
read.csv("data.csv") # comma-separated
read.csv2("data_semicolon.csv") # semicolon-separated
Modern Tidyverse Style
library(readr)
df_csv <- read_csv("data.csv")
df_tsv <- read_delim("data.tsv", delim = "\t")
These functions are faster and work seamlessly with tidy workflows.
Work with JSON, XML, and HTML Data
JSON:
library(jsonlite)
json_data <- fromJSON("data.json") # from file or URL
json_df <- as.data.frame(json_data)
XML / HTML:
library(xml2)
xml_doc <- read_xml("data.xml")
html_tab <- xml2::read_html("webpage.html") %>%
rvest::html_table(fill = TRUE) %>%
.[[1]]
Excel and Spreadsheet Files
The readxl package is the gold standard:
library(readxl)
df_default <- read_excel("workbook.xlsx") # reads first sheet
df_sheet3 <- read_excel("workbook.xlsx", sheet = 3)
It's fast, avoids external dependencies, and handles headers gracefully.
Data from Statistical & Scientific Files
Haven streamlines data importing from other stats tools:
library(haven)
df_sas <- read_sas("data.sas7bdat")
df_spss <- read_sav("data.sav")
df_stata <- read_dta("data.dta")
For MATLAB files:
library(R.matlab)
df_mat <- readMat("data.mat") # imports MATLAB files directly
And for Octave files:
library(foreign)
df_oct <- read.octave("data.octave")
Connect to Databases and Visualize Large Data
ODBC connections let you pull data straight from relational systems:
library(RODBC)
conn <- odbcConnect("DSN_Name", uid = "user", pwd = "password")
df1 <- sqlFetch(conn, "Table1")
df2 <- sqlQuery(conn, "SELECT * FROM Table2")
odbcClose(conn)
Perfect for relational data sources—SQL Server, Access, and others.
Pro Tips: Best Practices & Workflow Polish
- Keep column headers clean: no spaces, no special characters; use snake_case or camelCase.
- Use consistent naming and avoid duplicates.
- Treat missing values appropriately—let missing values map to NA.
- Use consistent code style (like the tidyverse style guide) for readability and maintainability.
Final Thoughts
Importing data into R now spans powerful tools like readr, readxl, haven, and ODBC connectors. By starting clean, applying modern packages, and enforcing consistent naming and style, you streamline your analysis—and set yourself up for speed and clarity.
This article was originally published on Perceptive Analytics.
In New York, our mission is simple — to enable businesses to unlock value in data. For over 20 years, we’ve partnered with more than 100 clients — from Fortune 500 companies to mid-sized firms — helping them solve complex data analytics challenges. As a leading Power BI Consultant in New York and Tableau Consultant in New York, we turn raw data into strategic insights that drive better decisions.
Top comments (0)