Importing data into R is often the very first step in any data analysis journey. But if you’ve worked with R for a while, you’ll know that it has a different function for nearly every file format. At first, this can feel confusing—even frustrating—because you might mix up the functions or their arguments.
The good news is that once you know which functions and packages to use for which file types, the process becomes smooth and straightforward. This tutorial provides a comprehensive reference guide for importing data into R, covering the most common file formats along with practical examples and code.
So the next time you find yourself Googling “how to load [file type] into R”, you’ll have everything you need right here.
We’ll explore:
Reading TXT, CSV, JSON, Excel files
Working with SAS, SPSS, Stata, Matlab, and Octave datasets
Importing data from HTML/XML files
Connecting directly to relational databases with ODBC
And even a handy hack for quick, ad-hoc data loading.
Let’s dive in!
Preparing Your R Workspace
Before importing data, it’s a good idea to set up your environment properly.
Setting the Working Directory
Most datasets are stored in a dedicated folder for each project. You can tell R to treat that folder as its working directory.
getwd() # check current working directory
setwd("") # set a new working directory
By doing this, you can use relative file paths instead of typing long absolute paths each time.
Cleaning the Environment
Your R environment often contains leftover objects from previous sessions, which can cause errors. To start fresh, run:
rm(list = ls())
This clears all objects, functions, and variables. Alternatively, you can choose not to save the workspace when closing R.
💡 Pro Tip: Always start with a clean environment for smoother imports and fewer debugging headaches.
Importing TXT, CSV, and Delimited Files
Reading Text Files
Text files usually contain data separated by tabs, commas, or semicolons. Here’s a simple example of a tab-delimited file:
Category V1 V2
A 3 2
B 5 6
B 2 3
A 4 8
Use read.table() to import such files:
df <- read.table("", header = TRUE, sep = "\t")
You can adjust the sep argument for other delimiters.
Reading CSV Files
CSV files are either comma-separated (,) or semicolon-separated (;). R provides wrapper functions around read.table() for convenience:
df <- read.csv("") # for comma-separated
df <- read.csv2("") # for semicolon-separated
Both functions work like read.table() but come with defaults tailored for each format.
Quick Copy-Paste Hack
Need a fast way to test or analyze some data? Copy the data to your clipboard and run:
df <- read.table("clipboard", header = TRUE)
It may not always format perfectly, but it’s great for quick ad-hoc analysis.
Using Packages for Data Import
For more complex file formats, you’ll need specialized packages.
Install and load packages with:
install.packages("")
library("")
Reading JSON Files
Use the rjson package:
install.packages("rjson")
library(rjson)
From file
jsonData <- fromJSON(file = "")
From URL
jsonData <- fromJSON(file = "")
JSON data is loaded as a list. To convert it into a data frame:
jsonDF <- as.data.frame(jsonData)
Importing XML and HTML Tables
For XML and HTML data, the XML package works best:
library(XML)
library(RCurl)
Parse XML file
xmlData <- xmlTreeParse("")
Convert to data frame
xmlDF <- xmlToDataFrame("")
For HTML tables:
htmlData <- readHTMLTable(getURL(""))
Reading Excel Workbooks
There are several options—XLConnect, XLSX, gdata—but readxl is the simplest and fastest.
install.packages("readxl")
library(readxl)
Read first sheet
df <- read_excel("")
Read by sheet name or index
df <- read_excel("", sheet = "Sheet3")
df <- read_excel("", sheet = 3)
Importing Data from Statistical Software
R can directly import data from SAS, SPSS, and Stata using the haven package:
install.packages("haven")
library(haven)
SAS
df_sas <- read_sas("data.sas7bdat")
SPSS
df_spss <- read_sav("data.sav")
Stata
df_stata <- read_dta("data.dta")
For MATLAB:
install.packages("R.matlab")
library(R.matlab)
matData <- readMat("")
For Octave:
library(foreign)
octData <- read.octave("")
Importing Data from Relational Databases
Use the RODBC package to connect with databases like Microsoft SQL Server or Access:
install.packages("RODBC")
library(RODBC)
Connect to database
con <- odbcConnect("dsn", uid = "username", pwd = "password")
Fetch data
df1 <- sqlFetch(con, "Table1")
df2 <- sqlQuery(con, "SELECT * FROM Table2")
Close connection
odbcClose(con)
Tips for Smooth Data Imports
Use the first row for column headers
Ensure column names are unique and case-sensitive
Stick to simple naming conventions (e.g., var_name, varName)
Replace missing values with NA
Remove comments or extra symbols from files
Keep code style consistent for readability
Conclusion
Importing data into R is just the beginning of your analysis journey. In this guide, we walked through methods to bring in CSV, TXT, JSON, Excel, XML/HTML, as well as data from SAS, SPSS, Stata, Matlab, and relational databases.
With these tools and tricks, you’ll be able to handle almost any dataset you encounter. And remember, R often has multiple ways to achieve the same goal—so explore and find the method that best suits your workflow.
This article was originally published on Perceptive Analytics.
In United States, our mission is simple — to enable businesses to unlock value in data. For over 20 years, we’ve partnered with more than 100 clients — from Fortune 500 companies to mid-sized firms — helping them solve complex data analytics challenges. As a leading excel consultants, we turn raw data into strategic insights that drive better decisions.
Top comments (0)