Data importing into R is a task in itself. There seems to be a different importing function for almost every type of file format. One can agree it becomes quite confusing and frustrating to mix up between those functions and yet again their arguments. But once you get to know what packages and commands to use for which file types, it is fairly simple. To help you with this, here is a comprehensive tutorial for your reference, covering these commands to import files into your R session.
So, next time you find yourself searching ‘how to load xyz type of file into R’, you’d know where to look.
We will discuss importing the most common file types such as .TXT, .CSV, .JSON, Excel worksheets, SAS/SPSS/STATA datasets, Matlab files, etc. with examples and code. We will also see how to use ODBC connections to import tables from relational databases directly into R. You are also in for an amazing importing hack which I find very useful for some quick and dirty adhoc analysis.
Let’s dive in!!
How to prepare workspace environment before data importing
Generally, files to be loaded for a particular task are stored in a single directory. Further, this directory can be assigned as the “working directory” inside your R environment for ease of importing. To know the location of your current working directory use -
getwd()
You could bring your files to the location returned by getwd. It could also be that you want to change the working directory to a location of your choice. Then you must use -setwd(“”)
setwd(“”)
R would now know which folder you’re working in. This makes it easier to import and export files using relative path rather than the long absolute path to file location.
Your workspace environment would often be filled with data and values from the last R session. It is often required to clean the environment before proceeding afresh. You can do it using
rm(list=ls())
It’s better to start with a clean environment. An alternative can be - not saving the workspace on ending the R session. Objects from the previous R sessions can lead to errors which are often very hard to debug. The above command gets rid of all the variables, functions and objects in your current environment. My advice – use it wisely.
Yayy, you’re finally done with the setup phase!
However, do keep in mind that the set up phase is important for eliminating majority of kinks you would come across in subsequent phases of your project. Now, it’s time to get to business.
Loading TXT/CSV/JSON/other separator files into R
Reading text files
Data in text files is of a form where values are separated by a delimiter such as tab, or comma or semicolon. Your file will look similar to the below format –
Category V1 V2
A 3 2
B 5 6
B 2 3
A 4 8
A 7 3
Above is an example of a tab delimited file. Similarly, a comma or semicolon could replace the tab separator as shown in the next section. Flat files like these can be read using the function read.table() –
df <- read.table(“”, header = FALSE)
(Note that you could skip writing the path in the above example if your file is in the working directory)
Read.table is the most common way of bringing simple files into your workspace. It is also very flexible. For instance, to import files delimited by characters other than tab, you can pass the sep argument indicating which character is the separator.
Reading CSV files
A CSV file is where entries are separated by a comma (,) or a semi-colon (;). Your file will look like this –
Category V1 V2
A 3 2
B 5 6
B 2 3
A 4 8
A 7 3
To load files like this, you can use read.csv() or read.csv2() function. For (,) separated files use read.csv, and for (;) separated files use read.csv2. Remember, files of both kind can also be read using read.table() function. read.csv() and read.csv2() functions are essentially just wrapper functions around read.table(). Look at their source codes:
read.csv
function (file, header = TRUE, sep = ",", quote = "\"", dec = ".",
fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
dec = dec, fill = fill, comment.char = comment.char, ...)
read.csv2
function (file, header = TRUE, sep = ";", quote = "\"", dec = ",",
fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
dec = dec, fill = fill, comment.char = comment.char, ...)
You’ll find the read.csv and read.csv2 are just wrapper functions around read.table with few changes in their default arguments.
Essentially,
df <- read.table(“”, sep = “,”)
is same as:
df <- read.csv(“”)
and
df <- read.table(“”, sep = “;”)
is same as:
df <- read.csv2(“”)
Similarly, for other separators, you can use either read.table or read.delim() with appropriate ‘sep’ argument.
Copy and paste from files directly into R
This is a quick and dirty hack that can save you a lot of time and effort. Select the data and copy it and run this command to bring it into R –
df <- read.table(“clipboard”)
Sometimes when your data won’t be in the appropriate format, it wouldn’t be read correctly. But, this is the best compromise when you want to do some ad-hoc analysis on the fly.
Packages
We’ve seen how to import files through basic R commands. You can also do this by loading in packages and using the packages’ functions. Furthermore, almost all complicated file types need respective packages for getting imported to R.
Some Workspace Prepping (yet again)
To use the import functions in a package, you first need to install your package and load it in your environment. Simply run the following two commands:
install.packages(“”)
library(“”)
Reading JSON files
For reading JSON files into R, you’ll need a package called ‘rjson’. Run the commands to download and install the package into R –
install.packages(“rjson”)
And load it into the environment by
library(rjson)
Now, you can use the fromJSON() function to read JSON data.
1 Import data from json file in your working directory
JsonData <- fromJSON(file= “”)
2 Import data from json file on a specific URL
JsonData <- fromJSON(file= “”)
Tip: The input JSON file is stored in JsonData object in R environment as a list. You can view the contents by printing the object JsonData. Also you can convert it into a data frame by running the below commandJsonDF <- as.data.frame(
JsonDF <- as.data.frame(JsonData)
print(JsonData) #
prints final data.frame
Importing XML data and data from HTML tables
Mostly while working with data on the World Wide Web, you’ll have to deal with HTML and XML formats.
Both XML and HTML data can be imported easily by the XML package in R. It offers many approaches for both reading and creating XML (and HTML) documents (including DTDs), both local and accessible via HTTP or FTP.
First, install and load the package, just like demonstrated above. Use the xmlParse() function to parse the XML file as shown in the code snippet –
Activate the libraries
library(XML)
library(RCurl)
Read XML file by URL
xmlData <- xmlTreeParse(“”)
Read XML file in working directory
XmlData <- xmlTreeParse(“”)
Tip: Again the xmlTreeParse() function returns a list into XmlData object. The same data can be read as a more usable data frame format by running below code –
xmldataframe <- xmlToDataFrame(“”)
print(xmldataframe)
Another useful function from the XML package allows reading data from one or more HTML tables. The readHTMLTable function and its methods provide robust methods for extracting data from HTML tables from an HTML document from a file, or (http: or ftp:) URL, or an already parsed document of htmlParse() function (XML package). One can also specify a specific
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.