loading...
Cover image for Getting Started with R and the Twitter API
TwitterDev

Getting Started with R and the Twitter API

jessicagarson profile image Jessica Garson ・4 min read

I've played around with R a few times in my career but never got to the point where I felt that I could say that I know R. Considering R is one of the most popular languages for common Data Science tasks like time-series analysis, statistical tests, and even visualization, I decided to revisit R again using the Twitter API and document some handy tips for getting started.

This tutorial is aimed for entry level developers to guide you through the process I used for setting up your environment, running your first “hello world” program, searching for Tweets, and saving your dataset of Tweets to a CSV. From there, you will be well situated to take the next steps and utilize R’s more advanced capabilities when working with Twitter data.

Environment set up

First, you need to download R. After, to set up an environment to work with R, you can use R studio, the R extension pack for Visual Studio Code or a Jupyter notebook if you come from the Python world.

To confirm you've set up the environment correctly, start with a ‘hello world’ program. When you are in R, you should see a prompt that looks like this > and can test that everything is configured correctly by running the following line.

print("hello world!")

If everything is set up correctly, you should get back the following line:

[1] "hello world!"

Installing a library to connect to the Twitter API

To connect to the Twitter API and pull in some Tweets to analyze with R, you can use the rtweet package. This package is one of many open-source libraries built by the Twitter developer community.

To install this package enter the following into your console:

install.packages("rtweet")

To start using the package, you need to call the library. When you call a library you are telling R you are going to work with the package rtweet.

library(rtweet)

Using the Twitter API

You will need to have an approved developer account to use any one of the Twitter APIs. If you don’t have one already, you can apply for a developer account. For this tutorial, you’ll be connecting to the API using the standard search endpoint, which allows you to retrieve recent Tweets that match queries.

Searching Tweets

Imagine that you wanted to search for all Tweets that have the hashtag “#synths” and limit the number of responses to 5 Tweets so that you don’t get overwhelmed. You can use n as shown below to define a limit on the number of Tweets to be returned (note: the maximum possible is 100, as defined by the API limit). In addition, you can also use the include_rts to indicate whether you want Retweets to be included in the results (let’s assume you don’t in this case). In R, <- is the assignment operator and works like = in other languages.

synths <- search_tweets(
  "#synths", n = 5, include_rts = FALSE
)

If you’re successful, you will be taken to a page that allows us to authorize your app to make calls on behalf of your Twitter account.

This means that you don’t have to pass your credentials via code. You will see a message that says “Authorize the rstats2twitter app by logging into Twitter, or selecting 'Authorize app'”. After you select “Authorize app” you will see the following message “Authentication complete. Please close this page and return to R”. After authorizing the app, you are ready to make additional requests without having to go through the extra steps because you only have to authorize the app for your first request.

Descriptive statistics on a column of our dataset

If you wanted to learn more about your dataset, a simple way to do so would be by running descriptive statistics. As an example, try calculating the mean, median, min, max and range of the favorite_count of your synths dataset. The term favorite_count refers to the number of likes.

To calculate the mean (or average) of favorite_count for our dataset, run the following command:

mean(synths$favorite_count)

For the median, which is the value that separates the higher half from the lower half of a dataset run this:

median(synths$favorite_count)

To get the lowest value of our dataset, run this line:

min(synths$favorite_count)

For the highest value of our dataset you could run:

max(synths$favorite_count)

Last, to get the range (the difference between the highest and lowest value of our dataset) you can run this line of code:

range(synths$favorite_count)

Viewing and saving our data to a CSV

To simply view the Tweets from your query you can run this line:

View(synths)

You can also save your synths dataset to CSV as well, allowing you to run more analysis on it at a later time:

write_as_csv(
  synths, file_name = "synths.csv", 
  prepend_ids = TRUE, na = "", 
  fileEncoding = "UTF-8"
)

Conclusion

While there is much more you can do with R and the Twitter API, this tutorial should help you get set up and ready to go with some basic operations. Be sure to check out the rtweet package documentation for more inspiration about what you could do next. Code from this tutorial can be found on our TwitterDev GitHub.

If you have any questions or if this tutorial inspires you to build anything interesting let us know on our community forums. We are also in the midst of building the future of the Twitter API, with Twitter Developer Labs. If you have any ideas for new features or other ideas be sure to give us feedback. Be sure to keep an eye on upcoming Labs features viewing our roadmap.

Discussion

pic
Editor guide