The R programming language is often used for statistical computing, which makes it the perfect candidate for financial analysis. Here at lemon.markets, we’re a big fan of R as it allows us (and more importantly, our users 🍋) to perform the necessary data exploration and analysis to inform (automated) trade decisions. lemon.markets is a Berlin-based 🇩🇪 brokerage API for developers, by developers such that they can build their own brokerage experience at the stock market. In this article, we’ve compiled a list of 10 great R packages that can be used to work with stock market data. After reading, you’ll be able to manipulate your data and begin performing technical analysis on it.
The focus of this article is on stock market data — this means that we’re primarily concerned with importing, manipulating, visualising and reporting on data. And this is where R Shiny-s 😉 the brightest due to its statistics-focus! We also don’t want to overwhelm you with the large number of R packages available, we’re sure this article can have a part two (and three, and four, and…)
lemon.markets offers two APIs: the trading API and the market data API, so we’ll just be focusing on the latter in this article. After retrieving, for example, OHLC data on a few financial instruments from our API, there’s several directions you can take. Perhaps you want to use these prices to forecast price movement in the future, or maybe you want to use it to generate (real-time) technical indicators. Regardless of which path you choose to follow, you’ll need to collect, (pre-)process and maybe visualise your stock market data. We’ve collected packages that cover these three necessary steps.
Should I be using R? 🏴☠️
R is one of the most widely used languages in the data analytics sector. It’s primarily used in academics, but large companies such as Uber, Facebook and Airbnb also use R for data visualisation and statistical inference. One of the most powerful features of R is that it is open-source, and anyone can contribute their own R packages, which means that there’s almost endless options to choose from. We’ve actually had to narrow down the list of packages we’re going to share with you today.
When using R for finance, you’ll probably use some more general packages that deal with data management and other finance-specific packages. In fact, many packages actually build upon each other — you’ll come to learn that the R ecosystem is highly interconnected. ♻️
A quick note on the tidyverse 🧼
We can’t write an article about data manipulation without mentioning the tidyverse. It’s a collection of R packages designed specifically for data science. It’s somewhat akin to the SciPy stack for Python. The tidyverse can, as the name suggests, tidy up your data. But, it can also provide additional functionalities such as data visualisation and manipulation. We’ll get into it later, but if the name pops up, you’ll know what we’re talking about.
Collecting stock market data
The first step before we can do something with our market data, is to make sure that we actually have some. This means that we need to collect it. The lemon.markets Market Data API can be used to retrieve historic market data in H1/D1/M1 format, the latest quotes and the latest trades for specific instruments. For example, if you want to request hourly OHLC data for Apple, a request to our API can look as follows:
install.packages("httr")
library(httr)
market_url <- "https://data.lemon.markets/v1/ohlc/h1?isin=US0378331005"
response <- httr::GET(url = market_url,
add_headers(
Authorization = paste("Bearer", YOUR-API-KEY)
)
)
For simplicity’s sake, you can also use the R package set up by Mario at Quantargo, which you can find here! He’s one of our prized community members building things with and for the lemon.markets API to make it accessible to all kinds of developers #opensource. Quantargo is a platform that can help you build up data science skills, through courses, workshops (also for businesses) and a browser-based workspace where you can immediately deploy your projects. They also have a wonderful Introduction to R course, check it out.
📚Readr
When you are working with large datasets that require you to structure them, readr is the right choice for you. R has built-in commands for reading in rectangular data, whether that be a .csv, .tsv or .fwf. Parsing a file with readr allows you to specify the data type per column (or it will smartly guess it for you). In addition, readr will output a tibble
, the workhorse of the tidyverse, which is a type of data.frame
that allows more complexity in your data (compared to the native format in R). A simple implementation is as follows:
install.packages("readr")
library(readr)
data <- read_csv("filename.csv", col_types = list(
var1 = col_double(),
var2 = col_integer(),
var3 = col_datetime()))
The package also allows you to read directly from an Excel spreadsheet or Google sheets, check out this cheatsheet to learn more.
Quantmod
The quantmod package can load data, chart data and obtain relevant technical signals. This package works with several sources, including (but not limited to) Yahoo Finance and FRED. But, it can also fetch data from something like an MySQL database. In the code snippet below, we show you how to load historical price data for AAPL from YahooFinance (do note that these prices are in USD):
install.packages("quantmod")
library(quantmod)
getSymbols("AAPL", src="yahoo")
chartSeries(AAPL, subset="last 6 months", theme=chartTheme("white"))
addMACD()
We chose to chart the last 6 months of the OHLCV (that is, Open High Low Close Volume) data, which can conveniently be specified verbatim (see the documentation for other formats). We also added the Moving Average Convergence Divergence (MACD) indicator, which shows the relationship between two moving averages of AAPL’s price. This produces the chart below:
As you can see, quantmod can be used for (pre-)processing your data too. And there’s a lot more that can be done with it: this is just a taste. Try it out for yourself!
(Pre-)processing your stock market data
Obtaining raw data often means that you’ll need to perform one or more alterations on it — perhaps you have irrelevant data, missing values, data in the wrong format or you might want to obtain some metrics from this data. Welcome to the (pre-)processing stage, where there’s more than enough R packages to help you address the above issues.
⏰ Xts
install.packages("xts")
library(xts)
The xts package is the package for handling time-series data (and it extends the popular zoo package, which means even more methods available to you). As financial data often takes the form of time series data, we expect xts objects to come in handy: think of them as time-indexed matrices. You can perform lots of different operations on these matrices, such as extracting time specific segments of data. For example, if you want to forecast prices, but you don’t want to include the volatile market open and close, you might choose to omit these two time intervals from your model. This guide gives a good overview of what can be done with the two packages.
🌪️ Dplyr
install.packages("dplyr")
library(dplyr)
The dplyr package can be used for data manipulation, it can filter, sort, summarise, select and mutate your data. In financial analysis, this could be useful when you’re:
- finding financial instruments that are related to each other,
- obtain certain metrics e.g. standard deviation, mean, range, etc.
- aggregating price information from different stock exchanges. This cheatsheet will tell you everything you need to know about using dplyr.
📅 Lubridate
Lubridate is yet another component of the tidyverse. This package’s role? Ensuring that your date-time objects are correctly formatted and/or combined. For example, the following code snippet,
install.packages("lubridate")
library(lubridate)
date <- as_datetime(1635592026)
will return 2021-10-30 11:07:06 UTC
. It’s robust against timezones, leap years and anything other time anomalies you can think of. This might be useful if you’re working with more than one data source (with different formats) or if your trading platform only accepts certain formats.
🚦TTR
install.packages("ttr")
library(ttr)
Technical Trading Rule (TTR) is a popular choice when it comes to technical trading signals. It includes over 50 technical indicators such as the more obscure Chande Momentum Oscillator (CMO) or the well-known Relative Strength Index (RSI). If you’d like to learn more about how trading signals can be used to motivate your strategies, you can read our article on beginner-friendly trading strategies.
TTR can also be used to obtain several volatility measures, such as True Range (TR) or the Chaikin Volatility (VT). You can use them to determine how much risk you are exposing yourself to and whether this aligns with your trading philosophy.
🧹 Tidyquant
Tidyquant is the bridge between the tidyverse and zoo, xts, quantmod and TTR. It basically makes working with the aforementioned packages easier by formatting the data in a tibble
. For example, the data loading we did in the ‘quantmod’ section can be reformulated as:
install.packages("tidyquant")
library(tidyquant)
google <- tq_get(x = "GOOG")
this ensures a tibble
as output, meaning we can use many of the featured data manipulation tools on the OHLCV data without having any formatting issues! See this page for the core functionalities of the package.
Visualising stock market data
Visualising your data can also be an important component in determining trade decisions. You might be able to spot patterns and anomalities that aren’t immediately apparent by looking at the raw price data, check out this article to get an idea.
📈 Ggplot2
Ggplot2 is another member of the tidyverse, it can be used to create graphs from your data and gain insight into your dataset. For example, we can plot the stock prices of two financial instruments on the same graph to (visually) determine whether there is co-movement (do note that this should be confirmed with a statistical test e.g. the Engle-Granger test):
install.packages(c("tidyquant", "ggplot2", "dplyr"))
library(tidyquant)
library(ggplot2)
library(dplyr)
multiple_stocks <- tq_get(c("GOOG", "AMZN"),
get = "stock.prices",
from = "2021-01-01",
to = "2021-10-31")
ggplot(data = filter(multiple_stocks, symbol == "GOOG" || symbol == "AMZN"),
aes(x=date, color=symbol)) + geom_line(aes(x=date, y=open, color=symbol))
This code snippet will output a graph that looks as follows:
From only visual inspection, it appears that the Google time series includes a drift (time trend), whereas Amazon appears to somewhat oscillate around a mean. These insights might inform us that these two stocks are likely not very appropriate in a Pairs Trading Strategy.
Check out this tutorial for more inspiration on how ggplot2 can be used for financial data.
Python & R
In the realm of data science, you’ll never have touchpoints with just one programming language and/or platform. For example, if you’re working with a multidisciplinary team, you might need to jump from one language to another, or find a way to integrate them into a single script. At lemon.markets, we’re partial to using both Python and R to design our trading strategies, therefore we thought it might be useful to find a way to combine the two.
➡️ Reticulate
The reticulate package allows you to embed a Python session within an R script, this makes the transition between the two more seamless. This could be useful if you are, for example, using R for data exploration and Python to automate your trading strategy.
All in all, there’s plenty of R packages that can be useful in the finance context. We’ve discussed the benefits to using the tidyverse (tibbles
!), how certain financial packages can be used in conjunction with the tidyverse and how to obtain technical signals. But, the surface has only been scratched!
Are there any other R packages that you think are unmissable when it comes to finance and automated trading? Share them below! And if you’re not yet part of lemon.markets, join our waitlist, we’d love to see your R projects.
Marius from lemon.markets 🍋
Top comments (0)