## DEV Community

Sπ°οΈβοΈ π

Posted on • Updated on

# Analyzing League of Legends Data with R (2023)

## Sourcing the data

If you want to analyze all gameplay from public matches, Riot has a great API. For me, I like to look at the pros in competitive play; an excellent resource is Oracle's Elixir. Specifically, I use the match data files to start all of my research.

## The R Part

As I said, we'll be using the match data file provided by Oracle's Elixer, we'll use the Summer 2019 file. If you're not familiar with R yet, that's OK! We'll keep it simple today. All you need to get started is R Studio. Now there are two libraries we'll be using: `Tidyverse` (a suite of several libraries for data manipulation) and `openxlsx` (You may have noticed the file format is an excel document, this lets us open it with ease). With these two libraries, we can get started!

With one line of code, we're ready to rock! Once you run that, we'll have the data loaded in, but before we can start an analysis, we need to clean it up a bit. For example, it thinks the `patchno` is a number (11.1 for example), but we really should consider it as a string, and the `gamelength` we should use as a double, the `date` is also in an Excel-specific format, so we should make that a usable date. So let's take care of these things:

You'll notice that the date cleaning looks a little bit more involved than the others. The reason is that we first need to represent the column as a number, and then we need to convert the number to a date (The trick here is that you need to provide the origin parameter, which in Excel is defined as December 30, 1899... I don't know why, but thanks Google!).

## Analysis

Finally, let's do a simple analysis using the Tidyverse packages!

Once we have the data thoroughly cleaned and ready to rock, we first filter the data by the region (in this case, called "league"), so we isolate the LCS (North America region), and then filter the rows by the "Team" rows (as opposed to the results for individual players, the "Team" represent aggregate performance for the entire team. Then we group by the actual teams. Finally, we create a summary for each team, where we tell it to create columns for each of the summaries that we want to see.

## Conclusion

In the future, I'll be sharing how to do a more in-depth analysis, but this is a great place for you to get started hopefully. Let me know if you have any questions or ideas for types of analysis to perform!