DEV Community

Megan
Megan

Posted on

Videogame Text Datasets Release

Background

In 2016 I released LibraryofCodexes, a website that aimed to gather videogame text into one uniform place (think in-game notes, books, letters, audio recordings etc). This was because I found that I was too engaged in finishing a quest or killing a monster to take the time to read it, and while wikis existed they can sometimes be tedious to navigate. Ultimately the website has gone through a few iterations since 2016. Most of the original design was stripped away in favor of shifting to an eBook repository and the database that has each individual entry is now private.

However, I've recently started reading through a few academic papers on Natural Language Processing applied to videogames (it's a rather small domain). I realized that there is a lack of easily accessible text and I’ve been sitting on a data set for the past few years that just needed to be formatted and released.

Datasets

I've gone ahead and released the full data set in json format to github. This repository, at the time of release, contains a slew of different game series (see full list below). Each videogame has it's own README which details what data has been collected, what kind of quirks, and the degree of sanitization.

Videogame Series List

  • Assassin's Creed
  • Baldur's Gate
  • Battlefield
  • Crysis
  • Dead Space
  • Destiny
  • Deus Ex
  • Diablo
  • Doom
  • Dragon Age
  • Dying Light
  • Fable
  • Fallout
  • Gears of War
  • Horizon Zero Dawn
  • Kingdoms of Amalur
  • Mass Effect
  • Metroid Prime
  • Middle-Earth
  • Nier
  • Red Dead Redepmtion
  • Resident Evil
  • Star Wars: The Old Republic
  • System Shock
  • The Divison
  • The Elder Scrolls
  • The Last of Us
  • The Witcher
  • Tomb Raider
  • Watch Dogs
  • World of Warcraft

Hopefully this can help make someone's research just a little bit easier. I will continue to update the repository in the future as I update LibraryofCodexes with new games.

AWS GenAI LIVE image

Real challenges. Real solutions. Real talk.

From technical discussions to philosophical debates, AWS and AWS Partners examine the impact and evolution of gen AI.

Learn more

Top comments (0)

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay