DEV Community

Cover image for The Elder Scrolls V: Skyrim Special Edition - Analysis of Dialogues
Haider Ali Punjabi
Haider Ali Punjabi

Posted on

The Elder Scrolls V: Skyrim Special Edition - Analysis of Dialogues

Read this on my blog

Background

After my last visualisation of Harry Potter data, I decided to use some other data to create word clouds. I am a huge Skyrim fan and always wanted to learn how to use xEdit Scripts. As a result, here I am with another Word Cloud.

Disclaimer: A lot of the code is shared between my previous project and this one

Visualisations

WordCloud of Most Occurring Words in Dialogues of Skyrim Special Edition
Graph of DLC contribution towards dialogues

Getting the data

I don't like opening my Windows installation (I have a dual boot setup, and use Manjaro mainly), and looked around the internet for some sort of data dump of Skyrim Dialogues. Unfortunately, I couldn't find any and then decided to extract the data myself. I had recently formatted my Windows Partition so had to reinstall the game. It also provided the benefit that no mods would pollute the data. (I had over 150 mods before the format). I downloaded the latest xEdit and used the Export dialogues.pas script that comes with it to export all the dialogues. (It took me 22:05 minutes).

I am going to look into other data I can extract this way, and maybe make some other stuff.

Processing the data

In the CSV, there were two columns of data I was interested in RESPONSE TEXT & TOPIC TEXT. Response Text was the larger one, with over 40k unique dialogues. Topic Text had only around 5.5K unique dialogues and also needed some additional processing. Topic Text contained some game constants such as RoomCost , HorseCost, and other prices, which had to be filtered out. I did all that in csv_to_json.py.

Counting the Words

Like the previous visualisation, I used nltk's stopwords corpus, along with a modified version of 20k most common words by Google. Interestingly, the modifications I did for Harry Potter were valid for Skyrim as well, because there is no dialogue with names like Harry, Ron, Arthur, etc and they share words like vampires, magic, etc.

I counted both RESPONSE TEXT & TOPIC TEXT data separately and then merged them into a single file count.json

Additional Tip: progress is a great Python Package to show progress in your scripts.

Making the WordCloud

I used pretty much the same process as the last visualisation. I changed the maximum font size to depict the variation properly and used a custom font this time.

To make the WordCloud, I used the wordcloud package. For the mask, I used Skyrim Logo Vector. For the font, I used Sovngarde font.

Making the Graph

I initially planned on making a set of graphs from the data, but wasn't able to due to two reasons:

  1. Some of the data was weird. Argneir had the highest dialogue count due to dialogues of many NPCs (including General Tullius, I think) is assigned to him.
  2. Some of the data doesn't; produce interesting visualisations. Nords have the highest dialogue count, and after the difference between the first few races and the remaining is so huge that a lot of the races aren't visible.

Since I had already made this, I thought of sharing it here, in case someone is interested in the image or its code.

Future Plans

I will look into making custom scripts (if someone already has them, do share it with me) to extract other interesting data from Skyrim and see what I can do with them.

References:

Top comments (0)