Introduction
Looking at reverse engineering game files from the french mmorpg Dofus
In the previous article talking about d2i I had not mentioned the thought process on how I reversed the file and understood its structure. I thought this could interest a few people.
This is the thought process behind this previous article: http://medericburlet.com/dofus-d2i-reverse-engineering/
About d2i files
Having done some work in the emulation of the game Dofus as well as the bot side of it I knew that information such as weapon names, dialogues, maps, musics etc were stored in local files.
I knew that d2i files were the files responsible for the npc dialogues, thus I decided to try to understand them. You can follow this tutorial with the same file I used which can be found here: i18n_fr.d2i
Pattern Recognition
Knowing we are working with a video game, the data must be stored in an easy way to read otherwise there would be no point having them locally. After all the principle of local game resources is to speedup loading times.
My first approach was to open the file in atom, maybe with a bit of luck it would be something readable like JSON. And it turns out part of the file was readable. Lucky us that meant that there was no encryption on the data.
However some of the lines were completely not readable.
My next step was opening it into a Hex editor since the jumbled text meant it wasn't meant to be a readable text format.
Upon opening the file into a Hex Editor something caught my eye straightaway!
There was a pattern at least a visible one! I did not know what it represented nor how to interpret it but at least there was hope.
We can clearly see some repetition on the right panel and if we look at the Hex data we can see a lot of the 2 bytes groups ending in 00 or 00001.
Making sense of the Hex data
So I first decided to look at the beginning of the file and noticed that the beginning of the first string was after a group of 4 bytes and 2 bytes.
The number seems very "lucky" as this could easily be 3, 2 bytes integer or 1, 4 byte integer and one 2 byte integer.
While still looking at the beginning of the file I noticed something else that was interesting. The first string in the file is written twice (red) and we notice in the group of 4 bytes that make the beginning of the string they both have the first 2 unreadable bytes (blue)
0022 converted to int gives us 34 which coincidentally is the length of the string. Looking further we can see that every string starts with a 2 bytes integer that specifies the length of the string.
If we look at the first 4 bytes as an int we have the number: 26343905 (0191F9E1)
If we select all the bytes until we arrive the end of all the string listing we get the same number. So the first 4 bytes of a d2i file are representative of the size of the string data.
If we look at the next 4 bytes as an int they make: 1957787 (001DDF9B) This leads us to the beginning of another listing of string
Hex pointers . . . Wait what???
Yes now we are going to the boring or interesting stuff. If we go back to the end of our first string listing we notice a pattern in the hex data. The first 4 bytes group is such a perfect set: 00000001
If we look closer we see a pattern emerge:
We see here that we have a value getting incremented same principle as an ID this would make sense since those files are data storage for the game.
We also notice the 01 that is repeating after each ID. until the next ID we have another 8 bytes we can easily see that they are also two numbers:
4 (00000004) and 40 (00000028)
From there I just followed my gut for ID 01 we get the number 4 which was the position of the first string in our file. The position is 4 if we count the first 4 bytes which are the size of the data. The second number 40 if we look at the 40th byte from the start we see its the same string as the first one but without capital letters or accents.
From this I deducted the following format which is explained in my previous blog post: http://medericburlet.com/dofus-d2i-reverse-engineering/
ID (Orange), diacritic exists? (cyan), string pointer (brown), diacritic pointer (blue)
I then checked with the second ID see if it corresponded to the second string and it did. from there I wrote a simple parser which then became a fully fledged reader: https://github.com/crimson-med/FastD2IReader
Conclusion
This was a very fun and entertaining challenge which forced me to use logic as well as understanding my environment and following my gut. It was important to keep in mind that the file came from a game and thus reading time was as important as accessibility.
crimson-med / FastD2IReader
A Fast .d2i (D2I) Dofus file reader based from reverse engineering the file.
FastD2IReader
Having reversed engineered the .d2i files from Dofus for fun, I decided to make a simple reader for them.
This is based on the 2.10 version.
There is now a TypeScript Version available: https://github.com/crimson-med/d2i-reader
How to use
Dim MyReader As New FastReader("MyFile.d2i", True)
MyReader.GetText(41903)
MyReader.Dispose()
Normal Load VS Fast Load
Normal Load:
Fast Load:
Fichiers D2I
Introduction
Le format D2I est un format utilisé par Ankama pour stocker des chaînes de caractères (string) comme par exemple les noms d’items ou dialogues et plus. Ce fichier varie en fonction de la langue mais la structure reste la même.
La Structure
Le Fichier
Le fichier est lui composé en 4 majeures parties:
- Les Datas
- Les Indexes
- Les UI Messages
- de l'extra data
Chacune des ces parties sont composées d'un Index (4 bytes) donnant la taille des données qui suivent hormis l'extra data.
Les Datas
…Source: http://medericburlet.com/reverse-engineering-gaming-files-d2i-from-dofus/
Top comments (10)
yo dude this is so dope. been trying to this since I was a kid. you know the game diablo the first one. we have disket so we can move to other pc. I knew theres a way to get rare items by copy paste some parts of the code. damn! teach me your ways
Hey man, call me senpai! (joking).
I'm glad that you enjoyed the post took me quite a while to retake all the screenshots since I had lost the previous ones hahaha. I have never looked at Diablo but that could be a fun challenge to do!
I noticed the #security tag isn't very active so I'm gonna post more and try to present more articles like this.
Senpai!! tried doing the same thing. I was young and had no experience. I was like 16 at that time. I notice the owner of the computer shops character is so vamp. That's what he does. He copies and pastes some code into his game file. that's why he wants to keep the disket for "safe keeping".
btw opened the github account you used visual basic right? Thought ur doing it with c or c++.
thanks senpai!
Haha yeah I remember being shocked and lost by code when I was younger 😂
Yes for the reader I used VB.net since the reader can be incorporated with emulator and bots. Most emulator and bots for Dofus use .Net so it would easily be ingegrated.
Nice! I liked it, very informative. Let me suggest to “rewrite” the lucky header paragraph as it seemed to me a bit confusing and misleading. Maybe “three 2 bytes” to make it clear you’re referring to a group of 2 bytes pattern? Also it’s just a bit confusing when you refer to the lucky header pattern using an image showing a dword column grid (for anyone else reading, a “word” is a group of two bytes together and the picture shows a dOUBLEword grid. Two “nibbles” together form a byte and each nibble is every single number from 0-9 and letters from A-F which is an 8bit representation. Which means this progression: 1 bit -> 4bits(nibble or half a byte) -> 8 bits (1 byte, from 0 to F) -> 2 bytes (from 00 to FF..) -> 4 bytes (a “word”) -> 8 bytes (dword or double word) -> bigger words are usually named after its bit count, like 64bit word or 256 bit word but are not that common... After all a binary file is a stream of bytes and this is only a way for us to make sense of it. The column organization can be changed at anytime by any hex editor as it is only visual and it is not hard coded or anything. Just convenient depending on what type of data structure your are looking at, like for example this d2i file where there is no fixed size data but depends on strings length. A smaller grid division would make it a bit harder to spot the pattern. Sometimes your prefer a single byte grid when working with embedded electronics as data size is shorter. Congrats 👍🏻
Hey man thanks for the input I will defo make an edit during the day I totally agree that that part is a little hazy. It's my first time writing a lengthy article and got a little lost in the midst of it a little I will try my best to make clearer and interesting articles.
Thanks to your job ! We waiting d2o file :p
I agree with you ;)
Nice Job!
Thank you! It was a lot of fun!