The war isn't going anywhere for now, so every couple of days I have to do the following steps to update Russian losses tracker:
- download zip from Kaggle
- optionally, verify that data looks right with
git diff, as occasionally there's a typo which makes losses go backwards (it happened only a few times, and it always gets corrected in the next update)
The annoying part is that Kaggle requires me to be logged in in a browser to download data, so I can't just replace that step with a
So let's try to improve this flow a bit.
You need to create an account on Kaggle.
Then go to your account settings by clicking on top right icon, and selecting Account (
There's "Create New API Token" button, which will create new account token, and download it as
~/.kaggle folder, and save that file to
Kaggle will complain if you don't secure the file so run this:
chmod 0600 ~/.kaggle/kaggle.json
If you have Python3 installed, you just need to do
pip3 install kaggle
User name and ID of the data set are in the URL, so to download
https://www.kaggle.com/datasets/piterfm/2022-ukraine-russian-war you need to run:
$ kaggle datasets download piterfm/2022-ukraine-russian-war
It will save it as
2022-ukraine-russian-war.zip. There are extra options like where you want to download it, or unzipping it etc.
Now I can automate the whole process:
$ kaggle datasets download piterfm/2022-ukraine-russian-war $ ./update_csv 2022-ukraine-russian-war.zip $ trash 2022-ukraine-russian-war.zip $ git add -u $ git ci -m 'Data Update' $ git push
And since it's just a series of commands, I can even make it run automatically every day, without any intervention.
I could also add some kind of data checks to the process, so if there's anything weird like numbers going backwards, it would stop the update and wait for the next day. But overall, I'm happy with how it all ended up.