Ok, so you don't have that many options for courses with just four hours available. I'd recommend you to look into some algorithms and data structures, how and why to avoid in memory workloads and how to use streaming data. You can find a lot of resources online and be ready to read a lot of tutorials.
Personally, I'd go with standard python for the task. You can use the csv module to load and write your CSVs and is a bit faster than pandas, you can do streaming insert which I think is great.
What I would do:
use as much logging as you can. This will save your day.
There is a really nice up and coming project called bonobo project (bonobo-project.org/). I have been watching this one really closely.
I love Pandas for ETL, but I really like the graphing and visualization that bonobos provides.
No course to reference but this was a good read for me.
To answer your first question: I would recommend Introduction to CS and programming using Python on Edx is free, and also Python courses on Teamtreehouse paid membership but free for 30days.
There are quite a few good channels on YouTube personally I love Socratica but look around and see which one you like the most.
Books I don't think I ever picked one up for python so can't help you there.
Though I guess effectively working with it will take some more time than 4 hours, if you are facing problems with datasets that do not fit in memory, pyspark might be a way to go. If you are really interested in this direction, udacity has "Data Engineering" nanodegree. I can not recommend the degree without reservations, nevertheless, with additional studies about surrounding topics I learned quite a bit.
got any advice here?
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.