Though I guess effectively working with it will take some more time than 4 hours, if you are facing problems with datasets that do not fit in memory, pyspark might be a way to go. If you are really interested in this direction, udacity has "Data Engineering" nanodegree. I can not recommend the degree without reservations, nevertheless, with additional studies about surrounding topics I learned quite a bit.