The Book:
Thinking in Pandas By Hannah Stepanek
Short Summary:
"Thinking in Pandas" is a book written on how to optimize your Pandas code. It starts with an introduction to the Pandas library, the basics of loading/merging, and how Pandas works under the hood. The middle chapters detail the typical things you would use Pandas for and how to optimize those operations. It ends with ways that you can use tools outside of the library for speed and the library's future.
What I liked:
I like the detail that Hannah went into because it helped me understand the library as a whole. It wasn't a cookbook-style book, but more like a course on Pandas and why one thing is better. I also enjoyed the diverse set of optimizations.
What I disliked:
Although the overall book is excellent, there were two things that I wish I could change. First are the code examples. Aesthetically they were tough on the eyes. They could have done a little more work on making it look nice. Second, Hannah doesn't dive into distributed solutions for large datasets. I like to think that most people will use Spark for their distributed solutions, but those interested in using only Pandas can use a library like Dask.
Review round-up:
This book is necessary for any Data Engineer or a Pandas user interested in developing better habits. The book has some shortcomings, but they aren't enough to warrant skipping this book. I used some of the suggestions immediately, and I'm grateful for all the work Hannah put into it.
Rating:
8/10 Python Snakes
Top comments (0)