DEV Community

MLOps Community

MLOps Meetup #25 // Python and Dask: Scaling the DataFrame // Dan Gerlanc - Founder of Enplus Advisors

Python's most popular data science libraries—pandas, numpy, and scikit-learn—were designed to run on a single computer, and in some cases, using a single processor. Whether this computer is a laptop or a server with 96 cores, your compute and memory are constrained by the size of the biggest computer you have access to.

In this course, you'll learn how to use Dask, a Python library for parallel and distributed computing, to bypass this constraint by scaling our compute and memory across multiple cores. Dask provides integrations with Python libraries like pandas, numpy, and scikit-learn so you can scale your computations without having to learn completely new libraries or significantly refactoring your code.

Daniel Gerlanc has worked as a data scientist for more than decade and written software professionally for 15 years. He spent 5 years as a quantitative analyst with two Boston hedge funds before starting Enplus Advisors. At Enplus, he works with clients on data science and custom software development with a particular focus on projects requiring expertise in both areas. He teaches data science and software development at introductory through advanced levels. He has co-authored several open source R packages, published in peer-reviewed journals, and is active in local predictive analytics groups.

Join our slack community: https://join.slack.com/t/mlops-community/shared_invite/zt-391hcpnl-aSwNf_X5RyYSh40MiRe9Lw
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://zoom.us/webinar/register/WN_a_nuYR1xT86TGIB2wp9B1g

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Cris Sterry on LinkedIn: https://www.linkedin.com/in/chrissterry/
Connect with Daniel on LinkedIn: https://www.linkedin.com/in/dgerlanc/



Episode source