DEV Community

MLOps Community

Fixing Your ML Data Blind Spots // Yash Sheth // MLOps Coffee Sessions #102

MLOps Coffee Sessions #102 with Yash Sheth, Fixing Your ML Data Blindspots co-hosted by Adam Sroka.  

// Abstract
Improving your dataset quality is absolutely critical for effective ML. Finding errors in your datasets is generally a slow, iterative, and painstaking process.    

Data scientists should be proactively fixing their model’s blindspots by improving their training data. In this talk, Yash discusses how Galileo helps data scientists identify, fix, and track data across the entire ML workflow.  

// Bio
Co-founder and VP of Engineering. Prior to starting Galileo, Yash spent the last decade working on Automatic Speech Recognition (ASR) at Google, leading their core speech recognition platform team, that powers speech-to-text across 20+ products at Google in over 80 languages along with thousands of businesses through their Cloud Speech API.  

// MLOps Jobs board  
https://mlops.pallet.xyz/jobs

MLOps Swag/Merch
https://www.printful.com/

// Related Links
Website: https://www.rungalileo.io/
Trade-Off: Why Some Things Catch On, and Others book by Kevin Maney:
https://www.amazon.com/Trade-Off-Some-Things-Catch-Others/dp/0385525958

--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Adam on LinkedIn: https://www.linkedin.com/in/aesroka/
Connect with Yash on LinkedIn: https://www.linkedin.com/in/yash-sheth-72111216/

Timestamps:
[00:00] Introduction to Yash Sheth
[02:53] Takeaways
[04:35] Why unstructured data?
[06:59] Fitting in the workflow
[10:56] Digging into the different pains
[18:33] Vision around the democratization of machine learning
[24:41] Unstructured data problem
[25:59] Galileo handling unified tools
[27:31] Calculus for ML
[28:55] Gatekeep
[30:03] Synthetic data in the unstructured data world of Galileo
[33:20] Tips for data scientists that have unstructured data but with a small data set
[35:10] Benefits of users from Galileo
[37:25] Business case for dummies
[42:46] War stories
[45:03] Rapid fire questions
[51:06] Wrap up

Episode source