DEV Community

MLOps Community

Feature Store at Shopify and Skyscanner // Matt Delacour and Mike Moran // Reading Group #4

MLOps Reading Group meeting on February 11, 2022  

Reading Group Session about Feature Stores with Matt Delacour and Mike Moran  

--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Connect with us on LinkedIn: https://www.linkedin.com/company/mlopscommunity/
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/

Timestamps:
[00:05] Matt's intro
[00:26] Mike's intro
[01:09] Matt’s talk: Feature store system at Shopify
[01:45] What is Shopify?
[02:05] Shopify Use Case
[02:38] Choosing a solution
[03:19] Managed service vs In-house vs Open-source (Feast)
[06:01] Why did we choose Feast?
[11:25] Implementation Strategy (multi-repo vs mono-repo approaches)
[13:01] Mono-repo approach breakdown
[14:30] Internal SDK
[17:01] Q&A: Does feast satisfy scalability for online inference of Shopify latency requirements?
[19:05] Q&A: Do you rely on Feast to serialize data to the online store?
[20:13] Q&A: Is your mono-repo library a subset of Feast?
[21:18] Q&A: Did you consider using git submodules for a multi-repo?
[23:02] Q&A: Are you storing embeddings with Feast?
[24:30] Q&A: Regarding the mono-repo, which modules are responsible for feature engineering? How do you guarantee that different feature engineering can be used across many DS?
[27:58] Mike’s talk (Feature store at Skyscanner)
[28:08] Kaleidoscope System
[28:25] Background and context of the Feature store
[29:30] Initial state of the feature store
[30:13] How does the marketing team also leverage the feature store
[31:04] Current state of the feature store (marketing & machine learning)
[31:44] SDK approach of creating schemas with dataframes (easy access)
[32:16] Reusability across teams among marketing and DS team
[33:06] GDPR constraints
[33:34] Data updates at the feature store
[36:09] Q&A: When a DS updates a feature, how are you communicating that across teams?
[38:25] Q&A: Are you applying different levels of feature engineering to increase the likelihood of a DS going back to a previous checkpoint of processing?
[40:55] Q&A: In what languages are you implementing the feature store?
[44:28] Q&A: Regarding performance-wise, how do you decide what code remains in Apache Spark vs SQL?
[49:00] Wrap-up

Episode source