1minMLOps #1 : What is MLOps and why should you care?

#tutorial #ai #mlops #machinelearning

If you've ever trained a beautiful model in a Jupyter notebook, watched the metrics shine, and then realized you have no idea how to actually put it in front of users, congratulations: you've just discovered why MLOps exists.

In this series, we are going to walk together from a notebook to a fully deployed, monitored and self-retraining ML system, one tiny step at a time. But before we write any code, let's get the foundations straight

So, what is MLOps?

MLOps (short for Machine Learning Operations) is the set of practices, tools and culture that lets you ship machine learning models to production reliably and repeatedly. Think of it as DevOps' younger sibling: same spirit (automation, reproducibility, monitoring), but adapted to the weirdness of ML, where your code is not the only thing that changes, your data changes, your model changes, and the world your model lives in changes too

A useful way to picture it is the ML lifecycle:

Data collection & versioning — where does the data come from, and which version did we train on?
Experimentation — which features, which model, which hyperparameters?
Training & evaluation — does it actually work, and is it better than what we had?
Packaging — wrap the model in something deployable
Deployment — serve predictions to real users (batch or real-time)
Monitoring — is it still working? Did the data drift?
Retraining — close the loop and start again

Traditional software has steps 4–6. ML has all seven, and steps 1–3 keep coming back to haunt you

Why "it works on my machine" is worse in ML

In classical software, if your code runs locally, it has a decent chance of running in production. In ML, that's a trap, because the model's behavior depends on three moving things, not one:

Code: the training script, the preprocessing, the inference logic
Data: the exact dataset (and its version) you trained on
Environment: Python version, library versions, CUDA versions, OS

Change any of these three and your "great model from Tuesday" becomes "mysterious garbage on Friday" This is why ML teams need stricter versioning, tracking and packaging discipline than most web teams.

What problems does MLOps actually solve?

Concrete pains you'll feel without MLOps, and that we'll fix in this series:

"Which dataset gave us that 0.94 F1 score? Nobody remembers."
"The model works locally but crashes in the Docker container."
"We retrained the model and accuracy dropped, but we can't roll back."
"Production is silently degrading and we noticed two weeks later."
"Every deploy is a hand-crafted artisanal disaster."

Each of these has a tool and a workflow that solves it, and we are going to meet them(almost) one by one

The MLOps stack we'll build

Here's a sneak peek of the tools we'll touch in the next articles:

DVC for data versioning
MLflow for experiment tracking and the model registry
FastAPI for serving
Docker for packaging (we'll lean a bit on Clelia's 1minDocker series here)
GitHub Actions for CI/CD
Evidently for monitoring data and model drift (we can use prometheus and grafana too)
A cloud provider (we'll pick one later) for actually deploying it all

Don't worry if some of these names sound intimidating, we'll introduce them gently, one per article, and always with a working example.

What you need to follow along

Nothing fancy:

Python 3.10+
git installed
A GitHub account
Docker installed (highly recommend to follow this series https://dev.to/astrabert/1mindocker-1-what-is-docker-3baa)
A laptop and ~1 minute per article 😉

In the next article, we'll get our hands dirty: we'll take a small dataset, version it with DVC, and finally answer the question "which data did we train on?" without crying

Stay tuned and have fun!