You know that moment when you finish training a model? That little spark of excitement? The "this could actually work" feeling?
Then reality hits.
You need to write a Flask app. Dockerize it. Write Kubernetes manifests. Set up CI/CD. Configure monitoring. Get security reviews. Deploy to staging. Wait for approval. Hope it works.
Three weeks later, that spark is gone. You're just tired.
I've been there. At startups, at scale-ups, at enterprises. The story is always the same: brilliant people spending 40% of their time on infrastructure instead of machine learning.
So I'm building the platform I wish I had.
It Starts With a Decorator
python
from mlops import track
@track
def train_churn_model():
# Your actual ML code here
model = train_random_forest(X_train, y_train)
accuracy = test_model(model, X_test, y_test)
return {"model": model, "accuracy": accuracy}
That's it. No manual logging. No setting up experiment tracking. Just train your model.
Then:
bash
$ mlops deploy --env production
One command. From Jupyter notebook to production API.
Why Now? Why Me?
Because I'm tired of the status quo. I've built internal MLOps platforms at multiple companies. Each time we:
Cut deployment time from weeks to hours
Reduced production incidents by 70%
Got data scientists actually excited about shipping models
And each time I thought: "This should exist as open source. Every team doing ML should have this."
So I'm building it. For real this time.
What Makes This Different
This isn't another experiment tracking tool. We have MLflow for that (and we're using it).
This isn't another model registry. We have plenty of those.
This is the glue that actually gets models to production.
Here's what you're getting:
- Real Deployment Not just "save the model file." Actual, production-ready deployments to Kubernetes with:
Health checks
Auto-scaling
Rolling updates
Built-in monitoring
- Actual Monitoring Not just CPU usage. Real ML monitoring:
Prediction latency distributions
Feature drift detection
Model accuracy tracking (when you have ground truth)
Business metric integration
- Sane Defaults I've seen what breaks in production. So this comes with:
Automatic retries on failure
Request timeouts that make sense
Resource limits that actually work
Security settings that won't get you fired
- It's Open Source No "community edition" with half the features missing. No enterprise sales calls. Just code that works.
The Tech Stack (Because Engineers Care)
Backend in Go: Fast, reliable, compiles to a single binary. I've written enough Python microservices to know when to use something else.
Python SDK: Where ML happens. It has to feel natural to data scientists.
Kubernetes: It won the container orchestration war. We're building for reality.
MLflow: Great for experiment tracking. We're integrating, not competing.
Prometheus/Grafana: The monitoring stack that actually gets used.
Who This Is For
Data scientists who want to deploy models without becoming DevOps experts
ML engineers tired of rebuilding the same deployment scripts
Startups that can't afford fancy enterprise MLOps platforms
Enterprises where ML deployment takes longer than model development
Join Me
I'm building this out in the open. Code's going on GitHub as I write it. Decisions are being made in public. There will be bugs. There will be bad decisions. There will be late nights.
But there will also be a working platform at the end of it.
If you've ever:
Spent more time on Docker than on data
Lost sleep over a production model going down
Wished deploying ML was as easy as deploying a website
This is your invitation.
Star the repo. Join the Discord. Open an issue with your pain points. Or just watch from the sidelines and laugh at my mistakes.
Let's fix ML deployment. Together.
Top comments (0)