I have been running ML pipelines in production for few years. Tens of millions of predictions a day, real money on the line, no tolerance for guesswork.
PulseFlow started as something I built for myself. A reference architecture I kept recreating from scratch at every company because nothing open source matched what production actually demands.
Today I packaged it, published it to PyPI, and put a live demo on Hugging Face. Here is what it covers and how to run it in under ten minutes.
What PulseFlow is
A production-grade MLOps pipeline you can clone and run immediately. Not a tutorial. Not a toy dataset. A real stack.
pip install pulseflow-mlops
Five components wired together:
- ETL pipeline: ingestion and preprocessing with Pandas and SQLAlchemy
- Training pipeline: model training with MLflow experiment tracking
- Deployment service: FastAPI microservice for real-time inference
- Orchestration: Apache Airflow DAGs for end-to-end automation
- Full Docker Compose stack: one command to run everything
The architecture
Every enterprise ML system I have built follows the same pattern. Raw data in, predictions out, everything in between observable and reproducible.
Raw Data → ETL → Feature Store → Training → MLflow Registry → FastAPI → Clients
↑
Airflow Scheduler
PulseFlow makes this concrete with actual code, not diagrams.
Run it locally in four commands
git clone https://github.com/anilatambharii/PulseFlow.git
cd PulseFlow
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
Then run each stage:
python etl/data_ingestion.py
python etl/data_preprocessing.py
python training/train_model.py
uvicorn deployment.app.main:app --reload
MLflow logs to ./mlruns locally. No server required. If you want the full UI:
mlflow ui --port 5000
Or bring up the complete stack:
docker-compose up --build
Why I built this as open source
Three reasons.
First, I kept seeing junior engineers spend weeks building pipeline scaffolding that should take days. PulseFlow collapses that to a git clone.
Second, enterprise ML has a credibility problem with open source. Most OSS ML projects are notebooks or toy pipelines. PulseFlow is the kind of code I would put in front of a Duke Energy production environment.
Third, I am building ARGUS-AI alongside this. ARGUS is an LLM observability platform that evaluates every model output across six dimensions: Groundedness, Accuracy, Reliability, Variance, Inference Cost, Safety. PulseFlow is what you run your models through. ARGUS is how you know they are not degrading in production.
They compose. PulseFlow trains and serves. ARGUS monitors and evaluates.
What is in the repo
PulseFlow/
├── etl/ # Data ingestion and preprocessing
├── training/ # Model training with MLflow tracking
├── deployment/ # FastAPI inference service
├── airflow/ # Orchestration DAGs
├── models/ # Model artifacts
├── ci_cd/ # GitHub Actions workflows
├── docker-compose.yml # Full stack in one command
└── pyproject.toml # pip install pulseflow-mlops
Live demo on Hugging Face
You can run the full ETL, training, and inference pipeline without installing anything:
PulseFlow MLOps Demo on Hugging Face Spaces
Three tabs. Load sample data, configure hyperparameters, run inference against the FastAPI endpoint simulation. All in the browser.
The production gap no one talks about
Most MLOps content stops at "train a model and log it to MLflow." That is maybe 20 percent of what production demands.
The other 80 percent:
- What happens when your data source schema changes at 2 AM?
- How do you roll back a model that passed validation but is failing on live traffic?
- Who gets paged when inference latency exceeds SLA?
- How do you prove to your compliance team that the model version in production matches what was approved?
PulseFlow gives you the structural patterns to answer all of these. It does not answer them for you because every organization's answers are different. But it gives you the right skeleton.
What I am adding next
- LangChain integration for LLM pipeline orchestration
- ARGUS-AI integration for automatic G-ARVIS scoring on inference outputs
- Kubernetes deployment manifests (production-grade, not tutorials)
- Prometheus metrics endpoint on the FastAPI service
Connect
- GitHub: github.com/anilatambharii/PulseFlow
- PyPI: pypi.org/project/pulseflow-mlops
- ARGUS-AI (the observability layer): github.com/anilatambharii/argus-ai
- Hugging Face: huggingface.co/AmbhariiLabs
- LinkedIn newsletter Field Notes: Production AI: linkedin.com/in/anilsprasad
If you are building ML systems in production and running into the gaps PulseFlow addresses, reach out. This is open source because I want it to be the reference architecture the community builds on.
28 years of production AI. All opinions are mine. All lessons were expensive.
HumanWritten #ExpertiseFromField
---
## Step 4 — Publish settings
- **Series:** Leave blank for now (or create "Ambharii Labs Open Source" series later)
- **Schedule:** Publish immediately — Tuesday 9 AM ET is ideal but today is fine given the momentum
- Click **Publish**
---
## Step 5 — After publishing, copy the URL and do these immediately
Top comments (0)