π― Introduction
As someone diving into the world of MLOps and DevOps, I recently completed my first end-to-end machine learning operations project. This project took me through the complete journey of building a diabetes prediction model and deploying it to Kubernetes. Here's what I learned and the best practices I followed along the way!
GitHub Repository: https://github.com/amco-f22/first-mlops-project
π Project Overview
This project implements a complete MLOps pipeline for a diabetes prediction model using:
- Machine Learning: Random Forest Classifier on the Pima Indians Diabetes Dataset
- API Framework: FastAPI for creating REST endpoints
- Containerization: Docker for packaging the application
- Orchestration: Kubernetes for deployment and scaling
The model predicts diabetes risk based on five key health metrics: Pregnancies, Glucose levels, Blood Pressure, BMI, and Age.
ποΈ Project Structure and Best Practices
1. Clean Code Organization
The project follows a simple, modular structure:
first-mlops-project/
βββ train.py # Model training script
βββ main.py # FastAPI application
βββ requirements.txt # Python dependencies
βββ Dockerfile # Container configuration
βββ k8s-deploy.yml # Kubernetes deployment manifest
βββ diabetes_model.pkl # Trained model (generated)
Best Practice: Keeping training and inference code separate makes the codebase maintainable and allows independent updates to each component.
π¬ Model Training (train.py)
What I Implemented:
# Load dataset from a reliable source
url = "https://raw.githubusercontent.com/plotly/datasets/master/diabetes.csv"
df = pd.read_csv(url)
# Feature selection
X = df[["Pregnancies", "Glucose", "BloodPressure", "BMI", "Age"]]
y = df["Outcome"]
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Model training
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Model persistence
joblib.dump(model, "diabetes_model.pkl")
Key Learnings:
β
Reproducibility: Using a fixed random_state=42 ensures consistent results across runs
β
Model Serialization: joblib is efficient for saving scikit-learn models, preserving the entire model state
β Feature Engineering: Selected only relevant features to keep the model simple and interpretable
β Data Source Management: Using a hosted dataset URL makes the project portable and easy to reproduce
π API Development (main.py)
FastAPI Implementation:
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI()
model = joblib.load("diabetes_model.pkl")
class DiabetesInput(BaseModel):
Pregnancies: int
Glucose: float
BloodPressure: float
BMI: float
Age: int
@app.get("/")
def read_root():
return {"message": "Diabetes Prediction API is live"}
@app.post("/predict")
def predict(data: DiabetesInput):
input_data = np.array([[
data.Pregnancies,
data.Glucose,
data.BloodPressure,
data.BMI,
data.Age
]])
prediction = model.predict(input_data)[0]
return {"diabetic": bool(prediction)}
Best Practices Followed:
β Input Validation: Pydantic models automatically validate request data and provide clear error messages
β Type Safety: Explicit type hints improve code readability and catch errors early
β
RESTful Design: Separate endpoints for health checks (/) and predictions (/predict)
β
Auto-Documentation: FastAPI generates interactive API docs at /docs automatically
π³ Containerization (Dockerfile)
Docker Configuration:
FROM python:3.10
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Key Best Practices:
β
Minimal Base Image: Using python:3.10 provides a balance between size and functionality
β
Working Directory: WORKDIR keeps the container filesystem organized
β
Port Exposure: Binding to 0.0.0.0 makes the API accessible outside the container
β Layer Optimization: Copying requirements first would enable better caching (future improvement!)
Commands Used:
# Build the image
docker build -t diabetes-prediction-model .
# Run locally
docker run -p 8000:8000 diabetes-prediction-model
# Test the API
curl http://localhost:8000/
βΈοΈ Kubernetes Deployment
Deployment Manifest (k8s-deploy.yml):
apiVersion: apps/v1
kind: Deployment
metadata:
name: diabetes-api
labels:
app: diabetes-api
spec:
replicas: 2
selector:
matchLabels:
app: diabetes-api
template:
metadata:
labels:
app: diabetes-api
spec:
containers:
- name: diabetes-api
image: aman202004/diabetes-api:latest
ports:
- containerPort: 8000
imagePullPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
name: diabetes-api-service
spec:
selector:
app: diabetes-api
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: LoadBalancer
MLOps Best Practices:
β
High Availability: replicas: 2 ensures the service stays available if one pod fails
β Label Management: Consistent labels connect Deployments, Pods, and Services
β Service Abstraction: LoadBalancer Service provides a stable endpoint for clients
β
Image Policy: imagePullPolicy: Always ensures latest version is deployed
β Port Mapping: Clear separation between service port (80) and container port (8000)
Deployment Commands:
# Deploy to Kubernetes
kubectl apply -f k8s-deploy.yml
# Check deployment status
kubectl get deployments
kubectl get pods
kubectl get services
# Test the API (after getting external IP)
kubectl port-forward service/diabetes-api-service 8000:80
π§ͺ Testing the API
Sample Request:
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"Pregnancies": 2,
"Glucose": 130,
"BloodPressure": 70,
"BMI": 28.5,
"Age": 45
}'
Response:
{
"diabetic": true
}
π What I Learned
Technical Skills:
- End-to-End MLOps Workflow: Understanding the complete pipeline from training to deployment
- API Development: Creating production-ready REST APIs with FastAPI
- Containerization: Packaging ML applications for consistent environments
- Kubernetes Fundamentals: Deploying and managing containerized applications
- Model Serialization: Properly saving and loading ML models for production
DevOps Practices:
- Reproducibility: Using virtual environments and requirements.txt
- Version Control: Tracking all code and configurations in Git
- Documentation: Writing clear README files and inline comments
- Infrastructure as Code: Defining infrastructure in YAML manifests
- Container Orchestration: Understanding pods, services, and deployments
Best Practices:
- Separation of Concerns: Keeping training, serving, and deployment separate
- Input Validation: Using Pydantic for robust API contracts
- Health Checks: Implementing basic health endpoints
- Scalability: Using Kubernetes for easy horizontal scaling
- Portability: Docker ensures the application runs anywhere
π Key Takeaways
What Worked Well:
β
FastAPI's automatic documentation saved significant development time
β
Docker eliminated "works on my machine" problems
β
Kubernetes simplified scaling from 1 to N replicas
β
Using a public dataset made the project easily reproducible
β
Simple project structure kept complexity manageable
Areas for Future Improvement:
π Add CI/CD pipeline with GitHub Actions
π Implement model versioning and A/B testing
π Add monitoring with Prometheus and Grafana
π Include automated testing (unit and integration tests)
π Optimize Docker image size using multi-stage builds
π Add model performance metrics endpoint
π Implement proper logging with structured logs
π Add authentication and rate limiting
π οΈ Tools & Technologies Used
| Category | Technology |
|---|---|
| Language | Python 3.10 |
| ML Framework | scikit-learn |
| API Framework | FastAPI, Uvicorn |
| Containerization | Docker |
| Orchestration | Kubernetes |
| Data Handling | pandas, numpy |
| Model Persistence | joblib |
| Validation | Pydantic |
π― Conclusion
This project gave me hands-on experience with the complete MLOps lifecycle. Moving from a Jupyter notebook to a production-ready Kubernetes deployment taught me that MLOps is as much about software engineering practices as it is about machine learning.
For anyone starting their MLOps journey, I highly recommend building a similar end-to-end project. The practical experience of connecting all these tools together is invaluable.
GitHub Repository: https://github.com/amco-f22/first-mlops-project
Feel free to clone, experiment, and extend this project! I'm always open to feedback and suggestions.
π Resources
- Pima Indians Diabetes Dataset
- FastAPI Documentation
- Docker Documentation
- Kubernetes Documentation
- Scikit-learn Documentation
What's your experience with MLOps? Have you built similar projects? Share your thoughts in the comments! π¬





Top comments (0)