Deploying Our Instrumented ML App to Kubernetes
Welcome to Part 3! If you’ve followed along so far, by the end of Part 2 you had:
- A FastAPI-based machine learning app
- Instrumented with OpenTelemetry for full-stack observability
- Dockerized and ready to ship
Now, it's time to bring in the big orchestration guns — Kubernetes.
Understanding Kubernetes Deployment & Service
Before we throw YAML at a cluster, let’s understand what these two crucial building blocks do:
Deployment
A Deployment in Kubernetes manages a set of replicas (identical Pods running our app). It provides:
- Declarative updates: You describe what you want, K8s makes it so.
- Rolling updates: Smooth upgrades without downtime.
- Self-healing: If a Pod dies, K8s spins up a new one.
Think of it as a smart manager for your app's pods.
Service
A Service exposes your app inside the cluster (or externally, if needed). It:
- Provides a stable DNS name.
- Load balances traffic between pods.
-
In our case, exposes:
- Port
80
→ App port8000
(FastAPI HTTP) - Port
4317
→ OTLP gRPC (Telemetry)
- Port
Kubernetes Manifest Breakdown
Let’s break down the configuration:
Deployment: house-price-service
apiVersion: apps/v1
kind: Deployment
metadata:
name: house-price-service
We declare a Deployment that manages our app.
spec:
replicas: 2
We want 2 replicas of our app running — high availability for the win.
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
Kubernetes will update pods gracefully. It allows some extra pods during rollout and ensures some stay alive.
containers:
- name: app
image: house-price-predictor:v2
We use the Docker image built in Part 2, deployed as a container.
ports:
- containerPort: 8000 # App port
- containerPort: 4317 # OTLP telemetry port
Complete Deployment Manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: house-price-service
labels:
app: house-price-service
spec:
replicas: 2
revisionHistoryLimit: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25% # Allow 25% more pods than desired during update
maxUnavailable: 25% # Allow 25% of desired pods to be unavailable during update
selector:
matchLabels:
app: house-price-service
template:
metadata:
labels:
app: house-price-service
spec:
containers:
- name: app
image: house-price-predictor:v2
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: "10m"
memory: "128Mi"
limits:
cpu: "20m"
memory: "256Mi"
ports:
- containerPort: 8000 # Application Port
- containerPort: 4317 # OTLP gRPC Port
Service: house-price-service
apiVersion: v1
kind: Service
metadata:
name: house-price-service
labels:
app: house-price-service
This ClusterIP Service lets other K8s workloads communicate with our app.
ports:
- port: 80
targetPort: 8000
- port: 4317
targetPort: 4317
The Service maps:
- Port
80
→ App HTTP server - Port
4317
→ For OTLP spans, metrics, logs
Complete Service Manifest File:
apiVersion: v1
kind: Service
metadata:
name: house-price-service
labels:
app: house-price-service
spec:
selector:
app: house-price-service
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8000
- name: otlp-grpc
protocol: TCP
port: 4317
targetPort: 4317
type: ClusterIP
Add both in the one file: house-price-app.yaml
Deploying with kubectl
Before deploying the app, let's create a Kubernetes namespace. This helps group related resources together.
kubectl create namespace mlapp
Run the following to deploy your app:
kubectl -n mlapp apply -f house-price-app.yaml
To check the deployment status:
kubectl -n mlapp get deployments
kubectl -n mlapp get pods
To see pod logs (structured JSON + OpenTelemetry info):
kubectl -n mlapp logs -f -l app=house-price-service
To view the exposed service:
kubectl -n mlapp get svc -l app=house-price-service
Testing the App in Kubernetes
Get the Endpoint IP from the K8s service:
API_ENDPOINT_IP=$(kubectl -n mlapp get svc -l app=house-price-service -o json | jq -r '.items[].spec.clusterIP')
Test it locally using curl
or Postman:
curl -X POST "http://${API_ENDPOINT_IP}:80/predict/" \
-H "Content-Type: application/json" \
-d '{"features": [1200]}'
You should get a prediction response like:
{"predicted_price": 170000.0}
And voilà — telemetry data is flowing.
What’s Next: Meet the OpenTelemetry Collector
In Part 4, we’ll introduce the OpenTelemetry Collector Agent:
- Deploy it as a DaemonSet alongside your app
- Configure it to collect traces, metrics, and logs
- Route the data to a gateway, and onward to backends like Prometheus, Jaeger, and Loki
TL;DR: It’s where the real observability magic begins.
{
"author" : "Kartik Dudeja",
"email" : "kartikdudeja21@gmail.com",
"linkedin" : "https://linkedin.com/in/kartik-dudeja",
"github" : "https://github.com/Kartikdudeja"
}
Top comments (0)