DEV Community

Srinivasaraju Tangella
Srinivasaraju Tangella

Posted on

MLOps and AIOps for Beginners: Build, Deploy, Monitor, and Scale an ML Model on Kubernetes

Let's build a simple House Price Prediction Model and then see where MLOps and AIOps fit.

Step 1: Business Problem

Suppose a real estate company wants to predict house prices.
Input:
House Size (sqft) Bedrooms
1000 2
1500 3
2000 4
2500 5

Output:
Price

50 Lakhs
75 Lakhs
1 Crore
1.25 Crore
Goal:

House Details

ML Model

Predicted Price

Step 2: Build a Basic ML Model

Using Python and Scikit-Learn:
Python

```from sklearn.linear_model import LinearRegression

X = [
[1000, 2],
[1500, 3],
[2000, 4],
[2500, 5]
]

y = [50, 75, 100, 125]

model = LinearRegression()
model.fit(X, y)

prediction = model.predict([[1800, 3]])
print(prediction)```

What happened?

Training Data

Learning Algorithm

Trained Model

The model learned:

More Size = Higher Price
More Bedrooms = Higher Price

Step 3: Save the Model
Python

```import joblib

joblib.dump(model,"house-price-model.pkl")```

Now we have an artifact:



Think of it like:



```Java Source Code
      ↓
mvn package
      ↓
employee-service.jar```



For ML:



```Training Data
      ↓
Model Training
      ↓
house-price-model.pkl```



**Step 4: Deploy Model as API
Using FastAPI:**



```Python
from fastapi import FastAPI
import joblib

app = FastAPI()

model = joblib.load("house-price-model.pkl")

@app.get("/predict")
def predict(size:int,bedrooms:int):
    result=model.predict([[size,bedrooms]])
    return {"price":float(result[0])}```


Now:



```User
 ↓
REST API
 ↓
ML Model
 ↓
Prediction```



**Step 5: Containerize**

Dockerfile:



```Dockerfile
FROM python:3.11

COPY . /app

WORKDIR /app

RUN pip install -r requirements.txt

CMD ["uvicorn","app:app","--host","0.0.0.0","--port","8000"]```



Build:



```docker build -t house-price:v1 .```



Run:



```docker run -p 8000:8000 house-price:v1```



**Step 6: Deploy to Kubernetes
Deployment:**



```YAML
apiVersion: apps/v1
kind: Deployment
metadata:
  name: house-price
spec:
  replicas: 3
Service:
YAML
apiVersion: v1
kind: Service
metadata:
  name: house-price```



Now:



```Client
   ↓
Service
   ↓
Pods
   ↓
ML Model```



At this point we enter the MLOps world.
Where MLOps Starts
Most beginners think:




```Model Built
   ↓
Job Done```


Reality:

``|Model Built
   ↓
Deploy
   ↓
Monitor
   ↓
Retrain
   ↓
Version
   ↓
Govern```

**MLOps Layer 1 - Versioning**


```employee-service-v1.jar
employee-service-v2.jar```
ML:

```house-model-v1.pkl
house-model-v2.pkl
house-model-v3.pkl```

Need to track:
Dataset version
Code version
Model version
Tools:
Git
MLflow
**MLOps Layer 2 - CI/CD**
DevOps:

```Git Push
 ↓
Jenkins
 ↓
Build
 ↓
Deploy```
MLOps:

```Git Push
 ↓
Training Pipeline
 ↓
Validation
 ↓
Model Registry
 ↓
Deployment```
Pipeline:

```Code
 ↓
Train
 ↓
Test
 ↓
Deploy Model```

**MLOps Layer 3 - Monitoring**

Traditional Monitoring:

```CPU
Memory
Disk
Network```

Tools:
prometheus.io⁠�
grafana.com⁠�
But ML requires more.
Monitor:

```Prediction Count
Model Accuracy
Latency
Failed Predictions```
Example:

```Yesterday Accuracy = 95%

Today Accuracy = 72%```

Alert!
**MLOps Layer 4 - Retraining**
Suppose house prices change.
Old Model:

```2024 Data```
Current Market:

```2026 Data```
Predictions become wrong.
Need:

```New Data
 ↓
Retrain
 ↓
Deploy New Model```

This is a core MLOps responsibility.
**Where AIOps Starts**
Now imagine:

```100 Kubernetes Clusters
500 Nodes
5000 Pods```

Humans cannot analyze everything.
AIOps applies AI to IT Operations.
**Traditional Monitoring**
Prometheus says:

```CPU = 95%```

Engineer investigates.
**AIOps Monitoring**

AI analyzes:

```CPU Spike
+
Memory Spike
+
Deployment Event
+
Application Error```

AI concludes:

```Root Cause:
Deployment version v2.1.3```

and automatically opens a ticket.
**AIOps for Our House Model**

Suppose:

```Prediction Latency Increased```

AIOps engine sees:

```Node CPU 95%
Memory 90%
Model Requests Increased```

AI Recommendation:

```Scale Deployment
From 3 Pods
To 8 Pods```

or

```Rollback Model v3
Deploy Model v2```

**Complete Architecture**

```Data
                   │
                   ▼
           Train ML Model
                   │
                   ▼
            Save Model
                   │
                   ▼
           Docker Image
                   │
                   ▼
             Kubernetes
                   │
                   ▼
            User Requests
                   │
                   ▼
             Predictions
                   │
       ┌───────────┴───────────┐
       ▼                       ▼
    MLOps                 AIOps
(Model Lifecycle)    (Operations Intelligence)

Versioning           Root Cause Analysis
Training Pipelines   Anomaly Detection
Model Registry       Auto Remediation
Retraining           Capacity Forecasting
Monitoring           Predictive Alerts```

**DevOps Engineer Perspective
If you already know:**

Linux
Git
Jenkins
Docker
Kubernetes
Prometheus
Grafana
Terraform
then you're already **70–80% of the way to MLOps.**

You add:
Python
Basic ML
Model Serving
MLflow
Kubeflow

For AIOps, you add:
Log Analytics
Anomaly Detection
AI Agents
Root Cause Analysis
Predictive Operations

This is why many experienced DevOps engineers are moving toward MLOps + AIOps + Agentic AI Operations, because it builds directly on the operational foundation they already have.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)