This series is meant to serve as a guide towards better understanding and getting started with concepts surrounding AI. I am writing this as a Backend Engineer trying to level-up.
I am using google (yep, old school), and AI tools to form a roadmap, ask questions and understand the concepts. The goal is to document the steps along the way.
π€ Questions that will get answered by the end, hopefully:
- What are LLMs, Agents, DAGS, RAGs, Vector DBs etc
- How do these things really work? Not the Math or the Neural nets but rather, all the flows using LLMs etc
- More importantly β how can I, as a backend engineer, get started and contribute to building AI tools?
If you're also exploring, or already working in the field β do share your feedback, mistakes, or suggestions in the comments.
πΊοΈ Current plan:
Model Serving --> Airflow --> Vector DBs --> RAG Style Q&A + Llama Index
π Part 1: Model Serving with FastAPI & TorchVision
In this step, I learned how a model is served behind an API β that's it. The same models from the earlier "ML" days (e.g., classification models) β now exposed cleanly via API.
Serving is about making the model available for real-time or batch predictions, efficiently, securely, and at scale.
πΉ What is a Model?
A model is the core of an ML system β a program trained on data to recognize patterns and make predictions.
πΉ What is Model Serving?
Model Serving is the process of putting that trained model behind an API (e.g., FastAPI), so it can take input and return predictions (inference).
Instead of bundling the model inside every client app, you host it once, centrally.
π Real-World Examples:
- πΌοΈ Image β API β Model returns:
"cat"
or"dog"
- π¬ Chatbot message β API β LLM replies
- π Transaction β Fraud model β
"fraud"
or"legit"
π Key Terms I Came Across
- Inference β Running the model on new (unseen) input data
- Model Hosting β Putting the model on a server (local or cloud) and exposing an API
π Read: Model Serving 101 (Paywalled...)
π₯ Key Takeaways from the Article:
Model serving introduces a distinct set of challenges compared to a typical CRUD backend. Itβs as if the heavy-lifting data pipelines we used to run in the background now need to respond to client requests in real-time, with strict performance and scalability requirements.
-
Key factors to balance:
- π Throughput β predictions/sec
- β±οΈ Latency β response time
- π° Cost β infra & compute
3 Fundamental Deployment Types:
- Online Real-Time Inference
- Asynchronous Inference
- Offline Batch Transform
π§° Tools Used
β‘ FastAPI
A high-performance Python web framework.
π Official Tutorial: FastAPI Docs
πΌοΈ TorchVision
A interesting PyTorch library that provides:
- Pretrained computer vision models (like
resnet18
,mobilenet
) used for image or object classification - Tools for image transformations and loading
π‘ Why itβs great: You donβt need to train from scratch. You can just load and serve a powerful image model in minutes.
π Read: TorchVision Basics
βοΈ Step-by-Step: Serving a Model via API
πΉ Step 1: Setup Environment
mkdir model-serving && cd model-serving
python3 -m venv venv
venv\Scripts\activate # On Mac/Linux: source venv/bin/activate
pip install fastapi uvicorn torch torchvision pillow requests
πΉ Step 2: Create Model Loader
π model.py
import torch
from torchvision import models, transforms
from PIL import Image
import requests
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
model.eval()
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
)
])
LABELS_URL = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
response = requests.get(LABELS_URL)
labels = [line.strip() for line in response.text.splitlines()]
def predict(image_path):
image = Image.open(image_path).convert("RGB")
input_tensor = transform(image).unsqueeze(0)
with torch.no_grad():
output = model(input_tensor)
pred_index = output.argmax().item()
return labels[pred_index]
def predict_topk(img_path):
image = Image.open(img_path).convert("RGB")
input_tensor = transform(image).unsqueeze(0)
with torch.no_grad():
output = model(input_tensor)
probs = torch.nn.functional.softmax(output[0], dim=0)
top_p, top_i = torch.topk(probs, 5)
top_labels = [(labels[idx], round(prob.item(), 4)) for idx, prob in zip(top_i, top_p)]
return top_labels
π Interesting learning: models.ResNet18_Weights.DEFAULT
loads a model pre-trained on 1000 categories. These labels come from a public file maintained by PyTorch (based on the ImageNet dataset). The model outputs a probability distribution over these categories, and the index with the highest score maps to the predicted label.
π I have included both predict
and predict_topk
methods to demonstrate how you can work with the model's output. While predict
gives you just the top result, predict_topk
provides the top 5 predictions along with confidence scores. This is useful when you want more insight into what the model "thinks" the image could be, especially in ambiguous cases.
πΉ Step 3: Create FastAPI App
π app.py
from fastapi import FastAPI, UploadFile, File
from model import predict
import shutil
app = FastAPI()
@app.post("/predict")
async def classify_image(file: UploadFile = File(...)):
temp_path = f"/tmp/{file.filename}"
with open(temp_path, "wb") as buffer:
shutil.copyfileobj(file.file, buffer)
result = predict(temp_path) # or predict_topk(temp_path)
return {"prediction": result}
πΉ Step 4: Run the API
uvicorn app:app --reload
Go to: http://127.0.0.1:8000/docs
π€ Upload any image β get a prediction
β Thatβs it! Youβve served your first model.Β Now you can integrate this into real-world applications or scale it using cloud services.
π» GitHub Repo
π Other Parts:
πͺ Coming Up Next
Next, I plan to explore Apache Airflow and how it's used for ML workflows and pipelines β one layer deeper each time π‘
Due to technical issues in Windows, this topic had to be pushed for later. Take a look at RAG and VectorDB instead.
Top comments (0)