DEV Community: Raju C

Week 5: RAG Systems and AI Agents - Where Distributed Systems Meet LLMs

Raju C — Sun, 12 Apr 2026 22:25:02 +0000

Week 5 done.

This week: RAG systems and AI agents - making LLMs actually useful with real data.

This week was about building systems around LLMs, not just calling APIs.

The Shift: From Models to Systems

Week 4 was about training models. Week 5 was about building systems around them.

Neural networks are the engine. RAG and agents are the architecture.

What I Actually Built

1. ArXiv Research Assistant (RAG)

The problem: Ask natural language questions about the latest AI/ML research papers from ArXiv. Built with Retrieval-Augmented Generation (RAG) — answers are grounded in actual paper content, not the LLM's training memory.

The solution: Build a RAG system that grounds answers in actual papers.

ArXiv RSS Feed (cs.AI + cs.LG)
    ↓
Fetch 30 recent papers → chunk into 300-word segments
    ↓
Embed with sentence-transformers → ChromaDB
    ↓
Question → Semantic Search → Top 5 chunks → GPT-4o-mini → Answer + Citations

The architecture:

def ask(question):
    # 1. Embed the question
    query_embedding = embed(question)

    # 2. Search vector database
    results = chroma_db.query(
        query_embeddings=[query_embedding],
        n_results=5
    )

    # 3. Build context from retrieved chunks
    context = "\n\n".join(results['documents'])

    # 4. Augment LLM prompt with context
    prompt = f"Based on these papers:\n{context}\n\nAnswer: {question}"

    # 5. Generate grounded answer
    return llm.generate(prompt)

In distributed systems, we cache expensive operations: Request → Cache → Database → Compute

RAG is the same: Query → Vector DB → Document Store → LLM

Key decisions:

Chunk size: 300 words

Smaller than my earlier experiments (1000 tokens)
Research papers need precise citations
300 words = one key finding per chunk

Overlap strategy:

No overlap initially → context cuts mid-sentence
Added semantic overlap → better coherence

Results:

Indexed 30 recent papers in ~1 minute
Query latency: ~500ms end-to-end
Answers cite actual paper sections, not hallucinations

2. Multi-Phase Task Assistant (AI Agents)

The problem: LLMs can't take actions, maintain state across tasks, or orchestrate multi-step workflows.

The solution: Build an agent system with tool calling and phase management.

Built a task assistant that manages multi-phase workflows:

User conversation → LLM decides which tools to call → Execute tools → Update state → Next phase

The architecture:

class TaskAgent:
    def __init__(self):
        self.tools = [
            get_weather_tool,      # External API
            decrypt_message_tool,  # Utility function
            generate_cipher_tool   # Utility function
        ]
        self.state = load_state()  # Phase tracking

    def run(self, user_input):
        # Build system prompt with current state
        prompt = self.build_prompt(self.state.current_phase)

        # LLM decides which tools to call
        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=self.conversation_history,
            tools=self.tools
        )

        # Execute tool calls
        if response.tool_calls:
            for tool_call in response.tool_calls:
                result = self.execute_tool(tool_call)
                self.conversation_history.append(result)

        # Check if phase should advance
        if self.should_advance_phase(response):
            self.state.advance()

        return response

Tool calling in action:

User: "I'm heading to Berlin for a task. What should I prepare?"

Agent thinking:
1. Detects location mention → calls get_weather("Berlin")
2. Weather API returns: 4°C, overcast, wind 15km/h
3. LLM uses real data: "Berlin is 4°C and overcast. Pack warm layers..."

User: "I received this encrypted note: KHOOR DJHQW with shift 3"

Agent thinking:
1. Detects cipher pattern → calls decrypt_caesar("KHOOR DJHQW", 3)
2. Decrypt tool returns: "HELLO AGENT"
3. LLM responds: "Decoded message: HELLO AGENT. Proceed to checkpoint."

Phase management:

The agent tracks multi-step workflows:

# 4-phase workflow example
phases = ["travel", "preparation", "execution", "completion"]

# Each phase has validation logic
def validate_phase(phase, user_input):
    if phase == "travel":
        return check_destination_confirmed(user_input)
    elif phase == "preparation":
        return check_equipment_selected(user_input)
    # ... etc

State persistence:

{
  "session_id": "abc123",
  "current_phase": "preparation",
  "context": {
    "destination": "Berlin",
    "weather": "4°C, overcast",
    "equipment": ["winter coat", "encrypted device"]
  },
  "conversation_history": [...]
}

In microservices: services maintain state in distributed databases.
In AI agents: agents maintain state in JSON/database, rebuild context each turn.

Real-time architecture:

Flask app + Socket.IO
    ↓
User sends message via websocket
    ↓
Agent processes with tool calling loop
    ↓
Tools execute (API calls, computations)
    ↓
State updates (phase advancement, context)
    ↓
Response streamed back via websocket

What surprised me:

The LLM is remarkably good at:

Deciding when to call tools (no explicit instructions needed)
Extracting parameters from natural language
Chaining multiple tool calls to solve complex requests
Understanding phase context and constraints

3. MCP Integration

The problem: Every integration needs custom code.

The solution: MCP (Model Context Protocol) - standard protocol for AI tools.

Built an MCP server exposing REST API endpoints as tools:

@mcp_server.tool()
def search_papers(query: str):
    return vector_db.search(query)

MCP is like a service mesh for AI - standard protocol, any tool plugs in.

What Frustrated Me Most

State management across tool calls.

When an agent makes multiple tool calls in one turn:

Weather API call → wait for response
Use weather data to decide next tool
Call cipher tool with extracted params
Build final response with both results

Managing this flow, tracking conversation history, and rebuilding context each turn is complex.

Solution: Treat each turn as a transaction - load state, process, update, persist.

The Debugging Moment

Problem: Agent called tools repeatedly in a loop, making 10+ API calls for one user message.

What I tried:

Added rate limiting → Still looped
Changed prompt to discourage multiple calls → Still looped
Inspected tool call sequence → Found the issue ✓

Root cause:
The conversation history included tool results, but the LLM kept "forgetting" it had already called the tool. The messages weren't formatted correctly.

Fix:

# Wrong - LLM doesn't recognize tool result
messages.append({
    "role": "user",
    "content": f"Tool result: {result}"
})

# Right - Proper tool result format
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": result
})

After fixing the message format, the agent called each tool exactly once.

Like debugging distributed systems: inspect the protocol, not just the logic.

Mistakes I Made

1. Didn't handle tool execution failures
What if the weather API is down? What if the cipher has invalid input?

Should have: Wrapped tool calls in try-catch, return graceful errors to LLM.

2. No token tracking
Multi-turn conversations + tool results = token count explosion.
Hit 16K context limit after 15 turns.

Fix: Implement conversation pruning - keep recent messages, summarize old ones.

3. Built tools without testing independently
Integrated everything at once, couldn't tell if bugs were in tools or orchestration.

Fix: Unit test each tool, then integration test the agent loop.

4. Forgot to persist state between server restarts
Agent lost all context when Flask restarted during development.

Fix: Save state to JSON after each turn, load on startup.

Connection to Distributed Systems

Agents = Microservice orchestration:

Tool calling     = Service-to-service RPC
State management = Distributed transactions
Retry logic      = Fault tolerance
Phase tracking   = Workflow orchestration (Temporal, Airflow)

RAG = Multi-tier caching:

Vector DB    = L1 cache (~10ms)
Doc Store    = L2 cache (~50ms)  
LLM          = Compute (~500ms)

Conversation history = Event sourcing:

Every message = Event
Rebuild state by replaying events
Can replay conversation from any point

MCP = Service mesh:

Standard protocol
Service discovery
Centralized monitoring

Time Spent This Week

~12 hours

What I'm Taking Forward

Agents are about orchestration, not intelligence.

The LLM provides reasoning. System design matters:

Which tools to expose
How to handle tool failures
When to advance workflow phases
How to maintain conversation context
When to prune history vs persist state

Same principles as designing microservices.

Tool calling is RPC with natural language.

Microservices: service_a.call(service_b.get_data(params))
AI Agents:     llm.call(weather_tool.get_data(city="Berlin"))

The LLM is the orchestrator, deciding which services to call and when.

State management is the hard part.

Not the LLM, not the tools - managing state across:

Multiple turns
Multiple tool calls per turn
Phase transitions
Server restarts
Concurrent sessions

This is distributed systems 101.

RAG is a data pipeline, not magic.

Pipeline stages:

Ingest → 2. Chunk → 3. Embed → 4. Index → 5. Retrieve → 6. Generate

Each stage needs tuning. Same as any distributed system.

What's Still Hard

Token management (conversation + tool results + context)
Multi-hop reasoning (questions needing multiple retrievals)
Tool selection (when to call vs when to answer directly)
Error propagation (how to surface tool failures to LLM)
Cost optimization (each tool call + LLM call costs money)

The Approach That Works

Start simple:

Single tool, single turn
Add multi-turn conversation
Add multiple tools
Add state management
Add phase orchestration

Each step works before adding complexity.

Test components independently:

Tool execution (unit tests)
LLM tool calling (integration tests)
State persistence (end-to-end tests)
Then full agent loop

Same debugging process I use for microservices.

Log everything:

{
  "timestamp": "2026-04-12T10:32:00Z",
  "user_input": "Weather in Berlin?",
  "tools_called": [{"tool": "get_weather", "args": {"city": "Berlin"}}],
  "tool_results": ["4°C, overcast"],
  "agent_response": "Berlin is 4°C...",
  "phase": "preparation"
}

When debugging, replay the conversation. Just like distributed tracing.

Week 5 down. Built RAG systems and agents that work with real data and real workflows.

Building AI systems? What distributed systems patterns have you applied?

Week 4: From Theory to Training - My First Neural Networks

Raju C — Sun, 29 Mar 2026 03:47:31 +0000

Week 4 done.

Last week: Shallow algorithms (Linear Regression, Logistic Regression).
This week: Neural networks - actually building and training them.

Still not LLMs. Still not ChatGPT integrations. Still "boring" ML.

But here's why: I want to understand what's actually happening, not just call APIs.

The difference? Last week I learned what models predict.
This week I learned how they learn.

The Shift: From Equations to Architectures

This week was about understanding when complexity is worth it.

What I Actually Built

1. Handwritten Digit Recognition (MNIST)

The problem: Recognize handwritten digits (0-9) from 28×28 pixel images.

import torch
import torch.nn as nn

class DigitClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Flatten(),
            nn.Linear(784, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(64, 10)
        )

    def forward(self, x):
        return self.layers(x)

# Result: 97% accuracy

In distributed systems, we build pipelines: Data → Transform → Store → Serve

Neural networks are the same: Input → Hidden Layers → Output

The transformation happens through learning, not hardcoding rules.

2. Experimenting with Layer Architectures

Dropout Layers - Forces redundant representations. Like fault-tolerant systems - if one node fails, others handle the load.

Convolutional Layers - Respects spatial structure. Same filter slides across the image (parameter sharing). Like using the same load balancing algorithm across all services.

Dense layers: 97% accuracy → Conv layers: 99.2% accuracy

BatchNorm - Stabilizes training by normalizing inputs to each layer. Like circuit breakers in microservices.

Without BatchNorm: Stuck at 85% → With BatchNorm: 97%

3. Image Denoising with Autoencoders

The architecture: Noisy Image → [Encoder] → Bottleneck → [Decoder] → Clean Image

The bottleneck (32 dimensions) is key:

Too large (128): Memorizes noise
Too small (8): Loses detail
Just right (32): Learns what matters ✓

This is lossy compression with learned parameters. Like designing a caching layer - except the model learns the patterns.

Extra: In-painting (reconstructing obscured regions). Simple digits: 90% success, Complex digits: 60% success.

What Frustrated Me Most

Hyperparameter hell. Every decision affects everything - learning rate, layers, dropout, bottleneck size. I've spent years tuning JVM heaps and thread pools. This feels similar but with 10x more knobs.

Solution: Start with known-good defaults. Change one thing at a time. Keep notes.

The Debugging Moment

Problem: Autoencoder producing blurry reconstructions despite loss decreasing.

The fix: Changed optimizer from SGD to Adam.

Adam adapts learning rates per parameter. SGD uses same rate for everything. Like auto-scaling different services based on their individual load patterns.

Mistakes I Made

1. Tested on training data (again!)
Should know: Always test on unseen data.

2. Forgot model.eval()
Dropout was randomly disabling neurons during testing!
Training mode: 82% → Eval mode: 97%

3. Picked architecture randomly
5 layers? 256 neurons? Dropout 0.8? Model barely learned.
Fix: Started with proven architectures (LeNet), modified incrementally.

4. Didn't normalize input data
Raw pixels (0-255): unstable, loss exploding
Normalized (0-1): stable, converging ✓

The Pattern Recognition

What I've learned leading teams applies here:

Start simple, add complexity only when needed
Basic: 92% → +dropout: 95% → +BatchNorm: 97% → +Conv: 99%

Understand trade-offs
More layers = more capacity = slower training = higher overfitting risk

Experiment systematically
Bottleneck sizes: 8 (blurry) → 16 (better) → 32 (clean ✓) → 64 (memorizing) → 128 (overfitting)

Connection to Distributed Systems

Encoder-Decoder = Data Pipeline
The bottleneck is like network bandwidth - compress to fit through it.

Parameter Sharing = Code Reuse
One "edge detector" works everywhere. Efficient and effective.

Overfitting = Over-optimization
I've seen systems optimized for one traffic pattern that broke when patterns changed.
Solution: regularization / graceful degradation.

Time Spent This Week

About 8-10 hours this week.

What I'm Taking Forward

Neural networks aren't a stepping stone to "real" AI. They ARE real AI.

Most production ML uses these techniques. LLMs get the hype. But understanding backpropagation, gradients, and optimization matters.

I could be building LLM wrappers right now. But I wouldn't understand:

How training actually works
Why models fail in specific ways
When to use what architecture
How to debug learning problems

Starting with fundamentals means I can build real intuition.

The approach that works:

Start with fundamentals. Build something small (like MNIST). Don't start with "I'm going to build ChatGPT." Master the basics. Understand debugging. Build intuition. Then scale up.

Same advice I give for learning any new tech stack.

What's Still Hard

Choosing architecture (Conv vs Dense? How many layers?)
Hyperparameter tuning (still trial and error)
Knowing when to stop (97% vs 99%?)

These feel like architectural decisions I make daily - but with less intuition.

Week 4 down. Built neural networks that actually learn.

Learning deep learning as a senior engineer? What surprised you most about the transition?

Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs

Raju C — Sun, 22 Mar 2026 14:33:15 +0000

Week 3 done.

This week I learned shallow algorithms - Linear Regression, Logistic Regression, DBSCAN, PCA.

Not LLMs. Not ChatGPT integrations. Not the AI applications everyone's building.

Basic machine learning algorithms from decades ago.

And I kept asking myself: why am I doing this?

The Question I Keep Getting

"You're learning AI, right? When are you building something with GPT or Claude?"

Fair question. I could skip straight to LLM applications. Plenty of people do.

But here's what I realized this week: I want to understand what's actually happening, not just call APIs.

Why Shallow Algorithms First

1. They're what's actually running in production

Most companies aren't running massive neural networks. They're running:

Logistic regression for fraud detection
Linear regression for demand forecasting
Clustering for customer segmentation
PCA for feature reduction

The "boring" algorithms power real systems.

2. They teach you how ML actually works

When I call an LLM API:

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

I'm using ML. I'm not understanding ML.

When I implement Linear Regression:

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

I see: training data → learning patterns → making predictions.

It's the same process neural networks use, just simpler.

3. I can actually debug them

If my Linear Regression model performs poorly, I can:

Check the features
Look at the coefficients
Understand what's being weighted

If my LLM call gives weird results? I have no idea what's happening inside.

Starting simple means I can build intuition before jumping to black boxes.

What I Actually Learned

Linear Regression - Predicting house prices from features (square feet, bedrooms, age)

The model learns: price = w1×sqft + w2×bedrooms + w3×age + bias

It finds the best weights (w1, w2, w3) by minimizing prediction error.

This clicked because I've spent years optimizing systems. Same concept - iteratively adjust parameters to minimize error.

Logistic Regression - Classifying patients as healthy vs disease

Despite the name, it's for classification, not regression. This confused me for days.

It outputs probabilities (0 to 1). Above 0.5 → disease, below → healthy.

DBSCAN - Grouping similar pixels in images

Clusters dense regions automatically. No need to specify number of clusters upfront.

Reminded me of finding hot spots in distributed systems - same density-based grouping concept.

PCA - Reducing 100 features down to 10

Keeps the most important information, throws away the noise.

Like compressing data in a pipeline - lose some detail but keep what matters.

The Part That Frustrated Me

Hyperparameter tuning.

Every algorithm has knobs to turn:

DBSCAN: How close is "similar"? How many points make a cluster?
PCA: How many components to keep?

The examples work fine. My own experiments? Trial and error.

I tried clustering an image and got either:

Everything in one giant cluster (threshold too loose)
Everything labeled as noise (threshold too strict)

Still figuring out the intuition here.

Mistakes I Made

1. Tested on training data

model.fit(X, y)  # Train on all data
score = model.score(X, y)  # Test on same data
# Score: 98%! Amazing!

Except the model had already seen the answers. Not a real test.

Should have split train/test:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
# Score: 73%. More honest.

Coming from software engineering where we have staging environments, I should have known better.

2. Mixed up regression and classification

I kept using Linear Regression when I should've used Logistic Regression.

Finally internalized:

Predicting a number (price, temperature, age) → Regression
Predicting a category (yes/no, cat/dog, disease) → Classification

Took more failed experiments than I'd like to admit.

3. Forgot to scale features

# Features with wildly different scales
X = [[2000, 3], ...]  # square_feet=2000, bedrooms=3

Square footage dominates because the numbers are bigger. Had to normalize everything to the same scale first.

The Pattern That Helped

Every scikit-learn algorithm follows the same structure:

model = SomeAlgorithm()
model.fit(X_train, y_train)  # Learn from data
predictions = model.predict(X_test)  # Make predictions
score = model.score(X_test, y_test)  # Evaluate

Once I saw this pattern, experimenting with new algorithms got easier.

Want to try a different classifier? Swap the algorithm. Same interface.

Reminded me of how Kafka, Flink, and other stream tools have different internals but similar APIs.

Connection to Distributed Systems

Gradient descent (how these models learn) works like load balancer tuning:

Load balancing:

Try a configuration
Measure performance
Adjust based on results
Repeat

Machine learning:

Make predictions
Measure errors
Adjust weights based on errors
Repeat

Same iterative optimization. Different domain.

This mental model helped when ML concepts felt foreign.

Time Spent This Week

About 8-10 hours this week.

What I'm Taking Forward

Shallow algorithms aren't a stepping stone to "real" ML. They ARE real ML.

Most production systems use these techniques. Neural networks get the hype. Logistic regression gets deployed.

Understanding fundamentals before jumping to LLMs makes sense.

I could be building GPT wrappers right now. But I wouldn't understand:

How training works
Why models fail
When to use what approach
How to debug problems

Starting simple means I can build intuition.

You can be productive without understanding everything.

I can use these algorithms effectively even if I don't fully grasp every mathematical detail.

Understanding deepens with practice.

What's Still Unclear

Picking the right algorithm for a new problem (I Google this constantly)
Tuning hyperparameters systematically (still trial and error)
Knowing when a model is "good enough"

I'm three weeks in, not three years. Still learning.

Why This Matters

In a few weeks, I'll start building LLM applications. RAG systems, agents, whatever.

But I'll understand:

What "training" means
How models learn patterns
Why evaluation matters
When simpler approaches work better

I won't just be calling APIs. I'll understand what those APIs are doing under the hood.

That's worth spending time on "boring" algorithms.

Week 3 down. Built and broke ML models this week.

Learning ML fundamentals before diving into LLMs? Or went straight to GPT APIs? Curious what path others are taking.

Week 2: Python Essentials and My First AI/ML Concepts

Raju C — Sun, 15 Mar 2026 15:28:02 +0000

Week 2 done.

This week wasn't about fancy ML models or neural networks.

It was about something more fundamental: building the Python muscle I'll need to debug AI code, even when Claude Code writes most of it.

New here? Read Week 1: Why I'm making this transition first.

What I Actually Did

This week I focused on the tools, not the theory:

Python fundamentals:

NumPy for numerical operations
Pandas for data manipulation
Matplotlib for visualizations
Jupyter notebooks as my workspace

AI/ML concepts I started exploring:

Embeddings (turning text into numbers)
Prompt engineering basics
Tool calling with LLMs
Basic LLM API calls through notebooks

Not glamorous. But necessary.

The Realization: I Need to Read AI Code, Not Just Generate It

Here's what hit me this week.

I've been using Claude Code to build POCs 2-3x faster. That's great.

But when something breaks? When the generated code doesn't do what I expect? When I need to understand WHY it works?

I need to read Python.

Not just generate it. Not just copy-paste it. Actually understand what's happening.

Coming from 18 years in other languages, I could've skipped Python basics. "It's just another language, I'll figure it out."

Bad idea.

Jupyter Notebooks: I Was Wrong

I was skeptical.

"Why not just use .py files like a normal engineer?"

Then I actually used notebooks for a week.

What clicked:

Write code → Run cell → See output immediately.

No "run entire script and wait."
No "add print statements everywhere to debug."
No "recompile everything for one change."

# Cell 1: Load data
import pandas as pd
data = pd.read_csv('data.csv')
data.head()  # See it immediately

# Cell 2: Try something
result = data['column'].mean()
print(result)  # Instant feedback

# Cell 3: Visualize
import matplotlib.pyplot as plt
data['column'].hist()
plt.show()  # Plot appears inline

For exploration, this is perfect.

For production code, I'll still use .py files.

But for learning ML? Notebooks make sense.

NumPy, Pandas, Matplotlib: The Foundation

NumPy - operations on arrays of numbers:

import numpy as np

# Instead of loops
data = np.array([1, 2, 3, 4, 5])
doubled = data * 2  # [2, 4, 6, 8, 10]

# Fast. Clean. No explicit loops.

This is the foundation. Every ML library uses NumPy under the hood.

Pandas - data manipulation:

import pandas as pd

# Like SQL, but in Python
df = pd.DataFrame({'age': [25, 30, 35], 'salary': [50000, 60000, 70000]})
high_earners = df[df['salary'] > 55000]

After years of SQL and data pipelines, Pandas clicked fast.

It's what feeds data into ML models.

Matplotlib - seeing what the data looks like:

import matplotlib.pyplot as plt

plt.scatter(df['age'], df['salary'])
plt.xlabel('Age')
plt.ylabel('Salary')
plt.show()

Turns out visualizing data isn't optional in ML. It's how you understand what's actually happening.

Starting to Understand AI Concepts

This week I dipped into actual AI/ML concepts:

Embeddings:

The idea: convert text into numbers (vectors) that capture meaning.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

texts = ["distributed systems", "microservices", "machine learning"]
embeddings = model.encode(texts)

# Each text is now a 384-dimensional vector
# Similar meanings = similar vectors

I can USE this now. Do I understand HOW the model creates these vectors?

Not yet. That's future weeks.

Prompt Engineering & Tool Calling:

Started experimenting with LLM APIs in notebooks.

Followed OpenAI's documentation examples to make basic API calls:

import openai

client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Explain embeddings simply"}]
)

print(response.choices[0].message.content)

Not building anything production-ready yet. Just understanding the mechanics.

But this shift - from using AI through a chat interface to calling it programmatically - matters.

This is the foundation of building AI applications, not just using them.

What's Still Fuzzy

How embeddings actually work: I can use them. Don't understand the training process yet.

Mathematical foundations: Linear algebra, probability - I know I need these. Haven't dived deep yet.

When to use what: Lots of ML concepts flying around. Don't have mental models yet for "when would I use X vs Y?"

That's okay. It's week 2.

The Claude Code Insight

Here's why this week mattered.

Claude Code generates Python. Fast.

But when that code uses NumPy array slicing I don't understand? Or Pandas operations I've never seen? Or tries to fix a bug I can't diagnose?

I'm stuck.

This week was about building enough Python muscle to:

Read what Claude generates
Understand what it's doing
Debug when it's wrong
Modify when I need something different

Not about becoming a Python expert.

About being competent enough to work WITH the AI tools, not just be dependent on them.

Connection to My Background

Something I noticed: ML data preparation is just ETL.

Extract, Transform, Load - but for models instead of databases.

# Load data (Extract)
data = pd.read_csv('data.csv')

# Clean and transform (Transform)
data = data.dropna()
data['new_feature'] = data['col1'] / data['col2']

# Feed to model (Load)
model.fit(data)

I've spent 18 years building large-scale distributed systems, streaming APIs, backend infrastructure.

The last few years leading teams building data pipelines.

ML preprocessing? It's the same ETL pattern I know. Different destination.

That helped it click.

Time Investment

This week: ~12 hours

Mostly:

Practicing Python in notebooks
Working through NumPy/Pandas basics
Playing with embeddings
First LLM API calls

More hands-on than Week 1. Less "watching tutorials," more "writing code and breaking things."

Week 2 down. Building the muscle I'll actually need.

If you're also learning Python for AI/ML - what tripped you up coming from other languages?

---

Why I'm Leaving My Comfort Zone: From Engineering Leadership to AI-First Engineering

Raju C — Sat, 07 Mar 2026 17:28:29 +0000

I've spent my career architecting distributed systems — designing fault-tolerant pipelines, making trade-offs between consistency and availability, and owning systems end-to-end from design through production.

Data pipelines. Streaming infrastructure. Backend at scale. I've built systems processing terabytes of data, architected platforms handling millions of requests.

I'm good at what I do.

So why am I starting over as a complete beginner in AI/ML?

What Happened

Two things made this impossible to ignore.

First, my company started hiring AI/ML engineers.

Suddenly, there were these people in meetings talking about RAG, agentic systems, and MCP, etc., and I'd nod along.

I had no idea what they were actually building.

All my experience, and I couldn't contribute to the most important projects at my company.

That hurt.

Second, I started using Claude Code.

Game changer. POCs that took me days? Done in hours. New features? 2-3x faster. Going from 0 to 1 on projects became almost effortless.

But here's the problem.

I didn't understand how it worked.

I was using AI. I wasn't building it. Couldn't explain it. Couldn't tell if solutions were actually good or just looked convincing.

Someone asked me: "How does Claude Code actually work?"

No answer.

That's when it hit me.

Why This Matters to Me

I've always believed understanding fundamentals lets you solve complex problems.

When I learned distributed systems - really learned them, not just used Kafka but understood partitioning, replication, consensus - that's when I stopped using tools and started building systems.

That's when I became valuable.

I need to do the same with AI.

Not just use it. Understand it. Build it.

What Scares Me

Transitioning from expert to beginner after leading distributed systems teams.
I'm back to asking foundational questions.

I should've started this in 2024. Every month I waited, AI moved faster.
But waiting for the "perfect time" would mean never starting.

Here I am.

What Pulls Me Forward

Two things make this worth the fear:

I want to be part of the conversation. Not the person nodding along while AI/ML engineers talk. I want to understand what they're building.

I want to build AI systems myself. Not just use Claude Code. Build things like it. Understand models, architectures, trade-offs.

I want to become an AI-first engineer.

Why Now and Not Later

Because I can't afford to fall further behind.

AI is accelerating too fast. Every day I wait, the gap widens.

I already feel late. Another year won't make this easier.

So I'm starting today.

The Plan

I'm learning AI/ML fundamentals from the ground up.

No shortcuts. No just-use-the-framework-without-understanding approach.

I want to understand:

How models like Claude actually work
How to build AI systems from scratch
How to architect production AI solutions

Timeline? Don't have one. Just committed to the journey.

Goal? However long it takes - I want to:

Lead or contribute to AI projects
Understand how these models actually work
Become a true AI-first engineer

What's Next

I'm documenting this journey here. Not polished tutorials. Real learning in public.

Wins. Struggles. Confusion. Breakthroughs.

Week 1 starts now.

I'm nervous. I'm excited. I'm deep into my career and starting over.

Let's see where this goes.

If you're also making a career transition into AI/ML, I'd love to hear about it in the comments.