DEV Community: Ege Pakten

The Machine Learning Lifecycle: 10 Steps From Problem to Production (And Why Most Projects Fail at Step 3)

Ege Pakten — Sat, 25 Apr 2026 09:15:35 +0000

Every ML tutorial jumps straight to model training. But in the real world, training is step 7 out of 10 — and the steps before it are where projects succeed or fail. This post walks through the full Machine Learning Lifecycle, from defining your problem to keeping your model healthy in production, with real examples and practical advice at every stage.

The Big Picture

Machine Learning is an iterative and structured process. It's not "throw data at an algorithm and hope for magic." It's a cycle — and most teams loop through it multiple times before they get something that works in production.

Here are the 10 stages:

1. Problem Definition → 2. Data Collection → 3. Data Cleaning & Preprocessing
→ 4. Exploratory Data Analysis (EDA) → 5. Feature Engineering & Selection
→ 6. Model Selection → 7. Model Training → 8. Model Evaluation & Tuning
→ 9. Model Deployment → 10. Monitoring & Maintenance → (back to 1)

Let's go through each one.

1. Problem Definition — "What are we actually solving?"

This is where most failed ML projects go wrong. Before touching any data or code, you need to answer:

What business problem am I solving? Not "I want to use AI," but "I want to reduce customer churn by 15%" or "I want to detect fraudulent transactions in real time."
Is ML even the right tool? Sometimes a simple rule-based system or a SQL query is better. ML is expensive overkill for problems that have clear, deterministic rules.
What does success look like? Define a measurable metric: accuracy, precision, recall, revenue impact, latency requirements.
What type of ML problem is this? Classification? Regression? Clustering? Recommendation? This dictates everything downstream.

The problem definition dictates the type of data you need. If you define the problem wrong, you'll collect the wrong data, build the wrong model, and ship something nobody wanted.

Example: A bank wants to "use AI." That's not a problem definition. "Predict which credit card transactions are fraudulent with less than 0.1% false positive rate and under 200ms latency" — that's a problem definition.

2. Data Collection — "Do we have enough?"

Once you know what you're solving, you need data. This step is about gathering enough high-quality, relevant data to train a model.

Key questions:

Where does the data come from? Internal databases, APIs, web scraping, third-party vendors, public datasets, user-generated content?
How much data do I need? Depends on complexity. A simple classifier might need 1,000 examples. A computer vision model might need 100,000+ labeled images. An LLM needs billions of tokens.
Is the data labeled? For supervised learning, you need labels (the "right answers"). Labeling is often the most expensive and time-consuming part.
Is the data representative? If you train a facial recognition system only on photos of one demographic, it will fail on others. Your data must represent the real-world distribution.

Common pitfalls:

Assuming you have "big data" when you actually have big noise
Not checking for sampling bias
Ignoring data privacy regulations (GDPR, KVKK, HIPAA)
Collecting too many features and not enough samples

3. Data Cleaning and Preprocessing — "Garbage in, garbage out"

This is where you spend 60-80% of your actual project time. Raw data is messy. Always.

What you're doing here:

Handling Missing Values

Some rows have blank fields. Do you fill them with the mean? The median? A prediction? Or drop them entirely?
The right answer depends on why the data is missing. "Random missing" and "systematically missing" require different approaches.

Removing Duplicates

Duplicate records distort your model's understanding of the distribution.

Fixing Inconsistent Data

"New York", "new york", "NY", "N.Y." are the same city but four different strings.
Date formats: "04/20/2026" vs "2026-04-20" vs "20 April 2026"
Units: meters vs feet, Celsius vs Fahrenheit

Handling Outliers

A salary dataset where most values are $40K-$120K but one entry says $99,999,999. Is it real or a typo? Outliers can destroy model performance or provide critical signal — you have to decide.

Data Type Conversions

Categorical variables need encoding (one-hot, label encoding)
Text needs tokenization
Images need resizing, normalization
Dates need feature extraction (day of week, month, holiday flag)

Normalization and Scaling

Features on different scales (age: 0-100, salary: 20,000-500,000) can bias models that use distance calculations. Standard scaling (z-score) or min-max scaling fixes this.

The motto: garbage in, garbage out. No model, no matter how sophisticated, can learn good patterns from bad data.

4. Exploratory Data Analysis (EDA) — "What does the data actually look like?"

Before building any model, you need to understand your data. EDA is about getting the big picture.

What you're looking for:

Distributions — Is your target variable balanced? If 99% of transactions are legitimate and 1% are fraud, you have a class imbalance problem.
Correlations — Which features are related to each other? Which features predict your target?
Patterns and trends — Seasonal effects? Time-based shifts? Geographic clusters?
Data quality issues you missed in step 3 — Sometimes problems only become visible in visualization.

Tools: histograms, scatter plots, correlation matrices, box plots, pair plots. Libraries: Pandas, Matplotlib, Seaborn, Plotly.

Example: You're building a house price predictor. EDA reveals that "number of bedrooms" and "square footage" are highly correlated (0.92). Including both might cause multicollinearity. You might drop one or combine them.

EDA often sends you back to step 2 or 3 — you realize you need more data, or your data has problems you didn't see before.

5. Feature Engineering and Selection — "Good features > fancy models"

This is often the difference between a mediocre model and a great one. Feature engineering is the art of creating new input variables that help the model learn patterns better.

Feature Engineering (Creating)

From dates: extract day_of_week, is_weekend, month, quarter, days_since_last_event
From text: word count, sentiment score, TF-IDF values, embeddings
From location: distance to nearest city, population density, latitude buckets
Combining features: price_per_sqft = price / square_footage, BMI = weight / height²
Domain knowledge: a doctor knows that "blood pressure × age" interaction matters; encode that

Feature Selection (Removing)

Not all features help. Some add noise. Too many features cause overfitting and slow training. Techniques:

Correlation analysis — drop features that are highly correlated with each other
Feature importance from tree-based models (Random Forest, XGBoost)
Recursive Feature Elimination (RFE) — iteratively remove the least important feature
L1 Regularization (Lasso) — automatically zeroes out unimportant features during training

Key insight: a simple model with great features almost always beats a complex model with bad features.

6. Model Selection — "Choose the right tool for the job"

Now you pick which algorithm(s) to try. This depends on:

Problem type: Classification → Logistic Regression, Random Forest, SVM, Neural Network. Regression → Linear Regression, XGBoost, Neural Network. Clustering → K-Means, DBSCAN. Sequence → RNN, LSTM, Transformer.
Data size: Small data → simpler models (logistic regression, SVM). Large data → deep learning can shine.
Interpretability needs: Healthcare and finance often need explainable models (decision trees, linear models). Recommendation engines can afford black boxes (deep learning).
Latency requirements: Real-time inference needs fast models. Batch processing can afford slower ones.

Best practice: Start simple. Try logistic regression or a decision tree first. If it gets 85% accuracy, you have a strong baseline. Then try more complex models and see if the improvement justifies the complexity.

You often try 3-5 different models and compare their performance.

7. Model Training — "The model learns about the data"

This is the step everyone thinks ML is about — but as you've seen, it's step 7 of 10.

Training means:

Feed data into the algorithm — the model sees examples and adjusts its internal parameters (weights) to minimize error
Split data into train/validation/test sets — typically 70/15/15 or 80/10/10. Never evaluate on data the model trained on.
Choose a loss function — the mathematical definition of "what is wrong." Cross-entropy for classification, MSE for regression, etc.
Set hyperparameters — learning rate, batch size, epochs, regularization strength. These are not learned by the model; you set them.
Iterate — training is rarely one-shot. You train, look at results, adjust, retrain.

Key concept: train/test split. If you evaluate your model on the same data it trained on, you get misleadingly high scores. It's like grading a student using the exact exam questions they practiced on.

8. Model Evaluation and Tuning — "How is your model doing?"

Training is done. Now: is the model actually good?

Evaluation Metrics

Different problems need different metrics:

Accuracy — % of correct predictions. Misleading with imbalanced data (99% accuracy on fraud detection means nothing if you just predict "not fraud" every time).
Precision — Of all things the model flagged as positive, how many were actually positive?
Recall — Of all actual positives, how many did the model catch?
F1 Score — Harmonic mean of precision and recall. Good when you need to balance both.
AUC-ROC — Area under the curve. Measures how well the model separates classes across all thresholds.
MSE / RMSE / MAE — For regression: how far off are predictions from actual values?

Hyperparameter Tuning

If results aren't good enough, adjust hyperparameters:

Grid Search — try every combination of predefined values
Random Search — randomly sample combinations (often faster than grid search)
Bayesian Optimization — smart search that learns from previous trials

Dealing with Problems

Overfitting (training score high, test score low) → more data, simpler model, regularization, dropout
Underfitting (both scores low) → more complex model, more features, longer training
Class imbalance → oversampling (SMOTE), undersampling, class weights, different metrics

This step often sends you back to steps 3, 4, 5, or 6. That's the iterative nature of ML.

9. Model Deployment — "Integrate model to the real world"

Your model works in a Jupyter notebook. Now it needs to work in production — handling real users, real data, and real scale.

Deployment means:

Packaging the model — save weights, serialize with ONNX, TorchScript, or pickle
Creating an API — wrap the model in a REST API (Flask, FastAPI) or gRPC endpoint
Infrastructure — where does it run? AWS SageMaker, Google Vertex AI, Azure ML, self-hosted Kubernetes, or edge devices?
Scaling — handle 10 requests/second? 10,000? Auto-scaling, load balancing, caching
CI/CD for ML — automated testing, model versioning, rollback capabilities

Common deployment patterns:

Real-time inference — API call, response in milliseconds (fraud detection, chatbot)
Batch inference — process large datasets periodically (weekly churn predictions, nightly recommendations)
Edge deployment — model runs on device (mobile app, IoT sensor, self-driving car)

Deployment is NOT the finish line. It's where the real work begins.

10. Monitoring and Maintenance — "Keep model healthy"

A deployed model is a living system. It degrades over time because the real world changes.

What to Monitor

Model performance — are accuracy/precision/recall staying stable?
Data drift — is incoming data different from training data? (seasonal changes, new user demographics, market shifts)
Concept drift — has the relationship between features and target changed? (what predicted churn in 2023 might not in 2026)
Latency — is inference speed within requirements?
Resource usage — CPU, memory, cost

When to Retrain

Performance drops below a threshold
Data distribution shifts significantly
Business requirements change
New data categories appear that the model has never seen

Best Practices

Set up automated alerts for performance degradation
Keep a champion/challenger system: new model version runs alongside the old one; switch only when the new one proves better
Log everything: predictions, input data, confidence scores. You'll need this for debugging.
Version your models like you version code. Know exactly which model version produced which prediction.

Monitoring is vital. A model that was 95% accurate at launch can silently drop to 70% if nobody's watching.

Summary — The Key Takeaways

ML is an iterative and structured process. It's a cycle, not a line. You will loop back to earlier steps repeatedly — that's normal, not failure.

Data quality and feature engineering are critical. Steps 3 and 5 have more impact on final model performance than the choice of algorithm at step 6. Good features beat fancy models.

Evaluation and tuning improve model performance. Don't ship the first model that trains. Rigorously evaluate, tune hyperparameters, and test on data the model has never seen.

Deployment isn't the end; monitoring is vital. The real world changes. Your model will degrade. Monitor, retrain, and iterate continuously.

The lifecycle is a loop. The best ML teams are the ones that spin through it fastest — not the ones with the fanciest models.

*If this helped you see the full picture of ML beyond "just training," drop a reaction.

Two Main Sources of ML Models: Pre-trained vs Custom — Which One Should You Use?

Ege Pakten — Mon, 20 Apr 2026 09:07:42 +0000

You want to build an AI-powered feature. Maybe a chatbot, an image classifier, or a recommendation engine. The first decision you'll face is: should I use someone else's model or train my own? This post breaks down both paths, when to pick which, and the real-world tradeoffs nobody tells you about upfront.

Why This Matters

Before you write a single line of code, this one decision determines three things: how fast you ship, how much you spend, and how well your product works. Pick wrong and you'll either waste months reinventing something that already exists, or ship a generic solution that doesn't actually solve your problem.

Source 1: Open Source Pre-trained Models

What Are They?

Pre-trained models are models that someone else (usually a big tech company or research lab) has already trained on massive datasets. They've spent the GPU hours, the engineering time, and the data collection effort. You just download and use.

Think of it like buying a car vs building one from scratch. The car (pre-trained model) already works — you just need to learn to drive it and maybe customize the seats.

Real Examples

Here are some of the most well-known pre-trained models and what they do:

Natural Language Processing (NLP):

BERT (Google) — understands text, great for search, Q&A, classification
GPT (OpenAI) — generates text, powers ChatGPT
LLaMA (Meta) — open-weight LLM, fine-tunable for custom use
T5 (Google) — text-to-text framework, versatile for many NLP tasks

Computer Vision:

ResNet — image classification, object detection
YOLO (You Only Look Once) — real-time object detection
CLIP (OpenAI) — connects text and images in the same space

Speech AI:

Whisper (OpenAI) — speech-to-text in 99 languages
Wav2Vec (Meta) — speech recognition with minimal labeled data

Healthcare & Science:

AlphaFold (DeepMind) — protein structure prediction
BioGPT (Microsoft) — biomedical text generation

Art & Creative:

Stable Diffusion — text-to-image generation
MusicGen (Meta) — text-to-music

Benefits

Saves development time. Training a model like BERT from scratch would take weeks on expensive GPUs. Downloading it takes 30 seconds.

Great for transfer learning. This is a key concept. The model has already learned general patterns (what a face looks like, how sentences are structured, etc.). You can then fine-tune it on your specific data to adapt it for your use case — often with very little data.

For example: BERT was trained on all of Wikipedia and BookCorpus. If you fine-tune it on 1,000 customer support tickets, it can classify support requests with 90%+ accuracy — because it already understands language, it just needs to learn your categories.

Community and ecosystem. Popular pre-trained models have huge communities, tutorials, and tooling. Platforms like Hugging Face host thousands of pre-trained models you can try in minutes.

When to Use Pre-trained Models

You're solving a common problem (text classification, image recognition, translation, Q&A)
You need to ship fast (prototype in days, not months)
You have limited data (transfer learning works with small datasets)
You have limited budget (no GPU cluster needed)
Your domain isn't extremely specialized

Source 2: Training Custom Models

What Are They?

You collect your own data and train a model from scratch — or from a very early starting point. You design the architecture, curate the dataset, define the training process, and iterate until it works.

When to Use Custom Models

When pre-trained models aren't specific enough. If you're doing something that generic models haven't seen — like detecting manufacturing defects on a very specific assembly line, or classifying rare medical conditions from proprietary scan data — a pre-trained model won't cut it.

For proprietary, domain-specific problems. Some industries have data that no public model has ever seen: financial fraud patterns unique to your bank, satellite imagery of your specific crop types, sonar data for underwater pipeline inspection. These need custom models.

Benefits

Tailored performance. A custom model trained on your data for your problem will almost always outperform a generic one — if you have enough data and expertise.

Control over training data and bias handling. You decide exactly what goes in. You can audit, clean, balance, and de-bias your dataset. With a pre-trained model, you inherit whatever biases were in their training data — and you often can't even see what that data was.

The Real Cost

Training custom models is expensive in multiple ways:

Data collection and labeling — You need thousands to millions of labeled examples. Getting quality labels is slow and costly (think: paying domain experts to annotate medical images).
Compute — Training from scratch needs powerful GPUs/TPUs, often for days or weeks. A single training run for a large model can cost $10,000–$100,000+.
Expertise — You need ML engineers who understand model architecture, loss functions, hyperparameter tuning, evaluation metrics, overfitting, and deployment.
Time — Months of iteration before you have something production-ready.

The Decision Framework

Here's how to actually choose. Ask yourself these questions in order:

Question 1: Has someone already solved this?

Search Hugging Face, TensorFlow Hub, PyTorch Hub. If a pre-trained model exists for your exact task — use it. Don't reinvent the wheel.

Question 2: Is my problem close to something that's been solved?

If yes, start with a pre-trained model and fine-tune it. This is the sweet spot for most companies. You get 80% of the benefit of custom training at 10% of the cost.

Fine-tuning means: take a pre-trained model, freeze most of its layers, and retrain just the last few layers on your specific data. The model keeps its general knowledge but adapts to your task.

Question 3: Is my problem truly unique?

If your data is proprietary, your domain is narrow, no pre-trained model comes close, and you have the budget and expertise — then train custom.

But even here, most teams start from a pre-trained base and do heavy fine-tuning rather than literally training from a random weight initialization.

The Spectrum

In practice, it's rarely a binary choice. Think of it as a spectrum:

Use as-is ← Fine-tune ← Heavy fine-tune ← Train from scratch
(cheapest)                                    (most expensive)

Most real-world ML projects land somewhere in the middle.

Key Terms You Should Know

Transfer Learning — Taking a model trained on one task and adapting it for a different (but related) task. The core technique that makes pre-trained models so powerful.

Fine-tuning — Retraining some or all layers of a pre-trained model on your specific dataset. Keeps general knowledge, adds specialized knowledge.

Feature Extraction — Using a pre-trained model as a fixed "feature extractor" — you feed data through it, take the intermediate representations, and train a simple classifier on top. Even cheaper than fine-tuning.

Hyperparameters — Settings you choose before training starts: learning rate, batch size, number of epochs, model architecture. Getting these right is critical and often requires experimentation.

Overfitting — When a model performs great on training data but poorly on new data. More common with small datasets and complex models. Regularization, dropout, and data augmentation help prevent it.

Data Augmentation — Artificially expanding your dataset by creating modified versions of existing data (rotating images, adding noise, paraphrasing text). Especially useful when you have limited training data.

Model Hub — Platforms that host thousands of pre-trained models ready to download: Hugging Face, TensorFlow Hub, PyTorch Hub, AWS SageMaker JumpStart.

Inference — Using a trained model to make predictions on new data. This is what happens in production. Different from training — inference is cheap and fast.

Common Mistakes

Mistake 1: Training from scratch when a pre-trained model exists. This is the most common waste of time and money in ML projects. Always search first.

Mistake 2: Using a pre-trained model without evaluation. Just because a model is on Hugging Face doesn't mean it works for your data. Always benchmark it on your own test set before shipping.

Mistake 3: Ignoring bias. Pre-trained models inherit biases from their training data. If you're deploying in a sensitive domain (hiring, lending, healthcare), you must audit for fairness.

Mistake 4: Underestimating data quality. A custom model is only as good as its training data. Garbage in, garbage out — no amount of GPU power fixes bad labels.

Mistake 5: Skipping the "boring" baseline. Before reaching for deep learning, try a simple approach (logistic regression, decision tree, keyword matching). Sometimes it's 90% as good at 1% of the complexity.

Wrapping Up

Pre-trained models are great for common tasks and faster deployment. They save you time, money, and let you benefit from massive datasets and research you couldn't replicate yourself.

Custom models offer flexibility for solving unique or proprietary problems. They give you full control over performance and bias, but at significantly higher cost and complexity.

Choosing the right source depends on your goal, your data, and your resources. Most teams start with pre-trained, fine-tune to their needs, and only train from scratch when they absolutely must.

The best ML engineers aren't the ones who build everything from zero — they're the ones who know when not to.

If this helped, drop a reaction. More AWS notes coming soon.

What is RAG? A Beginner's Guide to Retrieval-Augmented Generation (With a Full Pipeline Walkthrough)

Ege Pakten — Sat, 18 Apr 2026 09:21:50 +0000

If you've ever wondered how ChatGPT-style apps can suddenly "know" about your company's internal documents, product manuals, or legal files without being retrained, the answer is almost always RAG — Retrieval-Augmented Generation. In this post, we'll break down what RAG is, why it exists, and walk through the full pipeline step-by-step with a real example.

1. What is RAG?

Retrieval-Augmented Generation (RAG) is an AI framework that integrates an information retrieval component into the generation process of Large Language Models (LLMs) to improve factuality and relevance.

In plain English:

Instead of making the LLM remember everything, we let it look things up in a knowledge base right before answering.

The term RAG was coined in a 2020 research paper by Patrick Lewis et al. ("Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks") published on arXiv. The core insight: combine a parametric memory (the LLM's weights) with a non-parametric memory (a searchable document store) — and you get the best of both worlds.

2. Why RAG? The Motivation

Three big problems drove the invention of RAG:

LLM Limitations

LLMs are frozen snapshots. Once a model is trained, it only knows what was in its training data. It doesn't know:

What your company policies say
What happened after its training cutoff
What's in your private documents
What yesterday's sales numbers were

And even with what it does know, it can hallucinate confidently.

Cost of Retraining vs. Dynamic Retrieval

You could retrain or fine-tune the model every time your data changes. But:

Retraining a large model can cost tens of thousands to millions of dollars
It takes days or weeks
You have to do it again every time the data updates

Dynamic retrieval (looking things up at query time) is vastly cheaper and always up-to-date.

Need for Grounded, Up-to-Date Knowledge

For regulated industries (finance, healthcare, legal), you can't ship answers that come from "the model's memory." You need answers backed by sources you can cite and audit.

RAG addresses all three challenges by decoupling knowledge from the model.

3. The RAG Pipeline — Step-by-Step With a Real Example

This is the part most tutorials rush through. We're going to slow down.

Let's use a concrete example. Imagine you're building an internal developer assistant at a company called Acme Corp. Employees can ask it questions about the engineering handbook, API docs, and on-call runbooks.

A developer asks:

"How do I rotate the database credentials for the billing service?"

Here's exactly what happens behind the scenes.

Phase 1: Indexing (Done Once, Ahead of Time)

Before anyone can ask anything, we need to prepare the knowledge base.

Step 1a — Knowledge Corpus

First, we gather every document we want the assistant to know about:

The engineering handbook (Markdown files)
API documentation (HTML + Swagger specs)
Runbooks (Confluence pages)
Past incident post-mortems (Google Docs)
Security policies (PDFs)

Let's say this gives us 8,000 documents.

Step 1b — Document Chunking

An LLM can't efficiently search through a 50-page PDF. And you don't want to return a whole 50-page PDF to the user either — you want the one paragraph that actually answers their question.

So we chunk each document into smaller pieces. A common approach:

500 tokens per chunk (~300 words)
50 token overlap between chunks (so we don't split an idea across a boundary)

One chunk in our knowledge base might look like this:

[Chunk #4729 — Source: runbooks/billing-service.md]
"To rotate database credentials for the billing service:
1. Generate a new password in AWS Secrets Manager.
2. Update the 'billing-db' secret with the new value.
3. Trigger a rolling restart via: kubectl rollout restart deploy/billing.
4. Verify health endpoints return 200 OK.
5. Revoke the old credentials after 24h grace period."

After chunking, our 8,000 documents become maybe 120,000 chunks.

Step 1c — Vector Embeddings

For each chunk, we call an embedding model (like BERT, OpenAI's text-embedding-3-small, or Cohere's embedder). This turns each chunk into a vector — a list of ~1,536 numbers that represents the meaning of that chunk.

Chunk #4729 → [0.12, -0.08, 0.44, ..., 0.91]   (1,536 numbers)

Step 1d — Vector Database

We store all 120,000 of these vectors in a vector database — something like FAISS, Pinecone, Weaviate, Milvus, or Qdrant. The database indexes them so we can search across all of them in milliseconds.

Indexing is done. This usually runs as a background job, and you only re-run it when documents change.

Phase 2: Retrieval (Happens at Query Time)

Now a developer types:

"How do I rotate the database credentials for the billing service?"

Step 2a — User Query

The question comes in as plain text.

Step 2b — Query Embedding

We run the same embedding model on the question, producing a query vector:

Query → [0.15, -0.11, 0.48, ..., 0.87]

This is critical: you must embed the query with the same model you used to embed the chunks, otherwise the vectors live in different spaces and similarity becomes meaningless.

Step 2c — Similarity Search

Now we ask the vector database: "Which chunks have vectors closest to this query vector?"

Closeness is measured with a similarity metric, most commonly cosine similarity — it measures the angle between two vectors. The smaller the angle, the more similar the meaning.

Under the hood, the database uses Approximate Nearest Neighbors (ANN) tricks to search 120,000 vectors in ~5 milliseconds instead of comparing one by one.

Step 2d — Relevant Passages

The database returns the top-k most similar chunks (typically k=3 to k=10). For our query, we might get:

1. Chunk #4729 (score 0.94) — billing-service runbook, credential rotation
2. Chunk #3180 (score 0.89) — AWS Secrets Manager general guide
3. Chunk #5512 (score 0.85) — rolling restart playbook

These are the passages most likely to contain the answer.

Phase 3: Augmentation

Now we have relevant chunks, but we don't just show them to the user. We want the LLM to write a nice, synthesized answer using them.

Step 3a — Original Prompt

The user's raw question:

"How do I rotate the database credentials for the billing service?"

Step 3b — Augmented Prompt

We wrap it in a prompt template that injects the retrieved chunks as context:

You are Acme Corp's internal engineering assistant.
Answer the user's question using ONLY the context below.
If the answer isn't in the context, say you don't know.

---CONTEXT---
[Chunk #4729]: To rotate database credentials for the billing
service: 1. Generate a new password in AWS Secrets Manager...
[Chunk #3180]: AWS Secrets Manager allows you to store and
rotate database credentials...
[Chunk #5512]: Rolling restarts can be triggered via kubectl
rollout restart deploy/<service-name>...
---END CONTEXT---

USER QUESTION: How do I rotate the database credentials
for the billing service?

This is the "A" in RAG — Augmentation. We're augmenting the prompt with retrieved knowledge.

Phase 4: Generation

Step 4a — LLM Call

We send the augmented prompt to an LLM (GPT-3/4/5, Claude, LLaMA, T5, Mistral — your choice).

Because the model has the actual documentation right there in its context window, it doesn't need to remember anything — it just reads and synthesizes.

Step 4b — Final Response

The model produces a grounded answer:

"To rotate the database credentials for the billing service:

Generate a new password in AWS Secrets Manager.

Update the billing-db secret with the new value.

Trigger a rolling restart with kubectl rollout restart deploy/billing.

Verify that the health endpoints return 200 OK.

Revoke the old credentials after a 24-hour grace period.

Source: runbooks/billing-service.md"

Notice three things:

The answer is accurate — it comes from your actual docs.
The answer is current — if you update the runbook, the next query uses the new version. No retraining needed.
The answer can be cited — you know exactly which document it came from.

That's the whole RAG pipeline. Indexing → Retrieval → Augmentation → Generation.

4. The Retrieval Component in Detail

Three pieces make retrieval work:

Embedding Models

The model that turns text into vectors. Examples: BERT, text-embedding-3-small, Cohere Embed, Sentence-BERT. Choose one that's trained well for your language and domain.

Vector Stores

Databases optimized for vector similarity search. Popular options: FAISS (local, Facebook), Pinecone (managed), Weaviate, Milvus, Qdrant, and pgvector (Postgres extension).

Similarity Metrics

How we measure "closeness" between vectors. The go-to is cosine similarity, but Euclidean distance and dot product also show up. Cosine similarity is popular because it ignores vector length and focuses on direction — which is what semantic meaning lives in.

5. Augmentation & Generation in Detail

Prompt Templates

The structure that tells the LLM how to use the retrieved context. Good templates specify:

The assistant's role
What to do if context is missing
Output format (JSON, bullet points, prose)
Citation rules

Managing Model Context

The LLM only has so much context window. If retrieval returns 30 chunks but each chunk is 500 tokens, that's 15,000 tokens just for context. You have to:

Pick top-k carefully (more isn't always better)
Rerank retrieved chunks
Sometimes summarize chunks before injection

LLM Choices

Any generative LLM can work: GPT-3/4/5, T5, LLaMA, Claude, Mistral, Gemini. The RAG pipeline is mostly model-agnostic.

6. Applications and Benefits

RAG is behind a huge number of real-world AI products:

Knowledge-centric chatbots — customer support bots grounded in your docs
Document summarization & Q&A — ask questions about contracts, research papers, medical records
Enterprise search & knowledge management — "Glean for your company" style tools

Benefits:

No retraining required when data changes
Answers are traceable back to sources
Private data stays in your vector DB — never baked into model weights
Cheaper than fine-tuning for most use cases
Can mix multiple knowledge bases with one model

7. Challenges and Future Directions

RAG isn't magic. Here are the real tradeoffs.

Source Reliability & Bias

Garbage in, garbage out. If your knowledge base has outdated or biased content, your RAG system will confidently repeat it. Curation matters.

Latency & System Complexity

A RAG query is actually: embed → ANN search → rerank → build prompt → LLM call. That's a lot of moving parts. Each step adds latency, and each step can fail. Production RAG systems require serious observability.

Privacy & Security Safeguards

Your vector DB now contains sensitive content. Access control, encryption, and embedding leakage (yes, embeddings can leak information) all need attention.

Research Frontiers

Where RAG is heading:

Multi-hop retrieval — answering questions that need multiple retrieval rounds (e.g., "Find X, then look up Y for X")
Adapters — lightweight modules that specialize the LLM for using retrieved content better
Self-improving retrieval — the model learns which retrievals helped and which didn't

Wrapping Up — The TL;DR

What is RAG?
A framework that lets an LLM look things up in a knowledge base before answering, so its responses are grounded in real, current, specific information.

Why does it exist?
Because retraining is expensive, LLMs hallucinate, and most real-world apps need to answer from your data — not what the model memorized during training.

How does the pipeline work?

Indexing — Chunk documents → embed each chunk → store vectors in a vector DB.
Retrieval — Embed the user query → find nearest chunks with cosine similarity.
Augmentation — Inject retrieved chunks into a prompt template.
Generation — Send augmented prompt to an LLM → return grounded answer.

Where do you use it?
Anywhere you need an AI that can answer from your own content: internal docs, product support, legal research, medical Q&A, academic search, knowledge management, developer assistants, and more.

Once you understand the four-phase pipeline — Indexing, Retrieval, Augmentation, Generation — every RAG system you encounter becomes a variation on the same theme.

If this helped you finally "get" RAG, drop a reaction. More notes coming soon.

Embeddings Explained: The Secret Language AI Uses to Understand the World

Ege Pakten — Sat, 18 Apr 2026 09:20:32 +0000

If you've ever wondered how ChatGPT "knows" that king and queen are related, or how Spotify recommends songs you actually like, the answer is almost always the same: embeddings. This post breaks down what embeddings are, how they work, where they're used, and what you can actually do with them — no PhD required.

1. What Are Embeddings?

At their core, embeddings are just numbers — more specifically, a list of numbers (a vector) that represents something like a word, a sentence, an image, or even a user.

Computers don't understand the word "cat." They understand numbers. So we need a way to turn "cat" into numbers in a way that preserves its meaning. That's what an embedding does.

Simple example:

"cat"  → [0.21, -0.44, 0.89, 0.12, ..., 0.03]   (e.g., 768 numbers)
"dog"  → [0.19, -0.41, 0.85, 0.15, ..., 0.06]
"car"  → [-0.72, 0.31, -0.12, 0.88, ..., -0.44]

Notice how cat and dog have similar-looking numbers, while car looks very different. That's not an accident — it's the whole point. Similar meanings produce similar vectors.

The key idea: Embeddings are a way of placing concepts on a giant invisible map, where things that mean similar things end up close together, and things that mean different things end up far apart.

2. How Do Embeddings Work?

Embeddings don't appear out of nowhere. They're learned by a model during training. There are three core mechanisms worth understanding:

a) Self-Supervised Contrastive Learning

The model looks at massive amounts of raw data (text, images, etc.) and learns by playing a game: "pull similar things together, push dissimilar things apart."

For example, during training:

A sentence and a slightly rephrased version of it → should be close
A sentence about cats and a sentence about quantum physics → should be far apart

No human has to label anything. The model figures it out from the structure of the data itself. That's the "self-supervised" part.

b) Contextual Embeddings

Older embeddings gave every word a single fixed vector. That's a problem, because words can mean different things in different contexts:

"I deposited money at the bank." (financial institution)
"We had a picnic by the river bank." (side of a river)

Modern embeddings (like those from BERT or GPT) generate a different vector depending on the surrounding words. The model reads the whole sentence first, then decides what "bank" means here.

c) Dimensionality Reduction

Raw data (like a full image or a giant sparse word matrix) has way too many numbers. Embeddings compress this into a smaller, dense, meaningful representation — typically 256, 512, 768, or 1536 dimensions.

Think of it like writing a movie review: instead of describing every pixel in every frame, you capture the essence in a paragraph.

3. Embeddings Deep Dive

Let's go one layer deeper. Three properties make embeddings actually useful:

Mapping to a Vector Space

Every piece of data becomes a point in a multi-dimensional space. You can't visualize 768 dimensions, but you can imagine a 3D version:

        cat •
   dog •
         • kitten
                               • airplane
                                      • rocket

Cats, dogs, and kittens cluster together. Airplanes and rockets cluster together. The space itself has meaning baked into distance and direction.

Preserving Semantic Relationships

The famous example:

king - man + woman ≈ queen

You can literally do math on meanings. This works because "royalty," "gender," and other concepts become directions in the embedding space.

Efficient Processing

Once everything is a vector, you can do fast operations on millions or billions of items:

Compare two things? → compute cosine similarity (one quick math operation)
Find the nearest match? → use Approximate Nearest Neighbors (ANN)
Cluster similar items? → run k-means

This is why embeddings power huge real-world systems.

4. Types of Embeddings

Not all embeddings are created equal. Here are the major families:

Static Word Embeddings — Word2Vec, GloVe

These were the breakthrough that started it all. Each word gets exactly one vector, learned from how words co-occur in giant text corpora.

Pros: Fast, simple, very cheap to use.
Cons: Can't handle context ("bank" is always the same vector).

Contextual Embeddings — ELMo, BERT

These read the whole sentence and produce a vector for each word in context.

Pros: Much more accurate for real language understanding.
Cons: Heavier to compute, need a bigger model.

Sentence / Document Embeddings — Universal Sentence Encoder, Sentence-BERT

Instead of one vector per word, you get one vector for an entire sentence, paragraph, or document. Super useful for search, clustering, and classification.

Multimodal Embeddings — CLIP

These put text and images in the same vector space. A photo of a beach and the sentence "a sunny day at the ocean" end up close together. This is what powers most modern image search and text-to-image tools.

5. Key Use Cases — Where Embeddings Actually Shine

This is the "so what" section. Here's what you can build with embeddings.

Semantic Search

Forget keyword matching. With embeddings, a user can search for:

"How do I stop my laptop from overheating?"

…and you can return a document that says:

"Thermal management tips for portable computers"

No shared keywords, but the meaning is almost identical — and the vectors are close. This is the foundation of modern search, documentation bots, and RAG (Retrieval Augmented Generation).

Clustering and Recommendation

Group similar items automatically. Examples:

Netflix grouping movies you'd like based on what you've watched
Spotify building "Discover Weekly" playlists
Customer segmentation for marketing
Automatically grouping support tickets by topic

Anomaly Detection

If everything "normal" clusters in one region of the vector space, then anything far away from that cluster is probably weird. This is used for:

Credit card fraud detection
Network intrusion detection
Spotting defective products on factory lines
Finding unusual user behavior

Classification

Train a lightweight classifier on top of embeddings for things like spam detection, sentiment analysis, or intent recognition. You get great accuracy with almost no data.

6. Properties and Best Practices

If you're going to actually use embeddings, here are the things that matter in practice.

Normalize Your Vectors

Most similarity math works better when vectors are normalized to length 1. This means you're comparing direction, not magnitude — which is usually what you want semantically.

Pick the Right Dimensionality

Smaller (128–384): Faster, cheaper storage, less memory. Good for mobile or massive-scale systems.
Larger (768–1536+): More expressive, better accuracy, higher cost.

There's no free lunch. Start small, go bigger only if quality suffers.

Use Proper Indexing

If you have millions of vectors, you can't compare them one by one. Use a vector database or library:

FAISS (Facebook)
Pinecone
Weaviate
Milvus
Qdrant

These use tricks like ANN (Approximate Nearest Neighbors) to search billions of vectors in milliseconds.

Match the Embedding Model to Your Task

A general-purpose embedding model is fine to start. But for specialized domains (medical, legal, code), a fine-tuned or domain-specific model will often double your accuracy.

7. Challenges and Limitations

Embeddings are powerful, but they are not magic. Know the tradeoffs.

Memory and Compute

Storing a billion 1536-dimensional float vectors is not cheap. High-dimensional search can get expensive quickly. You'll eventually need to think about quantization, sharding, and cost.

Privacy and Data Leakage

Here's something that surprises most people: embeddings can leak information. Even though a vector looks like "just numbers," research has shown attackers can sometimes reconstruct or infer parts of the original text from an embedding ("embedding inversion attacks").

If you're embedding sensitive data (medical records, private messages, internal docs), treat the embeddings themselves as sensitive and protect them like you would the raw data.

Interpretability

A 1536-dimensional vector is a black box. You can't easily explain why two things are close. For regulated industries (finance, healthcare, EU AI Act compliance), this is a real concern.

Bias

Embeddings learn from data, and data contains human biases. If your training text associates certain jobs with certain genders, your embeddings will too — and any downstream system will inherit that bias.

8. Future Directions

Where is this all heading?

Hierarchical Embeddings

Instead of one flat vector, future systems will learn representations at multiple levels — word → sentence → paragraph → document — all connected, all meaningful.

Continual and Federated Learning

Today, most embedding models are trained once and frozen. The future is models that keep learning safely, updating over time without forgetting old knowledge — and learning across devices (federated learning) without centralizing private data.

Richer Multimodal Embeddings

Text + image is just the beginning. Expect models that unify text, image, audio, video, sensor data, and 3D scenes all in the same space. Search "the sound of rain on a metal roof" and get back audio clips and matching videos.

Wrapping Up — The TL;DR

Let's tie it all together.

What is an embedding?
A list of numbers that represents the meaning of something (a word, image, sentence, user, product) in a way a computer can work with.

Where are embeddings used?
Semantic search, RAG systems, recommendations, clustering, anomaly detection, fraud detection, classification, and multimodal search — basically anywhere you need a machine to understand "similarity" or "meaning."

What can you actually do with them?

Build a search engine that understands meaning, not just keywords
Power a chatbot with RAG using your own documents
Detect fraud, spam, or defects
Group customers, songs, movies, or articles automatically
Search images with text, or text with images
Add semantic understanding to almost any existing product

Embeddings are the quiet backbone of almost every modern AI system. You won't see them in the UI — but they're doing most of the real work behind the scenes. Once you understand embeddings, a huge amount of what seems "magical" about modern AI suddenly makes sense.

*If this helped you click on what embeddings really are, drop a reaction.

AI Agents Explained: 5 Types, Components, Frameworks, and Real-World Use Cases

Ege Pakten — Sun, 12 Apr 2026 09:55:27 +0000

AI agents are no longer just chatbots. They think, plan, use tools, and work together to solve complex problems autonomously. In this post, I break down the 5 types of AI agents, how they work, ReAct vs ReWOO frameworks, multi-agent systems, and guardrails — written so anyone can understand.

Introduction: AI Doesn't Just "Answer" Anymore

For most of us, our first experience with AI was simple: ask a question, get an answer, done. But there is a quiet revolution happening. AI agents are systems that don't just respond — they think, plan, use tools, and even talk to each other to solve complex tasks without human hand-holding.

This post is based on notes I took from IBM's "What Are AI Agents?" page. My goal is to write something clear enough that even someone who has never heard the term "AI agent" can walk away understanding the full picture.

What Is an AI Agent?

The simplest definition:

An AI agent is an AI tool that can autonomously perform complex tasks that would otherwise require human involvement.

The key word is "autonomously." A regular chatbot needs you to tell it what to do at every step. An AI agent takes a goal, plans how to reach it on its own, selects the right tools, and delivers the result.

Agentic vs Non-Agentic: What's the Difference?

Not every AI is an "agent." This distinction matters.

Non-Agentic AI:

No tools — can only say what it already knows
No memory — doesn't remember previous conversations
Limited reasoning — can't plan ahead
Needs constant human input to function
Example: A basic FAQ chatbot

Agentic AI:

Has access to tools — can search the web, call APIs, read files
Has memory — remembers past interactions and learns from them
Performs reasoning — creates step-by-step plans to reach goals
Works autonomously — minimal human intervention needed
Example: A hospital agent that handles insurance authorizations end to end

Think of it this way: non-agentic AI is a calculator — press a button, get a result. Agentic AI is an accountant — say "prepare my tax return" and it figures out the rest.

The 3 Core Components of Every AI Agent

No matter how simple or complex, every AI agent runs on three building blocks:

1. Goal (from the User)

Everything starts with a goal. The user tells the agent what needs to happen: "get insurance approval for this patient" or "find broken links on my website and report them." The agent takes this goal and performs task decomposition — breaking it into smaller, manageable subtasks.

2. Tools

Agents rarely have all the information they need on their own. So they reach out to external tools. These can be web search engines, APIs (weather, stock market, health records), databases, or even other AI agents (yes, an agent can use another agent as a tool).

3. Agentic Reasoning

This is the agent's "brain." It evaluates the information it perceives, uses its memory, and selects the best action to move toward the goal. It uses conditional logic, heuristics, and feedback loops to make decisions continuously.

The 5 Types of AI Agents

According to IBM, there are 5 distinct types, from simplest to most sophisticated:

1. Simple Reflex Agent

The most basic type. It works on "if X happens, do Y" rules. No memory, no planning — just pre-programmed reflexes.

Real-world analogy: A night light that turns on automatically when the room gets dark. Sense → react, nothing else.

Example: A temperature sensor that turns on the AC when it hits 30°C. It doesn't know why the temperature rose — it just follows the rule.

Limitation: If it encounters a situation it has no rule for, it's stuck.

2. Model-Based Reflex Agent

One step above simple reflex. The key difference: it has memory. It maintains a model of its environment and can operate in changing, partially observable settings.

Real-world analogy: A robot vacuum. It maps the room, remembers which areas it already cleaned, and navigates around furniture. It won't re-clean the same spot because it remembers it was already done.

Limitation: Still rule-based — smarter, but can't go beyond its predefined rules.

3. Goal-Based Agent

This is where things get serious. This agent doesn't just react — it has a goal and uses planning and reasoning to achieve it.

Real-world analogy: GPS navigation. When you say "take me to the airport," it doesn't just look at the current street. It examines the map, calculates traffic, evaluates alternative routes, and picks the best path.

Example: A customer service agent. Goal: "resolve the customer's issue." The agent understands the problem, searches a knowledge base, escalates if needed — always moving toward the goal.

4. Utility-Based Agent

Takes goal-based thinking one step further. This agent doesn't just achieve the goal — it aims to achieve it in the best possible way. It uses a utility function to evaluate different actions and pick the one that maximizes overall benefit.

Real-world analogy: An investment advisor. Doesn't just say "make money." It considers your risk tolerance, market conditions, time horizon, and recommends the strategy with the best balance of risk and reward.

Example: A logistics optimization agent. There are many ways to deliver a package, but this agent calculates the cheapest, fastest, and least risky route.

5. Learning Agent

The most advanced type. It has all the capabilities of the other types, plus the ability to learn. It improves with every experience, updates its knowledge base, and makes better decisions over time.

Real-world analogy: A medical resident. On day one, they're inexperienced, but with every patient interaction they learn. Years later, their diagnoses are far more accurate because of accumulated experience.

Example: An e-commerce recommendation engine. At first, it recommends the same products to everyone. Over time, it learns individual preferences and delivers personalized suggestions.

AI Agent Frameworks: ReAct vs ReWOO

How do agents actually "think"? Different architectural approaches have been developed. Two stand out:

ReAct (Reasoning + Acting)

ReAct follows a "Think → Act → Observe" loop. The agent pauses at each step, thinks, and plans its next move.

Flow:

Agent looks at the goal and thinks: "I need this information to solve this"
Uses a tool (e.g., web search)
Observes the tool's result
Thinks again: "Is this enough, or do I need more?"
Loop continues until the goal is met

Advantage: Highly flexible — can adapt its strategy at every step and handle unexpected results.

Disadvantage: The "thinking" at each step burns extra tokens and takes time. Higher cost and latency.

ReWOO (Reasoning Without Observation)

ReWOO follows a "plan first, execute later" approach. The agent does all its thinking upfront, creates a complete plan, and then executes the steps sequentially.

Flow:

Agent looks at the goal and generates a complete plan: "Step 1 is this, Step 2 is this, Step 3 is this"
Executes the plan in order — doesn't rethink at each step
Collects results

Advantage: Uses approximately 80% fewer tokens than ReAct. Much cheaper and faster.

Disadvantage: Struggles when it has limited context about its environment. Harder to handle unexpected situations since the plan was made upfront.

Which Should You Use?

Dynamic, unpredictable environments (customer service, chat, research) → ReAct
Well-defined, repetitive tasks (data processing, report generation, batch operations) → ReWOO

Multi-Agent Systems

A single agent is powerful, but some problems need a team. In multi-agent systems, multiple specialized agents work together.

Why isn't one agent enough?

Think of a human team: one person can't be the researcher, editor, and graphic designer efficiently. Same logic applies to AI agents.

Multi-agent structure example:

Researcher Agent: Gathers data and finds sources
Critic Agent: Audits data quality and fact-checks
Writer Agent: Synthesizes results and produces output

Each agent focuses on its specialty. According to IBM, multi-agent systems produce higher quality and more reliable outcomes than single agents.

Real-world example: A supply chain management system. One agent monitors inventory levels, another forecasts demand, a third optimizes logistics routes. They communicate with each other to make holistic decisions.

AI Agent Memory: Why It Matters

The real power of agents lies in their memory. Traditional AI processes each task independently — it doesn't remember yesterday's conversation. Agents use multiple types of memory:

Working Memory: Holds information about the active task. Like knowing which sources you've already checked during a research assignment.

Long-term Memory: Stores past experiences and learnings. When a similar task comes up again, the agent draws on previous experience.

Procedural Memory: Stores learned skills and automated behaviors. When an agent repeats a complex operation it has done before, it doesn't need to reason through every step from scratch.

AI Guardrails: The Safety Barriers

Agents are powerful, but powerful tools can be dangerous. Guardrails are boundaries an agent should never cross.

IBM defines guardrails like highway barriers: they don't slow the car down, but they keep it from going off the road.

What guardrails protect against:

Harmful content generation — preventing offensive, misleading, or dangerous outputs
Sensitive data exposure — preventing the agent from leaking personal or confidential information
Authority overreach — preventing the agent from making decisions outside its scope

With proper guardrails, agents can improve continuously while staying safe.

Final Thoughts

The thing that struck me most reading IBM's page was this: AI agents are no longer a technical concept sitting in research papers. They are real operational systems running in hospitals, supply chains, customer service centers, and more.

If you're new to this space, here's my suggested learning path:

Understand the 5 agent types — know which one fits which problem
Learn the ReAct vs ReWOO difference — framework choice directly affects cost and performance
Never skip guardrails — a powerful agent without controls is a dangerous agent
Think multi-agent — a single agent can do everything, but a team always does it better

I study one new AI topic per day from IBM's resources. This approach has been great for building deep understanding fast. I highly recommend the same: read, watch, but most importantly write. Writing forces you to truly understand the material — far more than passive consumption.

This post is based on IBM's "What Are AI Agents?" page. For more detail, I recommend checking the original source.

What Is AgentOps? A Beginner-Friendly Guide Using a Real Hospital Use Case

Ege Pakten — Sat, 11 Apr 2026 08:20:07 +0000

AI agents are no longer just chatbots; they are autonomous workers running real operations. But how do we know they are doing a good job? AgentOps is the answer. In this post, I break down the three layers of AgentOps using a hospital scenario anyone can follow.

Introduction: AI Agents Have Grown Up

A few years ago, when most people heard "AI", they pictured a simple chatbot: you ask a question, you get an answer, end of story. That world is gone.

Today's AI agents can think for themselves, talk to other software systems, read and write files, call APIs, and even hand off tasks to other agents. They are starting to behave less like a search box and more like a junior employee who shows up every day, takes assignments, and tries to get things done.

This is exciting. But it raises a serious question: how do we know these "employees" are doing a good job? What happens if an agent makes a mistake — especially in a setting like a hospital where mistakes can affect a patient's life?

This is exactly the gap that AgentOps fills. This post is built on notes I took while watching an IBM Technology video on the topic. My goal is to write something that even someone who has never heard the term "AI agent" can read and walk away understanding.

So, What Is AgentOps?

Here is the simplest way to define it:

AgentOps is the discipline of managing, improving, and monitoring AI agents in production.

If you have a software background, you may know "DevOps". DevOps is about continuously running, watching, and improving software systems. AgentOps is the same idea, but for AI agents. The difference is that the thing you are monitoring is not a static piece of code — it's a non-deterministic, decision-making "AI worker".

AgentOps is built on three layers:

Observability
Evaluation
Optimization

We will walk through each layer using one running example so the concepts don't feel abstract.

Our Running Example: Two AI Agents in a Hospital

Imagine a hospital. A doctor prescribes a new medication to a patient. Before the patient can pick it up, the insurance company needs to approve it. Traditionally this is a long, painful process involving phone calls, faxes, paperwork, and lots of human waiting.

Now imagine we automate it with two AI agents:

1. Clinical Documentation Agent

This agent connects to the hospital's Electronic Health Record (EHR) system. It pulls the doctor's notes, lab results, the patient's medical history, and any prior treatments. Then it bundles all the relevant information into a clean package that an insurance company would expect.

2. Payer Authorization Agent

This second agent takes that package, logs into the insurance company's portal, fills out the authorization form, submits it, and waits for approval. Once it gets a green light, it notifies the pharmacy and the doctor.

These two agents talk to each other (we call this A2A, short for agent-to-agent) and they each call out to external systems like the EHR, the insurance portal, and the pharmacy.

Sounds great. But how reliable is this system? How fast? How expensive? What if it makes a mistake one day? That is why we need AgentOps.

Layer 1: Observability

Observability answers the question: "What is my agent doing right now, and how long is it taking?" Instead of treating the agent like a mysterious black box, you turn it into a glass box.

In our hospital example, there are four key signals we want to watch.

End-to-End (E2E) Trace Duration

This is the total time from the moment the user makes a request to the moment they get an answer back. In our case, from the second the doctor says "get insurance approval for this patient" to the second that approval is confirmed.

If that takes 10 seconds, fantastic. If it takes 4 hours, something is wrong somewhere in the pipeline and you need to find it.

Agent-to-Agent (A2A) Handoff Latency

In our example, the Clinical Documentation Agent finishes its work and hands the task off to the Payer Authorization Agent. How long does that handoff take?

This matters more than people realize. Sometimes the time spent passing tasks between agents is longer than the time spent doing the actual work. A clean handoff protocol can save you minutes per request.

Tool Execution Latency

Agents rarely work alone. They use tools — calling the EHR system is a tool, opening the insurance portal is a tool, sending a message to the pharmacy is a tool.

How long does each tool take to respond? Maybe the insurance portal is slow and bottlenecking everything. Tool execution latency lets you spot exactly where the slowdown is, instead of blaming "the AI" in general.

Cost per Authorization

How much does a single insurance approval actually cost us? AI agents are not free. Every model call burns tokens, every tool call costs money. If a single approval costs $50 to run while a human staff member could do the same job for $10, the math doesn't work.

This is the metric that tells you whether your AI investment is actually paying off — or quietly bleeding the budget.

Layer 2: Evaluation

Observability tells you what is happening. Evaluation tells you whether what's happening is actually good. An agent that responds in two seconds but gives wrong answers is worse than useless — it's dangerous.

Task Completion Rate

Out of every 100 requests, how many does my agent finish successfully without a human having to step in? If that number is 95%, great. If it's 40%, your humans are still doing more than half the work and the automation barely exists.

Factual Accuracy

Are the things your agent says actually true? In healthcare this is not optional — it's life or death.

If the Clinical Documentation Agent records "no penicillin allergy" when the patient actually has one, the consequences can be catastrophic. Factual accuracy measures whether the agent's outputs match the real underlying data.

Guardrail Violations

Think of "guardrails" like the railings on a highway — boundaries the agent should never cross. For a hospital, those might be:

Never leak patient information to anyone unauthorized
Always comply with HIPAA and similar privacy laws
Never make decisions outside its authority

The guardrail violation rate measures how often the agent crosses one of those lines. The closer this number is to zero, the safer your system.

Clinical Appropriateness

This is a healthcare-specific check. Are the agent's decisions medically reasonable? For example, approving an adult dosage for a pediatric patient is technically a "decision" — but it's not clinically appropriate. This is usually scored using rules designed by actual clinicians.

First-pass Approval Rate

Of all the insurance authorizations the agent submits, what percentage get approved on the first try? This single metric secretly measures two things at once:

How well the agent prepares documentation
How well the agent understands the insurance company's rules

If your first-pass approval rate is 85%, the other 15% have to be redone, which means lost time and lost money.

Layer 3: Optimization

The first two layers tell you what is happening and whether it's good. The third layer answers: "How do we make it better?"

Prompt Token Efficiency

Language models are billed by "tokens". Every token is money. This metric asks: "How much output quality am I getting for each token I spend?"

If you're shoving a 5000-token mega-prompt into the model and only getting a one-line answer back, you are wasting money. If you can get the same quality with 500 tokens, you just made the system 10x cheaper. Multiplied over thousands of requests a day, this is a huge deal.

Flow Step Efficiency

How many steps does the agent take to complete a task? Sometimes agents loop, double-check things they already know, or ask the same question twice. For example, an agent might query the patient's name five times when it could have stored it once and reused it.

Cutting unnecessary steps makes the agent faster and cheaper at the same time.

Retrieval Precision (at K)

This one is a little technical, but it's important. Most agents don't know everything — they pull information from a knowledge base on demand. This is called RAG, short for Retrieval Augmented Generation. For example, an agent might pull the patient's previous lab reports to figure out their condition.

"Retrieval Precision at K" measures: out of the K documents the agent pulled, how many were actually relevant? If the agent grabs 10 documents but only 2 are useful, the other 8 are noise. Noise slows the agent down, confuses the model, and can lead to wrong decisions.

Handoff Success Rate

How often do handoffs between agents succeed cleanly? The Clinical Documentation Agent prepares a file, but does the Payer Authorization Agent actually receive it correctly? Or does it get a half-broken version? Failed handoffs are silent killers — the system looks like it's running but the wrong things are flowing through it.

Improvement Velocity

How fast is your agent actually getting better over time? This is a meta-metric. Are you continuously testing, measuring, tweaking, and re-deploying? Or is the agent today exactly as good as it was on day one? The real power of AgentOps is creating a tight feedback loop where the system improves itself week after week.

Back to the Hospital: What the Numbers Could Look Like

Imagine we apply all three layers in our hospital. After a few months of running this system with proper AgentOps, we might see results like:

Time to get an insurance approval reduced by 85% (from days to hours)
Cases requiring human intervention dropped by 50%
Cost per authorization down to $0.47

These are not just cute numbers on a dashboard. They translate into:

Patients getting their medications faster
Nurses and doctors spending less time on paperwork and more time with patients
The hospital running more efficiently as a business

That's the real promise of agents in healthcare — and AgentOps is the discipline that makes it possible.

Why You Cannot Skip AgentOps

Putting an AI agent into a hospital, a bank, or any other critical environment is like handing the car keys to a teenager. Can they drive? Maybe. Would you take your eyes off them? Absolutely not.

AgentOps is the "eyes" of your AI system. It watches, it grades, it improves. Running an AI agent without AgentOps is like:

Driving a car with no speedometer
Running a company with no financial reports
Treating a patient without a thermometer

It's technically possible, but it is asking for trouble.

Final Thoughts

The thing that hit me the hardest watching the IBM video was this: AI agents are no longer "experiments". They are becoming real operational systems. And real systems need real metrics.

To recap the three layers in plain English:

Observability: What is happening, how long is it taking, and how much does it cost?
Evaluation: Is what's happening correct, safe, and useful?
Optimization: How do we make it better, cheaper, and faster?

If you ever deploy your own AI agent into something that matters, do not skip these three layers. Building a flashy demo is easy. Building an agent that runs reliably every hour of every day is only possible with a solid AgentOps practice behind it.

I'm trying to learn one new AI concept per day by watching IBM Technology videos. If you're doing something similar, I highly recommend turning what you watch into written notes like this one. Writing forces you to actually understand the material — much more than just hitting play and nodding along.

This post is based on notes I took from an IBM Technology video on AgentOps. If you want to go deeper, I recommend watching the original.

AWS Basics But Needs to Be Known Before You Start Your Certification

Ege Pakten — Fri, 27 Mar 2026 16:30:23 +0000

You don't need to memorize 200 services to start your AWS journey. But you do need to understand the foundations. Here's everything I wish someone had explained to me before I started studying for my AWS certification.

1. What Even Is Cloud Computing?

Let's start from the very beginning. Cloud computing sounds fancy, but the core idea is simple: there's a physical server somewhere, and you're renting it over the internet.

Instead of buying your own hardware, setting up a server room, and hiring people to maintain it — you use someone else's infrastructure (like AWS) and pay only for what you use.

Here's the official NIST definition:

"A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of **configurable computing resources* that can be rapidly provisioned and released with minimal management effort or service provider interaction."*

Let's break that down into human language:

On-demand → You get resources whenever you want them
Shared pool → Multiple customers share the same physical infrastructure
Configurable → You choose how much CPU, RAM, storage you need
Rapidly provisioned and released → Spin up a server in minutes, shut it down when you're done
Minimal management effort → No need to physically touch any hardware

2. What's Actually Behind the "Cloud"?

Behind the cloud, there are real, physical server racks sitting in massive data centers around the world. AWS has these data centers spread across the globe, and the infrastructure is organized in a clear hierarchy:

Region

A large geographic area (e.g., eu-west-1 for Ireland, us-east-1 for N. Virginia). You choose a region based on where your users are, compliance requirements, or pricing.

Availability Zone (AZ)

Each Region contains multiple AZs (e.g., eu-west-1a, eu-west-1b, eu-west-1c). An AZ is one or more data centers with independent power, networking, and connectivity. Multiple AZs exist for redundancy — if one goes down, the others keep running.

Edge Locations / Points of Presence (PoP)

These are smaller, lightweight locations spread even more widely than Regions. They're used primarily for caching content closer to end users.

3. Edge Networks and CDN — Speed Matters

Imagine your origin server is in the US, but a user in Southeast Asia wants to load your website. Without any optimization, every single request travels across the Pacific Ocean and back. That's slow.

This is where Edge Networks and CDN (Content Delivery Network) come in.

How it works:

Your original content lives on the Origin Server (e.g., in us-east-1)
AWS caches copies of your static content (images, CSS, JS, videos) at PoP / Edge Locations around the world
When a user in Singapore requests your site, they get the cached version from the nearest PoP — not from the US
Result: dramatically lower latency

CDN vs Edge Network — are they the same thing?

Not exactly. The Edge Network is the infrastructure — the distributed network of servers worldwide. CDN is the most well-known use case of that infrastructure. In practice, people use the terms interchangeably, and that's mostly fine.

AWS's CDN service is called CloudFront, and it leverages these Edge Locations.

4. Virtualization and the Hypervisor

Here's where it gets interesting. When you launch an EC2 instance, you're not getting a dedicated physical server. Here's what's actually happening:

Host Computer → The physical AWS server rack in a data center
Host OS → The operating system running on that physical machine
Hypervisor → Software that sits on top of the Host OS and splits the physical machine into multiple virtual machines
Your EC2 Instance → One of those virtual machines

So if a physical server has 64 CPUs and 256 GB RAM, the hypervisor might split it into four isolated instances of 16 CPUs / 64 GB RAM each. Different customers can be using the same physical hardware without ever knowing about each other — completely isolated.

AWS built their own hypervisor called Nitro, which operates at the hardware level for minimal performance overhead.

Key takeaway: EC2 instances are virtual slices of physical servers, managed by a hypervisor. Multiple instances from different customers can live on the same physical machine, fully isolated from each other.

5. The Shared Responsibility Model — Who's Responsible for What?

This is probably the most important concept for your certification exam. AWS and the customer share security responsibilities, but the line between them is very clear.

Security OF the Cloud → AWS's Job

AWS is responsible for protecting the infrastructure that runs all the services:

Physical security — data center access, cameras, biometric entry
Hardware — servers, storage, networking equipment
Software — compute, storage, database, networking services
Global infrastructure — Regions, Availability Zones, Edge Locations
Host OS & Hypervisor — you can't even access these

Security IN the Cloud → Your Job

You are responsible for everything you put in and configure on the cloud:

Customer data — whatever you upload, store, or process
Platform, applications & IAM — who has access to what, roles, permissions, secret keys
OS & firewall configuration — if you launched an EC2 with Ubuntu, patching it is on you
Encryption — client-side, server-side, in-transit, at-rest decisions
Network traffic protection — security groups, NACLs, VPN configurations

The Three Control Categories

Category	Who?	Examples
Inherited	AWS only	Physical & environmental security, host OS, physical servers
Shared	Both	Patch management, configuration management, awareness & training
Customer-Specific	Customer only	Guest OS, custom applications, data encryption strategies

Quick Quiz — Test Yourself

Question	Answer
Patching your EC2 instance's operating system?	You
Guest OS security patches?	You
Running the host OS and virtualization layer?	AWS
Managing IAM user access and secret keys?	You
Maintaining the server under your Lambda functions?	AWS
Physical security of data centers?	AWS
Encryption-at-rest strategy for your RDS database?	You

Simple rule of thumb: If you can configure it in the AWS Console or CLI → it's your responsibility. If you can't even touch it → it's AWS's responsibility. If both sides need to do their part → it's shared.

Think of it like renting an apartment: AWS gives you a secure building (locked doors, fire alarms, security guards). But whether you lock your own door, put your valuables in a safe, or leave your windows open — that's entirely on you.

6. AWS Support Plans — Developer vs Business

When you create an AWS account, you'll need to choose a support plan. Here's the practical difference between the two most common ones:

Feature	Developer (~$29/mo)	Business (~$100/mo or % of usage)
Who can open tickets	1 person (primary contact)	Unlimited team members
Response time (critical)	12 hours	1 hour
Support channels	Email only	Email + Phone + Chat
Trusted Advisor	Limited checks	Full access
Third-party software support	No	Yes

Which one should you pick?

Developer → Great for learning, experimenting, and building prototypes. Start here if you're just getting started.
Business → Essential when you're running production workloads with real users. The 1-hour critical response time alone is worth it when something goes down.

Good news: You can start with Developer and upgrade to Business anytime from the AWS Console (Support Center > Change Plan). Billing is prorated, so you only pay for what you use. You can also downgrade later if needed.

Wrapping Up

Before diving into specific services like S3, Lambda, or DynamoDB, make sure these foundational concepts are solid:

Cloud = renting virtual resources from physical infrastructure over the internet
Regions → AZs → Edge Locations form the backbone of AWS's global presence
CDN / Edge Networks cache content close to users for speed
Hypervisors split physical servers into isolated virtual machines (EC2 instances)
Shared Responsibility Model — know what's yours vs what's AWS's (this will be on the exam)
Support Plans can be upgraded/downgraded as your needs evolve

These aren't glamorous topics, but they're the bedrock everything else is built on. Nail these, and the rest of your certification journey will make a lot more sense.

Happy cloud learning! If you found this helpful, drop a reaction or follow for more AWS certification notes.

What I Learned from "The Mom Test" - Chapter 8: Running the Process

Ege Pakten — Wed, 11 Mar 2026 09:59:08 +0000

A developer's guide to turning customer conversations from a solo guessing game into a team-wide learning machine

You've learned the skills: asking good questions, avoiding compliments, pushing for commitments, finding conversations, and choosing your customers. But skills alone aren't enough. If you don't have a process to capture and share what you learn, it all falls apart.

Chapter 8 is the final chapter of The Mom Test, and it ties everything together. It's about the nuts and bolts of actually running customer conversations as a repeatable, team-wide process — not just something the "business person" does while everyone else codes.

Outline

In this post, I'll break down Chapter 8 into the following sections:

Prepping for the Conversation — Why going in with a clear list of your riskiest assumptions saves you from aimless chats.
Who Should Show Up — Why the whole founding team needs to hear customer conversations firsthand, and what goes wrong when they don't.
How to Write It Down — The note-taking system that actually works: exact quotes, emotions, and constraints — not just "they liked it."
Reviewing with Your Team — How to turn a pile of messy notes into clear decisions, and why reviewing together prevents the "telephone game" problem.
Talking to Customers Is a Team Sport — Why delegating all customer conversations to one person is a recipe for disaster.
The Process — Fitzpatrick's lightweight, practical process for weaving customer learning into your weekly rhythm.

Prepping for the Conversation

Before you walk into a customer conversation, you need to know what you're trying to learn. This sounds obvious, but most people skip it. They just show up and "see what happens."

Fitzpatrick recommends sitting down with your team before any batch of conversations and asking: "What are the three biggest questions we need to answer right now?"

These should be your scariest, riskiest assumptions — the ones that could sink the whole business if you're wrong. Maybe it's "Do people actually switch tools for this?" or "Will managers pay for something their team uses?" or "Is this problem painful enough to justify the effort?"

If you go in without a clear list, you'll default to comfortable topics. You'll talk about features you're excited about instead of risks you're afraid of. And you'll come out feeling good but learning nothing.

You don't need a rigid script — that kills the natural flow of conversation. But you need a cheat sheet of the big questions. Glance at it before the meeting. Make sure you hit the important stuff before the conversation ends.

Rule of Thumb: If you don't know what you're trying to learn, you're not ready for the conversation. Prep your three big questions before every batch of meetings.

Who Should Show Up

This one surprised me. Fitzpatrick is adamant: the whole founding team should be in customer meetings. Not just the CEO. Not just the "business person." Everyone.

Why? Because customer learning that passes through a middleman always gets distorted. It's like a game of telephone. The person who was in the meeting says "they seemed really excited about feature X" and everyone else nods along. But were they actually excited? Or were they just being polite? Only the person in the room can judge the body language, the tone, the hesitations.

When the whole team hears the same thing at the same time, two things happen:

You avoid arguments about what the customer "really meant." Everyone heard it themselves.
The technical co-founders hear constraints and problems directly, which means they can often come up with better solutions than what the customer (or the business co-founder) would have suggested.

Obviously, this doesn't scale forever. You can't have five people show up to every coffee chat. But in the early days — when every conversation shapes your product direction — it's worth the investment.

If you absolutely can't have everyone attend, rotate who goes. But NEVER let one person become the sole keeper of customer knowledge. That's a single point of failure, and when they're wrong about something, the whole team is wrong.

Rule of Thumb: Everyone on the team who is making big product or design decisions should be sitting in on at least some customer conversations. Don't let secondhand summaries replace firsthand experience.

How to Write It Down

Taking notes during customer conversations is critical, but most people do it wrong. They write down summaries, interpretations, and vague impressions: "She seemed interested in the analytics dashboard" or "He thought the pricing was fair."

The problem? Those aren't facts — they're your interpretations. And interpretations are where bias creeps in.

Fitzpatrick recommends a specific note-taking format. Write down:

Exact quotes — The actual words they used. Not your paraphrase. Not your interpretation. Their exact words. Put them in quotation marks so you can distinguish them from your own notes later.
Emotions — Did they light up when talking about a particular problem? Did they seem bored? Angry? Dismissive? Emotions reveal what matters to people in a way that words sometimes don't.
Constraints — Hard facts about their situation. "Their team has 12 people." "They spend $2,000/month on the current tool." "They've been looking for a solution for 6 months." These are the concrete details that help you make decisions later.

What NOT to write down: your own ideas, feature requests you thought of during the conversation, or your emotional reactions. Those go in a separate section. Keep the raw customer data clean and uncontaminated.

Fitzpatrick also suggests using shorthand symbols to quickly flag important moments. For instance, use a smiley face or an exclamation mark for strong emotions, a dollar sign for anything related to money or budget, and a star for particularly important quotes.

After the meeting, spend five minutes cleaning up your notes while the conversation is still fresh. Fill in the gaps, clarify the shorthand, and highlight the most important bits. If you wait until the next day, you'll have forgotten half the nuance.

Rule of Thumb: Write down exact quotes, emotions, and constraints. Keep your interpretations separate from the raw data. Spend five minutes cleaning up notes immediately after every conversation.

Reviewing with Your Team

Raw notes from individual conversations aren't very useful on their own. The magic happens when you review them together as a team.

Fitzpatrick recommends a regular review session — ideally weekly — where the team sits down and goes through the recent conversation notes together. The goal isn't just to share what you heard. It's to look for patterns, update your beliefs, and decide what to do next.

Here's how a good review works:

Share the raw notes. Read out the exact quotes, the emotions, the constraints. Don't editorialize. Let the data speak for itself first.
Look for patterns. Are multiple people saying the same thing? Are you hearing the same pain point from different customers? That's a strong signal. Are you hearing wildly different things? That might mean your segment is too broad (go back to Chapter 7).
Update your assumptions. Remember those three big questions you prepped? Based on what you've heard, which ones have been answered? Which ones are still open? What new questions have emerged?
Decide on next steps. What conversations do you need to have next? Do you need to talk to more of the same type of customer, or a different segment? Do you have enough signal to start building something?

The key is that this review should be a conversation, not a presentation. If one person just summarises what they heard and everyone else nods, you've missed the point. Everyone should engage with the raw data and form their own conclusions.

This process also catches mistakes. If one team member misinterpreted a conversation, the others can catch it during the review. If someone got excited about a compliment disguised as a commitment (Chapter 5 flashback!), the team can flag it.

Rule of Thumb: Don't let customer learnings sit in one person's notebook. Review notes together as a team regularly, look for patterns, and update your beliefs based on the evidence.

Talking to Customers Is a Team Sport

Fitzpatrick keeps hammering this point home because it's so often ignored: customer learning is not one person's job.

In many startups, there's a division of labour: the "business person" talks to customers, and the "technical person" builds the product. This feels efficient, but it's actually a disaster.

Here's why: the technical co-founder ends up building based on secondhand information. They hear "customers want feature X" and build it. But if they had been in the room, they might have heard the underlying problem behind the request and come up with a completely different (and better) solution.

Engineers tend to think of customer conversations as someone else's responsibility — something that takes them away from "real work." But talking to customers IS real work. It's arguably the most important work in the early stages of a startup, because it determines whether you're building the right thing.

Fitzpatrick even goes so far as to say: if someone on your team refuses to participate in customer conversations, that's a serious red flag. It means they're comfortable building in the dark, which is a recipe for building something nobody wants.

This doesn't mean every engineer needs to lead conversations. Some people are naturally better at it than others. But everyone should at least be present for some conversations, take turns leading when possible, and participate actively in the team review sessions.

Rule of Thumb: Customer conversations are everyone's responsibility. If only one person on the team talks to customers, you have a bottleneck that will eventually break.

The Process

So how does all of this come together in practice? Fitzpatrick outlines a lightweight process that you can adapt to your situation. It's not complicated, but it requires discipline:

Before a Batch of Conversations:

Sit down as a team and identify your three biggest learning goals (your scariest assumptions)
Decide who you're going to talk to (your customer segment from Chapter 7)
Decide where to find them (your "who-where" pair)

During Each Conversation:

One person leads the conversation, asking questions and steering the discussion
Another person takes notes — exact quotes, emotions, constraints
Keep it casual (Chapter 4) and follow the Mom Test rules (Chapters 1-3)
Push for commitment at the end (Chapter 5)

Immediately After:

Spend five minutes reviewing and cleaning up notes together
Flag anything surprising or particularly important
Note any follow-up actions or commitments you made

Weekly (or After Every Batch):

Team review session — go through notes together
Look for patterns across conversations
Update your assumptions and beliefs
Decide: do you need more conversations, or is it time to build?
Identify your next three big questions for the next batch

The beauty of this process is that it's fast. Each cycle takes about a week. In a single week, you can have 5-10 conversations, review them as a team, and make a clear decision about what to do next. Compare that to spending months building something in isolation and then discovering nobody wants it.

Fitzpatrick emphasises that the process should be lightweight. If it feels like bureaucracy, you've over-complicated it. The notes should be quick. The reviews should be short. The goal is to learn fast and stay nimble.

Rule of Thumb: Keep the process simple. Prep → Conversations → Notes → Review → Decide → Repeat. If you're spending more time on process than on actual conversations, something's wrong.

Key Takeaways from Chapter 8

Prep before every batch of conversations. Identify your three scariest assumptions and make sure you address them. Don't just "wing it."
The whole team should hear customer conversations. Secondhand summaries always lose nuance. Everyone who makes product decisions needs firsthand exposure.
Take notes the right way. Exact quotes, emotions, and constraints — not interpretations. Keep the raw data separate from your opinions.
Review notes as a team. Look for patterns, catch misinterpretations, update your beliefs, and decide on next steps together.
Customer learning is everyone's job. Don't delegate it to one person. Engineers, designers, and founders all benefit from being in the room.
Keep the process lightweight. Prep → Talk → Note → Review → Decide → Repeat. One cycle per week. Don't let process become bureaucracy.
Learning is the goal, not meetings. Every conversation should move you closer to a clear picture of what to build and who to build it for.

Previously: Chapter 7 - Choosing Your Customers

That's a wrap on The Mom Test! If you've been following along, I hope these posts have been as useful to you as writing them was for me. Now go talk to your customers — the right way.

What I Learned from "The Mom Test" - Chapter 7: Choosing Your Customers

Ege Pakten — Mon, 09 Mar 2026 10:58:49 +0000

A developer's guide to why talking to "everyone" is the fastest way to learn nothing

There's a saying in the startup world: startups don't starve, they drown. You never have too few options, too few leads, or too few ideas — you have too many. You get overwhelmed. You do a little bit of everything and make progress on nothing.

Chapter 7 is about the antidote: customer segmentation. Choosing who to focus on so you can actually learn something useful instead of drowning in mixed signals.

Outline

In this post, I'll break down Chapter 7 into the following sections:

Segmentation: Why Starting Broad Kills You — How Google, Paypal, and Evernote all started narrow, and why you should too.
Babies or Body Builders? — A real-world example of what happens when you try to serve everyone with one product.
Big Brands or Mom & Pop? — Fitzpatrick's own painful lesson of choosing a customer segment that was way too broad.
But What Does It Mean? — Why "students" or "advertisers" aren't real customer segments, and what happens when you treat them like they are.
Customer Slicing — A practical step-by-step technique for narrowing down your segment until you know exactly who to talk to and where to find them.
Talking to the Wrong People — The three traps that lead you to waste time on the wrong conversations.

Segmentation: Why Starting Broad Kills You

When we look at the big tech successes, they seem to serve the whole world. Google lets anyone find anything. Paypal helps anyone send money anywhere. Evernote backs up everyone's notes.

But here's the thing — they didn't start there.

In its early days, Google helped PhD students find obscure bits of code. Paypal helped collectors buy and sell Pez dispensers and Beanie Babies more efficiently. Evernote helped moms save and share recipes.

Every one of them started with a tiny, specific group. They nailed it for that group, then expanded. If you try to start with "everyone" as your customer, three things go wrong:

You get overwhelmed by options and don't know where to start
You aren't moving forward but can't prove yourself wrong either
You receive mixed feedback and can't make sense of it

The feedback from a Fortune 500 CFO and a freelance designer will be wildly different — even if they're both technically "potential customers." If you're talking to both at once, their conflicting needs cancel each other out and you end up paralysed.

Rule of Thumb: If you aren't finding consistent problems and goals, you don't have a specific enough customer segment.

Babies or Body Builders?

Fitzpatrick shares a brilliant example. A woman had developed a powdered condiment — sweet like cinnamon brown sugar but packed with the nutrition of a multivitamin. An all-natural superfood you could survive on indefinitely.

She said it had countless uses: moms could sprinkle it on breakfast to trick their kids into being healthy, restaurants could leave it on tables as a sugar alternative, bodybuilders could mix it into protein shakes.

Sounds like a huge market, right? Wrong. She was running in circles. The bodybuilders wanted one thing, the restaurants wanted another, and the moms needed a third. Making one group happy always disappointed the others. She didn't know how to start. Even simple decisions — like what colour to use for the label — were impossible to answer because every group had different preferences.

The fix? She had to choose. She focused on moms with young kids who were already shopping at health food stores. Now she knew who to talk to, where to find them, and what they cared about.

Her next move was clever: she went to small, independent health food stores and asked them to place a few bottles of her condiment beside the breakfast foods. This was a great commitment to ask for — it gave her real shelf space and helped distinguish between stores that were just being polite and ones that were genuinely interested. She'd return in a week to check if the product sold and to talk to store owners about their experience.

Choosing a specific segment felt like losing all the other options. But it was the only way to actually make progress. As Fitzpatrick puts it: before we can serve everyone, we have to serve someone.

Big Brands or Mom & Pop?

Fitzpatrick shares his own painful lesson here. In one of his startups, he was thrilled that his customer segment was "advertisers." Everyone advertises somehow, so the market was practically infinite!

He talked to mom-and-pop shops, e-tailers, big brands, creative agencies, SMEs, music labels — anyone who spent money on advertising. And the result? Complete chaos.

Everything they tried sort of worked. Everything was somewhat promising. Some people were talking about paying $10,000/month while others scoffed at $10. Every new feature was moderately popular. But if they tried to cut any feature, someone would scream because it was their favourite part.

The fundamental problem: they couldn't prove themselves right or wrong. They were paying attention to so many customer types that there was pretty much always someone who liked a new idea. But making a so-so product for a bunch of audiences isn't the same as making an incredible product for one.

Eventually they noticed unusually strong signals from creative agencies who wanted to be edgy. They ignored everyone else, cut a bunch of features, and were finally able to get a clear picture of what was working and what wasn't.

But What Does It Mean?

This section is one of the most eye-opening in the entire book. Fitzpatrick describes two founders who were doing everything right — asking good Mom Test questions, pushing for commitments, using every sales meeting as a learning opportunity. And yet they were still completely confused.

After 20 conversations, they had 20 different must-have features and 20 separate must-solve problems. The more people they talked to, the more confused they got. What was going on?

Their customer segment was too broad, but in a sneaky way. They were building something for "students." Sounds specific enough, right? But think about what "students" actually means:

A PhD student at a research university
An ambitious teenager at a prep school
A homeschooling parent who wants to use it with her kid
A child in a rural Indian village self-educating through a shared computer
A student in Africa running the app off a shaky cellphone connection

All are technically "students." But they have completely different needs, workflows, devices, budgets, and goals. These founders weren't having 20 conversations with their customers — they were having one conversation each with 20 different types of customers. That's why the feedback was so inconsistent.

The same thing happens with segments like "small businesses," "developers," or "sales organisations." They sound specific but contain enormous variation underneath.

Customer Slicing

So how do you narrow down a broad segment into something useful? Fitzpatrick introduces a technique called Customer Slicing. It's simple but powerful.

Start with a broad segment and keep slicing it into smaller and smaller sub-sets by asking these questions:

Within this group, which type of person would want it most?
Would everyone in this group buy/use it, or only some?
Why does that sub-set want it? What is their specific problem?
Does everyone in the group have that motivation, or only some?
What additional motivations are there?
Which other types of people have these motivations?

You repeat this process until you end up with a segment that is specific enough to find and reach in the real world.

Here's Fitzpatrick's example: Say you're building a high-end fitness gadget for busy professionals. Your first instinct might be: "finance professionals, age 25-35, living in a major city." That sounds specific, but it's actually useless — it doesn't help you make product decisions and it doesn't help you find them.

Slice further: finance professionals in London who are currently training for a marathon. Better! Now slice again: the sub-sub-subset who go to the gym during their lunch hour. Now you know exactly where to find them — at a gym in London's financial district during lunch time. You can have all the customer conversations you want for the price of a gym membership.

The key test: if there isn't a clear physical or digital location where you can find your customer segment, it's probably still too broad. Go back and slice it into finer pieces.

Once you have a bunch of "who-where" pairs, decide who to start with based on three criteria:

Profitable or big — Is this segment worth pursuing financially?
Easy to reach — Can you actually find and talk to these people?
Personally rewarding — Do you enjoy working with this group?

Don't overthink it. Spend a few minutes to reach a concrete initial segment, find a few of them, and start learning. You can always broaden your segment later once you've nailed it for the first group.

Fitzpatrick adds a personal note that I really appreciate: the third factor — personally rewarding — matters more than people think. This stuff is hard work, and it can become a real grind if you're cynical about the people or the industry you're serving. Choose customers you admire and enjoy being around.

Rule of Thumb: Good customer segments are a who-where pair. If you don't know where to go to find your customers, keep slicing your segment into smaller pieces until you do.

Talking to the Wrong People

Even with good segmentation, you can still end up talking to the wrong people. Fitzpatrick identifies three ways this happens:

1. Your segment is too broad and you're talking to everyone

We've covered this extensively. If your feedback is all over the place, you're probably mixing multiple customer types together. The fix: slice your segment narrower.

2. You have multiple customer segments and missed some of them

Sometimes it's obvious — if you're a two-sided marketplace (like Airbnb), you clearly have hosts and guests as separate segments. But sometimes it's sneakier. If you're building an app for kids, you need to understand both the kids AND their parents. If you're building for public schools, you need the teachers, the students, the administration, and potentially even the parent-teacher association and tax payers.

You'll also need to worry about important partners — whether for manufacturing, distribution, or promotion. If your business relies on them, you need to understand their goals and constraints just as well as your customers'.

3. You're selling to businesses with complicated buying processes and have overlooked some stakeholders

This is the B2B trap. Don't fall into only talking to the most senior or impressive people you can find. You want to talk to people who are representative of your actual customers, not the ones who sound impressive on your status report.

Fitzpatrick admits making this mistake himself: when building interactive advertising products, he spent lots of time talking to executives and none talking to the kids who were supposed to actually love the products.

If you're in a multi-sided marketplace, yes, you need to run customer conversations separately for all the various segments. But hopefully that isn't as scary now that you know how to keep conversations casual and efficient.

Key Takeaways from Chapter 7

Startups drown, they don't starve. Too many options is the real killer. Segmentation is your life raft.
Every big company started small. Google, Paypal, Evernote — they all began by serving a tiny, specific group before expanding to the world.
"Everyone" is not a customer segment. If your feedback is inconsistent, your segment is too broad. Slice it down.
Use Customer Slicing. Keep asking "within this group, who wants it most?" until you have a who-where pair — a specific person you can physically go find and talk to.
Choose segments that are profitable, reachable, and personally rewarding. Don't just pick the biggest market — pick one you can actually serve and enjoy serving.
Watch out for hidden segments. Multi-sided marketplaces, products for kids (parents are the buyers), B2B with complex buying processes — make sure you're talking to all the relevant groups.
Don't over-plan. Spend a few minutes choosing an initial segment, go talk to them, and adjust as you learn. You can always broaden later.

This is part of my series where I break down each chapter of The Mom Test by Rob Fitzpatrick. If you're building a product and talking to customers, this book is essential reading.

Previously: Chapter 6 - Finding Conversations
Next up: Chapter 8 - Running the Process

What I Learned from "The Mom Test" - Chapter 6: Finding Conversations

Ege Pakten — Fri, 06 Mar 2026 10:16:10 +0000

A developer's guide to actually finding people to talk to — and making them want to talk to you

So you know how to ask good questions (Chapter 1-3), keep things casual (Chapter 4), and push for real commitments (Chapter 5). Great. But there's one problem nobody talks about enough: where do you actually find these people to talk to?

Chapter 6 is Fitzpatrick's playbook for finding conversations. And the core insight surprised me: you don't need to be a networking wizard. You just need to be strategic about where you show up and how you ask.

Outline

In this post, I'll break down Chapter 6 into the following sections:

Going to Them: Cold Outreach — Why cold conversations are a necessary evil and how to make the most of them (cold calls, serendipity, finding excuses, immersing yourself, landing pages, and getting clever).
Bringing Them to You — How to flip the dynamic so customers find you instead (organising meetups, speaking & teaching, industry blogging).
Creating Warm Intros — The gold standard for conversations and how to manufacture them (7 degrees of bacon, industry advisors, universities, investors, and cashing in favours).
Asking For and Framing the Meeting — The 5-element framework (Vision/Framing/Weakness/Pedestal/Ask) for getting meetings without sounding salesy.
To Commute or to Call — Why in-person beats phone calls, and when exceptions make sense.
The Advisory Flip — A powerful mindset shift that makes customer conversations feel natural instead of desperate.
How Many Meetings Do You Actually Need? — When to stop talking and start building.

Going to Them: Cold Outreach

Let's be real — cold outreach sucks. Nobody likes cold calls, and nobody likes receiving them. But sometimes, especially at the very beginning, you have no choice. You don't know anyone in the industry yet, and you need to start somewhere.

The key insight Fitzpatrick shares is this: the goal of cold conversations is to stop having them. You hustle together the first one or two from wherever you can, treat people's time respectfully, genuinely try to solve their problem, and those cold conversations start turning into warm intros. The snowball starts rolling.

Cold Calls

If you reach out to 100 people and 98 of them ignore you, what does that mean? It means you now have 2 conversations in play. That's it. Unless your entire business model depends on cold outreach, the rejection rate is irrelevant. You only need a few yeses to get the ball rolling.

One team successfully used cold LinkedIn messages to reach C-level executives at major UK retailers. They were ignored by practically everyone, but they only needed one "yes" to get started. And that one "yes" led to warm intros to others.

Seizing Serendipity

Beyond cold outreach, stay open to unexpected opportunities. Fitzpatrick shares a story about being at an engagement party and overhearing someone mention a speaking engagement. He walked over, started a genuine conversation about her career, and she ended up becoming his first committed alpha user.

The lesson? If you stop thinking of customer conversations as formal "interviews" and start thinking of them as genuine conversations about people's lives and problems, opportunities are everywhere. People love talking about their problems — by showing genuine interest, you become more interesting than 99% of people they've met.

Rule of Thumb: If it's not a formal meeting, you don't need to make excuses about why you're there or even mention that you're starting a business. Just ask about their life.

Find a Good Excuse

Sometimes you need a reason to start a conversation with a stranger. Fitzpatrick tells a great story about an entrepreneur who was building a product for cafe owners. He'd been hitting the pavement for weeks, getting turned away from cafe after cafe. The fix? Walking into a cafe and saying: "This coffee is amazing — I wanted to ask about the story behind the beans." Suddenly, the conversation was natural. The owner wasn't around, but the manager gave them the owner's contact details.

The trade-off with using an excuse is that it's hard to transition into a product or sales conversation later without revealing the initial deception. So think of these as one-time learning opportunities, not the start of an ongoing relationship.

The ultimate excuse? Having a PhD student on your team. "I'm doing research on X for my dissertation" opens almost any door. If you're really desperate, you can always say you're "writing a book" on the topic.

Rule of Thumb: If it's a topic you both care about, find an excuse to talk about it. Your idea never needs to enter the equation and you'll both enjoy the chat.

Immerse Yourself in Where They Are

This is one of the most underrated strategies. Instead of trying to find customers from the outside, go to where they already gather.

Fitzpatrick wanted to build tools for conference organisers and professional speakers. He didn't know any of the big names. So he hit the conference circuit and gave free talks everywhere he could. The speakers' lounge became his personal customer conversation machine. By immersing himself in the community, he met a load of people and had all the connections and conversations he could handle.

Interestingly, he ultimately decided that big speakers and big conferences were a bad customer segment and walked away. Not every conversation has to validate your idea — sometimes the most valuable learning is discovering what NOT to build.

Landing Pages

Joel Gascoigne did a classic landing page test with his startup Buffer. He described the value proposition and collected emails. But here's what most people miss: it wasn't the conversion rate metrics that convinced him to move forward. It was the conversations that resulted from him emailing every single person who signed up and saying hello.

Landing pages are a great way to collect qualified leads that you can then reach out to and have real conversations with. Paul Graham suggests a similar approach: get your product out there, see who seems to like it most, and then reach out to those types of users for deeper learning.

Get Clever

Every business is different, and sometimes you need a creative approach. One guy wanted to sell to top-tier universities like Stanford and Harvard. He needed to understand their problems and be taken seriously by decision-makers. His solution? He organised a semi-monthly "knowledge exchange" call between department heads of top universities to discuss shared challenges. By simply organising the call and playing host, he immediately absorbed all the credibility of the participating universities and got direct phone access to a pile of great leads.

The point is: don't just copy what someone else is doing. Consider your unique situation and get creative about how to manufacture conversations.

Bringing Them to You

When you're the one initiating conversations, you're always on the back foot. The other person is suspicious and trying to figure out if you're wasting their time. The better approach? Find ways to make them come to you. This saves time, reduces friction, and makes people take you more seriously.

Organise Meetups

For marginally more effort than attending an event, you can organise your own and benefit from being the centre of attention. Want to understand HR professionals' problems? Organise an "HR professionals happy hour." People will assume you're credible just because you're the one who sent the invite emails. It's the fastest and most unfair trick for rapid customer learning, and it also bootstraps your industry credibility.

Speaking & Teaching

Teaching is massively under-valued as both a learning and selling tool. If you're building better project management software, you probably have expertise and strong opinions about how things could be better. That's the magic combination for being an effective teacher.

Teach at conferences, workshops, through online videos, blogging, and by doing free consulting or office hours. You'll refine your message, connect with potential customers who take you seriously, and learn which parts of your offering resonate — all before you've even built it. Then simply chat up the attendees who are most keen.

Industry Blogging

Even if you have no audience, blogging about your industry is incredibly helpful. When Fitzpatrick sent cold emails from his blog email address, people would often meet with him because they had checked his domain, seen his industry blog, and figured he was an interesting person to talk to. The traffic and audience were almost irrelevant — the blog served as a credibility signal.

Blogging is also a great exercise for getting your thoughts in a row. It makes you a better customer conversationalist because you've already spent time thinking deeply about the industry's problems.

Creating Warm Intros

Warm intros are the gold standard. Conversations are infinitely easier when you get an introduction through a mutual friend that establishes your credibility and reason for being there.

7 Degrees of Bacon

The world is smaller than you think. Everyone knows someone. You just have to ask.

Fitzpatrick tells a story about a team of recent graduates who needed to reach McKinsey-style consultants. They were at a co-working space, so one of them stood on a chair and yelled: "Does anyone here know anyone who works at McKinsey? Can we talk to you for a second? We'll buy you a beer!" They bought three beers, had three quick chats, and left with a diary full of intros.

For consumer products, it's even easier — everyone knows a recent mom, an amateur athlete, or a theatre enthusiast.

Rule of Thumb: You can find anyone you need if you ask for it a couple of times. Kevin Bacon's 7 degrees of separation applies to customer conversations too.

Industry Advisors

Advisors can be a goldmine for intros. In his first company, Fitzpatrick relied heavily on 5 advisors who each had around half a percent of equity. Their main job was to make credible introductions. He met with each one once per month and got a fresh batch of intros weekly without it being a huge time burden for anyone.

You'd also be surprised by the quality of people willing to join your advisory board. The first conversation with a good advisor looks a lot like the first conversation with a flagship customer — you're both talking passionately about a space you care about. You can sometimes recruit great advisors directly from your early customer conversations.

Universities

If you're still in or recently out of university, professors are a goldmine. They get their grant funding from friendly, high-level industry folks, and since they're investing in research, those industry contacts are self-selected to be excited about new projects. Plus, professors are easy to reach — they post their emails publicly and you can generally just wander into their office.

Investors

Top-tier investors are awesome for B2B intros. Beyond their own rolodex and company portfolio, they can usually pull off cold intros to practically any industry. They can also help you close better advisors and directors than you'd be able to wrangle on your own. This applies to anyone who is a "big deal" and has already bought into your idea — always ask: who can they connect you to?

Cash in Favours

Remember all those people who brushed you off saying "Sounds great, keep me in the loop and let me know how I can help"? Now's the time to call in those favours. Reply to that old email and tell them you're ready for an intro to that person they know. Use the framing format from the next section to make their lives easy and reassure them you won't waste anyone's time.

You'll get ignored a lot, but who cares? You're not trying to minimise your failure rate — you're trying to get a few conversations going. Don't make a habit of it though, since it can burn bridges.

Asking For and Framing the Meeting

Sometimes a proper meeting can't be avoided — you need the full hour with someone senior. In those cases, how you frame the meeting request makes all the difference.

If you don't frame it properly, it becomes a sales meeting by default, which is bad for three reasons: the customer closes up about pricing, attention shifts to you instead of them, and it's going to be the worst sales meeting ever because you aren't ready.

The Five Key Elements

Fitzpatrick outlines a framework for requesting meetings that works incredibly well. The five elements are: Vision / Framing / Weakness / Pedestal / Ask

The mnemonic is: "Very Few Wizards Properly Ask [for help]."

Vision — You're an entrepreneur trying to solve a problem or achieve a vision. Don't mention your specific idea.
Framing — Set expectations about what stage you're at and, if true, that you don't have anything to sell.
Weakness — Show vulnerability by mentioning a specific problem you're struggling with. This also clarifies you're not a time waster.
Pedestal — Put them on a pedestal by showing how much they, in particular, can help.
Ask — Explicitly ask for help.

Here's an example of what this looks like in practice:

As Fitzpatrick suggests, a good meeting request might look like this:

Hey Pete,

I'm trying to make desk & office rental less of a pain for new businesses (vision). We're just starting out and don't have anything to sell, but want to make sure we're building something that actually helps (framing).

I've only ever come at it from the tenant's side and I'm having a hard time understanding how it all works from the landlord's perspective (weakness). You've been renting out desks for a while and could really help me cut through the fog (pedestal).

Do you have time in the next couple weeks to meet up for a chat? (ask)

— Rob Fitzpatrick, The Mom Test

People like to help entrepreneurs, but they also hate wasting their time. This kind of opening tells them you know what you need and that they'll be able to make a real difference.

Once the meeting starts, you need to grab the reins quickly. Repeat what you said in the email and immediately drop into your first question. If someone else made the introduction, use them as a voice of authority. Have a plan for the meeting and be assertive about keeping it on track.

To Commute or to Call

One common shortcut is to move conversations to phone or video calls. Fitzpatrick is not a fan, and for good reason.

When you're in person, you get access to body language, facial expressions, and the natural rapport that comes from sharing a physical space. On the phone, people are trying to squeeze calls between other activities, wondering when they can hang up, and the whole thing ends up feeling more like a scripted interview than a natural conversation.

In-person meetings also have a huge advantage for building ongoing relationships. Nobody becomes friends over the phone. And those friendships are what lead to warm intros and future meetings.

That said, some experienced practitioners in the field do recommend phone calls, and they can work. But Fitzpatrick's advice is to start in person first. It's too easy to use phone calls as an excuse to skip the awkwardness of meeting someone face-to-face, rather than as a considered trade-off.

The Advisory Flip

This is a subtle but powerful mindset shift. Instead of going into conversations thinking "I need to find customers", think "I'm looking for industry advisors."

When you're looking for customers, you feel needy. You're on the back foot. You're basically asking: "Please buy my thing." But when you're looking for advisors, the whole dynamic flips. You're evaluating them. You're the one deciding if they're a good fit. Even if the topics you discuss are the same, both you and the other person will notice the difference.

This isn't about explicitly telling people you're looking for advisors (unless you already like them and it comes up naturally). It's about orienting your state of mind to give you a helpful internal narrative and consistent front.

The sales-advisor switch also puts you firmly in control of the meeting, since you're now evaluating them. You set the agenda, you keep it on topic, and you propose next steps.

How Many Meetings Do You Actually Need?

Every meeting has an opportunity cost. When you're travelling to that meeting, you aren't writing code or building your product. So how many conversations is enough?

The UX community says: keep talking to people until you stop hearing new information.

In practice, if your initial assumptions are mostly correct and you're in a simple industry, it might only take 3-5 conversations to confirm what you already believe. But you usually won't get that lucky. It often takes more until you start hearing a consistent message.

Here's a useful diagnostic: if you've run more than 10 conversations and the results are all over the map, your customer segment is probably too vague. You might be mashing together feedback from multiple different types of customers.

The goal isn't to have a thousand meetings. It's about quickly learning what you need, and then getting back to building. In most cases, you should be able to answer almost any question about your business or customers within a week.

Rule of Thumb: Keep having conversations until you stop hearing new stuff. If you're still getting wildly different answers after 10+ conversations, your customer segment is too broad.

Key Takeaways from Chapter 6

The goal of cold outreach is to stop doing cold outreach. Hustle the first few conversations, then convert them into warm intros.
Go where your customers already are. Conferences, meetups, online communities — immerse yourself in their world instead of trying to pull them into yours.
Make them come to you. Organise events, teach, blog. Being the host or the expert flips the power dynamic in your favour.
Warm intros beat everything. Ask your network, advisors, investors, and even previous contacts for introductions. Everyone knows someone.
Frame your meeting request properly. Use the VFWPA framework (Vision/Framing/Weakness/Pedestal/Ask) to get meetings without sounding salesy.
Start in person. Phone calls are a shortcut that often backfires. In-person conversations build better relationships and give you better data.
Flip your mindset. You're not looking for customers — you're looking for advisors. This one mental switch changes everything about how the conversation feels.
Know when to stop. Keep talking until you stop learning new things, then get back to building.

This is part of my series where I break down each chapter of The Mom Test by Rob Fitzpatrick. If you're building a product and talking to customers, this book is essential reading.

Previously: Chapter 5 - Commitment and Advancement
Next up: Chapter 7 - Choosing Your Customers

The Mom Test - Chapter 5: Commitment and Advancement

Ege Pakten — Mon, 02 Mar 2026 11:03:54 +0000

In the previous chapters, we learned how to have proper customer conversations — avoiding compliments, digging into specifics, and not pitching too early. But here's a question that kept bugging me: How do I know if a meeting actually went well?

Chapter 5 answers exactly that. And the answer is brutally simple: a meeting went well only if it ends with a commitment.

Outline

In this post, I'll break down Chapter 5 into the following sections:

There's No Such Thing as a Meeting That "Went Well" — Why every meeting either succeeds or fails, and how compliments trick you into thinking you're making progress.
Commitment and Advancement: Two Sides of the Same Coin — The two key concepts of the chapter and why they always come together.
The Currencies of Commitment — The three types of commitment (Time, Reputation, Money) and how they escalate in seriousness.
The Spectrum: From Zombie Lead to Committed Customer — How to read the signals and know exactly where you stand with a potential customer.
Why We Don't Ask for Commitments (And Why We Should) — The two traps that prevent us from getting real signals: fishing for compliments and not asking for next steps.
The "Crazy" First Customers: Your Early Evangelists — Why your first customers won't be "normal" buyers, and why that's a feature, not a bug.
How to Push for Commitment Without Being a Used Car Salesman — A practical framework for asking for commitments without feeling pushy.
Don't Ask for Commitment Too Early — Why timing matters and how to match your ask to the stage of the relationship.

There's No Such Thing as a Meeting That "Went Well"

This was a mindset shift for me. I used to walk out of meetings thinking "That went great! They loved the idea!" — and then... nothing happened. No follow-up, no next steps, just silence.

Fitzpatrick puts it bluntly:

Every meeting either succeeds or fails.

A meeting fails when you leave with:

A compliment: "That's a really cool idea!"
A stalling tactic: "Let's circle back after the holidays."

A meeting succeeds when you leave with:

A commitment to the next step
Something concrete that advances the relationship forward

The tricky part? The subtle stalls don't feel like rejection. "We should definitely talk again soon" sounds positive, but it's just a polished version of "Don't call me, I'll call you."

Rule of Thumb: If you leave a meeting feeling good but without a concrete next step, you probably got played by a compliment, not a commitment.

Commitment and Advancement: Two Sides of the Same Coin

Fitzpatrick introduces two key concepts:

Commitment — When someone gives you something they value. This proves they're serious and not just being polite.

Advancement — When the relationship moves to the next concrete step in your sales or learning process.

These two almost always come together. To advance to the next step, someone has to commit something. And if someone commits something, the process naturally advances.

For example: You want to demo your product to a company's decision-maker. To get that meeting (advancement), your current contact needs to introduce you to their boss (reputation commitment). One doesn't happen without the other.

Rule of Thumb: Commitment and advancement are functionally the same thing. If you're getting one, you're usually getting both. If you're getting neither, the meeting failed.

The Currencies of Commitment

Not all commitments are created equal. Fitzpatrick breaks them down into three "currencies" — and they escalate in seriousness:

1. Time Commitment

This is the lightest form. The person is investing their time to engage with you further.

Examples:

Agreeing to a follow-up meeting with clear next steps
Sitting down for a longer, deeper conversation
Trying out your prototype or beta and giving feedback
Coming to your office (or going out of their way) for a meeting

If someone won't even give you another 30 minutes of their time, that's a pretty clear signal.

2. Reputation Commitment

This is heavier. The person is putting their name and credibility on the line for you.

Examples:

Introducing you to their boss or a decision-maker
Introducing you to a peer or potential customer
Giving you a public testimonial or case study
Posting about you on social media or their company Slack

When someone introduces you to their boss, they're essentially saying "I believe in this enough to risk looking stupid if it doesn't work out." That's real skin in the game.

3. Financial Commitment

The ultimate signal. Money talks, everything else walks.

Examples:

A letter of intent (LOI) or pre-order
A deposit or partial payment
Pre-paying for the product before it's built

If someone says "I'd definitely pay for that" — that means nothing. If someone says "Here's $500, let me know when it's ready" — that means everything.

Rule of Thumb: The more someone gives you (time → reputation → money), the more seriously you can take their signal. Compliments cost nothing. Commitments cost something. That's the whole difference.

The Spectrum: From Zombie Lead to Committed Customer

Fitzpatrick describes a spectrum of signals you might get from potential customers, and it's incredibly useful for figuring out where you actually stand:

Cold signals (the meeting failed):

"That's cool, I like it" → compliment, worthless
"Looks interesting, keep me in the loop" → polite brush-off
"Let's grab coffee sometime" → stalling, no specifics
No follow-up after the meeting → they forgot you exist

Warm signals (getting somewhere):

"Can you show this to my team next Tuesday?" → time + reputation commitment
"Send me the beta link, I'll try it this week" → time commitment with a deadline
"Let me introduce you to our Head of Product" → reputation commitment

Hot signals (you're onto something):

"How much would this cost? Can we do a pilot?" → moving toward financial commitment
"We'd like to pre-order 50 licenses" → money on the table
"Here's a deposit, build it" → they're all in

Rule of Thumb: If you can't tell where someone falls on this spectrum, you didn't push hard enough for a commitment at the end of the meeting.

Why We Don't Ask for Commitments (And Why We Should)

So if commitments are so important, why don't we ask for them? Fitzpatrick identifies two main traps:

Trap 1: You're Fishing for Compliments

Instead of asking "Would you be willing to pay for this?" or "Can I show this to your boss?", we ask soft questions like:

"What do you think of the idea?"
"Would you use something like this?"

These questions are begging for a compliment, not a commitment. And guess what? People are happy to give you a compliment because it costs them nothing and gets you out of the room.

Trap 2: You're Not Asking for Next Steps

The meeting is going well. You're vibing. You're having a great conversation. And then... you just let it end. No ask. No push. You walk away with warm feelings and zero concrete progress.

This is fear dressed up as politeness. We don't want to be "pushy" so we don't ask. But here's the thing — if your product is genuinely solving their problem, asking for a next step isn't pushy. It's helpful.

Rule of Thumb: Always know what commitment you want before the meeting starts. Then ask for it before the meeting ends. If you don't ask, you won't get it. Period.

The "Crazy" First Customers: Your Early Evangelists

Fitzpatrick makes an important point about who your first customers will be. They won't be normal, rational, cautious buyers. Your first customers will be a little bit "crazy" — and that's a good thing.

Your early evangelists typically:

Have the problem right now, not "someday"
Know they have the problem — they're not in denial
Have already tried to solve it (maybe with spreadsheets, duct tape, or a competitor)
Have the budget or authority to actually pay for a solution
Are desperate enough to try an unfinished, unpolished product from an unknown startup

Think about it: a normal person wouldn't use a half-built product from two people in a garage. But someone who's in pain RIGHT NOW and has been looking for a solution? They'll tolerate bugs, missing features, and a terrible UI — because you're solving their burning problem.

These people are gold. They give you real feedback, real money, and real validation.

Rule of Thumb: If you can't find anyone who's desperate enough to use your product in its current state, you either haven't found your real customer segment, or you're not solving a painful enough problem.

How to Push for Commitment Without Being a Used Car Salesman

A common fear: "But I don't want to be pushy!"

Fitzpatrick's answer: you're not being pushy if you're genuinely trying to help. Here's his framework:

Know your ask before the meeting. What's the ideal next step? An intro to the boss? A pilot program? A pre-order? Know this going in.
Ask at the end of the meeting. Don't let the meeting fizzle out. Before wrapping up, clearly state what you'd like to happen next.
Accept the answer gracefully. If they say no, that's actually great information. A clear "no" is infinitely more useful than a wishy-washy "maybe." At least now you know where you stand.
Interpret the response honestly. If they dodge, stall, or give you a compliment instead of a commitment — recognize it for what it is. Don't lie to yourself.

Examples of good asks:

"Would you be willing to do a trial run with your team next month?"
"Could you introduce me to [decision-maker] so I can understand their perspective?"
"If we build this by March, would you commit to being a pilot customer?"
"Can I get a letter of intent so we can prioritize building this feature for you?"

Rule of Thumb: If you're afraid to ask for a commitment because you think the person will say no — that's exactly why you need to ask. A "no" now saves you months of chasing a dead lead.

Don't Ask for Commitment Too Early

Here's the balance: pushing for commitment is essential, but timing matters.

If you push for money or a huge commitment during what's supposed to be an early learning conversation, you'll scare people away. The first few conversations should be about learning — understanding their problem, their workflow, their pain.

Once you've validated the problem and have something to show (even a rough prototype), THEN you start pushing for commitments.

The progression looks like this:

Early conversations: Learn about the problem. No pitch, no ask. Just listen.
Problem validated: Start showing your solution concept. Ask for time commitments (follow-up meetings, beta testing).
Solution takes shape: Push for reputation commitments (introductions, referrals).
Product is tangible: Push for financial commitments (pre-orders, deposits, LOIs).

Skipping steps or pushing too hard too early is just as bad as never pushing at all.

Rule of Thumb: Match your ask to the stage of the relationship. Early = learn. Middle = time and reputation. Late = money.

Key Takeaways from Chapter 5

Let me sum up the core lessons:

Meetings don't "go well." They either produce a commitment or they fail. Stop fooling yourself with compliments.
Commitments come in three currencies: Time, Reputation, and Money — in escalating order of seriousness.
Always push for a next step. Know your ask before the meeting and make it before the meeting ends.
Compliments ≠ Commitments. "That's a great idea" is worthless. "Here's my credit card" is priceless.
Your first customers will be "crazy." They have the problem now, they know it, and they're desperate enough to use your unfinished product.
A "no" is better than a "maybe." Rejection gives you clarity. Wishy-washiness wastes your time.
Match your ask to the stage. Don't ask for money when you should be asking questions. Don't ask for opinions when you should be asking for money.

This is part of my series where I break down each chapter of The Mom Test by Rob Fitzpatrick. If you're building a product and talking to customers, this book is essential reading.

Previously: Chapter 4 - Why You Should Keep Customer Conversations Casual
Next up: Chapter 6 - Finding Conversations

What I Learned from "The Mom Test" - Chapter 4: Why You Should Keep Customer Conversations Casual

Ege Pakten — Sat, 24 Jan 2026 12:50:12 +0000

A developer's guide to learning from customers without the meeting overhead

If you've been following my series on "The Mom Test" by Rob Fitzpatrick, you know we've been exploring how to have better conversations with customers. Today, I want to share what I learned from Chapter 4: Keeping It Casual.

Here's the big idea: formal meetings can actually hurt your customer learning.

Sounds counterintuitive, right? Let me explain.

The Problem with Formal Meetings

We developers love structure. When we want to learn something from customers, our first instinct is often to schedule a meeting. But Rob Fitzpatrick argues this is actually a trap.

Think about it. When you schedule a formal meeting:

The customer knows you want something from them
They put on their "meeting behavior"
They try to be polite instead of honest
Everything becomes awkward and formal

The author shares that Steve Blank, in his book "4 Steps to the Epiphany," recommends 3 separate meetings: one about the customer's problem, one about your solution, and one to sell the product. The idea is to avoid what Fitzpatrick calls the "Pathos Problem"—where you jump too quickly into emotional selling mode.

But here's the thing: setting up 3 formal meetings is incredibly time-consuming. Once you factor in scheduling, travel, preparation, and the actual meeting, a single 1-hour meeting can cost you 4+ hours of your time.

And as the author puts it: "The most precious resource in a startup is its founders' time."

The Solution: Keep It Casual

So what's the alternative? Keep your conversations casual and informal.

Instead of sending a calendar invite, just have a chat. Instead of a conference room, talk at a coffee shop. Instead of an interview, have a conversation.

Here's a great example from the book. Imagine you're at a conference and you bump into someone in your target industry. Instead of saying:

"Can I schedule a meeting to interview you about your problems?"

You say:

"Hey, I'm curious—how did you end up getting this gig?"

See the difference? One feels like work. The other feels like genuine human curiosity.

When you strip all the formality from the process, something magical happens:

No meetings to schedule
No "interviews" to conduct
Conversations become fast and lightweight
You can talk to a dozen people at a single industry meetup

The Meeting Anti-Pattern

The author introduces a concept called "The Meeting Anti-Pattern." This is the tendency to relegate every customer conversation into a calendar block.

Beyond being a bad use of time, our over-reliance on formal meetings makes us miss serendipitous learning opportunities.

Here's a funny example from the book: Imagine you're at a café, and your dream customer sits next to you. Instead of just starting a natural conversation, you psych yourself up and then fumble through an awkward pitch asking if maybe they want to get coffee sometime... at a different place.

That's ridiculous, right? You're already having coffee together! Just talk to them like a normal human being.

How Formal Is Too Formal?

Here are some warning signs that your conversation is too formal:

"So, first off, thanks for agreeing to this interview..."
"On a scale of 1 to 5, how much would you say..."

When you start with phrases like these, you immediately put the other person in "interview mode." They feel like they're doing you a favor, which creates an ominous atmosphere where they'll say whatever they think you want to hear.

Learning from customers doesn't mean wearing a suit and sipping boardroom coffee. The right questions are fast, interesting, and touch on topics people genuinely enjoy discussing.

Rule of thumb from the book: If it feels like they're doing you a favor by talking to you, it's probably too formal.

At their best, these conversations are a pleasure for both parties. You're probably the first person in a long time to be truly interested in the petty annoyances of their daily work.

How Long Should These Conversations Be?

This surprised me: early conversations can be incredibly short.

5 minutes is enough to learn whether a problem exists
10-15 minutes gets you into their workflow, time usage, and what they've tried before
30+ minutes happens naturally when you hit a topic they love (people enjoy talking about themselves!)

The longer conversations are easier to facilitate because once someone starts explaining their work, they can go into a monologue. You just need to point them in the right direction.

But here's the catch with formal B2B meetings: the duration is often determined by the arbitrary calendar block (usually 30 minutes or 1 hour), not by what you actually need to learn. You might lose:

5 minutes to miscellaneous tardiness
5 minutes to saying hello and small talk
10 minutes to product demos
5 minutes to figuring out next steps

That's half your meeting gone before you even start learning!

Putting It Together: A Real Example

The author shares a great story about trying to get feedback from busy investors who manage their dealflow with hundreds of meetings per month.

Instead of trying to schedule yet another meeting in their packed calendars, he showed up to an industry meeting and, during casual small talk, mentioned: "Our analysts kill most of them before they ever reach us."

That was it. One sentence during small talk. The investor responded with a bunch more details about how they deal with applications, pointed at some sticky notes on the wall, and shared exactly how their process worked.

It took 5 minutes. No formal meeting. No biases. No compliments. Just concrete facts.

Now compare that to someone who takes a 2-hour commute to attend a formal meeting. They might get the same information (or worse), but at a much higher cost.

The lesson? Sometimes a casual 5-minute chat is worth more than an hour-long formal meeting.

The Visionary Leap

Here's the beautiful part: once you've collected all these casual insights, you can take what the author calls the "visionary leap."

You take everything you've learned—all the problems, all the frustrations, all the workarounds—and you come up with a specific offering that makes your customers' lives better. Then you ask them to commit to it.

But you can only make this leap if you've been listening properly. And proper listening happens in casual, honest conversations—not in formal interviews where people tell you what you want to hear.

Key Takeaways

Let me summarize the main lessons from this chapter:

Formal meetings create bias. People behave differently when they know they're being "interviewed."
Casual conversations are faster and more honest. You can learn in 5 minutes what might take an hour in a formal setting.
The first conversation doesn't need to be a meeting. It works better as a casual chat.
Your time is precious. Don't waste 4 hours on meeting overhead when a 10-minute conversation would give you the same insights.
Serendipity is your friend. Be open to learning opportunities everywhere—conferences, coffee shops, random encounters.
If they feel like they're doing you a favor, it's too formal. The best conversations are enjoyable for both parties.

Rule of Thumb

I'll leave you with the book's main rule for this chapter:

"Learning about a customer and their problems works better as a quick and casual chat than a long, formal meeting."

And one more:

"Give as little information as possible about your idea while still nudging the discussion in a useful direction."

That's it for Chapter 4! The message is clear: stop hiding behind formal meetings. Get out there, talk to people like a normal human being, and you'll learn so much more.

In the next chapter, we'll explore more techniques for having effective customer conversations. Stay tuned!

This post is part of my series on "The Mom Test" by Rob Fitzpatrick. I'm a developer learning product management and customer validation, sharing my learnings with fellow developers who want to build products that people actually want. All examples and stories are credited to the original author.

What do you think? Have you experienced the difference between casual and formal customer conversations? Let me know in the comments!