DEV Community: Muhammad Mairaj

How to build a Claude Sonnet 4.5 agent in 10 minutes

Muhammad Mairaj — Tue, 30 Sep 2025 21:31:08 +0000

We are going to build an AI agent that uses Claude Sonnet 4.5. We will use the agent primitive by Langbase.

An agent works as a runtime LLM agent. You can specify all parameters at runtime and get the response from the agent. The agent uses Langbase’s unified LLM API to provide a consistent interface for interacting with 600+ LLMs across all the top providers.

See the full list of supported models.

Okay, let’s get to building.

Step 1: Install Langbase SDK

Langbase SDK comes in both TypeScript and Python.

For TypeScript:

npm install langbase

For Python:

pip install langbase

Step 2: Get your Langbase API Key

Every request you send to Langbase needs an API key. To generate one:

Sign up at Langbase.com
From the sidebar, click API Keys
Create a new API key

For more details, follow the API key guide.
.

In your .env file, add:

LANGBASE_API_KEY=your_langbase_api_key

Since we’re using Claude Sonnet 4.5 (Anthropic), you’ll also need your Anthropic API key:

LLM_API_KEY=your_anthropic_api_key

Step 3: Write the Code

TypeScript version (index.ts):

const { output } = await langbase.agent.run({
    model: 'anthropic:claude-sonnet-4-5',
    instructions: 'You are a helpful AI Agent.',
    input: 'Explain what an AI Engineer does.',
    apiKey: process.env.LLM_API_KEY!,
    stream: false,
});

console.log(output);

Run it:

npx tsx index.ts

Python version (index.py):

from langbase import Langbase
import os
from dotenv import load_dotenv

load_dotenv()
langbase = Langbase(api_key=os.getenv("LANGBASE_API_KEY"))

response = langbase.agent.run(
    model="anthropic:claude-sonnet-4-5",
    instructions="You are a helpful AI Agent.",
    input="Explain what an AI Engineer does.",
    api_key=os.getenv("LLM_API_KEY"),
    stream=False,
)

print("response:", response.get("output"))

Run it:

python index.py

Step 4: That’s it 🚀

That’s basically 5–6 lines of code.

With Langbase’s agent primitive, you can:

Dynamically modify the prompt, model, or instructions
Enable streaming responses
Switch between 600+ AI models with the same API
Use both open-source (DeepSeek, Kimi K2, Grok 4) and frontier models (Claude Sonnet 4.5, GPT-5, Gemini 2.0, etc.)

What are you waiting for? Build your Claude Sonnet 4.5 agent!

The complete guide to evals

Muhammad Mairaj — Tue, 30 Sep 2025 21:24:07 +0000

What are evals?

Evals, short for evaluations, are systematic processes designed to measure and benchmark the performance of AI models, prompts, and workflows. In the context of AI and machine learning, evals refer to structured methods of testing whether a system produces outputs that meet predefined quality, accuracy, or safety standards.
Key terms to understand here:

Model evaluation: Testing how well a model performs on tasks like summarization, classification, or reasoning.
Prompt evaluation: Measuring the consistency, accuracy, and reliability of a prompt across variations.
Benchmarking: Comparing one system against others using shared datasets or criteria.

Put simply, evals are the feedback loops that ensure AI systems are reliable, safe, and useful.

Why evals matter in 2025

With the rise of AI-powered applications across industries, evals have become critical to trust and adoption.
According to Stanford's 2024 AI Index, 52% of companies report challenges in measuring the reliability of generative AI outputs, and nearly 68% of executives say they are investing in evaluation frameworks to reduce hallucinations and compliance risks.
As Andrew Ng put it: "AI systems are only as good as the evaluations we run on them. Without proper testing, you're essentially flying blind."
Key reasons evals matter today:

Reliability: Users need consistent answers from AI systems. Safety: Evals catch harmful or biased outputs before deployment. Regulation: Governments are introducing audit requirements (e.g., EU AI Act).
Trust: Businesses and end-users rely on transparent evaluation scores. Iteration: Continuous evals help refine prompts, models, and workflows.

In short, evals are no longer optional - they are a competitive and compliance necessity.

How to implement evals in your AI workflows

Implementing evals requires a structured approach that balances automation with human review. Here's a step-by-step framework:
Step 1: Define evaluation goals

What do you want to measure - accuracy, safety, style, or factual correctness? Choose metrics aligned with your business use case.

Step 2: Select test datasets
Use real-world samples from your domain (customer queries, documents).
Consider synthetic datasets for edge cases.

Step 3: Choose evaluation methods
Automated metrics: BLEU, ROUGE, accuracy, latency.
LLM-as-a-judge: Using another model to grade outputs.
Human review: Domain experts validating results

Step 4: Run iterative evaluations
Automate periodic evals during development.
Track performance trends over time.

Step 5: Integrate with deployment pipeline
Add evals as CI/CD checks before shipping changes.
Automate alerts if scores drop below thresholds.

Next step prompt: Once you've completed basic evals, ask: How do I expand my evaluation system to cover fairness, safety, and compliance?

Evals vs alternatives

While evals are central, they are often compared against other quality-assurance approaches like A/B testing, user surveys, or synthetic monitoring.

Feature comparison table

Pros and cons of evals

Pros

Objective and repeatable
Scalable across tasks and models
Detects hidden weaknesses

Cons

Requires dataset preparation
May not capture "real-world" nuance without human input
Setup can be resource-intensive initially

Use case scenarios

Enterprise AI apps: Ensure compliance with legal standards.
Consumer chatbots: Test reliability before mass rollout.
Healthcare AI: Verify factual correctness and reduce bias.

Frequently asked questions

Q1: How are evals different from benchmarks?
A: Benchmarks are shared, static test sets, while evals can be customized for your use case. Evals are ongoing, whereas benchmarks are more like snapshots.

Q2: Do I always need humans in the loop for evals?
A: Not always. Automated metrics and LLM-as-a-judge methods are effective for many tasks, but human validation is critical in regulated or high-stakes domains.

Q3: Can evals prevent hallucinations?
A: They don't eliminate hallucinations entirely, but they can detect and reduce them significantly when paired with guardrails.

Q4: How often should I run evals?
A: Best practice is to run them continuously - at least before every major deployment or prompt update.

Agentic vs Graph RAG: Two paths to smarter AI systems

Muhammad Mairaj — Thu, 25 Sep 2025 19:26:05 +0000

When I first started working with LLMs, retrieval felt like magic.

You drop in a vector database, point it at your documents, and suddenly the model can “remember” everything it couldn’t fit in the context window.

But the more I used it, the more I realized retrieval alone isn’t the endgame.

It’s like giving a student a stack of textbooks. Yes, they can look things up. But what you really want is for them to understand, reason, and make connections on their own.

That’s where two new approaches come in: agentic RAG and graph RAG.

Both take retrieval and stretch it in new directions. Both are attempts to get closer to actual intelligence.

And they couldn’t be more different.

What agentic RAG does

Agentic RAG is about giving models the ability to act.

Instead of just fetching documents, the model becomes an agent that decides what to look for, how to look for it, and when to stop.

It’s like the difference between a librarian fetching you a single book and a research assistant who knows your goal, can read the books, summarize them, and then run off to find the next lead.

This kind of system feels alive in a way plain RAG doesn’t. It’s iterative, goal-driven, and flexible.

The downside is complexity. Once you make the model an agent, you also inherit the messiness of agents: loops, dead ends, hallucinations, and cost.

But when it works, it feels like magic.

Building agentic RAG becomes easier with Langbase SDK. Here's a guide if you are interested.

What graph RAG does

Graph RAG goes in the opposite direction.

Instead of making the model act like an agent, it structures the knowledge itself.

Imagine taking all your data and turning it into a graph of entities, relationships, and connections. Instead of raw chunks of text, you have a map of how ideas fit together.

When the model queries this graph, it’s no longer just pulling a paragraph. It’s pulling an entire web of meaning.

This makes answers more grounded and less brittle. You don’t have to hope the right chunk happens to be retrieved. The graph gives you the relationships directly.

Graph RAG feels less flashy than agentic RAG, but it’s sturdier. It’s the difference between a curious assistant and a well-organized library.

Choosing between the two

The funny thing is you don’t actually have to choose.

Agentic RAG and graph RAG are two different bets on the same problem: how do we get models to reason over knowledge instead of just parroting it back?

If you care about exploration and discovery, agentic RAG will take you further.

If you care about accuracy and structure, graph RAG is safer.

The smartest systems I’ve seen combine both. An agent that can reason and plan, but also a knowledge graph to keep it grounded.

One gives you flexibility, the other gives you stability.

Why this matters now

I don’t think we’ve seen the final form of RAG yet.

Right now, everyone is experimenting. Some are pushing towards agentic systems. Others are betting on graphs.

The reason it matters is simple. Retrieval is the foundation of every serious AI system. If you can make retrieval smarter, you make everything smarter.

That’s why I think agentic RAG and graph RAG are more than passing fads. They’re the first real attempts to move beyond raw text search and into reasoning.

The next decade of AI might be decided by which of these paths works best—or how we combine them.

Why most AI demos fail in production

Muhammad Mairaj — Thu, 25 Sep 2025 18:57:22 +0000

AI demos are intoxicating.

They make you feel like the future has arrived.

A few clicks, a few prompts, and suddenly you are looking at something that feels like science fiction.

But here’s the problem.

The same demo that dazzles on stage almost always collapses when you try to turn it into a product.

Why?

Because a demo is theater, and production is reality.

The best case bias

A demo is built to impress, not to last.

It only shows the happy path.

The presenter knows what to type.

They avoid the weird edge cases.

The inputs are clean, the timing is perfect, and the audience only sees the system at its best.

Production is the opposite.

Real users are unpredictable.

They type half-formed thoughts, use slang, and ask things the system was never designed to handle.

If the demo is a polished photo, production is a stress test.

Most demos are not built for that test.

The missing infrastructure

Another reason demos fail is that they don’t show the scaffolding.

What looks like a single model output is often supported by hidden tricks: a preloaded context, hand-picked data, or a carefully engineered prompt.

In production, those tricks don’t scale.

You need infrastructure.

You need ways to manage memory, handle retrieval, track costs, and monitor reliability.

Without that, you have a toy, not a product.

And toys break when people start using them in ways you didn’t expect.

The fragility of prompts

Prompts are like duct tape.

They hold demos together.

But duct tape doesn’t hold under stress.

A prompt that works in one demo often fails with different inputs.

Models change.

Users stretch the boundaries.

Suddenly, the system that looked smart in a five-minute demo looks lost when exposed to the chaos of production.

The cost problem

No one talks about cost in a demo.

You can burn through tokens without worrying.

But production is a different story.

When you go from ten queries to ten thousand, the bill starts to matter.

And scaling an AI system isn’t just about efficiency.

It’s about trade-offs: do you use a smaller model and risk worse results, or pay for a larger one and risk unsustainable costs?

Most demos ignore that question.

Production forces you to answer it.

The missing feedback loop

A demo doesn’t need to improve.

It’s a one-time performance.

But a real product has to get better over time.

You need a feedback loop.

You need to capture when the system fails, learn from it, and adapt.

Without that, the quality slowly declines.

Users lose trust.

And once trust is gone, the product is dead.

What really matters

The lesson is simple.

Anyone can build a demo.

The hard part is building something that survives messy inputs, unpredictable users, and real-world economics.

That requires engineering.

It requires discipline.

It requires treating AI not as magic, but as a component in a larger system that needs to be designed, tested, and maintained.

Demos are fun.

But products change the world.

And the gap between the two is where most teams fail.

Learning in public: The fastest way to grow as a software engineer

Muhammad Mairaj — Wed, 24 Sep 2025 20:16:00 +0000

Learning is a lifelong journey, especially in the fast-paced world of technology. While many people consume content quietly, the concept of "learning in public" offers a powerful way to accelerate your growth as a developer. Inspired by Shawn Wang’s essay on Learning in Public, this post explores how sharing your learning process openly can transform your skills, network, and career.

What is Learning in Public?

Learning in public is about creating a habit of sharing what you learn as you learn it. Instead of keeping your progress private, you document your journey through blogs, tutorials, videos, or social media posts. This "learning exhaust" not only helps you solidify your knowledge but also benefits others who are on similar paths.

The key idea is to create the resources you wish you had when you started. Whether it’s a blog post explaining a tricky concept, a YouTube video walking through a coding challenge, or a Reddit thread answering a question, your contributions add value to the community while reinforcing your own understanding.

Why Learn in Public?

1. You Become Your Own Best Teacher

When you explain concepts to others, you’re forced to clarify your own understanding. Writing a blog post or recording a tutorial requires you to break down complex ideas into digestible pieces, which deepens your mastery. As Shawn Wang puts it, the biggest beneficiary of learning in public is future you. By documenting your progress, you create a personal knowledge base that you can revisit and build upon.

2. It’s Okay to Be Wrong

Learning in public means embracing vulnerability. You might make mistakes or share incomplete knowledge, but that’s part of the process. The internet will correct you, and that feedback is invaluable. As Wang advises, “Wear your noobyness on your sleeve.” When critics point out flaws, listen, learn, and improve. This iterative process helps you grow faster than learning in isolation.

3. You Build a Network of Mentors

When you share your work publicly, you attract the attention of experienced developers who notice your genuine curiosity. These individuals often become informal mentors, offering guidance and opportunities. Wang emphasizes, “Pick up what they put down.” When a senior engineer asks for help on a project, seize the chance to contribute. These interactions can lead to one-on-one mentorship that you can’t buy.

4. You Amplify Your Impact

By teaching others, you amplify the knowledge of those around you. As Wang notes, “By teaching you, they teach many.” Your beginner’s perspective is a unique asset—your questions and explanations resonate with others who are just starting out. Over time, people will seek you out for help, mistaking you for an expert. Answer to the best of your ability and lean on your mentors when you’re stuck.

5. Opportunities Follow Visibility

Shawn Wang shares the story of Chris Coyier, who built a massive audience through his site CSS-Tricks by consistently sharing what he learned. While Coyier and his peers started at similar skill levels, his willingness to teach publicly led to a successful career, including raising nearly $90,000 for a site redesign. Learning in public creates visibility, which can open doors to speaking engagements, job offers, and even paid opportunities.

How to Start Learning in Public

Ready to take the plunge? Here are practical ways to start learning in public today:

Write Blogs or Tutorials: Share your coding journey on platforms like Medium or a personal blog. Write about a new framework you’re learning or a bug you solved.
Create Videos or Streams: Record a YouTube video or Twitch stream walking through a project. Talking through your code helps you process and teaches others.
Contribute to Open Source: Make pull requests to libraries you use or build your own small projects. Cloning a tool you admire from scratch can deepen your understanding.
Engage on Public Forums: Answer questions on Stack Overflow or Reddit. Avoid private platforms like Slack or Discord—focus on public spaces where your contributions are discoverable.
Summarize and Share: After attending a conference or workshop, write a summary of what you learned. This reinforces your knowledge and helps others who couldn’t attend.
Build a Knowledge Base: Document your progress in a persistent format, like a personal wiki or GitHub repository. Over time, this becomes a valuable resource for you and others.

Overcoming the Fear of Learning in Public

Sharing your work can feel daunting. You might worry about being judged or making mistakes. But as Wang points out, discomfort is a sign you’re pushing yourself. Embrace the impostor syndrome—it means you’re growing. If someone criticizes you, ask for specific feedback and use it to improve. Block abusive comments and keep moving forward.

Learning in public isn’t just about immediate gains—it’s about building a reputation as a curious, collaborative developer. Over time, your contributions will compound. You’ll develop a portfolio of work, a network of supporters, and a deeper understanding of your craft. As Wang says, “Eventually, they’ll want to pay you for your help too. A lot more than you think.”

It's a mindset shift

Learning in public is a mindset shift that transforms how you approach your development journey. By sharing your knowledge, embracing feedback, and engaging with the community, you accelerate your growth and open doors to new opportunities. Start small—write a blog post, answer a question, or share a project. The key is to begin. As you learn in public, you’ll not only help yourself but also inspire others to do the same.

What’s one thing you’ve learned recently that you could share with the world? Start today, and let future you reap the rewards.

Context engineering 101: Branch AI conversations with Langbase

Muhammad Mairaj — Wed, 24 Sep 2025 20:04:02 +0000

Long conversations with AI often go off track. Topics pile up, irrelevant details linger, and eventually responses degrade. This is what we call context rot.

The fix? Branching conversations.

Why Branch Conversations?

Branching is a form of context engineering. Instead of keeping one messy thread, you split the conversation at decision points. Each branch evolves independently, while the original conversation remains intact.

Benefits include:

Prevent context drift between different discussion paths
Reduce token usage by trimming unnecessary history
Explore in parallel without polluting the main thread
Merge insights back when you’re ready

Think of it as Git for conversations: fork, explore, and (optionally) merge.

Getting Started with Langbase

Branching with Langbase takes just a few lines of code. Let’s walk through it step by step.

Step 1: Install Dependencies

Install the Langbase SDK in your project:

npm i langbase dotenv
# or
pnpm add langbase dotenv
# or
yarn add langbase dotenv

Step 1.1: Langbase API key

Every request you send to Langbase needs an API key. You need to generate your API key by following these steps:

Sign up at Langbase.com
From the sidebar, click on the API keys
From here, you can create a new API key. For more details, follow this guide

Create a .env file and add your API key:

LANGBASE_API_KEY=xxxxxxxxx

Initialize Langbase with your API key:

import dotenv from 'dotenv';
import { Langbase, ThreadMessage } from 'langbase';

dotenv.config();

const langbase = new Langbase({
  apiKey: process.env.LANGBASE_API_KEY!,
});

Step 2: Create the Initial Conversation

Let’s start with a practical example: choosing state management for a React app.

async function createConversation() {
  const thread = await langbase.threads.create({
    messages: [
      { role: 'user', content: 'I need to add state management to my React app' },
      { role: 'assistant', content: 'How complex is your app and what are your main requirements?' },
      { role: 'user', content: "It's medium-sized, with user data, API calls, and real-time updates" },
      { role: 'assistant', content: 'You could use Redux for its ecosystem, or Zustand for simplicity. Which do you prefer?' },
    ],
  });

  return thread.id;
}

Step 3: Branch the Conversation

At the decision point (Redux vs Zustand), create a new branch:

async function branchThread(threadId: string, branchAt: number) {
  const messages = await langbase.threads.messages.list({ threadId });
  const messagesToKeep = messages.slice(0, branchAt);

  const branch = await langbase.threads.create({
    messages: messagesToKeep as ThreadMessage[],
    metadata: {
      parent: threadId,
      branchedAt: branchAt.toString(),
    },
  });

  return branch.id;
}

Step 4: Continue Each Branch

Now both threads evolve independently:

async function main() {
  const originalId = await createConversation();
  const branchId = await branchThread(originalId, 4);

  // Continue with Redux
  await langbase.threads.append({
    threadId: originalId,
    messages: [
      { role: 'user', content: "Let's go with Redux" },
      { role: 'assistant', content: 'Great choice! Redux Toolkit makes setup easier. Here’s how…' },
    ],
  });

  // Explore Zustand
  await langbase.threads.append({
    threadId: branchId,
    messages: [
      { role: 'user', content: 'Tell me about Zustand' },
      { role: 'assistant', content: "Zustand is lightweight and only 2KB. Here’s how to get started…" },
    ],
  });
}

main();

You now have two independent discussions:

Original → continues with Redux
Branch → explores Zustand

Step 5: Run It

npx tsx index.ts

You’ll see two clean, focused threads in your console.

Why this matters

Branching prevents conversations from collapsing under their own weight. Instead of a single, tangled thread, you get structured trees of thought:

Keep threads modular and adaptive
Reuse branches for future work
Merge insights or summaries back into the main conversation

With Langbase, branching isn’t just possible — it’s simple.

9 AI primitives that simplify building AI agents

Muhammad Mairaj — Wed, 24 Sep 2025 19:38:23 +0000

Building AI agents at scale is hard. You need to juggle tools, memory, conversation state, observability, and cost management — all while ensuring your workflows remain reliable and efficient.

That’s exactly where Langbase SDK comes in. It provides a set of AI primitives — simple, composable building blocks — that let you create powerful AI systems without reinventing the wheel.

Here are 9 primitives from Langbase that make it easier to build production-ready AI applications:

1. Pipe

Pipe is the foundation. It’s a unified primitive with everything you need to build serverless AI agents: tools, memory, threads, dynamic variables, observability, tracing, and cost controls.

In short: Pipe is your “all-in-one” building block. With Langbase Studio, you can also manage access, add safety, and track usage with ease.

2. Memory

Agents need memory to feel “human-like.”

Langbase Memory primitive gives your agents long-term, semantic memory — essentially RAG as an infinitely scalable API. It’s also 30–50x more cost-efficient compared to alternatives, making it practical for real-world apps.

3. Agent

Agent is the runtime primitive that powers serverless AI agents. It offers a unified API over 600+ LLMs, with support for advanced features like:

Streaming
Tool calling
Structured outputs
Vision models

With Agent, you can swap and extend LLMs seamlessly.

4. Workflow

Real-world agents often require multiple steps. Workflow helps you chain those steps reliably, with built-in durability, retries, and timeouts.

Every step is traceable, and detailed logging ensures you know exactly what’s happening in your pipeline.

5. Threads

Managing context across long conversations is tricky. Threads solve this by storing and handling conversation history automatically — no need for custom databases.

Better yet, Threads support branching, so you can avoid “context rot” when conversations drift in multiple directions.

6. Tools

Tools extend your agents beyond the LLM. With just a few lines, you can:

Perform web searches
Crawl webpages for content
Add custom functionalities tailored to your app

This makes your AI agents far more capable than a vanilla LLM.

7. Parser

Structured and unstructured documents are everywhere.

Parser helps extract text from files like CSV, PDF, and more, so you can turn them into clean text for further analysis or use in pipelines.

8. Chunker

Working with large documents? Use Chunker.

It splits text into smaller, manageable sections — essential for RAG pipelines or when you need fine-grained control over what part of the document your model sees.

9. Embed

Embed converts text into vector embeddings, unlocking capabilities like:

Semantic search
Text similarity comparisons
Other advanced NLP tasks

With Embed, you can build retrieval pipelines, recommendation systems, and intelligent search features.

Wrapping up

Langbase SDK gives you a complete suite of AI primitives that scale from hobby projects to production-grade AI systems.

Instead of reinventing the wheel, you can focus on building smarter, more capable AI agents — faster.

Explore the docs and start building: Langbase SDK Documentation

20 AI concepts, explained clearly

Muhammad Mairaj — Tue, 23 Sep 2025 19:47:28 +0000

Artificial Intelligence (AI) has exploded in recent years, powering everything from chatbots and recommendation engines to image generators and autonomous agents.

But if you’re just starting out, the jargon can feel overwhelming.
Terms like transformers, embeddings, RAG, and fine-tuning pop up everywhere.

This guide breaks down 20 fundamental AI concepts in plain language.

1. Large Language Models (LLMs)

At the core of today’s AI revolution are LLMs like GPT, Claude, and Llama.
They’re essentially giant neural networks trained to predict the next word.

Input: “All that glitters…”
Output: “…is not gold.”

That simple predictive mechanism unlocks surprising intelligence.

2. Tokenization

Before text can be fed into a model, it’s chopped into tokens (small chunks of text).

Example: “dancing” → [“danc”, “ing”]

This allows the model to work with language at a granular, structured level.

3. Vectors (Embeddings)

Models convert tokens into vectors, numerical points in multi-dimensional space.
Words with similar meanings end up close together.

“happy” sits near “joy”
“sad” is further away

This is how AI “understands” meaning.

4. Memory

Memory lets LLMs retain information from past interactions.

It helps them remember context, preferences, or facts across conversations.

This makes AI more personal and consistent over time.
Try Langbase Memory

5. Self-Supervised Learning

Instead of humans labeling data, models learn by filling in blanks:

“All that glitters ___ not gold.”

This self-supervised method scales to trillions of tokens, no need for endless manual labeling.

6. Tools

Tools are external functions or APIs that an LLM can call.

They extend what the model can do, like fetch data, or query databases.

Think of them as the model’s “hands” to interact with the world.

To learn about tools in detail, see this guide.

7. Fine-Tuning

A base model is general-purpose.
Fine-tuning = retraining it on specialized data (legal, medical, financial, etc.).

The result: a model adapted to your domain.

8. Few-Shot Prompting

Instead of retraining, sometimes you just need to show examples in the prompt:

Q: “Where’s my parcel?”
Example → Answer style

The model learns to mimic the pattern instantly.

9. Retrieval-Augmented Generation (RAG)

LLMs don’t know real-time info.
RAG fixes this by:

Retrieving relevant docs from a database
Feeding them into the model
Generating grounded answers

This is the backbone of many AI apps today. To learn about RAG in detail, see this guide.

10. Vector Databases

To power RAG, we need special databases.
They store document embeddings (vectors) and quickly find the most relevant ones.

Examples: Pinecone, Weaviate, Milvus, FAISS.

11. Model Context Protocol (MCP)

LLMs can’t browse or act on their own.
MCP lets them connect to external tools and APIs, extending their abilities.

Imagine asking an AI: “Book me a flight tomorrow.”
MCP makes it possible.

Try building MCP-powered agents with Langbase.

12. Context Engineering

Prompt engineering was just the beginning.
Context engineering means carefully shaping the information fed into an LLM:

RAG
Few-shot examples
Summarization
External tools

The goal: deliver the right context at the right time.

13. Agents

Agents are LLM-powered programs that can use tools, call APIs, and orchestrate tasks. They don’t just answer; they plan, fetch data, and take actions.

Example: a travel agent that:

Finds flights
Books hotels
Emails your itinerary

Try building one with Langbase runtime agents.

14. Reinforcement Learning from Human Feedback (RLHF)

How do models become more human-aligned?

They generate multiple outputs.
Humans rate them.

Good answers → rewarded
Bad answers → penalized

Over time, the model learns human preferences.

15. Chain of Thought (CoT)

Instead of spitting out answers, models show step-by-step reasoning.
This helps with math, logic, and complex problem-solving.

16. Multimodal Models

The future isn’t just text.
Multimodal AI handles text, images, audio, and video.

You can:

Upload a chart → get a summary
Ask it to generate music
Describe an image → get variations

17. Small Language Models (SLMs)

Not all models need to be massive.
SLMs are compact, domain-specific, and cheaper to run.

Perfect for enterprises that need private, efficient AI.

18. Distillation

How do you make big models smaller without losing smarts?
Distillation = training a small “student” model to mimic a large “teacher” model.

This makes deployment lighter + faster.

19. Reasoning Models

Beyond prediction.
Reasoning models can plan, break down problems, and explore solutions.

Think of them as AI that thinks more than just guesses.

20. Foundation Models

The giants that start it all.
Trained on massive datasets, they act as base layers.

From there, developers fine-tune or adapt them into specialized smaller models.

Final thoughts

AI can feel intimidating, but at its core, it’s about patterns, context, and reasoning.

If you understand these 20 concepts, you’ll have a strong foundation to explore deeper, whether you’re building AI products, researching, or just curious about the tech shaping our future.

RAG vs fine-tuning vs prompt engineering

Muhammad Mairaj — Tue, 23 Sep 2025 19:36:44 +0000

Improving outputs from large language models is rarely a question of "which single tool" to use. It is a design choice that balances accuracy, latency, cost, maintenance, and safety.

This article provides a thorough, practical comparison of the three dominant approaches: prompt engineering, RAG, and fine-tuning. So you can choose and combine them effectively for real products.

TL;DR for each approach

Prompt engineering: Change the input (the prompt) to better activate the model's existing knowledge and skills.

RAG: Give the model fresh, domain-specific evidence by retrieving external content and appending it to the prompt.

Fine-tuning: Change the model itself by training it on domain examples so the knowledge and behavior are baked into its weights.

Prompt engineering: what's it all about?

Prompt engineering shapes how the model interprets and prioritizes information that already exists in its parameters.

The challenge with prompt engineering is consistency. Prompts evolve over time, and subtle changes can produce significantly different results. This is where Langbase Pipes become useful.

You can experiment with variations side by side.
You can roll back to earlier versions.
You can track which prompts deliver consistent, high-quality results.

This approach allows you to iterate rapidly without losing control, much like developers use Git for code.

By the way, Langbase provides memory agents, the most inexpensive RAG solution.

Key techniques

Role and instruction framing (system or lead-in statement that defines tone, role, and constraints).
Few-shot prompting (showing examples to demonstrate desired style/logic).
Chain-of-thought or step-by-step prompts to improve reasoning.
Output constraints (formatting, JSON schema, length, explicit style rules).
Prompt templates and variable substitution for repeatable tasks.

When to use it

Fast iteration, prototyping, and low-cost improvements.
When you cannot or do not want to change model weights or add infra.
To enforce formatting and to reduce simple ambiguity in user input.

Limitations and risks

Cannot add factual knowledge that the model does not already have.
Brittleness: minor wording changes can produce different results.
Can't solve problems that require current data beyond the model's cutoff.
Evaluation is often empirical and requires careful A/B testing and versioning of prompts.

Best practices

Maintain prompt templates under version control and track experiments.
Use unit tests and automated checks for format and safety.
Combine with lightweight verification (e.g., regex checks, parsers) to catch format violations.

RAG: how it works and doesn't

RAG augments model output with material retrieved from a document corpus (internal knowledge base, web, PDFs, etc.). Technically this is "retrieve, augment, generate."

With Langbase Memory, you can implement RAG directly. Memory lets you ingest documents, store embeddings, and retrieve relevant chunks during a conversation.

High-level architecture

Ingest: documents are preprocessed, split into chunks, and embedded.
Store: embeddings and chunk metadata are stored in a vector database.
Retrieve: for each user query, compute an embedding and fetch top-k semantically similar chunks.
Re-rank and filter: optionally re-score retrieved results with a secondary model or heuristics.
Augment prompt: concatenate the selected chunks or their summaries with the original query.
Generate: the LLM produces an answer conditioned on the augmented prompt.

Langbase Memory (RAG) architecture

Why RAG is valuable

Gives access to up-to-date, domain-specific facts without retraining the model.
Enables provenance: you can link answers back to documents or passages.
Good for domains where the base model's cutoff or coverage is insufficient.

Operational trade-offs

Latency: retrieving and re-ranking adds time per query.
Infrastructure: requires embedding services, a vector DB, and periodic re-ingestion.
Cost: embedding + storage + retrieval + LLM calls can be materially more expensive.
Hallucination risk: the model can still hallucinate or over-generalize even with retrieved context; requiring explicit citation and grounding helps

Best practices

Chunk documents with overlap to preserve context but avoid redundancy.
Precompute and refresh embeddings when sources change.
Use re-rankers (BM25, cross-encoders) to improve precision.
Limit context length and prioritize high-quality sources.
Surface provenance (document id, snippet, URL) with each claim.

Fine-tuning

Fine-tuning updates model weights by training on a labeled, domain-specific dataset. Variants include full fine-tuning and parameter-efficient methods (LoRA, adapters, PEFT), which change fewer parameters for lower cost.

What fine-tuning achieves

Embeds domain knowledge and preferred behaviors directly into the model.
Improves consistency for specialized tasks and can reduce the need for long context windows at inference.
Eliminates the per-query retrieval overhead if all needed knowledge can be encoded in the model.

Requirements and costs

Data: high quality, well-labeled examples are essential. Thousands of curated examples are typical for nontrivial tasks.
Compute: training requires GPUs or managed training services; costs can escalate for large models
Maintenance: to update knowledge you need to retrain or adapt the model; versioning and rollback mechanisms are necessary.
Risks: catastrophic forgetting (losing general capabilities), overfitting to training data, possible introduction of biases.

When to choose fine-tuning

When you need very high performance on a narrow, stable domain.
When latency must be minimal and predictable.
When privacy/regulatory constraints require on-device or on-premise models with no external retrieval.

Best practices

Hold out evaluation and test sets that reflect production prompts.
Use parameter-efficient methods where possible to reduce compute and avoid full retrains.
Monitor general-purpose performance post-tuning to detect catastrophic forgetting.
Keep fine-tuned models versioned and provide an easy rollback path

Comparing the approaches (concise)

Prompt engineering improves clarity and control without infrastructure changes but cannot expand a model's knowledge.
RAG provides fresh, domain-specific evidence at the cost of extra infrastructure, latency, and complexity.
Fine-tuning embeds deep expertise into the model itself, delivering faster inference and specialized behavior, but requires data, compute, and maintenance.

Most production systems use a hybrid: fine-tune where stable expertise is needed, use RAG to add recent or large external corpora, and apply prompt engineering to shape output and enforce constraints.

Example: Legal AI agent (detailed pipeline)

Ingest firm knowledge: policies, playbooks, annotated past briefs → chunk, embed, store in a secure vector DB.
Fine-tune a core model on firm templates and permitted language to internalize style, disclaimers, and firm policy.
At query time:

Compute query embedding; retrieve top-k passages from vector DB.
Re-rank passages via a cross-encoder or BM25 hybrid.
Construct a controlled prompt that includes: the most relevant passages, an instruction to cite sources, and a JSON output schema.
Run generation on the fine-tuned model.
Run a verifier that checks claims against retrieved passages; if inconsistencies appear, flag for human review.
Return response with inline citations and an "evidence" panel for the user to inspect.

This hybrid approach gives fast, policy-compliant writing, up-to-date legal citations, and auditability through provenance.

Decision flow: which to pick first

Want immediate, low-cost improvements? Start with prompt engineering.
Need current facts or large corpora accessible at query time? Implement RAG.
Need high, repeatable accuracy in a narrow domain and can afford training? Fine-tune (or use parameter-efficient tuning).

Complex, production use cases often require all three: fine-tuning for domain rules, RAG for fresh evidence, and prompt engineering for consistent outputs and safety controls.

At Langbase, we build, deploy and scale AI agents, co-powered by these approaches.

The shortest AI agent you can build

Muhammad Mairaj — Thu, 18 Sep 2025 20:41:27 +0000

We're going to build an AI agent that uses GPT-5. We'll use agent primitive by Langbase.

The agent works as a runtime LLM agent. You can specify all parameters at runtime and get the response from the agent. Agent uses Langbase's unified LLM API to provide a consistent interface for interacting with 600+ LLMs across all the top LLM providers.

See the list of supported models and providers.

Okay, let's get to building! 🚀

Installation

Start by installing the Langbase SDK (comes in both TypeScript and Python).

npm install langbase

Langbase API Key

Every request you send to Langbase needs an API key. You need to generate your API key by following these steps:

Sign up at Langbase.com
From the sidebar, click on the API keys
From here, you can create a new API key. For more details, follow this guide

Create a .env file and place the Langbase API key like this:

LANGBASE_API_KEY=your_langbase_api_key

Another environment variable you need is for the LLM model you decide to use. In this example, we are using OpenAI's GPT-5. So add the LLM API Key like this:

LLM_API_KEY=your_openai_api_key

Setting Up the Code

Next, create an index.ts file and import the necessary packages.

import { Langbase } from 'langbase';
import 'dotenv/config';

Let's initialize Langbase:

const langbase = new Langbase({
    apiKey: process.env.LANGBASE_API_KEY!
});

The AI Agent Code

Finally, the code for the AI agent. It might be the shortest AI agent ever. Here it is:

const {output} = await langbase.agent.run({
    model: 'openai:gpt-5-mini-2025-08-07',
    instructions: 'You are a helpful AI Agent.',
    input: 'Who is an AI Engineer?',
    apiKey: process.env.LLM_API_KEY!,
    stream: false,
});

console.log(output)

Running Your Agent

Finally, run your code:

npx tsx index.ts

That's It!

That's like 5–6 lines of code. The code is pretty much the same in Python too.

You can dynamically modify:

The prompt
The model
The system prompt (instructions param)
Get the agent to stream the output

It's up to you! Langbase supports 600+ AI models. This includes open source models like:

Kimi K2
DeepSeek R1
Grok 4
And many more

What are you waiting for? Build your AI agent today!

Context engineering: What, why and how to engineer context

Muhammad Mairaj — Thu, 18 Sep 2025 20:30:28 +0000

In the past two years, prompt engineering has risen as a crucial skill for getting the most out of AI systems. But as context windows grow bigger, a new discipline is emerging: context engineering.

Today, the underlying models are powerful for most tasks. The reason agents fail is that they are not provided the right context. The bottleneck shifts from model capability to system design. That's where context engineering comes in.

What the heck is context engineering?

Context engineering is designing systems to deliver context to the LLM. It involves filling the LLM's context window with exactly what it needs to succeed at a specific step in a complex workflow. It is setting the stage for the models to perform effectively.

The complete environment for the model. Includes background, tone, intent, history, tools, and guardrails.

Compare LLMs to an OS, where CPU is the model and RAM represents a "working memory" for the model. Just like an OS curates what fits into the RAM, context engineering is the science of filling the context window with just the right amount of right information.

Enough information to do the task, but not too much to confuse the LLM and save token costs. AI models get their AI agents get their context from multiple dynamic sources. It includes:

Instructions - prompts, memories, guardrails, and preferences
Retrieval - retrieval systems to fetch relevant information
Tool outputs - data flowing in from web search and APIs

Together, they form the model's perception of the task at hand. Context engineering is about orchestrating this perception with precision, ensuring that every token counts and every part of the context serves a purpose.

The context window

The context window is the amount of information the model sees at a time. It is limited, and there is ample debate around token optimization to best manage the window.

Context engineering is the craft of carefully filling this window with the right information. There is so much context available to the model at any time. For example, a deep-research agent might retrieve hundreds of pages of content through search tools - far exceeding what the model can fit into its context window.

It's like packing a bag for a hike. Take too little and you're lost. Take too much and you're overwhelmed.

If you provide too little information, the model gives vague answers. If you provide too much information, the context overflows.

Why context engineering is important

In multi-agent systems, where a complex task is distributed across several agents, managing context becomes absolutely critical.

Consider a scenario where a task is being divided between multiple agents. The subagents receive a fragment of the overall context and a subtask. The agent that divided the task has the full context of the task. The subagents don't have that broader context. They can easily miscommunicate and produce nothing close to the original task.

This happens often in multi-agent architectures. Agent-to-agent communication can fix the problem of miscommunication between agents, but that's early at the moment. Until that matures, ensure the task is parallelizable for multi-agent scenarios.

One way to curb this problem is context compression. Agent interactions can span hundreds of turns and may have token-heavy tool calls, resulting in context overflow. At each turn, compress the context to only forward high-value tokens.

This is one of the techniques of context engineering. General principles for building agents are still in their infancy, so there is lot of ongoing experimentation.

Beyond prompts: A system approach

Prompts are instructions part of one interaction for accomplishing a task. Prompt engineering involves specific phrasings, examples, or formatting that trigger desired responses. Context engineering requires systematic approaches: database design, information architecture, retrieval systems, and knowledge management.

Prompt is what you ask. E.g. "Translate this paragraph to Spanish"
Context is what the model knows when you ask it. E.g. "User is a South Asian. Previous conversations were about visiting Spain."

The prompt is just the tip of the iceberg. Context is everything underneath that makes it possible.

Engineering context is much larger. Designing systems to best deliver the right context agents running in production for months, require structures. These structures are scalable, and rigorously tested to avoid breaking real-world production environments.

The shift from prompts to context is not just semantic, it's systematic.

Engineering a great context

To engineer effective context, you have to make decisions in real time - about what to include, exclude and carry forward. Key questions include:

What's the broader task at hand?
What information does the model need?
What should it remember from earlier steps?
What should it forget to avoid getting confused?

Context engineering is still an emerging discipline, and best practices continue to evolve. However, it is not an optional skill. It's the core of how the powerful AI systems will work from here on.