Day 1: I'm Done Writing Prompts by Hand — Meet DSPy

#ai #python #dspy #llm

Let me paint you a picture that probably feels familiar.

You spend 45 minutes crafting the perfect prompt. You test it. It works. You ship it. Two days later your colleague tries it with slightly different input and... it falls apart completely. So you're back at it — tweaking a word here, rearranging a sentence there, re-testing, repeat.

Sound familiar? I've been there more times than I'd like to admit. And it turns out there's a name for this exhausting loop: prompt engineering. More importantly, there's now a smarter way to escape it.

I'm kicking off a series where I read and share insights from Building LLM Applications with DSPy by Serj Smorodinsky and William Brett Kennedy (Manning Publications, MEAP V01, 2026). One chapter a day, straight to the point. Let's get into Day 1.

The Problem with Prompt Engineering (Be Honest, You Know It)

Prompt engineering is essentially trial and error dressed up as a skill. You're manually rephrasing the same instruction hoping the LLM eventually "gets it." The authors describe it well — even prompts that seem equivalent can produce wildly different results, and once your prompts grow to dozens of lines, you lose track of why each phrase is even there.

Here's what makes this particularly painful in production systems:

Every time you switch LLMs (GPT-4 → Claude → Gemini), you start over
You can't easily track which prompt variant performed best
Complex apps with multiple LLM calls compound errors fast — one bad prompt poisons the whole pipeline

Enter DSPy: Prompt Programming, Not Prompt Engineering

DSPy (short for Declarative Self-improving Python, pronounced dee-ess-pie) flips the script. Instead of writing prompts, you write code that describes what you want. DSPy handles generating, evaluating, and optimizing the actual prompts automatically.

The book frames this as a natural evolution — similar to how we no longer hand-write assembly code when building apps in Python. We work at the right level of abstraction. Prompt programming is that next level for LLM apps.

Here's what that looks like in practice:

import dspy

lm = dspy.LM("openai/gpt-4o-mini", api_key=OPENAI_API_KEY)
dspy.settings.configure(lm=lm)

predictor = dspy.Predict("question, context -> answer, confidence")
prediction = predictor(question="What is the capital of France?", context="")

print(prediction.answer, prediction.confidence)

No prompt string in sight. You declare what you want (a question answered with a confidence score), and DSPy figures out how to ask the LLM for it. Clean, readable, testable.

What DSPy Actually Does Under the Hood

This is the part I found genuinely impressive. DSPy doesn't just wrap your LLM call — it optimizes your prompts systematically. Given a task and some evaluation criteria, it:

Generates many candidate prompts automatically
Evaluates each one against your metric
Uses techniques like hill climbing, genetic algorithms, and Bayesian optimization to find the best performer
Iterates until it converges on something strong

The authors note that experiments pitting DSPy against professional prompt engineers found DSPy produced stronger prompts in less time. That's not a small claim.

The Recommended Dev Workflow

The book lays out a clean three-step process:

Baseline → Evaluate → Optimize

You start by building a simple working version of your app. Then you evaluate it rigorously (DSPy gives you the tools for this). Then you let DSPy optimize the prompts across your full pipeline — not just one call, but all of them together, tuned for the best combined outcome.

This matters a lot in agentic systems, where LLM calls chain together and one weak prompt cascades into a messy failure downstream.

When Should You Actually Use DSPy?

DSPy shines when:

You have complex prompts or multiple LLM calls in a pipeline
You need prompts to be reliable and production-grade
You're experimenting with different LLMs and don't want to rewrite everything

For casual, one-off LLM interactions? Direct prompting is probably fine. DSPy has a learning curve — it's a framework, not a magic wand. But once you're past the basics, the payoff is real.

My Takeaway from Chapter 1

What struck me most isn't the automation — it's the mindset shift. Prompt engineering is fundamentally reactive: you write something, see what breaks, fix it. Prompt programming is systematic: you define what "good" means, and the framework finds the path there.

As someone who builds production AI systems, that distinction matters enormously. Less time debugging prompts means more time shipping features.

What's Next

Tomorrow I'm covering Chapter 2: Basic Prompting and DSPy — where we get into the actual anatomy of a well-formed prompt and build our first real DSPy application. If you've ever wondered what goes into those multi-section prompt templates, that chapter breaks it down nicely.

📚 Source: Building LLM Applications with DSPy by Serj Smorodinsky and William Brett Kennedy — Manning Publications, MEAP V01, 2026. manning.com/books/building-llm-applications-with-dspy

All concepts, examples, and code snippets referenced in this post are drawn from the book above. This series is a reader's journal, not a reproduction — pick up the book for the full depth.

Are you still hand-writing your prompts? Drop a comment — I'm curious how many of you have tried DSPy already and what your experience was like.