The Handoff Prompt: Transfer AI Context Between Models Without Losing State

#ai #programming #productivity #promptengineering

Here's a workflow I run almost every day: start a task with a fast cheap model, hit its limits, then escalate to a bigger model for the hard part. Or go the other way — use a big model to design something, then switch to a small one for mechanical execution.

The problem: every time I switched, I'd paste half my chat history into the new window and the new model would start with the wrong context. Missing decisions, misread scope, re-asking questions I'd already answered.

So I stopped copy-pasting chats and started writing a Handoff Prompt — a structured summary the outgoing model writes for the incoming one. Here's the pattern.

The problem with raw chat transfer

When you paste a long conversation into a new model, three things go wrong:

Signal-to-noise drops. Most chat turns are tool errors, clarifications, false starts. The new model has to re-infer what's still relevant.
Decisions disappear. "We decided to use Postgres instead of DynamoDB" is buried in turn 14. The new model might recommend DynamoDB on turn 1.
Implicit context is lost. Things the old model "knew" from earlier in the conversation aren't spelled out anywhere.

Raw transcripts are the worst possible format for context transfer. They're optimized for humans replaying a conversation, not for an agent picking up work.

The Handoff Prompt template

At the end of a session (or whenever I want to switch models), I ask the current model to generate a handoff:

You are about to hand off this task to another AI assistant. Write a HANDOFF PROMPT that contains everything the next assistant needs, and nothing it doesn't. Use this exact structure:

# Handoff: <one-line task summary>

## Goal
<what we're trying to accomplish, 1-3 sentences>

## Current state
<what's been done so far — concrete artifacts, not narrative>

## Decisions made (do not re-litigate)
- <decision>: <reason>
- <decision>: <reason>

## Open questions
- <question the user hasn't answered yet>

## Constraints
- <hard constraint, e.g., "no external dependencies">
- <soft constraint, e.g., "prefers functional style">

## Next step
<the single next thing the incoming model should do>

## Context files
<list of file paths or artifacts the next model should read, in order>

Do NOT include chat pleasantries, tool errors, or anything the next assistant doesn't strictly need. Be ruthless.

The result is usually 200-400 tokens. That's it. That's your entire context.

Why this works

Decisions are explicit. The "Decisions made" section acts like an immune system against re-litigation. The next model sees "we chose Postgres, not DynamoDB" and won't suggest otherwise unless you explicitly ask.

State is concrete, not narrative. "Implemented the user auth endpoint, tests passing, see api/auth.py" beats "We've been working on the auth stuff and it's mostly done I think."

The next step is pre-committed. Instead of the new model deciding what to do, it inherits a specific instruction. Momentum is preserved.

Open questions surface. The old model often knows it's blocked on something. Writing it down means the new model can ask the human directly instead of stumbling into the same wall.

A real example

Here's a handoff I generated yesterday, switching from a small model (design phase) to a bigger one (implementation phase):

# Handoff: Build a CLI tool that syncs markdown notes to a SQLite FTS5 index

## Goal
A Python CLI `notesync` that indexes all .md files in a directory into SQLite FTS5, supports search, and handles incremental updates.

## Current state
- Requirements gathered
- Schema designed (see below)
- No code written yet

## Decisions made (do not re-litigate)
- Python 3.11+, stdlib sqlite3 only, no ORMs
- FTS5 table with columns: path, title, body, mtime
- Incremental sync via mtime comparison
- CLI uses argparse, not click (user preference)
- Search output: path + 1-line snippet, ranked by bm25

## Open questions
- None currently

## Constraints
- Must work on macOS and Linux
- No external dependencies beyond Python stdlib
- Single file preferred (notesync.py)

## Next step
Write notesync.py with three commands: `init`, `sync`, `search`. Start with `init` (create schema).

## Context files
- /Users/nova/projects/notesync/SCHEMA.md
- /Users/nova/projects/notesync/REQUIREMENTS.md

The new model reads that in one pass and starts coding. No "let me understand the requirements first." No re-asking about Click vs argparse. No re-exploring the schema.

When to use it

Model switch mid-task (cheap → expensive or vice versa)
Ending a session you'll resume tomorrow (future-you is another model, basically)
Delegating a subtask to a separate agent (parallel work, clean context)
Debugging a long thread that's gone off the rails (handoff cuts the garbage)

The handoff prompt is one of those techniques that sounds obvious once you've seen it, but I spent months awkwardly pasting chat logs before I started writing them explicitly. Try it once on a task you're about to switch models on — the difference is stark.

Question for you: How do you currently transfer context between AI tools or sessions? Got a better template than this one? I'd genuinely love to see what other people use.