Gunnar Grosch

Posted on Mar 1

Writing System Prompts That Actually Work: The RISEN Framework for AI Agents

#agents #ai #aws #llm

You've probably written a system prompt that looks like this:

You are a helpful assistant. Help the user with their request.

It works. The model responds. But the output is unpredictable. Ask it to review code and you get a mix of style comments and security findings with no consistent structure. Ask it to diagnose an incident and it gives you a wall of text that buries the actionable steps. Ask it to design an architecture and it picks services without explaining trade-offs.

If you're building agents that need to produce consistent, structured output, whether that's a single-agent workflow or a multi-agent system, the problem isn't the model. It's the prompt. A vague system prompt gives the model no framework for structuring its reasoning, so it improvises every time. Sometimes the improvisation is great. Sometimes it misses the point entirely. You can't build reliable agents on "sometimes."

The RISEN Framework

RISEN is a structured approach to writing system prompts. Each letter represents a component:

Component	What it does
Role	Who the agent is. Expertise, experience, specialization.
Instructions	What you want it to do. The core task.
Steps	How to get there. The ordered workflow.
Expectation	What the output should look like. Format, structure, sections.
Narrowing	What to exclude. Constraints, boundaries, scope limits.

You'll see the E defined as "End Goal" in some formulations. Expectation is a deliberate choice here: for agents, what matters is the structural contract for the output, not a vague goal statement. "Produce a useful architecture" is an end goal. "Return sections for Requirements Summary, Service Selection with trade-off tables, SAM Template, and Cost Estimate" is an expectation.

Most people only write the I part. "Review the code." "Diagnose the issue." "Design an architecture." That's an instruction with no context about who's doing the work, what process to follow, what format to use, or what to leave out.

RISEN fills in the rest. The result isn't just a prompt. It's a behavioral contract. The agent knows what role it's playing, what steps to follow, what structure to produce, and what boundaries to respect.

Why This Matters for Agents

System prompts matter more for agents than for simple chat. In a chat application, a vague system prompt means the user gets a mediocre answer and can follow up. In an agentic workflow, a vague system prompt means the agent takes actions based on an ambiguous understanding of its role. It might use the wrong tools, skip steps, or produce output that downstream agents can't parse.

In multi-agent systems (whether you're using protocols like A2A and MCP, or frameworks like Strands Agents), each agent's system prompt is its behavioral contract with the rest of the system. A warehouse management agent in a logistics pipeline needs to know exactly what decisions it owns, what format to return, and what to escalate. "You are a warehouse assistant" doesn't cut it.

The SwiftShip demo from re:Invent session DEV415 is a good example. It's a logistics platform with four agents (Triage, Order, Payment, Warehouse) that coordinate to resolve delivery exceptions. Every agent has a RISEN-structured system prompt. The Triage Agent's Steps section is a full decision tree: classify the exception, determine the resolution strategy, invoke the right specialist agents in the right order (Payment before Warehouse before Order for replacements), and produce a resolution summary. The Narrowing section prevents it from handling general customer inquiries and enforces that it never processes refunds without confirming the exception type. That's not a prompt. That's an orchestration contract.

This is also where Narrowing earns its place. Without explicit constraints, agents over-deliver. An incident response agent might suggest rewriting the application code when all you need right now is "switch DynamoDB to on-demand capacity." Narrowing keeps the agent focused on what's useful for the current context.

The Difference in Practice

I put together a demo repo with three scenarios that show the difference between basic and RISEN system prompts. Each scenario sends the same user prompt to the same model twice: once with a one-sentence system prompt, once with a RISEN-structured prompt. Same model, same input, different guidance. The demo uses Strands Agents with Amazon Bedrock.

Scenario 1: Incident response

The user prompt is a DynamoDB throttling alert: 4,850 WCU consumed against 1,000 provisioned, 2,347 throttled requests, deployed a new Lambda version 45 minutes ago.

Basic prompt:

You are an incident response assistant. Help diagnose and resolve
AWS issues.

RISEN prompt (abbreviated, full version in the demo repo):

# Role
You are an AWS site reliability engineer on an on-call rotation
with 10 years of experience operating production serverless workloads.

# Instructions
Perform a structured diagnosis. Identify the most likely root cause,
provide immediate mitigation steps, and recommend longer-term fixes.

# Steps
1. Parse the alert details: service, metric, threshold, duration.
2. List the top 3 most likely root causes in order of probability.
3. For each, describe evidence that would confirm or rule it out.
4. Provide immediate mitigation steps executable in under 5 minutes.
5. Recommend longer-term fixes with estimated effort.

# Expectation
Sections: Alert Summary, Probable Root Causes (ranked), Diagnostic
Steps, Immediate Mitigation, Long-Term Fixes. Include specific
metric names, CLI commands, and thresholds.

# Narrowing
- Operator has CLI access but cannot deploy code changes during
  the incident.
- Focus on mitigation first. Restoring service is the priority.
- Do not suggest "contact AWS Support" as a first step.
- All commands should use AWS CLI v2 syntax.

The basic prompt gives a solid response. It correctly identifies the new Lambda deployment as the likely cause, provides useful CLI commands, and suggests scaling up DynamoDB. But it's organized as a narrative with emoji headers and ends by asking the operator what to do:

## 🚨 Immediate Issue
Your write capacity is being consumed at **485% of provisioned capacity**...

## 🔍 Root Cause Hypothesis
Given the timeline, the new Lambda deployment is the likely culprit.

...

**What would you like to do first? Scale the table or rollback the Lambda?**

That question is the wrong instinct for an incident response agent. At 2 AM, you don't want a conversation. You want a ranked action plan.

The RISEN prompt produces exactly that. Root causes are ranked with confidence percentages:

## 1. **Lambda Write Amplification (90% confidence)**
## 2. **Hot Partition Key Issue (70% confidence)**
## 3. **SQS Message Backlog Processing (60% confidence)**

Each mitigation option includes cost and impact:

## Option A: Increase DynamoDB Write Capacity (60 seconds)
aws dynamodb update-table \
  --table-name order-events-prod \
  --provisioned-throughput ReadCapacityUnits=1000,WriteCapacityUnits=5000
Impact: Eliminates throttling immediately. Table update takes 30-60 seconds.
Cost: ~$0.35/hour additional ($2,336/month vs $467/month baseline)

And long-term fixes come with effort estimates ("Effort: 15 minutes", "Effort: 4 hours") so you can prioritize. The Narrowing constraint about not deploying code during an incident kept the response focused on what an on-call engineer can actually do without waking up the development team.

Scenario 2: Architecture decision

This scenario adds a twist: both agents get the same AWS MCP server tools for searching AWS documentation, checking service limits, and validating recommendations. Same tools, same model, same user prompt. The only difference is the system prompt.

The user prompt describes requirements for a real-time order notification system: 50,000 orders per day, multiple notification channels, customer preferences, 30-second delivery SLA, under $500/month.

Basic prompt:

You are an AWS solutions architect. Help design cloud architectures.

RISEN prompt (abbreviated):

# Role
You are a principal AWS solutions architect specializing in
event-driven serverless architectures.

# Instructions
Design an AWS architecture. Evaluate service options, justify
choices with trade-offs, and provide a SAM template snippet.
Use the AWS documentation tools to validate your recommendations.

# Steps
1. Restate requirements as functional and non-functional.
2. Identify the core architectural pattern.
3. For each component, list 2-3 service options with trade-offs.
   Use the documentation tools to verify current service limits
   and pricing.
4. Select and justify the recommended option.
5. Describe the data flow end to end.
6. Provide a SAM template snippet.
7. Call out operational considerations.

# Expectation
Sections: Requirements Summary, Architecture Pattern, Service
Selection (with trade-off tables), Data Flow, SAM Template,
Operational Considerations. Include a monthly cost estimate.

# Narrowing
- Prefer serverless over instance-based.
- Use managed services only.
- SAM templates should be valid YAML, not pseudocode.
- Cost estimates using current us-east-1 pricing.
- Use the documentation tools only to verify specific facts
  (pricing, limits, quotas). Do not use them to generate
  the architecture itself.

Both agents used the MCP tools. But look at what they did with them.

The basic prompt queried the documentation and jumped straight to a recommendation:

## Recommended Architecture: Event-Driven Real-Time Order Notification System
...
### Core Components
#### 1. Event Ingestion Layer
- **Amazon EventBridge**: Central event bus for order events
...
Would you like me to:
1. Generate CDK/CloudFormation templates for this architecture?
2. Create the Lambda function code with full error handling?

No alternatives evaluated. No trade-offs. And it ends by asking what to do next.

The RISEN prompt used the same tools to verify facts, then produced trade-off tables for every component:

### 3.1 Event Ingestion Layer

| Service           | Pros                         | Cons                       | Verdict      |
|-------------------|------------------------------|----------------------------|--------------|
| EventBridge       | Native filtering, $1/M events| Limited transformation     | SELECTED     |
| Kinesis Streams   | Replay, high throughput      | $11/month min, overkill    |              |
| SQS               | Simple, cheap                | No native fanout           |              |

Decision: EventBridge - 8.33 events/sec peak << 10,000/sec limit

The Steps guided the agent to evaluate before deciding. The Narrowing constraint "Use the documentation tools only to verify specific facts" kept the tool usage focused: the agent looked up pricing and limits, not architectures. The result was a full architecture document with a SAM template, a cost breakdown ($253.51/month against the $500 budget), and operational considerations including scaling limits and monitoring.

Scenario 3: Code review

The user prompt is a Lambda function with several issues: SDK client instantiated inside the handler, no input validation, sensitive data (SSN) returned in the API response, wildcard CORS headers, and no error handling.

Basic prompt:

You are a code review assistant. Review code for issues and suggest improvements.

RISEN prompt (abbreviated):

# Role
You are a senior AWS security engineer specializing in serverless
application security.

# Instructions
Review the provided code for security vulnerabilities, performance
issues, and AWS best practice violations. Prioritize findings by
severity and provide fix recommendations with corrected code.

# Steps
1. Identify the AWS services and patterns in use.
2. Check for security issues: injection, overly permissive IAM,
   hardcoded secrets, missing input validation.
3. Check for performance issues: cold start impact, unnecessary
   SDK client instantiation.
4. Check for reliability issues: missing error handling, no retries.
5. For each finding, provide severity, the problematic code,
   and a corrected snippet.

# Expectation
Structured review organized by severity. Each finding includes:
severity level, description, problematic code, corrected code.
End with a summary count.

# Narrowing
- Focus on production impact. Ignore style preferences.
- Do not suggest rewriting the entire function or switching runtimes.
- Limit the review to security, performance, and reliability.

Both prompts catch the SSN exposure. But look at how the output differs.

The basic prompt opens with emoji-coded sections and mixes severity with style:

## Critical Issues 🔴
### 1. **Security Vulnerability - Sensitive Data Exposure**
...
## Medium Priority Issues 🟠
### 7. **Type Safety**
- `event: any` loses type safety

It also generates a full rewrite of the function (which the reviewer didn't ask for) and ends with a brief summary.

The RISEN prompt produces a consistent structure: every finding follows the same format (Severity, Description, Problematic Code, Corrected Code) and ends with a summary table:

| Severity     | Count | Issues                                     |
|--------------|-------|--------------------------------------------|
| **Critical** | 2     | NoSQL injection, PII exposure (SSN)        |
| **High**     | 3     | Missing auth, permissive CORS, no errors   |
| **Medium**   | 3     | Cold start, input validation, null checks  |
| **Low**      | 2     | TypeScript any type, missing headers       |
| **Total**    | **10**|                                            |

The Narrowing constraint "do not suggest rewriting the entire function" kept the RISEN response focused on targeted fixes. The basic prompt had no such guardrail and generated a complete replacement.

Try It Yourself

You'll need:

Node.js 20+
AWS credentials configured for Amazon Bedrock access
Python 3.10+ and uvx (for the architecture scenario's AWS MCP server integration)

git clone https://github.com/gunnargrosch/risen-prompt-demo.git
cd risen-prompt-demo
npm install

Run a scenario:

npm run code-review
npm run incident
npm run architecture

Each scenario runs the basic prompt first, then the RISEN prompt, so you can see the difference in your terminal.

Building Your Own RISEN Prompts

A few things I've noticed that make RISEN prompts more effective:

Role is more than a job title

"You are a code reviewer" gives the model a vague persona. "You are a senior AWS security engineer specializing in serverless application security" tells it what lens to apply. The more specific the role, the more the model draws on relevant knowledge. Include years of experience, domain expertise, and the specific technology stack.

Get the step granularity right

Too few steps and the model skips reasoning. Too many and it gets rigid. Three to seven steps tends to work. Each step should represent a distinct phase, not a sub-task. If you find yourself writing "2a, 2b, 2c," that's one step with internal detail, not three steps.

Make Narrowing specific

The most common mistake is forgetting Narrowing entirely. The second most common is making it too vague. "Keep it focused" isn't a constraint. "Do not suggest services in preview or limited availability" is. Write constraints that you could objectively check against the output.

Don't skip Expectation for agents

For single-use prompts, Expectation is nice to have. For agents whose output feeds into other agents or structured workflows, it's required. Specify sections, ordering, format (bullet points, tables, code blocks). If you skip one section, don't skip this one.

Your first RISEN prompt won't be your last. Run it against a few representative inputs and check the output against your Expectation section. If the structure is right but the content is off, adjust the Role or Steps. If the agent keeps going out of scope, tighten the Narrowing. If the output format is inconsistent, make the Expectation more specific. The framework gives you five independent levers to tune.

Other Frameworks Worth Knowing

RISEN isn't the only structured approach. Anthropic and OpenAI both publish recommended prompt structures for their models that cover similar ground: role, instructions, output format, examples, and constraints. If your agent uses tools extensively, RISE-M extends RISEN with a sixth component, Methods, which covers when and how to use each tool. The architecture scenario above is a lightweight version of this: the Steps and Narrowing sections include tool usage guidance ("verify current service limits" in Steps, "only to verify specific facts" in Narrowing). If your tool-specific constraints keep growing, a dedicated Methods section may be cleaner.

The frameworks overlap. The value isn't in picking the "right" one. It's in moving from an unstructured one-liner to any systematic approach that covers role, task, process, format, and constraints.

Start Here

If you want to try RISEN on your next agent, here's a blank template you can copy and fill in:

# Role
You are a [job title/expertise] specializing in [domain]. You have
[years of experience] with [specific technologies/tools].

# Instructions
[Core task in 1-2 sentences. What should the agent accomplish?]

# Steps
1. [First thing the agent should do]
2. [Second thing]
3. [Continue until the workflow is complete]

# Expectation
[Output format: sections, tables, code blocks, bullet points.
Specify the structure the response should follow.]

# Narrowing
- [What to exclude or ignore]
- [Scope boundaries]
- [Constraints on format, length, or approach]

Fill in Role first (it shapes everything else), then Instructions, then Steps. Expectation and Narrowing come last because they depend on knowing what the agent is doing and how.

Additional Resources

RISEN Prompt Demo Repository
SwiftShip Multi-Agent Demo (RISEN in a production-style multi-agent system)
Strands Agents SDK
Anthropic System Prompt Guide
OpenAI GPT-4.1 Prompting Guide
The Prompt Report (Zhou et al.)
Anthropic: Building Effective Agents

Have you applied a structured framework to your agent system prompts? What changed in the output when you did? I'd like to hear about it in the comments.

Top comments (4)

Matthew Hou • Mar 2

RISEN is a good framework for thinking about prompt structure, but the part that resonates most is the implicit argument that methodology should be codified, not left to individual skill. The difference between a developer who writes good prompts and one who doesn't shouldn't be "talent" — it should be whether they're using a structural template or winging it. I've been pushing in a similar direction: turning workflow knowledge into executable files (think AGENTS.md or skill files) rather than documentation people read once and forget. The challenge is that frameworks like RISEN work well for single prompts but get harder to maintain when you're orchestrating multi-step agent workflows where each step needs its own context and constraints. Have you tried applying this to chained agent interactions?

Gunnar Grosch • Mar 3

The codification point is exactly right. A structural template democratizes what's otherwise locked up as individual intuition.

On chained interactions: Yes, and it gets interesting. Each agent in a chain needs its own RISEN prompt, but the Expectation section of one agent effectively defines the input contract for the next. That alignment is where multi-agent systems either hold together or fall apart. I'm working on a post that goes into this specifically. Short answer: The framework scales, but you have to be deliberate about what each agent owns and what it escalates.

klement Gunndu • Mar 3

The Narrowing component is underrated — we found that explicitly listing what the agent should NOT do cut hallucination rates more than any positive instruction. Biggest win was adding constraints like "never invent package names" to the N section.

Gunnar Grosch • Mar 3

Totally agree on Narrowing being underrated. The negative constraints do more to reduce hallucinations than anything in the positive instructions. The model fills in gaps confidently unless you explicitly close them. "Never invent package names" is exactly the kind of specific, verifiable constraint that works best.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.