So far, we’ve covered how to use agent tools, skills, and sub-agents. Now, it’s time to put everything into practice. I’m going to walk you through a small proof of concept (POC) project to show how these pieces actually work together.
In this small POC project, I built a simple DEV blog post writer using LangGraph multi-agents with memory and AWS Bedrock Nova models. The workflow starts by extracting keywords from a user prompt, researching related topics from DEV Community posts, and then generating a draft blog article. An evaluator agent reviews the output and provides feedback for refinement, creating a lightweight content improvement loop.
This is intentionally a POC rather than a production-grade or highly optimized blog-writing system. Alongside blog generation, the workflow also stores topic summaries in a memory.md file to retain useful context for future runs. The project is open for extension and experimentation, making it a practical starting point for exploring agentic content pipelines.
Why MultiAgent Generator & Evaluator Pattern?
- Improves output quality by combining creation (generator) with structured review (evaluator)
- Enables iterative refinement > generate > evaluate > improve until a quality threshold is met
- Increases reliability by separating responsibilities (build vs. critique)
- Works especially well for code, content, and complex reasoning tasks
How it generates & evaluates the blog post:
Whether you're exploring agent design or building your own system, this will give you a clear, practical starting point 😉
Table of Contents
- Dependencies & Configuration
- Implementing Memory
- Implementing Research Abilities
- Agents State & Evaluation Result
- Generator & Evaluator Agents Nodes
- Workflow & Route & Graph
- Run & Call AWS Nova
- All Code & Demo
- Memory.MD File
- Generated Blog Post
- Conclusion
- References
Dependencies & Configuration
- Please install dependencies:
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
# deactivate
Requirements.txt:
langchain>=1.0.0
langchain-aws>=1.2.0
langgraph>=1.0.0
python-dotenv>=1.0.0
boto3>=1.34.0
langchain-community>=0.4.1
duckduckgo-search>=8.1.1
ddgs>=9.14.1
requests>=2.33.0
-
Enable AWS Bedrock model access in your region (e.g. eu-central-1, us-east-1)
AWS Bedrock > Bedrock Configuration > Model Access > AWS Nova-Pro, or Claude 3.7 Sonnet
In this code, we'll use
AWS Nova-Pro, because it's served in different regions by AWS.After model access, give permission in your IAM to access AWS Bedrock services:
AmazonBedrockFullAccess-
2 Options to reach AWS Bedrock Model using your AWS account:
-
AWS Config: With
aws configure, to createconfigandcredentialsfiles - Getting variables using .env file: Add .env file:
-
AWS Config: With
AWS_ACCESS_KEY_ID= PASTE_YOUR_ACCESS_KEY_ID_HERE
AWS_SECRET_ACCESS_KEY=PASTE_YOUR_SECRET_ACCESS_KEY_HERE
Implementing Memory
- Memory Functions (init, read, append):
# Memory
def mem_init(topic: str) -> None:
MEM.write_text(
f"# Memory — {topic} ({datetime.datetime.now():%Y-%m-%d %H:%M})\n\n---\n\n"
"## Research\n\n## Sources\n\n## Critiques\n\n## Log\n",
encoding="utf-8",
)
def mem_read() -> str:
return MEM.read_text(encoding="utf-8") if MEM.exists() else ""
def mem_append(section: str, content: str) -> None:
text = mem_read()
eol = text.index("\n", text.index(f"## {section}") + len(f"## {section}"))
MEM.write_text(text[:eol+1] + "\n" + content.strip() + "\n" + text[eol+1:], encoding="utf-8")
Implementing Research Abilities
- Research functions (keywords generation, search on DuckDuckGo, fetch and collect):
from ddgs import DDGS
import re, sys, json, datetime, requests
# Research
def get_keywords(topic: str) -> list[str]:
resp = GEN.invoke(
f"Generate 8 diverse search queries for technical articles about: {topic}\n"
"Return ONLY a JSON array of strings."
)
try: return json.loads(resp.content.strip())[:8]
except: return [topic]
def search(query: str) -> list[dict]:
def run(q: str) -> list[dict]:
with DDGS() as d:
raw = list(d.text(q, max_results=MIN_ARTICLES * 4))
return [{"title": r["title"], "url": r["href"], "snippet": r["body"]}
for r in raw if SITE in r.get("href", "")]
for q in [f"site:{SITE} {query}", f"{query} {SITE}"]:
try:
hits = run(q)
if hits: return hits[:MIN_ARTICLES]
except: continue
return []
def fetch(url: str, snippet: str = "") -> str:
try:
html = requests.get(url, headers={"User-Agent": "Mozilla/5.0"}, timeout=10).text
text = re.sub(r"<(style|script)[^>]*>.*?</\1>", " ", html, flags=re.DOTALL)
text = re.sub(r"<[^>]+>", " ", text)
text = re.sub(r"\s{2,}", " ", text).strip()
return text[:3000] if len(text) >= 300 else (snippet[:3000] or "[unavailable]")
except Exception as e:
return f"[failed: {e}]"
def collect(keywords: list[str]) -> tuple[str, list[str]]:
seen, found = set(), []
for kw in keywords:
if len(found) >= MIN_ARTICLES: break
for hit in search(kw):
if hit["url"] in seen or len(found) >= MIN_ARTICLES: continue
seen.add(hit["url"])
content = fetch(hit["url"], hit["snippet"])
if len(content.strip()) < 300 or content.startswith("["): continue
found.append({**hit, "content": content})
print(f" {len(found)}/{MIN_ARTICLES} {hit['url']}")
if len(found) < MIN_ARTICLES:
print(f" WARNING: only {len(found)}/{MIN_ARTICLES} articles found")
sections = [
f"### [{a['title']}]({a['url']})\n**Snippet:** {a['snippet']}\n\n**Content:**\n{a['content']}\n"
for a in found
]
return "\n---\n".join(sections) or "_No results._", [a["url"] for a in found]
Agents State & Evaluation Result
- State needs to communicate between agents.
- Also, EvalResult (evaluation of result) needs to optimize, evaluate the result.
from typing import TypedDict, Literal, Optional
from pydantic import BaseModel, Field
class State(TypedDict):
topic: str; keywords: list[str]; blog: str
feedback: Optional[str]; verdict: str; iteration: int; max_iter: int
class EvalResult(BaseModel):
verdict: Literal["accepted", "rejected"]
depth_score: int = Field(ge=1, le=5)
recency_score: int = Field(ge=1, le=5)
structure_score: int = Field(ge=1, le=5)
writing_score: int = Field(ge=1, le=5)
feedback: Optional[str] = None
evaluator = EVAL.with_structured_output(EvalResult)
Generator & Evaluator Agents Nodes
- Generator create keywords, research, appends the data (links, summary) into the Memory, then generates the blog post.
# generator
def generator(state: State) -> dict:
n, topic = state["iteration"] + 1, state["topic"]
print(f"\n[Generator] Iter {n}/{state['max_iter']}")
keywords = state["keywords"] or get_keywords(topic)
research, urls = collect(keywords)
now = datetime.datetime.now().strftime("%H:%M:%S")
mem_append("Research", f"### Iter {n} — {now}\n**Keywords:** {', '.join(keywords)}\n\n{research}\n")
mem_append("Sources", f"### Iter {n}\n" + "\n".join(f"- {u}" for u in urls) + "\n")
mem_append("Log", f"- **Iter {n}** `{now}` — {len(urls)} articles\n")
rewrite = (f"\n\nPREVIOUS POST:\n{state['blog']}\nFEEDBACK:\n{state['feedback']}"
if state.get("feedback") and n > 1 else "")
# Strip the Log section — critiques + research are useful, raw logs are noise
memory_context = mem_read().split("## Log")[0][:4000]
blog = GEN.invoke(
f"Write a 1500-2000 word technical blog post about **{topic}** for senior AI/ML engineers.\n\n"
f"MEMORY CONTEXT (previous research, sources, critiques):\n{memory_context}\n\n"
f"CURRENT RESEARCH:\n{research[:8000]}"
f"{rewrite}\n\n"
"Requirements: cite ≥8 sources by title + URL, # title, ## sections, code examples.\n"
"End with ## References (real URLs only). Markdown only."
).content
BLOG.write_text(blog, encoding="utf-8")
print(f" Blog: {len(blog)} chars | {len(urls)} sources")
return {"blog": blog, "iteration": n, "keywords": keywords}
- Evaluator evaluate the post in certain criterias (depth, recency, structure, writing), appends the output into the Memory.
# evaluator
def evaluate(state: State) -> dict:
print(f"\n[Evaluator] Reviewing iter {state['iteration']}...")
e: EvalResult = evaluator.invoke(
f"Evaluate this blog post on **{state['topic']}** for senior AI engineers.\n\n"
f"POST:\n{BLOG.read_text(encoding='utf-8')}\n\nMEMORY:\n{mem_read()[:2000]}\n\n"
"Score 1-5 each: depth, recency (2023-2025), structure, writing. Accept only if ALL ≥ 4."
)
print(f" D:{e.depth_score} R:{e.recency_score} S:{e.structure_score} W:{e.writing_score} → {e.verdict.upper()}")
now = datetime.datetime.now().strftime("%H:%M:%S")
mem_append("Critiques",
f"### Iter {state['iteration']} — {now} — **{e.verdict.upper()}**\n"
f"| Depth | Recency | Structure | Writing |\n|---|---|---|---|\n"
f"| {e.depth_score}/5 | {e.recency_score}/5 | {e.structure_score}/5 | {e.writing_score}/5 |\n"
+ (f"\n**Feedback:** {e.feedback}\n" if e.feedback else "")
)
mem_append("Log",
f"- **Iter {state['iteration']}** `{now}` — {e.verdict.upper()} "
f"(D:{e.depth_score} R:{e.recency_score} S:{e.structure_score} W:{e.writing_score})\n"
)
return {"verdict": e.verdict, "feedback": e.feedback or ""}
Workflow & Route & Graph
- It needs to create the agents as nodes using graph.
- Route decides the number of iterations.
from langgraph.graph import StateGraph, START, END
def route(state: State) -> str:
if state["iteration"] >= state["max_iter"]:
print(" Max iterations — publishing best version."); return "Accepted"
return "Accepted" if state["verdict"] == "accepted" else "Rejected"
# Graph
graph = StateGraph(State)
graph.add_node("generator", generator)
graph.add_node("evaluator", evaluate)
graph.add_edge(START, "generator")
graph.add_edge("generator", "evaluator")
graph.add_conditional_edges("evaluator", route, {"Accepted": END, "Rejected": "generator"})
pipeline = graph.compile()
Run & Call AWS Nova
- For each agents, assign ChatBedrockConverse.
from langchain_aws import ChatBedrockConverse
from pathlib import Path
from dotenv import load_dotenv
load_dotenv()
OUT = Path("output"); OUT.mkdir(exist_ok=True)
BLOG = OUT / "blog_post.md"
MEM = OUT / "memory.md"
SITE = "dev.to"
MIN_ARTICLES = 5
GEN = ChatBedrockConverse(model="us.amazon.nova-pro-v1:0", temperature=0.8)
EVAL = ChatBedrockConverse(model="us.amazon.nova-pro-v1:0", temperature=0)
def run(topic: str, max_iter: int = 3) -> None:
print(f"\n{'='*60}\n {topic}\n{'='*60}")
mem_init(topic)
result = pipeline.invoke({
"topic": topic, "keywords": [], "blog": "", "feedback": None,
"verdict": "", "iteration": 0, "max_iter": max_iter,
})
print(f"\n{'='*60}\n Done in {result['iteration']} iter(s) — output/ written\n{'='*60}\n")
if __name__ == "__main__":
run(" ".join(sys.argv[1:]) or "AI Agents with Memory on AWS Bedrock AgentCore")
All Code & Demo
GitHub Link: Project on GitHub
Run:
python3 agent.py
Memory.MD File
- Generator collects blog posts from Dev.To with their links and content.
- Generator collects all links under sources for references.
## Research
### Iter 1 — 14:46:42
**Keywords:** AI Agents with Memory on AWS Bedrock AgentCore: Overview and Use Cases, Implementing Long-Term Memory for AI Agents using AWS Bedrock AgentCore, Comparing AWS Bedrock AgentCore Memory Features with Other AI Platforms, ...
### [AWS Bedrock AgentCore Memory: Give Your AI Agent a Brain That Actually ...](https://dev.to/sampathkaran/aws-bedrock-agentcore-memory-give-your-ai-agent-a-brain-that-actually-remembers-12ie)
**Snippet:** It's a managed memory service purpose-built for agents, with three distinct memory tiers and a retrieval API that plugs directly into the Bedrock agent runtime. ...
**Content:**
AWS Bedrock AgentCore Memory: Give Your AI Agent a Brain That Actually Remembers - Prerequisites Familiarity with AWS Bedrock, boto3, and building LLM-based agents. The Problem With Stateless Agents If we ship a Bedrock agent to production, we already hit this wall. Every invocation is stateless. We hack around it by stuffing conversation history into the prompt, bloating your token count, and eventually hitting context limits. Or you build your own memory layer...
## Sources
### Iter 1
- https://dev.to/sampathkaran/aws-bedrock-agentcore-memory-give-your-ai-agent-a-brain-that-actually-remembers-12ie
- https://dev.to/aws-builders/agent-memory-strategies-building-believable-ai-with-bedrock-agentcore-kn6
- https://dev.to/aws/build-production-ai-agents-with-managed-long-term-memory-2jm
- https://dev.to/sudarshangouda/ai-agent-memory-from-manual-implementation-to-mem0-to-aws-agentcore-2d7c
- https://dev.to/yuriybezsonov/ai-agent-memory-made-easy-amazon-bedrock-agentcore-memory-with-spring-ai-3bng
- Evaluator puts its critiques under Memory.MD
## Critiques
### Iter 1 — 14:46:54 — **ACCEPTED**
| Depth | Recency | Structure | Writing |
|---|---|---|---|
| 5/5 | 5/5 | 5/5 | 5/5 |
## Log
- **Iter 1** `14:46:54` — ACCEPTED (D:5 R:5 S:5 W:5)
- **Iter 1** `14:46:42` — 5 articles
GitHub Link: Memory.MD on GitHub
Generated Blog Post
- It creates blog post. However, our aim is not to create perfect blog post.
# AI Agents with Memory on AWS Bedrock AgentCore
In the evolving landscape of AI and machine learning, the ability to create agents that can retain and utilize past interactions is paramount. AWS Bedrock AgentCore Memory provides a robust solution for building AI agents with memory capabilities. This article will delve into how AWS Bedrock AgentCore Memory works, its advantages, and how senior AI/ML engineers can leverage it to build more effective and context-aware AI agents.
## Introduction to AWS Bedrock AgentCore Memory
AWS Bedrock AgentCore Memory is a managed service designed specifically for AI agents, offering a sophisticated memory system that enhances the capabilities of these agents. Unlike traditional stateless agents, AWS Bedrock AgentCore Memory enables agents to retain context across interactions, making them more effective and efficient. This service is particularly beneficial for applications requiring long-term memory, such as customer support bots, virtual assistants, and conversational AI systems....
GitHub Link: Blog Post Output on GitHub
Conclusion
In this post, we mentioned:
- how to create multi-agents with LangGraph,
- how to use memory files (markdown files) between multi-agents,
- how to create nodes, routes, states, evaluation loop,
- how to use AWS Bedrock Nova.
This small POC project creates blog post drafts, but the goal is not to write a perfect blog post fully with AI. Instead, it helps people by doing tasks like research, organizing ideas, and creating a first draft. The draft can then be improved with feedback from agents and edited by a human. Writing is still a human process, and AI agents are used as assistants to save time and improve quality, not to replace the writer.
If you found the tutorial interesting, I’d love to hear your thoughts in the blog post comments. Feel free to share your reactions or leave a comment. I truly value your input and engagement 😉
For other posts 👉 https://dev.to/omerberatsezer 🧐
References
- https://docs.langchain.com/oss/python/langgraph/overview
- https://aws.amazon.com/bedrock
- https://github.com/omerbsezer/Fast-LLM-Agent-MCP/
Your comments 🤔
I’d love to hear how others are building agentic workflows.
- Are you using generator–evaluator patterns, memory layers, or multi-agent architectures in real projects?
- What are you thinking about LangGraph, Multi-Agents, AWS Bedrock Services?
Drop your thoughts, feedback, or improvements in the comments, always curious to learn from different approaches. ☺️

Top comments (1)
My current view is that the future of agentic apps is less about one super agent and more about orchestrated systems of smaller specialized agents, tools, memory layers, and evaluation mechanisms. I also think we’ll move from prompt-centric apps toward workflow-centric products, where reliability, observability, and evaluation become as important as model quality itself. Curious how others see the future of agentic apps. Are you building simple agent workflows first, or jumping directly into more complex multi-agent systems?