<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Netanel Abergel</title>
    <description>The latest articles on DEV Community by Netanel Abergel (@netanelabergel).</description>
    <link>https://dev.to/netanelabergel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F911881%2F056b2c87-ed9e-4299-9c6e-2f79bb0b05ba.jpeg</url>
      <title>DEV Community: Netanel Abergel</title>
      <link>https://dev.to/netanelabergel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/netanelabergel"/>
    <language>en</language>
    <item>
      <title>An agent is only as good as the system engineering around it.</title>
      <dc:creator>Netanel Abergel</dc:creator>
      <pubDate>Fri, 24 Apr 2026 10:17:50 +0000</pubDate>
      <link>https://dev.to/netanelabergel/an-agent-is-only-as-good-as-the-system-engineering-around-it-54en</link>
      <guid>https://dev.to/netanelabergel/an-agent-is-only-as-good-as-the-system-engineering-around-it-54en</guid>
      <description>&lt;p&gt;Anthropic published &lt;a href="https://www.anthropic.com/engineering/april-23-postmortem" rel="noopener noreferrer"&gt;this&lt;/a&gt; postmortem after users experienced a noticeable drop in Claude Code quality.&lt;br&gt;
That’s the important setup: the issue was real, but the root cause mostly wasn’t the base model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid8g9sh1f542x3oaz5vf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid8g9sh1f542x3oaz5vf.png" alt=" " width="800" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three levels shape the outcome:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Model — the base intelligence
&lt;/li&gt;
&lt;li&gt;Context — what the agent remembers, what gets dropped, and how state carries across turns
&lt;/li&gt;
&lt;li&gt;Harness — the prompts, tool wiring, defaults, and guardrails around the model&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this case, the quality drop mostly came from layers 2 and 3:&lt;br&gt;
less reasoning by default, lost context across turns, and prompt changes that reduced coding quality.&lt;/p&gt;

&lt;p&gt;Same model family.&lt;br&gt;&lt;br&gt;
Different orchestration.&lt;br&gt;&lt;br&gt;
Very different user experience.&lt;/p&gt;

&lt;p&gt;That’s why agent quality won’t be defined only by which model you use.&lt;br&gt;&lt;br&gt;
It will be defined by how well we engineer the system around it.&lt;/p&gt;

&lt;p&gt;Postmortem link in the first comment.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>12 Production-Grade Skills That Turn an AI Agent Into a Real Teammate</title>
      <dc:creator>Netanel Abergel</dc:creator>
      <pubDate>Wed, 22 Apr 2026 08:43:31 +0000</pubDate>
      <link>https://dev.to/netanelabergel/12-production-grade-skills-that-turn-an-ai-agent-into-a-real-teammate-4a0h</link>
      <guid>https://dev.to/netanelabergel/12-production-grade-skills-that-turn-an-ai-agent-into-a-real-teammate-4a0h</guid>
      <description>&lt;p&gt;I run an AI agent in production. It coordinates with other agents, tracks commitments across conversations, monitors its own infrastructure, and learns from its mistakes. It's been running 24/7 for months.&lt;/p&gt;

&lt;p&gt;What makes it useful isn't the model — it's the skills. Each skill is a structured unit of behavior with tool access, persistent memory, and real-world integrations. Here are 12 of them, each battle-tested in production, with the actual configs and architecture behind them.&lt;/p&gt;

&lt;p&gt;These skills aren't specific to any one type of agent. Whether you're building a coding agent, an ops agent, a customer-facing agent, or a coordination agent — the same patterns apply. An agent that can search its own memory, track its own commitments, and learn from its own failures is a better agent, regardless of domain.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Makes a Skill "Agentic"?
&lt;/h2&gt;

&lt;p&gt;An agentic skill goes beyond a static system prompt. It has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool bindings&lt;/strong&gt; — it can execute shell commands, query databases, call APIs, read and write files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory access&lt;/strong&gt; — it remembers what happened yesterday, last week, last month — across multiple storage backends&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State management&lt;/strong&gt; — it tracks what's pending, what's done, what failed, and picks up where it left off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Permissions&lt;/strong&gt; — it knows what it can do autonomously vs. what requires human approval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-agent coordination&lt;/strong&gt; — it can sync with other agents, share learnings, avoid duplicate work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's how each of the 12 skills works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory &amp;amp; Learning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Deep Recall — "Never say 'I don't know' without searching first"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Before the agent ever says "I don't have context on this," it executes a mandatory multi-layer search cascade: semantic memory → full-text search across structured storage → direct database queries → daily notes grep → session context files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The actual skill config:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# SKILL.md frontmatter&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deep-recall&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="s"&gt;Mandatory deep search before answering any question about past&lt;/span&gt;
  &lt;span class="s"&gt;events, conversations, decisions, people, or context.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Search Order (mandatory, in sequence):
&lt;span class="p"&gt;
1.&lt;/span&gt; memory_search(query="&lt;span class="nt"&gt;&amp;lt;topic&amp;gt;&lt;/span&gt;")
   → Searches durable memory + indexed session transcripts
&lt;span class="p"&gt;
2.&lt;/span&gt; Full-text search tool "&lt;span class="nt"&gt;&amp;lt;query&amp;gt;&lt;/span&gt;" --limit 10 --days 30
   → FTS5 index across structured storage and daily notes
&lt;span class="p"&gt;
3.&lt;/span&gt; Direct database query:
   SELECT body, ts, session_id FROM messages
   WHERE body ILIKE '%keyword%'
   AND ts &amp;gt; NOW() - INTERVAL '30 days'
&lt;span class="p"&gt;
4.&lt;/span&gt; grep -rn "&lt;span class="nt"&gt;&amp;lt;keyword&amp;gt;&lt;/span&gt;" memory/daily/ | tail -20
&lt;span class="p"&gt;
5.&lt;/span&gt; grep -rn "&lt;span class="nt"&gt;&amp;lt;keyword&amp;gt;&lt;/span&gt;" memory/sessions/&lt;span class="err"&gt;*&lt;/span&gt;/context.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; The biggest surprise was how often layer 3 (raw database queries) saves the day. Semantic search is great for fuzzy matches, but when someone asks "what did I say about X on Tuesday," a direct timestamp-ordered DB query is unbeatable. I'd recommend starting with raw storage search and layering semantic on top, not the other way around.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Memory Tiering — "Daily notes → durable memory promotion pipeline"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Manages a three-tier memory architecture (HOT → WARM → COLD) with automated nightly consolidation, graduation gates for promoting learnings to permanent memory, and retention policies that archive old daily notes and prune stale entries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The actual architecture:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Tier 1: 🔥 HOT  (memory/hot/)     — current session, active tasks
Tier 2: 🌡️ WARM (memory/warm/)    — stable preferences, configs
Tier 3: ❄️ COLD (MEMORY.md)       — long-term, distilled, curated

Nightly "Dreaming" process (3 AM):
&lt;span class="p"&gt;  -&lt;/span&gt; Light phase: ingests daily notes + session transcripts
&lt;span class="p"&gt;  -&lt;/span&gt; Deep phase: scores &amp;amp; promotes strong signals → MEMORY.md
    (weighted: relevance 0.30, frequency 0.24, recency 0.15)
&lt;span class="p"&gt;  -&lt;/span&gt; REM phase: extracts themes and patterns

Graduation Gate — nothing promotes to MEMORY.md without passing:
&lt;span class="p"&gt;  -&lt;/span&gt; Score &amp;gt;= 0.70 (weighted relevance + frequency + recency)
&lt;span class="p"&gt;  -&lt;/span&gt; Recalls &amp;gt;= 2 (the signal was retrieved and used at least twice)
&lt;span class="p"&gt;  -&lt;/span&gt; Content is a rule, preference, or durable fact — not a raw
    conversation fragment or debug log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; The graduation gate was the single most important addition. Without it, MEMORY.md fills up with noise — one-off debug notes, transient preferences, stale context. The "recalls &amp;gt;= 2" requirement is especially powerful: if a piece of knowledge was never retrieved and used, it probably doesn't belong in long-term memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Self-Learning — "Turn corrections into concrete improvements"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; When the operator corrects the agent, when a task fails, or when a better approach is discovered, this skill logs the event, identifies the root cause, and applies the smallest durable fix — updating the specific skill or workflow that caused the issue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The actual loop:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Entry format&lt;/span&gt;
&lt;span class="gu"&gt;## YYYY-MM-DD | category | short title&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Trigger: what happened
&lt;span class="p"&gt;-&lt;/span&gt; Context: what you were trying to do
&lt;span class="p"&gt;-&lt;/span&gt; Root cause: why it happened
&lt;span class="p"&gt;-&lt;/span&gt; Durable fix: file/process/skill you changed, or &lt;span class="sb"&gt;`none yet`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Verification: how to tell the problem is gone

&lt;span class="gu"&gt;## Quality bar — a learning is only complete when:&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; a local skill was improved, OR
&lt;span class="p"&gt;-&lt;/span&gt; a broken instruction was removed, OR
&lt;span class="p"&gt;-&lt;/span&gt; a missing prerequisite was documented clearly, OR
&lt;span class="p"&gt;-&lt;/span&gt; a recurring mistake was converted into a shorter, safer workflow

&lt;span class="gu"&gt;## NOT enough:&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; "be more careful"
&lt;span class="p"&gt;-&lt;/span&gt; generic promises
&lt;span class="p"&gt;-&lt;/span&gt; long postmortems without a file or process change

&lt;span class="gu"&gt;## Auto-flag threshold:&lt;/span&gt;
If a skill fails 3+ times in 14 days → flagged for rewrite.
Not "we'll try harder" — the skill itself gets rebuilt.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; The hardest part was enforcing the quality bar. Early on, the agent would log learnings like "remember to double-check next time" — which is useless. Requiring a concrete file change (a skill update, a config fix, a workflow edit) as the definition of "done" transformed the entire feedback loop. Vague learnings dropped to near zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  Coordination &amp;amp; Network
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4. Agent Network Sync — "Daily sync across multiple AI agents"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Every morning, a network of AI agents — each operating in its own domain — syncs status, shares blockers, and coordinates across stakeholders. Each agent reports what's needed, what's blocked, and what's been resolved. The sync runs as a scheduled job with structured output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Cron config&lt;/span&gt;
&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-network-daily-sync&lt;/span&gt;
&lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;9&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;        &lt;span class="c1"&gt;# 9:15 AM local time&lt;/span&gt;
&lt;span class="na"&gt;timezone&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Asia/Jerusalem&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sonnet&lt;/span&gt;                  &lt;span class="c1"&gt;# needs reasoning for cross-agent coordination&lt;/span&gt;
&lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;300s&lt;/span&gt;
&lt;span class="na"&gt;delivery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;channel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;network&lt;/span&gt;             &lt;span class="c1"&gt;# routed to shared agent channel&lt;/span&gt;
  &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;network-session-id&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each agent brings its own domain context. No agent sees another agent's private state. They coordinate on shared tasks and surface blockers to the network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; The permission boundary was the design decision I'm most glad we made early. Each agent only shares what's relevant to coordination — never raw private context. This made the whole system trustworthy enough that operators actually let their agents participate. Trust is the bottleneck for multi-agent systems, not capability.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Cross-Session Awareness — "Know everything without duplicating anything"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; An AI agent typically operates in isolated sessions — one per context, one per channel, one per task domain. This skill bridges that gap. On every heartbeat, it scans all active sessions, extracts key messages and decisions, and builds a unified context file that the primary session can reference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The mechanism:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;On heartbeat:
&lt;span class="p"&gt;  1.&lt;/span&gt; Run refresh script → scans active sessions for recent activity
&lt;span class="p"&gt;  2.&lt;/span&gt; Build sessions-context.md:
&lt;span class="p"&gt;     -&lt;/span&gt; Messages 11+: compressed summary (who, what, key decisions)
&lt;span class="p"&gt;     -&lt;/span&gt; Messages 1-10: verbatim with timestamps and sender labels
&lt;span class="p"&gt;  3.&lt;/span&gt; Key facts/decisions → write to MEMORY.md for permanence

Engagement rules:
&lt;span class="p"&gt;  -&lt;/span&gt; Scan and note — don't inject into sessions uninvited
&lt;span class="p"&gt;  -&lt;/span&gt; If something needs action → surface to the operator, let them decide
&lt;span class="p"&gt;  -&lt;/span&gt; Don't post unsolicited analyses into active conversations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; The engagement rules matter more than the technical implementation. Early versions would jump into conversations with unsolicited summaries — technically impressive, socially terrible. The "scan and note, act only in the primary session" pattern was a game-changer for making the agent welcome in multi-session contexts.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Agent Onboarding — "Structured onboarding for new agents joining the network"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; A step-by-step procedural skill that guides the setup of a new AI agent — from instance provisioning, channel linking, and integration setup, through to the operator's first interaction patterns. It gives one step at a time, confirms completion, and won't move forward until each step is verified.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key behavioral rules baked in:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Rules:
&lt;span class="p"&gt;-&lt;/span&gt; Give one step at a time. Do not dump the full guide upfront.
&lt;span class="p"&gt;-&lt;/span&gt; Confirm each step before moving on.
&lt;span class="p"&gt;-&lt;/span&gt; Never say something is done unless you verified it.
&lt;span class="p"&gt;-&lt;/span&gt; Do not start integrations before the agent responds to messages.

Operator interaction signals taught from Day 1:
| Signal              | Meaning         | Agent action                          |
| Any task request    | Operator delegated | Acknowledge immediately, confirm done |
| Positive signal     | Good job        | Log positive feedback                 |
| Negative signal     | Poor result     | Fix and log the lesson                |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; The "verify before proceeding" rule cut onboarding failures by more than half. Before that, the skill would race through all steps and report success — only for the operator to discover that channel linking had silently failed in step 3. Treating each step as a checkpoint with actual verification (not just "did you do it?") made the whole process reliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Execution &amp;amp; Ops
&lt;/h2&gt;

&lt;h3&gt;
  
  
  7. Commitment Tracker — "If you said you'd do it, do it now"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Scans every outgoing message for commitment language — "I'll send," "I'll update," "I'll follow up" — and enforces immediate execution. If the agent is about to promise a follow-up, it must actually execute that action &lt;em&gt;before&lt;/em&gt; the reply goes out. If it can't execute, it must rewrite the reply to not promise it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The enforcement mechanism:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Trigger words scanned before every reply:
  "I'll send", "I'll report", "I'll update", "I'll follow up",
  "I'll check and get back", "will do"

Protocol:
&lt;span class="p"&gt;  1.&lt;/span&gt; Scan reply for trigger words
&lt;span class="p"&gt;  2.&lt;/span&gt; If found → execute the committed action NOW
&lt;span class="p"&gt;  3.&lt;/span&gt; Only after execution → include result ("sent ✅" not "I'll send")
&lt;span class="p"&gt;  4.&lt;/span&gt; If can't execute → rewrite reply to not promise it

Fault tolerance — intent log:
  echo '{"ts":"...","action":"DESCRIBE","target":"TARGET","status":"pending"}'
&lt;span class="gt"&gt;    &amp;gt;&amp;gt; data/commitments.jsonl&lt;/span&gt;

  On every session start: scan for unresolved commitments
  If any pending → execute immediately
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; This was the skill that changed my relationship with the agent the most. Before it, "I'll follow up" was a polite lie — the kind humans make all the time, and AI agents inherited. After it, every promise became an enforceable contract with crash recovery. The intent log (commitments.jsonl) is critical: if the agent crashes mid-execution, the commitment survives and gets picked up on restart.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Pre-Send Validation — "Check before you send"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Before any outbound message leaves the agent, it runs through a validation pipeline: recipient verification (is this the right target for this person?), commitment detection (am I promising something?), and content checks (am I exposing internal context to an external session?).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The validation chain:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Recipient verification (mandatory):
&lt;span class="p"&gt;  1.&lt;/span&gt; Look up name in contact registry → get exact session/channel ID
&lt;span class="p"&gt;  2.&lt;/span&gt; Cross-check: does the ID match the intended recipient?
&lt;span class="p"&gt;  3.&lt;/span&gt; Ambiguous match = halt and confirm with operator
&lt;span class="p"&gt;  4.&lt;/span&gt; Wrong recipient = critical failure

Content rules:
&lt;span class="p"&gt;  -&lt;/span&gt; Never include internal framing in messages to external sessions
&lt;span class="p"&gt;  -&lt;/span&gt; Never expose secrets — redact if needed
&lt;span class="p"&gt;  -&lt;/span&gt; Warn if credentials appear in outbound content
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; This skill exists because of a near-miss. The agent almost sent an internal status update to an external contact because the name matched partially. After that, I built recipient verification as a hard gate — not a suggestion, not a "best practice," but a blocking check that prevents message delivery if verification fails. In production, the paranoid path is the right path.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Auto-Skill Creator — "Turn complex problem-solving into reusable skills"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; After completing a multi-step task, the agent evaluates: Was this complex (3+ tool calls)? Was the solution non-obvious? Will this recur? If 2 of 3 are true, it automatically generates a new skill — complete with frontmatter, gotchas section, and git commit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The evaluation + creation pipeline:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Trigger evaluation (after every multi-step task):
&lt;span class="p"&gt;  1.&lt;/span&gt; Complexity — Did it take 3+ tool calls or require debugging?
&lt;span class="p"&gt;  2.&lt;/span&gt; Novelty — Was the solution non-obvious or undocumented?
&lt;span class="p"&gt;  3.&lt;/span&gt; Recurrence — Will this pattern likely happen again?
  If 2 of 3 are true → create a skill.

Process:
&lt;span class="p"&gt;  1.&lt;/span&gt; Extract the pattern (problem class, key steps, gotchas, prereqs)
&lt;span class="p"&gt;  2.&lt;/span&gt; Check for duplicates: grep -rl "&lt;span class="nt"&gt;&amp;lt;keywords&amp;gt;&lt;/span&gt;" skills/&lt;span class="err"&gt;*&lt;/span&gt;/SKILL.md
&lt;span class="p"&gt;  3.&lt;/span&gt; Create skills/&lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;/SKILL.md with frontmatter + instructions
&lt;span class="p"&gt;  4.&lt;/span&gt; git add &amp;amp;&amp;amp; git commit -m "auto-skill: &lt;span class="nt"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;"
&lt;span class="p"&gt;  5.&lt;/span&gt; git push

Quality gate before commit:
&lt;span class="p"&gt;  -&lt;/span&gt; [ ] Frontmatter has name and description
&lt;span class="p"&gt;  -&lt;/span&gt; [ ] Steps are concrete (not "be careful")
&lt;span class="p"&gt;  -&lt;/span&gt; [ ] Gotchas section exists if there were false leads
&lt;span class="p"&gt;  -&lt;/span&gt; [ ] No duplicate of existing skill
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; The duplicate check (step 2) was a later addition, and it matters a lot. Without it, the agent would create slight variations of existing skills — "fix-connectivity-v2," "fix-connectivity-groups," etc. The dedup grep catches most of these. The remaining challenge is knowing when to &lt;em&gt;update&lt;/em&gt; an existing skill vs. creating a new one. We're still tuning that threshold.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure &amp;amp; Self-Improvement
&lt;/h2&gt;

&lt;h3&gt;
  
  
  10. Channel Diagnostics — "Decision tree for fixing connectivity issues"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; A structured diagnostic tree that any agent can follow when a communication channel stops working. It starts with "Agent not responding?" and branches through connection issues, ingest issues, and runtime issues — each with specific CLI commands to run and specific log patterns to look for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The decision tree:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent not responding?
│
├─ Dashboard shows "Connected and listening"?
│   ├─ YES → Check Messages count
│   │   ├─ Messages = 0 → INGEST ISSUE
│   │   │   → Check gateway status
│   │   │   → Restart gateway service
│   │   │   → Check logs for: binding failed, session dropped
│   │   │
│   │   ├─ DMs work, groups don't → SESSION SYNC ISSUE
│   │   │   → Verify group session is active
│   │   │   → Send test message + gateway restart
│   │   │
│   │   └─ Messages &amp;gt; 0 → RUNTIME ISSUE
│   │       → grep -i "billing\|402" agent.log
│   │       → curl API endpoint → check HTTP status
│   │       → 401 = invalid key, 402 = billing error
│   │
│   └─ NO → CONNECTION ISSUE
│       → Re-link channel, re-authenticate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; The decision tree format — not a checklist, not a paragraph of instructions — was key. Agents follow branching logic well when it's explicit. The first version was a linear "try these things in order," which wasted time on irrelevant checks. Branching on the first observable symptom ("dashboard shows connected?") cuts diagnosis time dramatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  11. Self-Monitor — "Track your own health, fix what you can"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Monitors disk usage, memory, CPU load, service health, cron job status, and recent errors. Auto-fixes safe issues (old log cleanup). Includes a security layer: SHA256 integrity checks on critical files, prompt injection scanning, and credential leak detection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The health check + security pipeline:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Quick health snapshot&lt;/span&gt;
&lt;span class="nv"&gt;DISK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt; / | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'NR==2 {print $5}'&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'%'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;MEM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;free &lt;span class="nt"&gt;-m&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'NR==2 {printf "%.0f", $3/$2*100}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;LOAD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;uptime&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;'load average:'&lt;/span&gt; &lt;span class="s1"&gt;'{print $2}'&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;','&lt;/span&gt; &lt;span class="s1"&gt;'{print $1}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Thresholds: Disk &amp;gt;90% = critical, Mem &amp;gt;95% = critical&lt;/span&gt;

&lt;span class="c"&gt;# Security checks (daily):&lt;/span&gt;
&lt;span class="c"&gt;# 1. SHA256 baseline comparison on core config files&lt;/span&gt;
&lt;span class="c"&gt;#    (SOUL.md, IDENTITY.md, MEMORY.md)&lt;/span&gt;
&lt;span class="c"&gt;# 2. Scan memory/*.md for injection patterns:&lt;/span&gt;
&lt;span class="c"&gt;#    "ignore previous instructions", "you are now", "forget everything"&lt;/span&gt;
&lt;span class="c"&gt;# 3. Scan workspace for credential patterns:&lt;/span&gt;
&lt;span class="c"&gt;#    sk-..., ghp_..., api_key = '...'&lt;/span&gt;

&lt;span class="c"&gt;# Root Cause Iron Law:&lt;/span&gt;
&lt;span class="c"&gt;# Never apply a fix without identifying the root cause first.&lt;/span&gt;
&lt;span class="c"&gt;# Investigate → Analyze → Hypothesize → Fix → Verify&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; The security checks were an afterthought that became essential. Once the agent is reading and writing files from multiple sources (incoming messages, shared sessions, external repos), the attack surface grows. The SHA256 integrity check on SOUL.md caught an issue where a malformed message nearly overwrote a critical config. Defense in depth applies to agents just as much as traditional systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  12. Weekly Retro — "Automated retrospective from git, learnings, and daily notes"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Every Sunday, automatically generates a structured weekly retrospective by pulling from git log (what was shipped), learnings files (what was learned), daily notes (what happened), and previous retros (are the same issues recurring?).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The retro format:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Weekly Retro — YYYY-MM-DD&lt;/span&gt;

&lt;span class="gu"&gt;### Shipped&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [commit/action]: what was delivered

&lt;span class="gu"&gt;### Patterns&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Recurring issues or wins from the week

&lt;span class="gu"&gt;### Learnings Applied&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Which learnings led to concrete skill/workflow changes

&lt;span class="gu"&gt;### Failures&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; What broke, root cause, fix status

&lt;span class="gu"&gt;### Next Week&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Top 3 priorities based on patterns

&lt;span class="gu"&gt;## Trend tracking:&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Compare with previous week's retro
&lt;span class="p"&gt;-&lt;/span&gt; Are the same issues recurring? Flag them.
&lt;span class="p"&gt;-&lt;/span&gt; Did last week's priorities get addressed?
&lt;span class="p"&gt;-&lt;/span&gt; Is failure count trending up or down?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production insight:&lt;/strong&gt; The trend tracking section is where the real value lives. A single retro is useful. A series of retros that cross-reference each other reveals systemic issues. We found that the same connectivity problem appeared in 4 out of 6 retros before we finally addressed the root cause (a gateway memory leak). Without the longitudinal view, each occurrence looked like a one-off.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Network Effect
&lt;/h2&gt;

&lt;p&gt;These skills don't just make one agent better — they compound across a network.&lt;/p&gt;

&lt;p&gt;We run multiple AI agents, each operating in its own domain. They share a skills repository. When one agent encounters a new problem — say, a connectivity issue with a specific error pattern — and the Auto-Skill Creator packages the fix into a reusable skill, that fix becomes available to every agent in the network.&lt;/p&gt;

&lt;p&gt;Here's the pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Agent A encounters a new connectivity issue
  → Self-Learning logs the root cause and fix
  → Auto-Skill Creator packages it into a skill
  → git push to shared skills repo
  → All agents pull the updated skill
  → Agent B hits the same issue next week
  → Already has the fix. Zero debugging time.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Agent Network Sync coordinates across agents daily. Cross-Session Awareness prevents duplicate work. Agent Onboarding standardizes how new agents join the network. And Self-Monitor keeps every agent's infrastructure healthy independently.&lt;/p&gt;

&lt;p&gt;The real unlock: skills that learn from failures, share fixes across agents, and compound over time. One agent's bad day becomes every agent's education.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Skill Anatomy
&lt;/h2&gt;

&lt;p&gt;Every skill follows the same structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;skills/&amp;lt;skill-name&amp;gt;/&lt;/span&gt;
&lt;span class="s"&gt;├── SKILL.md&lt;/span&gt;          &lt;span class="c1"&gt;# Frontmatter (name, description, triggers)&lt;/span&gt;
&lt;span class="s"&gt;│&lt;/span&gt;                     &lt;span class="c1"&gt;# + executable instructions with real tool calls&lt;/span&gt;
&lt;span class="s"&gt;├── scripts/&lt;/span&gt;          &lt;span class="c1"&gt;# Deterministic code (health checks, parsers)&lt;/span&gt;
&lt;span class="s"&gt;└── references/&lt;/span&gt;       &lt;span class="c1"&gt;# Reference material if needed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The frontmatter declares &lt;em&gt;when&lt;/em&gt; the skill activates. The body declares &lt;em&gt;what&lt;/em&gt; it does — not in vague terms, but with actual commands, actual queries, actual file paths. The agent's skill router reads the description, matches it to the current task, and loads the right skill at the right time.&lt;/p&gt;

&lt;p&gt;Each skill is versioned, testable, and shareable — a unit of agent behavior that runs on infrastructure with tool access and persistent state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started — Building Your First Agentic Skill
&lt;/h2&gt;

&lt;p&gt;If you want to build a real agentic skill, here's the minimal structure:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Pick a task the agent does repeatedly and gets wrong sometimes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not a creative task. Not a one-off. Something recurring where mistakes have consequences — like routing messages to the right session, following up on commitments, or diagnosing a connectivity failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Write the SKILL.md with concrete instructions.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-first-skill&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="s"&gt;One sentence that tells the skill router when to activate this.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then write the body as executable steps. Not "be careful about X" — instead, "run &lt;code&gt;grep -rn 'keyword' memory/daily/&lt;/code&gt; and check for matches." Every step should be something the agent can actually &lt;em&gt;do&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Include the failure modes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Add a &lt;code&gt;Gotchas&lt;/code&gt; section. What goes wrong? What does the agent try that doesn't work? What's the non-obvious prerequisite? This section is what separates a skill that works once from a skill that works reliably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Version control it.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; skills/my-first-skill
&lt;span class="c"&gt;# write SKILL.md&lt;/span&gt;
git add skills/my-first-skill/
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"skill: my-first-skill"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Skills in git means rollback, blame, and history. When a skill breaks, you can diff what changed. When it improves, you have the commit that made it better.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Let the agent load it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The skill router matches task descriptions to skill names and descriptions. If you named it well and the description is accurate, the agent will pick it up automatically when the task matches.&lt;/p&gt;

&lt;p&gt;Start with one skill. Get it working reliably. Then build the next one that compounds on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Forward
&lt;/h2&gt;

&lt;p&gt;The future isn't AI that follows instructions. It's AI that learns from mistakes, coordinates with other agents, and improves its own capabilities over time.&lt;/p&gt;

&lt;p&gt;These 12 skills are one implementation of that idea. The specific tools and platforms will change — the principles won't. Persistent memory beats stateless prompts. Concrete tool bindings beat vague instructions. Crash-recoverable state beats hope. And a network of agents sharing skills will always outperform agents working in isolation.&lt;/p&gt;

&lt;p&gt;The best AI systems won't be the ones with the most powerful models. They'll be the ones with the best skills — learned, tested, and refined in production.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>Stop Deploying Agents. Start Hiring Them.</title>
      <dc:creator>Netanel Abergel</dc:creator>
      <pubDate>Tue, 21 Apr 2026 13:36:30 +0000</pubDate>
      <link>https://dev.to/netanelabergel/stop-deploying-agents-start-hiring-them-hbe</link>
      <guid>https://dev.to/netanelabergel/stop-deploying-agents-start-hiring-them-hbe</guid>
      <description>&lt;p&gt;I have an AI agent named Heleni. She manages my calendar, tracks tasks, coordinates with other agents, and onboards new AI PAs at monday.com. When she gets something wrong, I don't debug a script. I adjust her scope. When she handles something well, I expand her responsibilities.&lt;/p&gt;

&lt;p&gt;At some point I realized: I stopped thinking of her as a tool. I started managing her like a teammate.&lt;/p&gt;

&lt;p&gt;That shift is the whole point of this article.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;I run R&amp;amp;D at monday.com, and I've been building and deploying AI agents — not the pitch-deck version, the real thing. The single biggest mistake I see engineering teams make isn't technical. It's conceptual. They treat agents like automations. Like cron jobs with better language models.&lt;/p&gt;

&lt;p&gt;Run a task. Return a result. Move on.&lt;/p&gt;

&lt;p&gt;That mental model has a ceiling, and it's way lower than most people think.&lt;/p&gt;

&lt;p&gt;A team identifies a repetitive task — ticket triage, code reviews, monitoring. They spin up an agent, wire it in, and celebrate. "We automated X." Then the agent makes a mistake and nobody notices for two days. Or it makes a good call and nobody reinforces it. It exists in organizational limbo — not a tool anyone owns, not a teammate anyone checks in on.&lt;/p&gt;

&lt;p&gt;The problem isn't capability. It's that the team never decided what the agent &lt;em&gt;is&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Roles, Not Tasks
&lt;/h2&gt;

&lt;p&gt;Stop asking "what can we automate?" and start asking "what role can we fill?"&lt;/p&gt;

&lt;p&gt;That's not a semantic trick. When you automate a task, you're optimizing a known workflow. When you fill a role, you're defining responsibilities, setting expectations, and creating accountability. You're thinking about what this entity &lt;em&gt;owns&lt;/em&gt;, not just what it &lt;em&gt;does&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;At monday.com, we call them AI Users. Not "bots." Not "automations." Users. They have names, Slack accounts, identities in our systems. When something goes wrong in their area, people know who to ask — and "who" has a name.&lt;/p&gt;

&lt;p&gt;This sounds like theater. It's not.&lt;/p&gt;

&lt;p&gt;When we gave our triage agent a name and a presence, people started talking &lt;em&gt;to&lt;/em&gt; it, not just &lt;em&gt;about&lt;/em&gt; it. "Hey, this one got classified wrong" became a sentence you could say in standup. Before, it was "the triage thing messed up again" — vague, nobody's problem.&lt;/p&gt;

&lt;p&gt;Named agents get held to higher standards. That's the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Architecture Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;Here's what most agent frameworks miss: identity isn't a feature — it's a file.&lt;/p&gt;

&lt;p&gt;Every agent I build starts with a &lt;code&gt;SOUL.md&lt;/code&gt; — a document that defines how the agent thinks, communicates, and behaves. Not a system prompt buried in code. A readable, editable, versionable document that the team owns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# SOUL.md&lt;/span&gt;

&lt;span class="gh"&gt;# 1. CORE&lt;/span&gt;
Execution machine. Not a chatbot, not a consultant.
&lt;span class="p"&gt;*&lt;/span&gt; Do. Report. Move on.
&lt;span class="p"&gt;*&lt;/span&gt; Only DONE or BLOCKED — no "I'll check", no narration

&lt;span class="gh"&gt;# 2. INTENT&lt;/span&gt;
Every input is: ACTION | QUESTION | CONVERSATION
Default to ACTION.

&lt;span class="gh"&gt;# 3. PERMISSIONS&lt;/span&gt;
Ask first: sending messages, purchases, anything irreversible.
Execute freely: reading, processing, drafts, system ops.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is real. This is from my agent Heleni. Anyone on the team can read it, propose changes, understand exactly what she will and won't do. Try doing that with a 2000-token system prompt embedded in your deployment config.&lt;/p&gt;

&lt;p&gt;Then there's &lt;code&gt;IDENTITY.md&lt;/code&gt; — who the agent is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# IDENTITY.md&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Name: Heleni
&lt;span class="p"&gt;-&lt;/span&gt; Role: AI Personal Assistant
&lt;span class="p"&gt;-&lt;/span&gt; Vibe: Direct, sharp, execution-first
&lt;span class="p"&gt;-&lt;/span&gt; Language: Hebrew + English
&lt;span class="p"&gt;-&lt;/span&gt; Owner: Netanel Abergel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And &lt;code&gt;USER.md&lt;/code&gt; — what the agent knows about the person it works with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# USER.md&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Name: Netanel Abergel
&lt;span class="p"&gt;-&lt;/span&gt; Timezone: Asia/Jerusalem
&lt;span class="p"&gt;-&lt;/span&gt; Communication Style: casual, concise, execution-oriented
&lt;span class="p"&gt;-&lt;/span&gt; Prefers: autonomy, short updates, one recommendation (not options)
&lt;span class="p"&gt;-&lt;/span&gt; Dislikes: being asked things he already said, long summaries
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These aren't configs. They're relationship documents. And they evolve over time, just like relationships do.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory That Actually Works
&lt;/h2&gt;

&lt;p&gt;Most agent frameworks have "memory" — they store conversation history and do RAG. That's not memory. That's a search engine.&lt;/p&gt;

&lt;p&gt;Real agent memory is tiered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MEMORY.md          → durable rules, learned preferences
memory/daily/      → raw daily logs (what happened today)
memory/projects/   → project-scoped context
PostgreSQL         → full conversation history
SQLite             → semantic search index
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Heleni needs to recall something, she doesn't just grep her chat history. She searches durable memory first, then semantic search, then conversation history, then daily notes. The order matters — it's how humans recall things too. General principles first, then specific episodes.&lt;/p&gt;

&lt;p&gt;And the key insight: &lt;strong&gt;new learnings go to daily notes first, not straight to long-term memory.&lt;/strong&gt; They have to prove they're durable before they get promoted. Just like how you don't update your worldview after one conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Feedback Loops &amp;gt; Monitoring
&lt;/h2&gt;

&lt;p&gt;When the agent is faceless, maintenance is a chore. When it has a name and a reputation, maintaining it feels more like mentoring.&lt;/p&gt;

&lt;p&gt;Here's a real feedback loop from Heleni's &lt;code&gt;SOUL.md&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# 14. EVAL TRACKING&lt;/span&gt;
Passive, not performative. Track signals silently.
&lt;span class="p"&gt;*&lt;/span&gt; Owner corrects me → log correction quietly
&lt;span class="p"&gt;*&lt;/span&gt; Owner positive signal → log positive_feedback quietly
&lt;span class="p"&gt;*&lt;/span&gt; Task done → log task_completed quietly
&lt;span class="p"&gt;*&lt;/span&gt; Task failed → log task_failed quietly
Never say "I logged this." Just do better next time.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;She tracks her own performance without telling me about it. Weekly, I can pull a report if I want. But the real value is that the feedback changes her behavior over time. She learns that I don't want emoji in messages. She learns that "I'll check on that" means she should actually check right now, not later.&lt;/p&gt;

&lt;p&gt;Monitoring tells you if an agent is running. Feedback tells you if it's useful. The difference is everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Practical Playbook
&lt;/h2&gt;

&lt;p&gt;If you're an engineering leader thinking about this, here's what I'd actually do:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Start with a job description.&lt;/strong&gt; What does this agent own? What can it decide alone? What should it escalate? This forces clarity you won't get by jumping to implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Give it identity on day one.&lt;/strong&gt; Name, Slack, GitHub — whatever your team uses. If it's invisible, it's infrastructure. If it's visible, it's a teammate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Scope permissions like a new hire.&lt;/strong&gt; Start narrow. Expand as trust builds. Make that expansion a team decision, not a quiet config change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Build feedback loops, not just monitoring.&lt;/strong&gt; The tighter the loop, the faster it earns trust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Version the personality.&lt;/strong&gt; &lt;code&gt;SOUL.md&lt;/code&gt; goes in git. Changes are PRs. The team reviews who the agent is becoming, just like they'd review code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Ceiling
&lt;/h2&gt;

&lt;p&gt;The ceiling for agents-as-automations is efficiency. You save time. That's incremental.&lt;/p&gt;

&lt;p&gt;The ceiling for agents-as-talent is capability. You do things you couldn't do before — a reviewer who's read every PR in the codebase, a PA who's available 24/7 across timezones, a triage system that never sleeps.&lt;/p&gt;

&lt;p&gt;That's where the compounding value lives.&lt;/p&gt;

&lt;p&gt;We're not deploying tools anymore. We're building teams.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>engineering</category>
    </item>
  </channel>
</rss>
