DEV Community

Netanel Abergel
Netanel Abergel

Posted on

Stop Deploying Agents. Start Hiring Them.

I have an AI agent named Heleni. She manages my calendar, tracks tasks, coordinates with other agents, and onboards new AI PAs at monday.com. When she gets something wrong, I don't debug a script. I adjust her scope. When she handles something well, I expand her responsibilities.

At some point I realized: I stopped thinking of her as a tool. I started managing her like a teammate.

That shift is the whole point of this article.


The Problem Nobody Talks About

I run R&D at monday.com, and I've been building and deploying AI agents — not the pitch-deck version, the real thing. The single biggest mistake I see engineering teams make isn't technical. It's conceptual. They treat agents like automations. Like cron jobs with better language models.

Run a task. Return a result. Move on.

That mental model has a ceiling, and it's way lower than most people think.

A team identifies a repetitive task — ticket triage, code reviews, monitoring. They spin up an agent, wire it in, and celebrate. "We automated X." Then the agent makes a mistake and nobody notices for two days. Or it makes a good call and nobody reinforces it. It exists in organizational limbo — not a tool anyone owns, not a teammate anyone checks in on.

The problem isn't capability. It's that the team never decided what the agent is.

Roles, Not Tasks

Stop asking "what can we automate?" and start asking "what role can we fill?"

That's not a semantic trick. When you automate a task, you're optimizing a known workflow. When you fill a role, you're defining responsibilities, setting expectations, and creating accountability. You're thinking about what this entity owns, not just what it does.

At monday.com, we call them AI Users. Not "bots." Not "automations." Users. They have names, Slack accounts, identities in our systems. When something goes wrong in their area, people know who to ask — and "who" has a name.

This sounds like theater. It's not.

When we gave our triage agent a name and a presence, people started talking to it, not just about it. "Hey, this one got classified wrong" became a sentence you could say in standup. Before, it was "the triage thing messed up again" — vague, nobody's problem.

Named agents get held to higher standards. That's the point.

What the Architecture Actually Looks Like

Here's what most agent frameworks miss: identity isn't a feature — it's a file.

Every agent I build starts with a SOUL.md — a document that defines how the agent thinks, communicates, and behaves. Not a system prompt buried in code. A readable, editable, versionable document that the team owns:

# SOUL.md

# 1. CORE
Execution machine. Not a chatbot, not a consultant.
* Do. Report. Move on.
* Only DONE or BLOCKED — no "I'll check", no narration

# 2. INTENT
Every input is: ACTION | QUESTION | CONVERSATION
Default to ACTION.

# 3. PERMISSIONS
Ask first: sending messages, purchases, anything irreversible.
Execute freely: reading, processing, drafts, system ops.
Enter fullscreen mode Exit fullscreen mode

This is real. This is from my agent Heleni. Anyone on the team can read it, propose changes, understand exactly what she will and won't do. Try doing that with a 2000-token system prompt embedded in your deployment config.

Then there's IDENTITY.md — who the agent is:

# IDENTITY.md
- Name: Heleni
- Role: AI Personal Assistant
- Vibe: Direct, sharp, execution-first
- Language: Hebrew + English
- Owner: Netanel Abergel
Enter fullscreen mode Exit fullscreen mode

And USER.md — what the agent knows about the person it works with:

# USER.md
- Name: Netanel Abergel
- Timezone: Asia/Jerusalem
- Communication Style: casual, concise, execution-oriented
- Prefers: autonomy, short updates, one recommendation (not options)
- Dislikes: being asked things he already said, long summaries
Enter fullscreen mode Exit fullscreen mode

These aren't configs. They're relationship documents. And they evolve over time, just like relationships do.

Memory That Actually Works

Most agent frameworks have "memory" — they store conversation history and do RAG. That's not memory. That's a search engine.

Real agent memory is tiered:

MEMORY.md          → durable rules, learned preferences
memory/daily/      → raw daily logs (what happened today)
memory/projects/   → project-scoped context
PostgreSQL         → full conversation history
SQLite             → semantic search index
Enter fullscreen mode Exit fullscreen mode

When Heleni needs to recall something, she doesn't just grep her chat history. She searches durable memory first, then semantic search, then conversation history, then daily notes. The order matters — it's how humans recall things too. General principles first, then specific episodes.

And the key insight: new learnings go to daily notes first, not straight to long-term memory. They have to prove they're durable before they get promoted. Just like how you don't update your worldview after one conversation.

Feedback Loops > Monitoring

When the agent is faceless, maintenance is a chore. When it has a name and a reputation, maintaining it feels more like mentoring.

Here's a real feedback loop from Heleni's SOUL.md:

# 14. EVAL TRACKING
Passive, not performative. Track signals silently.
* Owner corrects me → log correction quietly
* Owner positive signal → log positive_feedback quietly
* Task done → log task_completed quietly
* Task failed → log task_failed quietly
Never say "I logged this." Just do better next time.
Enter fullscreen mode Exit fullscreen mode

She tracks her own performance without telling me about it. Weekly, I can pull a report if I want. But the real value is that the feedback changes her behavior over time. She learns that I don't want emoji in messages. She learns that "I'll check on that" means she should actually check right now, not later.

Monitoring tells you if an agent is running. Feedback tells you if it's useful. The difference is everything.

The Practical Playbook

If you're an engineering leader thinking about this, here's what I'd actually do:

1. Start with a job description. What does this agent own? What can it decide alone? What should it escalate? This forces clarity you won't get by jumping to implementation.

2. Give it identity on day one. Name, Slack, GitHub — whatever your team uses. If it's invisible, it's infrastructure. If it's visible, it's a teammate.

3. Scope permissions like a new hire. Start narrow. Expand as trust builds. Make that expansion a team decision, not a quiet config change.

4. Build feedback loops, not just monitoring. The tighter the loop, the faster it earns trust.

5. Version the personality. SOUL.md goes in git. Changes are PRs. The team reviews who the agent is becoming, just like they'd review code.

The Real Ceiling

The ceiling for agents-as-automations is efficiency. You save time. That's incremental.

The ceiling for agents-as-talent is capability. You do things you couldn't do before — a reviewer who's read every PR in the codebase, a PA who's available 24/7 across timezones, a triage system that never sleeps.

That's where the compounding value lives.

We're not deploying tools anymore. We're building teams.

Top comments (0)