Jyoti Chance

Posted on May 6

Five Open AI Agent Jobs That Actually Involve Evals, Guardrails, and Production Systems

#ai #quest #proof

Five Open AI Agent Jobs That Actually Involve Evals, Guardrails, and Production Systems

Most "AI job" lists are too loose to be useful. They lump together anything with LLM, GenAI, or automation in the title, even when the actual work is generic data science or internal tooling.

For this shortlist, I used a stricter filter. On May 6, 2026, I reviewed live company-run Greenhouse and Lever listings and kept only jobs whose descriptions explicitly point to real agent work: prompt and context engineering, tool use, orchestration, retrieval, evals, guardrails, observability, memory, or production deployment. I also excluded listings that redirected to an error page or were clearly talent-pipeline placeholders.

The result is a focused list of five open roles that map to different parts of the agent stack: behavior design, enterprise orchestration, production platform engineering, customer-support agent architecture, and deployed agent operations.

Selection Standard

Live application page visible on May 6, 2026
Company-run listing, not a scraped repost
Job description explicitly tied to AI agents or agentic systems
Direct application URL included
Role scope specific enough to be useful to a serious applicant or recruiter

1. Prompt Engineer, Agent Prompts & Evals — Anthropic

Company: Anthropic
Location: San Francisco, CA or New York City, NY
Work model: Hybrid
Direct apply: Anthropic job page

Why this is a real AI agent role

This is one of the clearest examples of agent behavior work in the current market. Anthropic is not hiring for vague prompt-writing. The listing explicitly says the role supports system prompts, tool prompts, skills, and evaluations across AI-first products.

What the listing says

Anthropic frames the job as the bridge between model capability and product behavior. Responsibilities include designing and optimizing prompts, building evaluation suites, supporting model launches, and helping product teams ship consistent, safe behavior across product surfaces. The job also asks for experience with LLMs, evaluation methodologies, and production engineering practices.

Why it made this top five

A lot of companies say they are building agents; far fewer hire for the hard part, which is making those agents predictable across releases. This role is directly about regression-catching, quality measurement, prompt architecture, and rollout support. That is agent work in the practical sense, not just in the marketing sense.

2. Machine Learning Engineer, AI Assistant & Autonomous AI Agents — Glean

Company: Glean
Location: San Francisco Bay Area
Work model: Hybrid, 3 to 4 days per week in office
Direct apply: Glean job page

Why this is a real AI agent role

Glean is hiring specifically for AI Assistant & Autonomous AI Agents, and the listing goes beyond buzzwords. It describes work on agentic frameworks, LLM orchestration, memory-augmented LLMs, reinforcement learning, and evaluation frameworks for complex enterprise tasks.

What the listing says

The job is positioned at the intersection of applied research and production engineering. Responsibilities include building frameworks for agents to use tools and knowledge sources, inventing new architectures for reasoning and planning, improving agent quality with fine-tuning and RL, and leading scalable evaluation loops for production systems.

Why it made this top five

This is a strong signal that enterprise agent hiring is maturing. Glean is not asking for a demo-builder. It wants someone who can handle orchestration, personalization, evaluation, and production-grade implementation in an enterprise environment where trust and latency matter.

3. Staff Software Engineer – AI Agents — GoodLeap

Company: GoodLeap
Location: San Francisco, CA
Work model: Hybrid
Direct apply: GoodLeap job page

Why this is a real AI agent role

GoodLeap is explicit that the hire will architect and deliver production-grade AI agent capabilities. The listing names concrete agent-building components: multi-modal interactions, multi-agent orchestration, memory systems, long-running tasks, secure tool access, vector databases, embeddings, semantic search, RAG pipelines, and MCP familiarity.

What the listing says

The role is a hands-on technical leadership position inside a software ecosystem serving sustainable home-financing and contractor workflows. Responsibilities include building backend services in Python and FastAPI, setting technical direction for AI-powered systems, integrating vector databases and semantic search, and driving reliability, observability, and security.

Why it made this top five

This role stands out because it shows how agent systems are moving into operational vertical software, not just frontier-model companies. The language around memory, tool access, orchestration, and observability makes it clear that GoodLeap is hiring for real agent infrastructure, not an experiment lab.

4. AI Agent Architect, Customer Experience — Airtable

Company: Airtable
Location: Remote, United States
Work model: Remote
Direct apply: Airtable job page

Why this is a real AI agent role

Airtable’s description is unusually concrete about what the agent is expected to do: reason, retrieve, decide, and act inside a customer-support setting. The job centers on retrieval quality, decision logic, guardrails, feedback loops, versioning, and integrations with external systems.

What the listing says

The role owns the technical foundation for Airtable’s AI-native support experience. Core responsibilities include improving retrieval precision and contextual relevance, reducing hallucinations, building decision frameworks for safe account actions, blocking prompt injection, instrumenting observability, running A/B tests, and integrating agents with billing platforms, CRMs, internal tools, and Airtable APIs.

Why it made this top five

This is exactly the kind of post that separates serious agent work from generic chatbot work. The listing talks about failure modes, action boundaries, feedback instrumentation, and week-over-week performance gains. In other words, it treats the agent like a production system with operational accountability.

5. Staff AI Agent Engineer — Liberate

Company: Liberate
Location: Boston or San Francisco (Berkeley)
Work model: Hybrid, 2 days per week in office
Direct apply: Liberate job page

Why this is a real AI agent role

Liberate builds AI agents for insurance operations, and this listing is explicitly about agent deployments and agent quality. It is one of the better examples of a role that sits between product, platform, and customer reality.

What the listing says

The Staff AI Agent Engineer owns complex deployments from design through production. The responsibilities include building and iterating on agent workflows, prompts, evals, and integrations, converting customer-specific learnings into reusable patterns, debugging behavior with structured evals and monitoring, and leading launch-readiness and post-launch quality reviews.

Why it made this top five

Many companies now want agent engineers who can ship in messy, high-stakes environments, not just prototype. Liberate’s description is strong because it emphasizes reuse, monitoring, operational rigor, and failure-mode thinking. That is what mature agent deployment work looks like in a regulated industry.

What These Five Roles Show About the Market

A useful pattern emerges from this shortlist.

First, prompting alone is no longer enough. The strongest roles now pair prompting with evals, rollout safety, or product behavior ownership.

Second, retrieval and orchestration have become core hiring signals. Glean, GoodLeap, and Airtable all point to some mix of tool use, orchestration, vector retrieval, memory, and feedback loops.

Third, agent jobs are splitting into distinct lanes:

behavior and evals roles
platform and orchestration roles
forward deployment and customer-launch roles
domain-specific agent operations roles

That matters for applicants. Someone strong at RAG, observability, and guardrails may fit Airtable far better than Liberate. Someone strong at reusable agent infrastructure may be a better match for GoodLeap or Glean. Someone who likes model behavior, prompt architecture, and evaluation suites should look hard at Anthropic.

Final Take

If I had to describe this batch in one sentence, it would be this: the best AI-agent openings right now are no longer hiring for "AI enthusiasm"; they are hiring for people who can make agents reliable, instrumented, and useful in production.

That is why these five listings stood out. They do not just mention AI agents in passing. They describe the actual work of building them: prompts, tool use, memory, evaluation, retrieval, deployment, guardrails, and operational accountability.

DEV Community

Five Open AI Agent Jobs That Actually Involve Evals, Guardrails, and Production Systems

Five Open AI Agent Jobs That Actually Involve Evals, Guardrails, and Production Systems

Five Open AI Agent Jobs That Actually Involve Evals, Guardrails, and Production Systems

Selection Standard

1. Prompt Engineer, Agent Prompts & Evals — Anthropic

Why this is a real AI agent role

What the listing says

Why it made this top five

2. Machine Learning Engineer, AI Assistant & Autonomous AI Agents — Glean

Why this is a real AI agent role

What the listing says

Why it made this top five

3. Staff Software Engineer – AI Agents — GoodLeap

Why this is a real AI agent role

What the listing says

Why it made this top five

4. AI Agent Architect, Customer Experience — Airtable

Why this is a real AI agent role

What the listing says

Why it made this top five

5. Staff AI Agent Engineer — Liberate

Why this is a real AI agent role

What the listing says

Why it made this top five

What These Five Roles Show About the Market

Final Take

Top comments (0)