Alessandro Pireno

Posted on Mar 4

10 Things That Need a Shell: Where the Filesystem Metaphor Could Fix Agent Interfaces

#ai #agents #devtools #opensource

The Pattern That Worked

I recently shipped DOMShell — an MCP server that maps Chrome's Accessibility Tree to a virtual filesystem. Instead of feeding agents screenshots or raw HTML, it lets them ls, cd, grep, and click their way through web pages.

The result: 2× fewer API calls compared to screenshot-based browsing across controlled testing with Claude (4 tasks, 8 trials). The filesystem metaphor gave the model a spatial map of the page, so it spent less time exploring and more time extracting.

The insight underneath is simple: agents waste most of their cycles on orientation, not action.

The current playbook — pump screenshots into a vision model, dump 50k tokens of raw HTML into the context window, or chain brittle CSS selectors — treats the model as a brute-force parser. It works until it doesn't, and when it fails, it fails silently. You don't get an error. You get a confident wrong answer and a $4 API bill.

When you give agents a navigable, scoped, low-entropy interface instead of a high-entropy dump of raw data, they get dramatically more efficient. Not incrementally — structurally.

That got me thinking: where else are agents hitting the same wall?

The Filesystem Metaphor, Generalized

The pattern has three primitives that LLMs already deeply understand from training data:

Scope (cd) — Narrow your working context to reduce noise
Discover (ls, find, grep) — See what's available within that scope
Act (cat, click, type, call) — Do something with what you found

Every interface below has the same fundamental problem: agents don't have a structured way to scope, discover, and act. They get a flat API or a firehose of data and burn calls trying to orient themselves.

Here are 10 interfaces that could benefit from the shell treatment. I want to build the next one — help me pick which.

1. GraphShell — Knowledge Graph Navigator

The problem: Graph databases (Neo4j, Neptune, TigerGraph) are powerful but agents struggle with schema discovery and pathfinding. Cypher and Gremlin are expressive query languages, but agents don't know what nodes or relationships exist until they explore — and exploration is expensive.

The shell: Nodes become directories. Edges become navigable links with typed relationships. ls shows adjacent nodes with relationship types and properties. cd traverses edges. find --type Person --depth 3 does bounded breadth-first search. path A B computes shortest paths. schema shows the full ontology.

Why it works: Graphs are already hierarchical-ish — they just need scoping. An agent exploring a company knowledge graph could cd Company/Acme → ls --rel EMPLOYS → cd employees/jane → ls --rel MANAGES instead of writing a 6-line Cypher query to get the same traversal. Having spent time building on SurrealDB — a multi-model database with graph capabilities — I watched developers struggle with exactly this: the data was richly connected, but every query required you to already know the shape of the graph. Agents hit the same wall, harder.

The business case: Knowledge graph queries are the backbone of fraud detection, recommendation engines, and supply chain mapping. An agent that can traverse a graph in 4 calls instead of 12 doesn't just save API cost — it makes real-time graph-powered workflows viable for the first time.

2. APIShell — REST/GraphQL Endpoint Navigator

The problem: Agents are terrible at API discovery. They hallucinate endpoints, guess parameter names, and burn calls on 404s. Even with an OpenAPI spec in context, they struggle to chain calls correctly.

The shell: Ingests OpenAPI specs or GraphQL introspection and presents endpoints as a filesystem: /users/, /users/{id}/orders/, /users/{id}/orders/{id}/items/. ls shows available operations and parameters. cat shows schema. call GET /users/123 executes. Agents cd into nested resources and find across the API surface.

Why it works: REST APIs are already hierarchical — resources nest inside resources. The shell makes that nesting navigable instead of requiring the agent to memorize the spec.

The business case: Every agent-powered integration — CRM sync, payment processing, data pipeline orchestration — starts with API discovery. Cut the discovery overhead and you cut the cost of every downstream automation.

3. K8Shell — Kubernetes Cluster Navigator

The problem: kubectl output is verbose, unstructured, and agents constantly run the wrong command or parse output incorrectly. Getting the status of a single deployment requires chaining 3-4 commands.

The shell: Namespaces are top-level directories. Resource types are subdirectories. Individual resources are files. cd production/deployments/api-server then cat shows status, replicas, image version. logs streams container output. find --status CrashLoopBackOff across a whole cluster in one call.

Why it works: Kubernetes already has a natural hierarchy (cluster → namespace → resource type → resource → containers). It's just not exposed as navigable.

The business case: MTTR is the metric that matters. An agent that can find a crashing pod, pull its logs, and identify the root cause in 3 calls instead of 10 turns a 20-minute incident into a 2-minute one. For a startup running $50k/month in compute, finding zombie resources alone could pay for the tooling.

4. CloudShell — AWS/GCP/Azure Resource Navigator

The problem: Cloud consoles are the worst agent interface — hundreds of services, thousands of resources, nested in regions and accounts. An agent needs 10+ API calls just to orient itself in an AWS account.

The shell: /us-east-1/ec2/instances/, /us-east-1/rds/databases/, /global/iam/roles/. ls shows resources with key metadata inline. find --type security-group --port 22 finds open SSH across every region. tree shows the blast radius of a VPC.

Why it works: Cloud resources have natural hierarchy (region → service → resource type → resource) but every cloud provider's API requires you to specify the scope upfront rather than navigate to it.

The business case: Security audits, cost optimization, and compliance checks all require cross-account, cross-region visibility. An agent that can find --type security-group --port 22 across your entire AWS org in one call replaces a week of manual audit work.

5. GitShell — Repository History Navigator

The problem: Git's CLI is powerful but output is unstructured text. Agents struggle with log parsing, diff interpretation, and blame navigation. Merge conflict resolution — where agents currently fail catastrophically — is a chain of poorly-structured interactions.

The shell: Branches are directories. Commits are navigable nodes. cd main/HEAD~5 puts you at a point in time. diff shows changes in structured format. blame function_name traces authorship. find --author=alex --since=2w --path=src/ replaces gnarly git log incantations.

Why it works: Git history is a DAG — it's already a graph structure. The shell linearizes it into something navigable.

The business case: AI-assisted code review and automated PR summaries are already shipping. The bottleneck is agent comprehension of change context — not just what changed, but why and who else touched it. A navigable git history makes those workflows reliable instead of approximate.

6. DataShell — Database Schema & Query Navigator

The problem: Agents writing SQL against unfamiliar databases waste most of their calls on INFORMATION_SCHEMA queries and DESCRIBE TABLE to understand what they're working with. They build a mental model one table at a time.

The shell: Schemas are directories. Tables are subdirectories. Columns are files. cd analytics/fact_orders then ls shows columns with types, nullability, foreign keys. sample 10 shows real data. stats revenue shows distribution. query "SELECT ..." executes.

Why it works: Databases already have three-level hierarchy (schema → table → column). The shell makes it navigable and adds introspection primitives that agents currently cobble together from metadata queries. At Snowflake, I watched analysts spend 30% of their time just figuring out which tables and columns existed before they could write a single query. Agents do the same thing, except they burn tokens instead of hours.

The business case: Text-to-SQL is a $2B+ market growing fast. The accuracy ceiling today is schema comprehension — not query generation. Fix the agent's understanding of the database and the SQL practically writes itself.

7. LogShell — Observability Stack Navigator

The problem: Log exploration (Datadog, Splunk, ELK) is one of the hardest agent tasks. The data is temporal, high-volume, and agents don't know which filters to apply until they see what's there.

The shell: Services are directories. Time ranges are navigable: cd api-server/last-1h. grep ERROR searches within scope. find --level=error --service=payments --since=30m replaces complex query syntax. trace request-id-xyz follows a distributed trace across services. stats shows error rate trends.

Why it works: Observability data has natural hierarchy (service → time range → severity → individual events) but every platform exposes it as a query builder instead of a navigable space.

The business case: On-call engineers spend most of their incident response time on triage, not resolution. An agent that can scope, search, and trace in structured commands turns a 45-minute triage into a 5-minute one — and makes 3am pages survivable.

8. DocShell — Large Document Navigator

The problem: PDFs, legal contracts, SEC filings — anything longer than a context window. Agents currently get truncated chunks and lose spatial awareness of where they are in the document.

The shell: Sections and headings are directories. Paragraphs are files. cd "Article 7/Indemnification" scopes to a section. find --type definition locates defined terms. diff v1.docx v2.docx shows redlines structurally. xref "Force Majeure" finds every cross-reference.

Why it works: Documents already have hierarchical structure (TOC, headings, sections). The shell makes that structure navigable instead of forcing agents to process content linearly.

The business case: Legal review, due diligence, and regulatory compliance all involve agents processing documents that exceed context windows. A navigable document structure means the agent can answer "what does the indemnification clause say?" without ingesting 200 pages.

9. MailShell — Email Thread Navigator

The problem: Email is deceptively hard for agents. Threading, attachments, reply chains, CC dynamics, and the social graph embedded in headers. IMAP and Gmail APIs return flat lists that lose conversational structure.

The shell: Inbox is a directory. Threads are subdirectories. Messages are files. cd thread-xyz enters a conversation. ls --from=nick@connectifi.co --since=1w filters. attachments lists files across a thread. participants shows the social graph. find --has-attachment --unread replaces complex query syntax.

Why it works: Email is already hierarchical (account → folder → thread → message → parts) but no API exposes it that way.

The business case: AI email assistants are everywhere, but they're all working against flat APIs. A shell that preserves thread structure and social context would make "summarize this thread" and "draft a reply" dramatically more reliable — and stop agents from replying to the wrong person in the chain.

10. MeetingShell — Calendar & Transcript Navigator

The problem: Meeting transcripts, action items, and calendar context are scattered across Zoom, Google Meet, Notion, and calendar apps. No single interface connects "what was discussed" with "what was decided" and "who committed to what."

The shell: ls today shows your schedule. cd standup-2026-03-02 enters a meeting's context. transcript shows what was said. actions lists commitments. attendees shows who was there. find --action-owner=me --status=open across all meetings finds outstanding commitments. grep "pricing decision" --since=1m searches all recent transcripts.

Why it works: Meetings have natural hierarchy (calendar → event → transcript → segments → action items) but the data is siloed across 4-5 apps.

The business case: The average knowledge worker spends 31 hours per month in meetings. The institutional memory from those meetings evaporates within days. An agent that can search across all your meeting history and surface outstanding commitments turns meetings from a time sink into a queryable knowledge base.

Which One Should I Build Next?

I'm going to build one of these. Here's how you can help:

🗳️ Vote for your top pick →

Or better yet:

Build one yourself. DOMShell is open source and the MCP server pattern is reusable. Fork it, swap the Chrome extension for a different data source, and the shell primitives carry over.
Tell me I'm wrong. Maybe there's an 11th interface I haven't thought of that's even more broken for agents.
Tell me you've already built it. I'd love to see prior art.

Drop a comment, vote, or reach out. The filesystem metaphor is a 50-year-old idea — it's just taken us this long to realize it's the right abstraction for AI agents too.

Links:

DOMShell: github.com/apireno/DOMShell
npm: npx @apireno/domshell
Full benchmark data: DOMShell vs CiC experiment

Built by Pireno. I do fractional CTO/CPO work helping teams ship AI-native products — if your agents are hitting an orientation wall on a specific stack, let's figure out which shell would unlock the most value.

Top comments (1)

Alessandro Pireno • Mar 4

For anyone who wants to go deeper on the GraphShell or DataShell ideas — I've spent time at Snowflake and building on SurrealDB, and the schema discovery problem is real whether it's a human analyst or an AI agent doing the exploring. The filesystem metaphor isn't theoretical — DOMShell already proved it cuts API calls in half for browser automation.

If you've built something similar for any of these interfaces, I'd love to compare notes. And if you have a strong opinion on which one I should tackle next, vote here: strawpoll.com/YVyPvbVa2gN