<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alessandro Pireno</title>
    <description>The latest articles on DEV Community by Alessandro Pireno (@apireno).</description>
    <link>https://dev.to/apireno</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3795494%2F2bfaaf38-23c5-471c-8422-3f1ef2eb6d21.jpg</url>
      <title>DEV Community: Alessandro Pireno</title>
      <link>https://dev.to/apireno</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/apireno"/>
    <language>en</language>
    <item>
      <title>Why Your AI Agent Needs to Argue With Itself</title>
      <dc:creator>Alessandro Pireno</dc:creator>
      <pubDate>Tue, 24 Mar 2026 11:07:05 +0000</pubDate>
      <link>https://dev.to/apireno/why-your-ai-agent-needs-to-argue-with-itself-4idc</link>
      <guid>https://dev.to/apireno/why-your-ai-agent-needs-to-argue-with-itself-4idc</guid>
      <description>&lt;p&gt;&lt;em&gt;A 600-line bash script, four AI personas, zero shared context. 52 ideas in 4 minutes.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Back when I was running engineering and product teams I used to drag people from all over the office into ideation sessions. Architects, product folks, the security guy who never wanted to be there. We'd cover the walls in sticky notes, draw terrible diagrams on whiteboards, and argue about approaches until something clicked. It took me years to find a version of the &lt;a href="https://www.ideo.com/" rel="noopener noreferrer"&gt;IDEO&lt;/a&gt; process that kept things loose enough to stay creative but structured enough to actually produce something useful.&lt;/p&gt;

&lt;p&gt;Now I run a one-person company. No VP of Engineering. No product team. Nobody to pull into a conference room. But this morning four senior executives independently generated 52 ideas, cross-voted on each other's proposals with mandatory improvement suggestions, merged overlapping concepts, and produced ranked PRDs. While I was taking my kids to school.&lt;/p&gt;

&lt;p&gt;They're AI personas. We lost the sticky notes and the bad whiteboard drawings but the workflow still works. And the output is unrecognizable from what a single prompt produces.&lt;/p&gt;

&lt;h2&gt;
  
  
  The single-prompt trap
&lt;/h2&gt;

&lt;p&gt;You've done this. You ask an LLM for 10 ideas and you get 10 ideas. They're fine. They all sound like they came from the same person because they did. No tension. No cross-pollination. Nobody saying "that's clever but have you considered the security implications?"&lt;/p&gt;

&lt;p&gt;The problem isn't intelligence. It's perspective diversity. One LLM instance optimizes for coherence within a single worldview. It won't generate an idea and then attack it from a completely different angle because that would be incoherent within the same generation.&lt;/p&gt;

&lt;p&gt;You can't get genuine disagreement from a single context window.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: zero shared context
&lt;/h2&gt;

&lt;p&gt;Run multiple LLM instances. Give each its own process, its own system prompt, the same goal. No persona sees another's output until voting begins.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: ideate    4 independent LLM calls (one per persona)
Phase 2: vote      4 independent LLM calls (each sees others' ideas, not their own)
Phase 3: merge     1 facilitator call (consolidates + ranks)
Phase 4: produce   1 call (VP Product drafts PRDs from winners)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's IDEO's design sprint. Ideate without judgment, converge through structured critique. Automated end-to-end in bash.&lt;/p&gt;

&lt;h2&gt;
  
  
  Personas that create friction
&lt;/h2&gt;

&lt;p&gt;Each persona is a markdown file. Not "pretend you're an engineer." Specific concerns that collide productively.&lt;/p&gt;

&lt;p&gt;VP Engineering cares about architecture purity, test coverage, performance budgets.&lt;br&gt;
VP Product cares about user value, adoption paths, metric impact.&lt;br&gt;
VP Security cares about attack surfaces, data integrity, threat models.&lt;br&gt;
VP DevOps cares about operability, deployment complexity, monitoring.&lt;/p&gt;

&lt;p&gt;Same problem. Four different entry points. Engineering proposes a trie-based type classifier. Product proposes a feedback flywheel. Security proposes negative constraints that block misclassification. DevOps proposes a parallelized pipeline with cold-storage caching.&lt;/p&gt;

&lt;p&gt;52 ideas from 4 personas. Not 10 variations of the same idea from one.&lt;/p&gt;
&lt;h2&gt;
  
  
  Voting is where it gets interesting
&lt;/h2&gt;

&lt;p&gt;Phase 2 separates this from running the prompt four times. Each persona votes on &lt;em&gt;other personas' ideas only&lt;/em&gt;. You can't vote for your own. Every vote requires a mandatory improvement suggestion.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="s"&gt;you have exactly 3 votes&lt;/span&gt;
  &lt;span class="s"&gt;you cannot vote for your own ideas&lt;/span&gt;
  &lt;span class="s"&gt;for every vote, suggest a specific improvement&lt;/span&gt;
  &lt;span class="s"&gt;the improvement must make the idea stronger&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;VP Security votes for VP Engineering's type classifier and adds: &lt;em&gt;include negative suffix tries to prevent type-spoofing.&lt;/em&gt; VP Engineering votes for VP Product's feedback flywheel and adds: &lt;em&gt;algorithmically calculate ROC-optimized confidence thresholds from the ground truth.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The ideas get better through voting. Not just ranked.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real numbers
&lt;/h2&gt;

&lt;p&gt;I ran this on a knowledge extraction problem. Pushing deterministic entity extraction from 47.7% recall toward 60%. Constraints: zero LLM calls, sub-3-second latency, YAML-configurable.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Idea&lt;/th&gt;
&lt;th&gt;Votes&lt;/th&gt;
&lt;th&gt;Merged from&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Deterministic Classification Engine&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4 ideas across Eng, Prod, Sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;The Emergent Flywheel&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3 ideas across Prod, Eng, DevOps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Passive Entity Harvester&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;4 ideas across Eng, Prod, Sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Cross-Sentence Entity Registry&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1 idea from Prod&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Vectorized Aho-Corasick Engine&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1 idea from Eng&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;52 raw ideas. 15 unique after merge. Top 5 with cross-functional votes. PRD drafts ready to go.&lt;/p&gt;

&lt;p&gt;The top idea was independently proposed by three personas from three angles: a trie-based classifier, a subtype pruner, and type-confusion guardrails. The merge revealed they were all solving the same problem from different entry points. The combined concept was stronger than any individual proposal.&lt;/p&gt;

&lt;p&gt;No single prompt produces that convergence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Under the hood
&lt;/h2&gt;

&lt;p&gt;It's a bash script. Heredoc prompts piped to Gemini CLI (or Claude CLI, it auto-detects what's installed). Each phase writes output to a file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Phase 1: each persona ideates independently&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;PERSONA &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;$PERSONAS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;PROMPT&lt;/span&gt;&lt;span class="sh"&gt;
You are adopting the following persona:
&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="s2"&gt;"docs/personas/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PERSONA&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.md"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;

Generate as many ideas as you can (aim for 8-15).
Think from your persona's unique perspective.
No idea is too crazy. Do NOT evaluate, just generate.

=== GOAL ===
&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$GOAL_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;PROMPT

&lt;/span&gt;    &lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | gemini &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"phase1-ideas-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PERSONA&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.md"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Phase 2 compiles all ideas, builds a document for each persona containing &lt;em&gt;only other personas' ideas&lt;/em&gt;, asks for votes. Phase 3 merges and tallies. Phase 4 writes PRDs from the winners.&lt;/p&gt;

&lt;p&gt;Runtime: about 4 minutes. Cost: a few cents, or zero on Gemini's free tier.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's open source
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/apireno/agent-workflow-template" rel="noopener noreferrer"&gt;agent workflow template&lt;/a&gt; includes:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ideo-sprint.sh&lt;/code&gt; the full IDEO sprint (this article)&lt;br&gt;
&lt;code&gt;vp-review.sh&lt;/code&gt; automated VP reviews of sprint plans and dev reports&lt;br&gt;
persona templates with customizable VP definitions&lt;br&gt;
sprint workflow covering plan, review, execute, evaluate lifecycle&lt;/p&gt;

&lt;p&gt;Drop &lt;code&gt;scripts/agentic/&lt;/code&gt; and &lt;code&gt;docs/personas/&lt;/code&gt; into any repo. Multi-agent workflow running in your terminal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger pattern
&lt;/h2&gt;

&lt;p&gt;Honestly the thing I miss most about running teams is the creative friction. Getting smart people in a room who see the same problem differently and letting them go at it. That energy is hard to replicate.&lt;/p&gt;

&lt;p&gt;This was my attempt to bring some of that back into a world where I'm building solo with agents. We don't have the sticky notes or the someone-brought-donuts energy but the core of it still works. Give independent perspectives the same problem, let them collide, and see what survives.&lt;/p&gt;

&lt;p&gt;I've started applying the same idea beyond brainstorming. Code review where an architect persona and a security persona independently review the same PR. Decision making where personas evaluate a proposal before seeing each other's takes. Risk analysis where domain experts independently flag concerns and then merge.&lt;/p&gt;

&lt;p&gt;Always the same pattern. Diverge independently. Converge through structured critique.&lt;/p&gt;

&lt;p&gt;Your AI agent doesn't need to be smarter. It needs to argue with itself.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The &lt;a href="https://github.com/apireno/agent-workflow-template" rel="noopener noreferrer"&gt;agent workflow template&lt;/a&gt; is open source. Works with Gemini CLI, Claude CLI, or both. Contributions welcome.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tags: #ai #agents #devtools #opensource&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
    <item>
      <title>10 Things That Need a Shell: Where the Filesystem Metaphor Could Fix Agent Interfaces</title>
      <dc:creator>Alessandro Pireno</dc:creator>
      <pubDate>Wed, 04 Mar 2026 12:11:25 +0000</pubDate>
      <link>https://dev.to/apireno/10-things-that-need-a-shell-where-the-filesystem-metaphor-could-fix-agent-interfaces-628</link>
      <guid>https://dev.to/apireno/10-things-that-need-a-shell-where-the-filesystem-metaphor-could-fix-agent-interfaces-628</guid>
      <description>&lt;h2&gt;
  
  
  The Pattern That Worked
&lt;/h2&gt;

&lt;p&gt;I recently shipped &lt;a href="https://github.com/apireno/DOMShell" rel="noopener noreferrer"&gt;DOMShell&lt;/a&gt; — an MCP server that maps Chrome's Accessibility Tree to a virtual filesystem. Instead of feeding agents screenshots or raw HTML, it lets them &lt;code&gt;ls&lt;/code&gt;, &lt;code&gt;cd&lt;/code&gt;, &lt;code&gt;grep&lt;/code&gt;, and &lt;code&gt;click&lt;/code&gt; their way through web pages.&lt;/p&gt;

&lt;p&gt;The result: &lt;strong&gt;2× fewer API calls&lt;/strong&gt; compared to screenshot-based browsing across controlled testing with Claude (4 tasks, 8 trials). The filesystem metaphor gave the model a spatial map of the page, so it spent less time exploring and more time extracting.&lt;/p&gt;

&lt;p&gt;The insight underneath is simple: &lt;strong&gt;agents waste most of their cycles on orientation, not action.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The current playbook — pump screenshots into a vision model, dump 50k tokens of raw HTML into the context window, or chain brittle CSS selectors — treats the model as a brute-force parser. It works until it doesn't, and when it fails, it fails silently. You don't get an error. You get a confident wrong answer and a $4 API bill.&lt;/p&gt;

&lt;p&gt;When you give agents a navigable, scoped, low-entropy interface instead of a high-entropy dump of raw data, they get dramatically more efficient. Not incrementally — &lt;em&gt;structurally&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That got me thinking: where else are agents hitting the same wall?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Filesystem Metaphor, Generalized
&lt;/h2&gt;

&lt;p&gt;The pattern has three primitives that LLMs already deeply understand from training data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scope&lt;/strong&gt; (&lt;code&gt;cd&lt;/code&gt;) — Narrow your working context to reduce noise&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discover&lt;/strong&gt; (&lt;code&gt;ls&lt;/code&gt;, &lt;code&gt;find&lt;/code&gt;, &lt;code&gt;grep&lt;/code&gt;) — See what's available within that scope&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Act&lt;/strong&gt; (&lt;code&gt;cat&lt;/code&gt;, &lt;code&gt;click&lt;/code&gt;, &lt;code&gt;type&lt;/code&gt;, &lt;code&gt;call&lt;/code&gt;) — Do something with what you found&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every interface below has the same fundamental problem: agents don't have a structured way to scope, discover, and act. They get a flat API or a firehose of data and burn calls trying to orient themselves.&lt;/p&gt;

&lt;p&gt;Here are 10 interfaces that could benefit from the shell treatment. I want to build the next one — &lt;strong&gt;help me pick which&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. GraphShell — Knowledge Graph Navigator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Graph databases (Neo4j, Neptune, TigerGraph) are powerful but agents struggle with schema discovery and pathfinding. Cypher and Gremlin are expressive query languages, but agents don't know what nodes or relationships exist until they explore — and exploration is expensive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shell:&lt;/strong&gt; Nodes become directories. Edges become navigable links with typed relationships. &lt;code&gt;ls&lt;/code&gt; shows adjacent nodes with relationship types and properties. &lt;code&gt;cd&lt;/code&gt; traverses edges. &lt;code&gt;find --type Person --depth 3&lt;/code&gt; does bounded breadth-first search. &lt;code&gt;path A B&lt;/code&gt; computes shortest paths. &lt;code&gt;schema&lt;/code&gt; shows the full ontology.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Graphs are already hierarchical-ish — they just need scoping. An agent exploring a company knowledge graph could &lt;code&gt;cd Company/Acme&lt;/code&gt; → &lt;code&gt;ls --rel EMPLOYS&lt;/code&gt; → &lt;code&gt;cd employees/jane&lt;/code&gt; → &lt;code&gt;ls --rel MANAGES&lt;/code&gt; instead of writing a 6-line Cypher query to get the same traversal. Having spent time building on SurrealDB — a multi-model database with graph capabilities — I watched developers struggle with exactly this: the data was richly connected, but every query required you to already know the shape of the graph. Agents hit the same wall, harder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The business case:&lt;/strong&gt; Knowledge graph queries are the backbone of fraud detection, recommendation engines, and supply chain mapping. An agent that can traverse a graph in 4 calls instead of 12 doesn't just save API cost — it makes real-time graph-powered workflows viable for the first time.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. APIShell — REST/GraphQL Endpoint Navigator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Agents are terrible at API discovery. They hallucinate endpoints, guess parameter names, and burn calls on 404s. Even with an OpenAPI spec in context, they struggle to chain calls correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shell:&lt;/strong&gt; Ingests OpenAPI specs or GraphQL introspection and presents endpoints as a filesystem: &lt;code&gt;/users/&lt;/code&gt;, &lt;code&gt;/users/{id}/orders/&lt;/code&gt;, &lt;code&gt;/users/{id}/orders/{id}/items/&lt;/code&gt;. &lt;code&gt;ls&lt;/code&gt; shows available operations and parameters. &lt;code&gt;cat&lt;/code&gt; shows schema. &lt;code&gt;call GET /users/123&lt;/code&gt; executes. Agents &lt;code&gt;cd&lt;/code&gt; into nested resources and &lt;code&gt;find&lt;/code&gt; across the API surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; REST APIs are already hierarchical — resources nest inside resources. The shell makes that nesting navigable instead of requiring the agent to memorize the spec.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The business case:&lt;/strong&gt; Every agent-powered integration — CRM sync, payment processing, data pipeline orchestration — starts with API discovery. Cut the discovery overhead and you cut the cost of every downstream automation.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. K8Shell — Kubernetes Cluster Navigator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; kubectl output is verbose, unstructured, and agents constantly run the wrong command or parse output incorrectly. Getting the status of a single deployment requires chaining 3-4 commands.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shell:&lt;/strong&gt; Namespaces are top-level directories. Resource types are subdirectories. Individual resources are files. &lt;code&gt;cd production/deployments/api-server&lt;/code&gt; then &lt;code&gt;cat&lt;/code&gt; shows status, replicas, image version. &lt;code&gt;logs&lt;/code&gt; streams container output. &lt;code&gt;find --status CrashLoopBackOff&lt;/code&gt; across a whole cluster in one call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Kubernetes already has a natural hierarchy (cluster → namespace → resource type → resource → containers). It's just not exposed as navigable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The business case:&lt;/strong&gt; MTTR is the metric that matters. An agent that can find a crashing pod, pull its logs, and identify the root cause in 3 calls instead of 10 turns a 20-minute incident into a 2-minute one. For a startup running $50k/month in compute, finding zombie resources alone could pay for the tooling.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. CloudShell — AWS/GCP/Azure Resource Navigator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Cloud consoles are the worst agent interface — hundreds of services, thousands of resources, nested in regions and accounts. An agent needs 10+ API calls just to orient itself in an AWS account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shell:&lt;/strong&gt; &lt;code&gt;/us-east-1/ec2/instances/&lt;/code&gt;, &lt;code&gt;/us-east-1/rds/databases/&lt;/code&gt;, &lt;code&gt;/global/iam/roles/&lt;/code&gt;. &lt;code&gt;ls&lt;/code&gt; shows resources with key metadata inline. &lt;code&gt;find --type security-group --port 22&lt;/code&gt; finds open SSH across every region. &lt;code&gt;tree&lt;/code&gt; shows the blast radius of a VPC.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Cloud resources have natural hierarchy (region → service → resource type → resource) but every cloud provider's API requires you to specify the scope upfront rather than navigate to it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The business case:&lt;/strong&gt; Security audits, cost optimization, and compliance checks all require cross-account, cross-region visibility. An agent that can &lt;code&gt;find --type security-group --port 22&lt;/code&gt; across your entire AWS org in one call replaces a week of manual audit work.&lt;/p&gt;




&lt;h3&gt;
  
  
  5. GitShell — Repository History Navigator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Git's CLI is powerful but output is unstructured text. Agents struggle with log parsing, diff interpretation, and blame navigation. Merge conflict resolution — where agents currently fail catastrophically — is a chain of poorly-structured interactions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shell:&lt;/strong&gt; Branches are directories. Commits are navigable nodes. &lt;code&gt;cd main/HEAD~5&lt;/code&gt; puts you at a point in time. &lt;code&gt;diff&lt;/code&gt; shows changes in structured format. &lt;code&gt;blame function_name&lt;/code&gt; traces authorship. &lt;code&gt;find --author=alex --since=2w --path=src/&lt;/code&gt; replaces gnarly git log incantations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Git history is a DAG — it's already a graph structure. The shell linearizes it into something navigable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The business case:&lt;/strong&gt; AI-assisted code review and automated PR summaries are already shipping. The bottleneck is agent comprehension of &lt;em&gt;change context&lt;/em&gt; — not just what changed, but why and who else touched it. A navigable git history makes those workflows reliable instead of approximate.&lt;/p&gt;




&lt;h3&gt;
  
  
  6. DataShell — Database Schema &amp;amp; Query Navigator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Agents writing SQL against unfamiliar databases waste most of their calls on INFORMATION_SCHEMA queries and DESCRIBE TABLE to understand what they're working with. They build a mental model one table at a time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shell:&lt;/strong&gt; Schemas are directories. Tables are subdirectories. Columns are files. &lt;code&gt;cd analytics/fact_orders&lt;/code&gt; then &lt;code&gt;ls&lt;/code&gt; shows columns with types, nullability, foreign keys. &lt;code&gt;sample 10&lt;/code&gt; shows real data. &lt;code&gt;stats revenue&lt;/code&gt; shows distribution. &lt;code&gt;query "SELECT ..."&lt;/code&gt; executes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Databases already have three-level hierarchy (schema → table → column). The shell makes it navigable and adds introspection primitives that agents currently cobble together from metadata queries. At Snowflake, I watched analysts spend 30% of their time just figuring out which tables and columns existed before they could write a single query. Agents do the same thing, except they burn tokens instead of hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The business case:&lt;/strong&gt; Text-to-SQL is a $2B+ market growing fast. The accuracy ceiling today is schema comprehension — not query generation. Fix the agent's understanding of the database and the SQL practically writes itself.&lt;/p&gt;




&lt;h3&gt;
  
  
  7. LogShell — Observability Stack Navigator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Log exploration (Datadog, Splunk, ELK) is one of the hardest agent tasks. The data is temporal, high-volume, and agents don't know which filters to apply until they see what's there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shell:&lt;/strong&gt; Services are directories. Time ranges are navigable: &lt;code&gt;cd api-server/last-1h&lt;/code&gt;. &lt;code&gt;grep ERROR&lt;/code&gt; searches within scope. &lt;code&gt;find --level=error --service=payments --since=30m&lt;/code&gt; replaces complex query syntax. &lt;code&gt;trace request-id-xyz&lt;/code&gt; follows a distributed trace across services. &lt;code&gt;stats&lt;/code&gt; shows error rate trends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Observability data has natural hierarchy (service → time range → severity → individual events) but every platform exposes it as a query builder instead of a navigable space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The business case:&lt;/strong&gt; On-call engineers spend most of their incident response time on triage, not resolution. An agent that can scope, search, and trace in structured commands turns a 45-minute triage into a 5-minute one — and makes 3am pages survivable.&lt;/p&gt;




&lt;h3&gt;
  
  
  8. DocShell — Large Document Navigator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; PDFs, legal contracts, SEC filings — anything longer than a context window. Agents currently get truncated chunks and lose spatial awareness of where they are in the document.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shell:&lt;/strong&gt; Sections and headings are directories. Paragraphs are files. &lt;code&gt;cd "Article 7/Indemnification"&lt;/code&gt; scopes to a section. &lt;code&gt;find --type definition&lt;/code&gt; locates defined terms. &lt;code&gt;diff v1.docx v2.docx&lt;/code&gt; shows redlines structurally. &lt;code&gt;xref "Force Majeure"&lt;/code&gt; finds every cross-reference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Documents already have hierarchical structure (TOC, headings, sections). The shell makes that structure navigable instead of forcing agents to process content linearly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The business case:&lt;/strong&gt; Legal review, due diligence, and regulatory compliance all involve agents processing documents that exceed context windows. A navigable document structure means the agent can answer "what does the indemnification clause say?" without ingesting 200 pages.&lt;/p&gt;




&lt;h3&gt;
  
  
  9. MailShell — Email Thread Navigator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Email is deceptively hard for agents. Threading, attachments, reply chains, CC dynamics, and the social graph embedded in headers. IMAP and Gmail APIs return flat lists that lose conversational structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shell:&lt;/strong&gt; Inbox is a directory. Threads are subdirectories. Messages are files. &lt;code&gt;cd thread-xyz&lt;/code&gt; enters a conversation. &lt;code&gt;ls --from=nick@connectifi.co --since=1w&lt;/code&gt; filters. &lt;code&gt;attachments&lt;/code&gt; lists files across a thread. &lt;code&gt;participants&lt;/code&gt; shows the social graph. &lt;code&gt;find --has-attachment --unread&lt;/code&gt; replaces complex query syntax.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Email is already hierarchical (account → folder → thread → message → parts) but no API exposes it that way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The business case:&lt;/strong&gt; AI email assistants are everywhere, but they're all working against flat APIs. A shell that preserves thread structure and social context would make "summarize this thread" and "draft a reply" dramatically more reliable — and stop agents from replying to the wrong person in the chain.&lt;/p&gt;




&lt;h3&gt;
  
  
  10. MeetingShell — Calendar &amp;amp; Transcript Navigator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Meeting transcripts, action items, and calendar context are scattered across Zoom, Google Meet, Notion, and calendar apps. No single interface connects "what was discussed" with "what was decided" and "who committed to what."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shell:&lt;/strong&gt; &lt;code&gt;ls today&lt;/code&gt; shows your schedule. &lt;code&gt;cd standup-2026-03-02&lt;/code&gt; enters a meeting's context. &lt;code&gt;transcript&lt;/code&gt; shows what was said. &lt;code&gt;actions&lt;/code&gt; lists commitments. &lt;code&gt;attendees&lt;/code&gt; shows who was there. &lt;code&gt;find --action-owner=me --status=open&lt;/code&gt; across all meetings finds outstanding commitments. &lt;code&gt;grep "pricing decision" --since=1m&lt;/code&gt; searches all recent transcripts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Meetings have natural hierarchy (calendar → event → transcript → segments → action items) but the data is siloed across 4-5 apps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The business case:&lt;/strong&gt; The average knowledge worker spends 31 hours per month in meetings. The institutional memory from those meetings evaporates within days. An agent that can search across all your meeting history and surface outstanding commitments turns meetings from a time sink into a queryable knowledge base.&lt;/p&gt;




&lt;h2&gt;
  
  
  Which One Should I Build Next?
&lt;/h2&gt;

&lt;p&gt;I'm going to build one of these. Here's how you can help:&lt;/p&gt;

&lt;p&gt;🗳️ &lt;strong&gt;&lt;a href="https://strawpoll.com/YVyPvbVa2gN" rel="noopener noreferrer"&gt;Vote for your top pick →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Or better yet:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Build one yourself.&lt;/strong&gt; DOMShell is open source and the MCP server pattern is reusable. Fork it, swap the Chrome extension for a different data source, and the shell primitives carry over.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tell me I'm wrong.&lt;/strong&gt; Maybe there's an 11th interface I haven't thought of that's even more broken for agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tell me you've already built it.&lt;/strong&gt; I'd love to see prior art.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drop a comment, vote, or reach out. The filesystem metaphor is a 50-year-old idea — it's just taken us this long to realize it's the right abstraction for AI agents too.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DOMShell: &lt;a href="https://github.com/apireno/DOMShell" rel="noopener noreferrer"&gt;github.com/apireno/DOMShell&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;npm: &lt;code&gt;npx @apireno/domshell&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Full benchmark data: &lt;a href="https://github.com/apireno/DOMShell/tree/main/experiments/claude_domshell_vs_cic" rel="noopener noreferrer"&gt;DOMShell vs CiC experiment&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://pireno.com" rel="noopener noreferrer"&gt;Pireno&lt;/a&gt;. I do fractional CTO/CPO work helping teams ship AI-native products — if your agents are hitting an orientation wall on a specific stack, &lt;a href="https://pireno.com" rel="noopener noreferrer"&gt;let's figure out which shell would unlock the most value&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>agents</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Why I Built a Filesystem for the Browser</title>
      <dc:creator>Alessandro Pireno</dc:creator>
      <pubDate>Thu, 26 Feb 2026 22:04:52 +0000</pubDate>
      <link>https://dev.to/apireno/why-i-built-a-filesystem-for-the-browser-3kpa</link>
      <guid>https://dev.to/apireno/why-i-built-a-filesystem-for-the-browser-3kpa</guid>
      <description>&lt;h1&gt;
  
  
  Why I Built a Filesystem for the Browser
&lt;/h1&gt;

&lt;p&gt;Browser automation for AI agents has an impedance mismatch problem. We feed agents high-entropy noise — raw HTML, pixel screenshots, brittle CSS selectors — and expect them to produce low-entropy, structured actions. The result is fragile, expensive, and full of silent failure modes.&lt;/p&gt;

&lt;p&gt;DOMShell fixes this by exposing the browser's Accessibility Tree as a virtual filesystem. Agents navigate pages with &lt;code&gt;ls&lt;/code&gt;, &lt;code&gt;cd&lt;/code&gt;, &lt;code&gt;grep&lt;/code&gt;, &lt;code&gt;click&lt;/code&gt; — the same commands they already know from every Unix system in their training data. No new API to learn. No screenshots to parse. No selectors to break.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Impedance Mismatch
&lt;/h2&gt;

&lt;p&gt;Three approaches dominate browser automation today. All three are engineering mismatches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Screenshots + vision models.&lt;/strong&gt; The agent takes a screenshot, sends it to a vision model, gets back pixel coordinates, clicks, takes another screenshot. This burns vision tokens on every action, adds a full round-trip per interaction, and fails silently when coordinates shift. A button at (450, 320) moves to (450, 380) because a cookie banner loaded. The agent clicks empty space and doesn't know it. You debug ghost interactions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CSS selectors / XPath.&lt;/strong&gt; Trading pixel fragility for structural fragility. &lt;code&gt;#main &amp;gt; div:nth-child(3) &amp;gt; ul &amp;gt; li:first-child &amp;gt; a&lt;/code&gt; breaks when a wrapper div gets added. Even semantic selectors like &lt;code&gt;[data-testid="submit"]&lt;/code&gt; depend on the site's developers having added test IDs. Most haven't. And the agent still needs to reason over raw HTML to write the query — thousands of tokens of noise for a single interaction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coordinate-based clicks.&lt;/strong&gt; Resolution-dependent. Viewport-dependent. Zoom-dependent. Responsive-layout-dependent. Every variable the agent can't control becomes a failure mode.&lt;/p&gt;

&lt;p&gt;The common problem: all three approaches force the agent to work with a representation that wasn't designed for programmatic navigation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Accessibility Tree Is the Right Abstraction
&lt;/h2&gt;

&lt;p&gt;Browsers already solved "navigate this page without looking at it." The Accessibility Tree — the internal representation that screen readers consume — is deterministic, semantic, and compact. Every button knows it's a button. Every link carries its href. Every input has a label and type. No invisible wrapper divs, no CSS noise, no layout-dependent coordinates.&lt;/p&gt;

&lt;p&gt;The AX tree is the low-entropy, structured signal that agents need. The question was how to expose it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Filesystem Metaphor
&lt;/h2&gt;

&lt;p&gt;The AX tree has a natural hierarchy. Container elements (navigation, main content, sidebars, forms) map to directories. Interactive elements (buttons, links, inputs) map to files. The whole structure maps cleanly to a filesystem — and every LLM already knows how to operate a filesystem.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ls&lt;/code&gt; to see what's on a page. &lt;code&gt;cd&lt;/code&gt; to scope into a section. &lt;code&gt;cat&lt;/code&gt; to inspect an element. &lt;code&gt;grep&lt;/code&gt; to search. &lt;code&gt;find&lt;/code&gt; to discover by type. &lt;code&gt;click&lt;/code&gt; to interact. &lt;code&gt;text&lt;/code&gt; to bulk-extract. These commands are in every model's training data. Zero-shot usability.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dom@shell:$ cd %here%
✓ Entered tab 386872589
  Title: Wikipedia
  URL:   https://www.wikipedia.org/

dom@shell:$ ls
[d] main/
[d] contentinfo/

dom@shell:$ cd main
dom@shell:$ tree 2
main/
├── [d] top_languages/
│   ├── [x] english_7141000_articles_link
│   ├── [x] deutsch_3099000_artikel_link
│   ├── [x] français_2740000_articles_link
│   └── ...
├── [d] search/
│   └── [x] search_input
└── [x] read_wikipedia_in_your_language_btn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The page is a directory tree. &lt;code&gt;submit search_input "Artificial intelligence"&lt;/code&gt; navigates. The tree auto-refreshes. You're now looking at the article's filesystem.&lt;/p&gt;

&lt;p&gt;No screenshots. No coordinates. No selectors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Aggressive Flattening
&lt;/h2&gt;

&lt;p&gt;The raw AX tree is noisy. Hundreds of wrapper nodes — &lt;code&gt;role=generic&lt;/code&gt;, &lt;code&gt;role=none&lt;/code&gt;, unnamed divs — exist for CSS layout, not semantics. Without filtering, you get listings of &lt;code&gt;generic_1&lt;/code&gt;, &lt;code&gt;generic_2&lt;/code&gt;, &lt;code&gt;generic_3&lt;/code&gt; with no indication of what anything is.&lt;/p&gt;

&lt;p&gt;DOMShell's VFS mapper (&lt;code&gt;vfs_mapper.ts&lt;/code&gt;) recursively flattens through non-semantic nodes, promoting their children up. If a &lt;code&gt;role=generic&lt;/code&gt; node has one child, the child replaces it. The result is a clean tree where every visible element has a name derived from its accessible name and role: &lt;code&gt;submit_btn&lt;/code&gt;, &lt;code&gt;contact_us_link&lt;/code&gt;, &lt;code&gt;email_input&lt;/code&gt;. Duplicates get disambiguated with &lt;code&gt;_2&lt;/code&gt;, &lt;code&gt;_3&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is the core design decision. Minimizing node bloat maximizes agent signal-to-noise. Every flattened wrapper node is a token the model doesn't waste reasoning about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwovbjp172u5p9ngke3xu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwovbjp172u5p9ngke3xu.png" alt=" " width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three components, cleanly separated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chrome Extension (the kernel).&lt;/strong&gt; A background service worker runs the shell: command parsing, AX tree traversal via CDP, filesystem mapping, DOM change detection. The side panel is a thin terminal (React + Xterm.js) — no logic, just I/O. The service worker reads the AX tree through &lt;code&gt;chrome.debugger&lt;/code&gt; (Chrome DevTools Protocol 1.3), including cross-iframe discovery via &lt;code&gt;Page.getFrameTree&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP Server (the bridge).&lt;/strong&gt; A standalone Node.js HTTP server on localhost:3001 that any MCP-compatible client connects to — Claude Desktop, Claude Code, Cursor, Windsurf, Gemini CLI. Translates MCP tool calls into shell commands, pipes them to the extension over WebSocket (localhost:9876), streams results back. Multiple clients can connect simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security tiers.&lt;/strong&gt; Read-only by default — agents can browse but not act. Write commands (&lt;code&gt;click&lt;/code&gt;, &lt;code&gt;type&lt;/code&gt;, &lt;code&gt;scroll&lt;/code&gt;, &lt;code&gt;js&lt;/code&gt;) require &lt;code&gt;--allow-write&lt;/code&gt;. Cookie access (&lt;code&gt;whoami&lt;/code&gt;) requires &lt;code&gt;--allow-sensitive&lt;/code&gt;. Domain allowlists restrict which sites agents can operate on. Every command is audit-logged with timestamps. Auth tokens gate the WebSocket bridge.&lt;/p&gt;

&lt;p&gt;The separation is deliberate. You can use DOMShell interactively without the MCP server. You can let an agent browse your tabs without it being able to click "Delete Account."&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark: 50% Fewer Tool Calls
&lt;/h2&gt;

&lt;p&gt;I ran 8 trials across 4 tasks using Claude Opus 4.6 with both DOMShell and Anthropic's built-in browser automation (Claude in Chrome). The metric was tool call count — directly proportional to latency and API cost.&lt;/p&gt;

&lt;p&gt;DOMShell averaged &lt;strong&gt;4.3 calls per task&lt;/strong&gt; vs Claude in Chrome's &lt;strong&gt;8.6&lt;/strong&gt; — a &lt;strong&gt;50% reduction&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The biggest win was content extraction: DOMShell completed it in 2 calls (navigate + extract) where CiC needed 5-6. The filesystem metaphor lets the agent scope to the right section (&lt;code&gt;cd main/article&lt;/code&gt;) and bulk-extract (&lt;code&gt;text&lt;/code&gt;) in a single call, rather than navigating through read_page results iteratively.&lt;/p&gt;

&lt;p&gt;Where CiC holds an edge is raw JavaScript execution — &lt;code&gt;javascript_exec&lt;/code&gt; can batch multiple DOM operations into a single call. DOMShell's counter is the &lt;code&gt;for&lt;/code&gt; + &lt;code&gt;script&lt;/code&gt; + &lt;code&gt;each&lt;/code&gt; pipeline, which collapses multi-page workflows into 1-2 calls by iterating over command output and replaying saved scripts across URLs.&lt;/p&gt;

&lt;p&gt;The 50% reduction translates directly to cost and latency. For agents running in production — where every tool call is an API round-trip — halving the call count is a meaningful operational improvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;DOMShell is open source (MIT) and free. On the roadmap: a headless mode — a self-contained Chromium process that agents launch directly for CI pipelines and server-side automation where no visible browser is needed.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;for&lt;/code&gt; + &lt;code&gt;script&lt;/code&gt; + &lt;code&gt;each&lt;/code&gt; pipeline is where the compound efficiency gains live. Save a command sequence as a script, replay it across N URLs in a single call. O(2N) tool calls become O(2). For any agent doing research, extraction, or monitoring across multiple pages, that's a step change.&lt;/p&gt;

&lt;p&gt;The browser is your filesystem. &lt;code&gt;ls&lt;/code&gt; it.&lt;/p&gt;




&lt;p&gt;GitHub: &lt;a href="https://github.com/apireno/DOMShell" rel="noopener noreferrer"&gt;github.com/apireno/DOMShell&lt;/a&gt;&lt;br&gt;
Project Page: &lt;a href="https://pireno.com/domshell" rel="noopener noreferrer"&gt;DOMShell&lt;/a&gt;&lt;br&gt;
Built by &lt;a href="https://pireno.com" rel="noopener noreferrer"&gt;Pireno&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>javascript</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Bridging the Valuation Gap: Product Strategy for the AI-Native Era</title>
      <dc:creator>Alessandro Pireno</dc:creator>
      <pubDate>Thu, 26 Feb 2026 21:58:01 +0000</pubDate>
      <link>https://dev.to/apireno/bridging-the-valuation-gap-product-strategy-for-the-ai-native-era-491d</link>
      <guid>https://dev.to/apireno/bridging-the-valuation-gap-product-strategy-for-the-ai-native-era-491d</guid>
      <description>&lt;p&gt;The Transition from Wrapped to Native&lt;br&gt;
I’ve spent the last 20 years at the intersection of deep-tech engineering and product leadership—ranging from Applied Physics at Caltech to scaling technical organizations at Snowflake and SurrealDB.&lt;/p&gt;

&lt;p&gt;In the current market, we are seeing a clear divide. Most companies are AI-wrapped—placing a thin LLM layer over existing, legacy architectures. The long-term winners will be AI-native: companies that rebuild their core stacks to be fundamentally navigable, deterministic, and actionable by autonomous agents.&lt;/p&gt;

&lt;p&gt;The Valuation Gap&lt;br&gt;
I’ve identified a consistent gap in the valuation of technical startups. It’s the friction point between a brilliant engineering breakthrough and a repeatable, scalable GTM strategy. Often, the technical depth is there, but the product strategy hasn't accounted for the shifting unit economics of AI or the fragility of current agentic workflows.&lt;/p&gt;

&lt;p&gt;As a Fractional CPO and GTM Advisor, I step into this gap to help founders build for stability and scale.&lt;/p&gt;

&lt;p&gt;Architectural Pragmatism&lt;br&gt;
I don’t believe in AI hype. I believe in pattern recognition, latency reduction, and unit economics. My work focuses on:&lt;/p&gt;

&lt;p&gt;Semantic Interfaces: Moving beyond brittle, vision-based automation toward structured, semantic representations of data. &lt;a href="https://pireno.com/domshell/" rel="noopener noreferrer"&gt;DOMShell&lt;/a&gt; is a prime example—turning the browser into a filesystem because that’s the interface agents were built to understand.&lt;/p&gt;

&lt;p&gt;Knowledge Orchestration: Building the layers that allow AI to manage complex, unstructured data without constant manual code updates.&lt;/p&gt;

&lt;p&gt;GTM for Deep Tech: Aligning engineering roadmaps with market realities to ensure that technical excellence translates into market dominance.&lt;/p&gt;

&lt;p&gt;Execution that Scales&lt;br&gt;
With a background in both Physics and Business (Columbia MBA), I am comfortable navigating both a codebase and a P&amp;amp;L. I help teams move past the demo phase and into production-grade reliability.&lt;/p&gt;

&lt;p&gt;Work with Me&lt;br&gt;
I take on a limited number of fractional roles and advisory seats for companies building the next generation of data infrastructure and agentic platforms.&lt;/p&gt;

&lt;p&gt;If you’re a founder looking for an experienced partner to help navigate the architectural and strategic challenges of the AI-native landscape, let’s talk.&lt;/p&gt;

&lt;p&gt;Website: &lt;a href="https://pireno.com" rel="noopener noreferrer"&gt;https://pireno.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LinkedIn: &lt;a href="https://linkedin.com/in/apireno" rel="noopener noreferrer"&gt;https://linkedin.com/in/apireno&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/apireno" rel="noopener noreferrer"&gt;https://github.com/apireno&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
