As developers building AI agents, we’ve all run into the same massive bottleneck: how do you let a Language Model (LLM) browse the web without dest...
For further actions, you may consider blocking this person and/or reporting abuse
Solid approach — AOM over raw DOM is a genuinely good insight, and the benchmarks show real numbers. A couple of questions though:
How does it hold up on SPA-heavy sites (React dashboards, maps, etc.) where AOM coverage is spottier than on Wikipedia/GitHub? Would be interesting to see the same benchmark on something like a Google Sheets or Notion page.
Neo4j for state transitions makes sense for complex workflows, but for simpler "navigate → extract → done" patterns, do you see it as a net win or does the infrastructure overhead outweigh the benefit?
Either way, clean write-up. Curious to see how this evolves.
While not as fantastic as if it were native AOM, it does use heuristic search for interactives and it has inspect_node to isolate and send commands in respect to the node id, resulting still in cleaner interactivity and a token reduction vs traditional.
Neo4j is integrated as a 'long-game' solution. Mapping sites at enterprise level, would mean a large amount of ram usage, but practically 0 inference cost on any action triggered on that site after the initial mapping. So as the numbers scale gets cranked, efficiency rises. a little overhead for scalability makes a big difference when it's system memory instead of vram + inference
Docs updated on the repo for the SPA test
i would need standard benchmark stats vs this, this is potentially novel information if you can prove theres no intelligence loss in any way or corners skipped
docs updated.