One of the main problems I've faced when building AI agents for the past two years is:
How do I reliably give my AI agent access to the web?
For instance, if I want to build an AI agent that can send emails – How can I connect it to a service like gmail or outlook? Should I use APIs, websites or autonomous web agents? This article aims to cover the methods I've discovered for allowing AI agents to use the web.
Using APIs and SDKs
Many developers choose to use APIs and SDKs to hook their agent up to the web. This approach has low latency and good authentication options, but there are a few cons:
Some web services don't have APIs
You'll encounter outdated, poorly written documentation
Usually, APIs don't contain the all the features that the accompanying website has, so if you want your agent to do something super specific on a web service, that might be impossible!
Luckily, there are a few services that provide 'libraries' of API calls that you can use directly as tools in your agents. Here are a few nice ones:
Composio – Library of tools for AI Agents with nice Auth solutions
Langchain tools – Tools for any langchain/graph agent.
Apify – Huge library of APIs made by a community of developers
But to use any web service, you'll need to leave the confines of APIs and use web services like humans – via the website!
Using Websites
If we can use any website reliably with our AI agent, then we can automate any process that a human can do in the web.
But how do we do this?
A lot of developers, myself included, initially opt for using a browser testing framework like Selenium or Playwright to open a browser and then use scripts to interact with the page.
There are several problems with this approach:
Scripts break easily if a website changes. Something as simple as an A/B test can render your script useless.
Test browsers are easily detectable and blocked by websites
Once you want to put your agent into production, it's not trivial to host browsers. Adding authentication and rotating proxies to the equation make things even trickier!
For a couple of months, me and some friends experimented with building a new type of Browser SDK that solves these problems by:
Adding natural language selectors, where you can write e.g
get_element("find the log in button")
orextract("extract the companies like this: {name: str, desc: str}")
instead of e.g using a brittle css selectors and scripts.Built in authentication so that AI Agents easily can access services.
Including several pre-configured remote hosting providers with rotating proxies built-in, so you don't get blocked.
We have just made our work 100% free and open source so that anyone can try it and learn from it:
dendrite-systems / dendrite-python-sdk
Tools to build web AI agents that can authenticate, interact with and extract data from any website.
Notice: The Dendrite SDK is not under active development anymore. However, the project will remain fully open source so that you and others can learn from it. Feel free to fork, study, or adapt this code for your own projects as you wish – reach out to us on Discord if you have questions! We love chatting about web AI agents. 🤖
What is Dendrite?
Dendrite is a framework that makes it easy for web AI agents to browse the internet just like humans do. Use Dendrite to:
- 👆🏼 Interact with elements
- 💿 Extract structured data
- 🔓 Authenticate on websites
-
↕️ Download/upload files - 🚫 Browse without getting blocked
A simple outlook integration
With Dendrite it's easy to create web interaction tools for your agent.
Here's how you can send an email:
from dendrite import AsyncDendrite
async def send_email(to, subject, message):
client = AsyncDendrite(auth
…Unfortunately, we won't keep working on the SDK for several reasons, but if you have any questions about how it works we'd love to help out in our Discord. Other alternatives that provide a similar experience are:
Computer Use – the End Game for Web AI Agents?
If you have not read the Bitter Lesson by Rich Sutton
, go and do it right now, it could change the way you view building AI products and it will be be highly relevant for this next part. (TL;DR: Generalisable AI solutions that can scale with increased compute always win)
A few months back, Anthropic released Computer Use and took the AI agent community by storm. It is a new service that allows Anthropic's LLMs to take control over a computer or browser to complete tasks by using the mouse and keyboard, instead of using scripts and API calls. In their release they wrote:
With computer use, we're trying something fundamentally new. Instead of making specific tools to help Claude complete individual tasks, we're teaching it general computer skills—allowing it to use a wide range of standard tools and software programs designed for people.
They are saying that future AI agents won't need API calls or browser scripts since they can just control the browser directly like a human would via the mouse and keyboard. Their thesis aligns perfectly with the Bitter Lesson, which states that if you want to be the leader in any segment of the AI space, you should build a general solution that scales well with an increased abundance of cheap computational power.
And, of course! This lesson should apply to the AI agent space too – the most general way to build an AI agent that can do anything a human can do (which is what most of us AI agent devs are trying to achieve) is to just give it a browser, keyboard and mouse, and then let it get to work!
I've already seen developers using Anthropic Computer Use with 100% reliability in complex tasks just by giving a really good prompt, often times augmented by Anthropic's prompt improver.
Conclusion
I don't think all AI agents will use something like Computer Use in the future, however, I do think the vast majority will. Sure, there are many instances where it's way easier to query a database through e.g an API call. However, if a AI agent can just log into a website and use it's search feature, from which it can make draw a conclusion from the first 10 results, why would it need the entire database?
So, to my fellow AI agent developers – will you heed this lesson now, or will you wager against Rich Sutton's words and risk tasting the Bitter Lesson yourself?
PS: This is my debut post on dev.to, please let me know if I can improve my approach for future posts. Also, if you have any questions about AI Agents/automating tedious tasks with AI, drop a comment and I'll get back to you asap!
Top comments (0)