The pace of change in generative AI is incredible, and I’m constantly amazed with what we can do now. But today I discovered something that turned my amazement up to 11.
Jack Wotherspoon had shared his most recent weekly bulletin describing the latest advancements in Gemini CLI. He mentioned “Multi-Modal MCP Tool Responses” as one of the new features, meaning that Gemini CLI can now interpret and act on MCP responses of various modalities, not just text. He happened to mention BrowserMCP which I’ve never before looked at.
15 minutes later I had Gemini CLI performing UI tests of one of my web apps, simply by providing a single natural language instruction to Gemini CLI.
MIND. BLOWN.
Setting Up BrowserMCP
This was a doddle. First, I took the MCP configuration described here and added it to my .gemini/settings.json. I.e.
{
"mcpServers": {
"browsermcp": {
"command": "npx",
"args": ["@browsermcp/mcp@latest"]
}
}
}
Then I installed the BrowserMCP extension into Chrome. This took about 5 seconds, following the instructions here. Could it really be this easy?
Prepping to Test my Web App UI
Time to test with Rickbot, of course! For those of you not aware, Rickbot is a multi-personality agentic solution I built recently using Google Gemini, and which I have deployed to Google Cloud Run. (Check the article for the walkthrough of how I did this.)
My goal was to create a UI test that would change to each of the personalities available in the dropdown, interact with each personality, get all the responses, and then present them back in a table.
I launched the Rickbot Streamlit web app locally in my development environment, like this:
uv run -- streamlit run app.py --browser.serverAddress=localhost
Rickbot launches in the browser:
Then I connect BrowserMCP by clicking on the BrowserMCP button in Chrome:
Next, I launch Gemini CLI. Let’s check the MCP server is properly loaded with the /mcp command:
Okay, this all looks good.
Running the Test
And now for the amazing part. Here’s my prompt:
Test Rickbot at http://localhost:8502/ by selecting each personality in turn, and for each: say “Hello” and check you get a response. When done, show me the responses for each personality in a table.
In response to my prompt, Gemini CLI goes ahead and uses BrowserMCP to interact with Rickbot in the browser. During this test run I DID NOT TOUCH MY COMPUTER! Everything you see below was orchestrated from Gemini CLI!
OMG! How amazing was that!! At the end of the test run, I ask Gemini to render the results in HTML, and save the file locally. Which it does, of course!
Conclusion
And there you have it. UI tests that you can setup and run in minutes with natural language prompts. MINUTES! This could easily be adapted into a repeatable regression test. How’s that for a good Gen AI use case?
Before You Go
- Please share this with anyone that you think will be interested. It might help them, and it really helps me!
- Please give me 50 claps! (Just hold down the clap button.)
- Feel free to leave a comment 💬.
- Follow and subscribe, so you don’t miss my content.





Top comments (0)