xbill for Google Developer Experts

Posted on May 27 • Originally published at Medium on May 27

Cross Language A2A Agent Benchmarking with Antigravity CLI

#mcps #googleadk #a2aprotocol #crosslanguage

Building a Benchmarking Agent with A2A and MCP

This tutorial aims to build and test benchmarking Agents using the A2A protocol across several popular programming languages. The Master Orchestrator Agent is exposed via MCP to allow Antigravity CLI to be used as a MCP client.

Deja Vu — What is Old is New!

This paper is a re-visiting of the original benchmark series with Gemini CLI over Node, GO, and Python:

Cross Language A2A Agent Benchmarking with Gemini 3 and Gemini CLI

In this updated version, the Antigravity CLI is used over Node, GO, Python, and Rust.

Why would I need Multi-Language Support? Can’t I just use Python?

Most mature Agent development tools and libraries are Python based. Python allows for rapid prototyping and evaluation of approaches. Python is also an interpreted language- which has trade-offs in memory safety, and performance. Other languages like GO and Rust offer high performance and memory safe operations. With a language neutral communication protocol — the actual Agent implementation of each Agent can be coded in the most appropriate language.

What is this Approach actually Benchmarking?

The high level goal was to measure the actual time spent running an algorithm in the native language code inside the A2A agent. Each language had a slightly different implementation due to the language syntax. After running the algorithm- each Agent was instructed to calculate and return the elapsed time for cross language comparison.

What is the Agent Development Kit?

The Agent Development Kit (ADK) is a flexible and modular framework for developing and deploying AI agents. While optimized for Gemini and the Google ecosystem, ADK is model-agnostic , deployment-agnostic , and is built for compatibility with other frameworks. The official ADK is only currently available in the Python, GO, and Java programming languages.

Google’s Agent Stack in Action: ADK, MCP, A2A on Google Cloud

Google provides full documentation on the Agent Development Kit (ADK) here:

Agent Development Kit (ADK)

What is the A2A protocol?

The Agent2Agent (A2A) protocol, an open communication standard for AI agents, was initially introduced by Google in April 2025. It is specifically engineered to facilitate seamless interoperability within multi-agent systems, enabling AI agents developed by diverse providers or built upon disparate AI agent frameworks to communicate and collaborate effectively.

A good overview of the A2A protocol can be found here:

A2A Protocol

Language Support For the A2A Protocol

The official ADK for Python, GO, and Java provide built-in support for working with the A2A protocol. For other programming languages like JS, Rust, and .NET — 3rd party libraries are available to add support for the protocol.

The main source for A2A Language support is the GitHub repo:

GitHub - a2aproject/A2A: An open protocol enabling communication and interoperability between opaque agentic applications.

Tool Chain Setup for A2A

This article targets the Python, GO, Rust, and JavaScript environments.

The build tools for each language environment need to be in place. For building with Python- a working Python environment with 3.12 or later along with package management tools like uv or pip is required.

For building with GO — a recent version of the GO compiler (1.24.1 or later) is required.

For building with Rust, the Rust tool chain is required.

For building with Node / JavaScript — a working Node.js environment with Node version 20 or later and a functional npm tool is needed.

Antigravity CLI

Antigravity CLI is the follow-on successor to Gemini CLI- the terminal driven, agent assisted coding tool.

Full details on installing Antigravity CLI are here:

Getting Started with Antigravity CLI

Testing the Antigravity CLI Environment

Once you have all the tools in place- you can test the startup of Antigravity CLI.

You will need to authenticate with a Google Cloud Project or your Google Account:

agy

This will start the interface:

Checking the Developer Environment

Verify that all the prerequisite packages and compilers are installed — and clone the sample Github repo:

git clone https://github.com/xbill9/a2a-benchmark
cd a2a-benchmark

Once you have your Google Cloud Project and preferred authentication method — run the init.sh script to validate the setup:

xbill@penguin:~/a2a-benchmark$ source init.sh
Project ID file found, skipping.
--- Authentication Method ---
Do you want to use a Gemini API Key for authentication? (y/n): n
WARNING: Your active project does not match the quota project in your local Application Default Credentials file. This might result in unexpected quota issues.

To update your Application Default Credentials quota project, use the `gcloud auth application-default set-quota-project` command.
Updated property [core/project].
--- Setup complete ---

The set_env.sh script is provided to set common ADK environment variables:

xbill@penguin:~/a2a-benchmark$ source set_env.sh
--- Setting Google Cloud Environment Variables ---
Checking gcloud authentication status...
gcloud is authenticated.
Exported PROJECT_ID=comglitn
Exported PROJECT_NUMBER=1056842563084
Exported SERVICE_ACCOUNT_NAME=1056842563084-compute@developer.gserviceaccount.com
Exported GOOGLE_CLOUD_PROJECT=comglitn
Exported GOOGLE_GENAI_USE_VERTEXAI=TRUE
Exported GOOGLE_CLOUD_LOCATION=us-central1
Exported REPO_NAME=
Exported REGION=us-central1
--- Environment setup complete ---
xbill@penguin:~/a2a-benchmark$

Debugging API Permission Errors

If your application default credentials expires or your Google Cloud Authentication expires you will get an error. The workaround is to re-authenticate:

gcloud auth login
gcloud auth application-default login

Another common error is that the environment variables are not set correctly. Go the the root directory and re-run the set_env.sh to set the variables:

cd ~/adk-hello-world-a2a
source set_env.sh

A2A Debugging Tools -A2A Inspector

The A2A Inspector is a standalone tool that provides low level visibility into the A2A protocol. The GitHub is available here:

GitHub - a2aproject/a2a-inspector: Validation Tools for A2A Agents

A summary of the features of the A2A inspector can be found here:

A2A Protocol Documentation

To install the A2A Inspector:

cd ~
git clone https://github.com/a2aproject/a2a-inspector

Then follow the build instructions — you need uv , and a recent version of node :

a2a-inspector/README.md at main · a2aproject/a2a-inspector

Staring the A2A Inspector

Once the A2A inspector has been installed- you can validate the installation by using this URL:

http://127.0.0.1:5001/

Extending ADK Agents with A2A

The ADK provides several key tools to allow standard ADK Agents to run as standalone A2A agents — without the ADK — either in A2A Client or A2A Server mode. The Python ADK includes libraries and samples to extend a standard ADK agent to enable A2A protocol features. Instead of running the agent inside the ADK web utility- the agents are dual purposed with A2A to be able to run as dedicated agents with their own embedded Uvicorn web server.

I am confused- How Does this all Work?

When Agents are run in ADK mode — the ADK CLI or Web interface is used to directly interact with the Agents. The ADK UI is started in a well known port — usually 8000 and the Agents are accessed in that environment.

The ADK does not automatically expose the agent as an A2A agent. The basic agent code from the ADK needs to be extended and enabled to run as a standalone A2A Agent. Without the additional A2A function calls and a active standalone web server- the Agents will not be usable in A2A mode.

Mersenne Primes

In mathematics, a Mersenne prime is a prime number that is one less than a power of two. As of 2025, 52 Mersenne primes are known. An interesting thing about Mersenne primes is that they are the easiest natural numbers to prove to be primes, so they make up the largest category on the list of known prime numbers. This example was chosen as a good problem as the computation is CPU bound and gets exponentially longer for each higher number.

A2A Cross Language Agents

The a2a_benchmark Repo has sample scripts for running the ADK and various types of agents across programming languages. Each agent implements a Mersenne Prime Number generator with only basic optimization. The root directory of the a2a-benchmark directory contains several common agent development languages. These include:

benchmark-go (GO) A2A Server
benchmark-python (Python) A2A Server
benchmark-node (JavaScript) A2A Server
benchmark-rust (Rust) A2A Server
benchmark-master (Python) A2A Client / MCP Server

A2A Mersenne Prime Number Agent (GO)

This agent provides a basic Agent that generates Mersenne prime numbers. To run the agent — run the bench-go.sh script:

Running the Go project...
{"time":"2025-11-25T13:47:18.071713651-05:00","level":"INFO","msg":"Using Model ","model":"gemini-2.5-flash"}
{"time":"2025-11-25T13:47:18.07195886-05:00","level":"INFO","msg":"Starting A2A mersenne prime server","port":"8102"}
{"time":"2025-11-25T13:47:18.072116547-05:00","level":"INFO","msg":"Starting the web server: &{port:8102 writeTimeout:15000000000 readTimeout:15000000000 idleTimeout:60000000000}"}
{"time":"2025-11-25T13:47:18.072119644-05:00","level":"INFO","msg":""}
{"time":"2025-11-25T13:47:18.072121732-05:00","level":"INFO","msg":"Web servers starts on http://localhost:8102"}
{"time":"2025-11-25T13:47:18.072123829-05:00","level":"INFO","msg":" a2a: you can access A2A using jsonrpc protocol: http://localhost:8102"}
{"time":"2025-11-25T13:47:18.072125402-05:00","level":"INFO","msg":""}

The A2A inspector can be used to validate the Agent:

"id": "adk-88a8aefa-3070-48ff-b8d1-631af65ea57e",
  "name": "generate_primes",
  "response": {
    "result": "Elapsed time: 307.932µs"
  }

A2A Mersenne Prime Number Agent (Python)

This agent provides Python agent that implements the Mersenne generation algorithm.

This Agent can be checked with the ADK web interface:

what do you do

I can calculate Mersenne primes using the Lucas-Lehmer primality test. I can find the list of the first N Mersenne primes.

find the first 10 primes

It took 0.0002677440643310547 seconds to find the first 10 Mersenne primes.

To start the dedicated A2A version of the Python prime number generator Agent use the bench-python.sh script:

xbill@penguin:~/a2a-benchmark$ source bench-python.sh

INFO: Started server process [10495]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8101 (Press CTRL+C to quit)

A2A Mersenne Prime Number Agent (Node)

This agent provides a minimal agent to generate a prime number. To run the Agent use the bench-node.sh script:

/home/xbill/a2a-benchmark/benchmark-node
staring a2a bench node generate prime

The Agent can be validated with the A2A inspector on port 8103:

A typical session will look something like this:

what do you do
message
Found first 5 Mersenne primes in 0.09ms.

✅
find 10 primes
message
Found first 10 Mersenne primes in 0.81ms.

✅

A2A Mersenne Prime Number Agent (Rust)

The Mersenne primes algorithm was newly implemented in Rust:

A sample test script verified the A2A Agent skill:

xbill@penguin:~/a2a-benchmark$ source test-rust.sh 
Checking if Rust A2A agent is running on port 8104...
Rust agent is already running.
Running Test 1: Fetching Agent Card...
Agent Card Response:
{
  "name": "Mersenne Prime Agent Rust",
  "description": "A rust agent that builds a list of the first n Mersenne primes and reports the elapsed time. Configured with model: Not specified.",
  "protocolVersion": "0.3.0",
  "version": "0.1.0",
  "url": "http://0.0.0.0:8104/",
  "skills": [
    {
      "id": "find-mersenne-rust",
      "name": "Find Mersenne Primes in rust",
      "description": "Finds the list of the first n Mersenne primes in Rust",
      "tags": [
        "math",
        "benchmark"
      ]
    }
  ],
  "capabilities": {},
  "defaultInputModes": [],
  "defaultOutputModes": []
}
✅ Test 1 Passed: Agent Card is valid.
Running Test 2: Sending message/send request to calculate 5 Mersenne primes...
Response:
{
  "id": 1,
  "jsonrpc": "2.0",
  "result": {
    "contextId": "test-context-123",
    "kind": "message",
    "messageId": "f20c704c-bafa-4081-a18d-8d7984d4213c",
    "parts": [
      {
        "kind": "text",
        "text": "Found first 5 Mersenne primes in 0.11ms."
      }
    ],
    "role": "agent"
  }
}
✅ Test 2 Passed: Successfully calculated primes.
Running Test 3: Sending invalid RPC method...
Error Response:
{
  "error": {
    "code": -32601,
    "message": "Method not found: invalid/method"
  },
  "id": 2,
  "jsonrpc": "2.0"
}
✅ Test 3 Passed: Invalid method was rejected successfully.
All tests passed successfully!

MCP Master/Orchestrator Agent (Python)

The final agent follows a slightly different pattern. It provides a minimal agent with several functions (Tools/Skills). It has one root_agent and 3 remote agents connected over A2A. This agent also exposes an interface as a MCP server- allowing Antigravity CLI to be used as a MCP client.

The Master Agent is started with a similar script:

╭──────────────────────────────────────────────────────────────────────────────╮                        
                       │ │                        
                       │ ▄▀▀ ▄▀█ █▀▀ ▀█▀ █▀▄▀█ █▀▀ █▀█ │                        
                       │ █▀ █▀█ ▄▄█ █ █ ▀ █ █▄▄ █▀▀ │                        
                       │ │                        
                       │ FastMCP 2.13.1 │                        
                       │ │                        
                       │ │                        
                       │ 🖥 Server name: benchmark │                        
                       │ │                        
                       │ 📦 Transport: HTTP │                        
                       │ 🔗 Server URL: http://127.0.0.1:8100/mcp │                        
                       │ │                        
                       │ 📚 Docs: https://gofastmcp.com │                        
                       │ 🚀 Hosting: https://fastmcp.cloud │                        
                       │ │                        
                       ╰──────────────────────────────────────────────────────────────────────────────╯                        


[11/25/25 14:17:36] INFO Starting MCP server 'benchmark' with transport 'http' on [http://127.0.0.1:8100/mcp](http://127.0.0.1:8100/mcp) server.py:2055
INFO: Started server process [21826]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on [http://127.0.0.1:8100](http://127.0.0.1:8100) (Press CTRL+C to quit)

Antigravity CLI with MCP Connection

The Master Agent starts a FastMCP server interface. This allows Antigravity CLI to be used as a MCP client:

xbill@penguin:~/a2a-benchmark/.agents$ more mcp_config.json 
{
  "mcpServers": {
    "bench": {
      "serverUrl": "http://127.0.0.1:8100/mcp",
      "serverURL": "http://127.0.0.1:8100/mcp"
    }
  }
}

ADK Validation of the Master Agent

Because it is Python based- the Master Agent can also be viewed with the ADK web interface:

So What is all this Doing?

All of the setup and configuration allows you to start debugging and visualizing Agent flows using the A2A protocol. The Agents can be validated using the A2A Inspector. The Python Agents can be checked with the ADK web interfaces. The Master Agent can be validated with Antigravity CLI as a MCP client.

Time to Starting Running some Numbers!

Now that the individual Agents have been validated — open several Terminal windows and start up all the remaining agents:

benchmark-go.sh — (port 8102)
benchmark-python.sh-(port 8101)
benchmark-node.sh- (port 8103)
benchmark-rust.sh- (port 8104)
benchmark-master.sh — Orchestrator Agent (port 8100)

Start the Benchmarking Session

Once all the agents have been started — the Master agent on port 8100 is connected over MCP with Antigravity CLI:

  I have completed the benchmark run across all four language agents for primes 1 to 22.

  ### Summary of Actions Taken:

  1. Agent Startup & Verification: Automatically checked and started all four target language agents (Node.
  js/TypeScript, Rust, Go, Python) and verified they were active and listening on their respective A2A JSON-RPC ports.
  2. Benchmark Scripting: Created a customized benchmarking script run_benchmark_1_22.py that targets prime calculations
from
  $N = 1$ to $N = 22$.
  3. Execution: Ran the benchmark and recorded the exact computation time (within the tool call) and the total Agent-to-
  Agent (A2A) network/orchestration latency (RTT) for every prime count $N$.
  4. Data Plotting: Generated visualizations for both calculation times ( prime_calculation_times_1_22.png ) and
  overall latency ( a2a_latency_times_1_22.png ).
  5. Reporting: Compiled all calculations and visualizations into the analysis_results.md artifact.

  Please view analysis_results.md to inspect the table and embedded plots showing the comparison of calculation speeds
and
  latency characteristics!

Review Performance

The results of the benchmark are graphed:

With latency times:

Final Results

Antigravity CLI summarized the results of the Agent Benchmarking:

  Formulating Key Findings
  Here is a summary of the key findings from the benchmark results for primes 1 to 22:

  ### 1. Core Computation Performance (Lucas-Lehmer Prime Calculation)

  • Rust is the fastest: Computes $N=22$ in 195.96 ms.
  • Node.js (TS) is extremely close: Computes $N=22$ in 214.06 ms.
  • Go performs well: Computes $N=22$ in 279.54 ms.
  • Python is the slowest: Computes $N=22$ in 572.48 ms (roughly 3x slower than Rust).

  ### 2. A2A Framework & Protocol Overhead

  • Minimal Overhead (Rust & Node.js):
      • Rust and Node.js use lightweight custom HTTP wrappers. Their framework overhead is practically negligible (< 2 
      ms). For example, Node's total round-trip time (RTT) for $N=22$ is 215.78 ms (only 1.72 ms higher than the raw
      calculation time).
  • Significant Base Overhead (Go & Python):
      • Both Go and Python use the official google-adk library. This introduces a fixed baseline overhead per request
      (schema validation, logging, and metadata context processing).
      • Go incurs a consistent baseline RTT of ~1.2s to 1.6s.
      • Python incurs a consistent baseline RTT of ~2.3s to 3.0s.

and a deeper dive:

  Analyzing Agent Initialization
  Based on an analysis of the codebase and execution paths, here is the breakdown of why the benchmark produced these
  results:
  ──────
  ### Factor 1: The Massive Latency (RTT) Difference (Framework vs. Mocking)

  The most striking difference is that Rust and Node.js have near-zero overhead, whereas Go takes ~1.5s and Python     
  takes
  ~2.5s. This is entirely due to how the agents process the benchmark messages:

  1. Rust and Node.js (Mocked Agent Logic / Direct Execution):
      • Both main.rs and server.ts implement custom JSON-RPC endpoints that intercept incoming queries and
      bypass the LLM entirely.
      • They use simple regex to extract the number of primes directly from the text "Calculate the first {N} Mersenne
      primes" , execute the math, and return a mock agent response message. Since there are no LLM API calls, the
      overhead is practically zero.
  2. Go and Python (Real Agentic Execution via ADK):
      • Both Go (main.go) and Python (agent.py) use the official Agent Development Kit ( google-adk ) and
      hook up the Gemini 2.5 Flash model.
      • When they receive a message, the ADK runner executes a full agent reasoning loop. This involves two sequential 
      remote LLM calls (one to determine which tool to call, and another to formulate the final natural language
      response).
      • This remote LLM invocation via the internet accounts for the 1.5–3.0 second latency overhead.

  ──────
  ### Factor 2: Mathematical Computation Speeds (Lucas-Lehmer Loop)

  At higher values (e.g., $N=22$, where the Mersenne number $2^{9941}-1$ has 2,993 decimal digits and the Lucas-Lehmer
  test requires 9,939 iterations of squaring and modular reductions), we see clear performance tiers:

  1. Rust (195.96 ms):
      • Rust is compiled to optimized native machine code (using --release ). The num-bigint crate executes the big
      integer arithmetic without garbage collection overhead, making it the most efficient compiled loop.
  2. Node.js (214.06 ms):
      • V8's JIT compiler is extremely good at compiling JavaScript loops to native machine instructions. Additionally,
      V8's native BigInt implementation is heavily optimized at the engine level (written in assembly/C++).
  3. Go (279.54 ms):
      • While Go compiles to native code, its implementation in main.go performs heap allocations inside the hot
      loop:
        a := new(big.Int).Rsh(s, uint(p))
        b := new(big.Int).And(s, m)
      Allocating new big.Int objects inside a loop that runs thousands of times puts extra pressure on Go's runtime
      allocator and garbage collector, dragging down performance compared to Rust.
  4. Python (572.48 ms):
      • Python handles arbitrary-precision integers natively, but executing the loop in interpreted bytecode has a lot
      of interpreter overhead compared to compiled or JIT-compiled languages. 9,939 iterations of the loop in bytecode
      is the bottleneck.

Core Findings

For low numbers of Mersenne primes that execute quickly- all four languages are comparable. An unusual result was the good performance in the mid range from Node. This could have been from lower overhead of the Node A2A toolkit or potentially from JavaScript engine optimizations. As expected, GO and Rust did break out and delivered the best performance as the complexity scaled.

Summary

The goal of the demo/article was to get basic Agents implemented across multiple programming languages and benchmark the Agent performance finding Mersenne primes.

The Google Agent development kit (ADK) was presented along with the complimentary A2A (Agent to Agent) protocol. Three basic agents were presented — covering various combinations of programming languages and Agent implementation approaches.

Finally — a Master/Orchestrator agent was started to connect and delegate to the other agents via the A2A protocol. Antigravity CLI was used to connect to the Master Agent over MCP and execute the benchmarks.

Top comments (2)

xulingfeng • May 31

Multi-language agent benchmarking is a space we've been poking at too — our deep-test framework evaluates model outputs across different language runtimes for consistency. One thing I'd be curious about: did you measure variance across multiple runs per language, or was this more of a single-shot comparison? Runtime can fluctuate quite a bit depending on GC pressure and system load, especially across interpreted vs compiled languages.

Followed you 👀

xbill Google Developer Experts • May 31

single shot just to get a rough idea of the results. there is definitely some value on a deeper dive across multiple runs to check for consistency