xbill for Google Developer Experts

Posted on May 7 • Originally published at xbill999.Medium on May 6

Building ADK Agents with Phi Models on Azure Foundry

#microsoftphi4 #agents #googleadk #gemini

This article provides a step by step deployment guide for using Azure foundry models with ADK Agents. The small Phi-4 model hosted on Azure Foundry is configured with LiteLLM for use from the ADK Agent.

What is this project trying to Do?

This project aims to configure an ADK agent to use an Azure Foundry model. The ADK provides agent support in standard Python code.

Lite LLM

LiteLLM is an open-source AI gateway and Python SDK that provides a unified OpenAI-compatible interface to over 100 LLMs (Anthropic, Gemini, Azure, Bedrock, Ollama). It simplifies API management by allowing users to call any model using the same format, providing features like spend tracking, model fallbacks, key management, and load balancing.

More info is here:

GitHub - BerriAI/litellm: Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]

Azure Foundry

Azure AI Foundry (formerly Azure AI Studio) is a unified platform-as-a-service for developing, customizing, and managing AI agents and applications. It serves as an AI “factory” combining model catalogs (including OpenAI and Hugging Face), low-code tools, and enterprise-grade security for building generative AI apps. [1, 2, 3, 4]

More details are available here:

https://azure.microsoft.com/en-us/products/ai-foundry

Why would I want Gemini CLI with Azure? Isn’t that a Google Thing?

Yes- Gemini CLI leverages the Google Cloud console and Gemini models but it is also open source and platform agnostic. Many applications are already cross-cloud so this enables familiar tools to be run natively on Microsoft Azure.

Azure CLI

The Azure Command-Line Interface (CLI) is a cross-platform tool used to connect to Azure and execute administrative commands on Azure resources. It allows you to manage services like virtual machines, databases, and networking through a terminal using interactive prompts or scripts. [1, 2, 3]

More details are available here:

What is the Azure Developer CLI?

Gemini CLI

If not pre-installed you can download the Gemini CLI to interact with the source files and provide real-time assistance:

npm install -g @google/gemini-cli

Testing the Gemini CLI Environment

Once you have all the tools and the correct Node.js version in place- you can test the startup of Gemini CLI. You will need to authenticate with a Key or your Google Account:

xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent$ gemini

▝▜▄ Gemini CLI v0.39.1
   ▝▜▄
  ▗▟▀ Signed in with Google /auth
 ▝▀ Plan: Gemini Code Assist Standard /upgrade

Node Version Management

Gemini CLI needs a consistent, up to date version of Node. The nvm command can be used to get a standard Node environment:

GitHub - nvm-sh/nvm: Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions

Python MCP Documentation

The official GitHub Repo provides samples and documentation for getting started:

GitHub - modelcontextprotocol/python-sdk: The official Python SDK for Model Context Protocol servers and clients

Agent Development Kit

The Google Agent Development Kit (ADK) is an open-source, Python-based framework designed to streamline the creation, deployment, and orchestration of sophisticated, multi-agent AI systems. It treats agent development like software engineering, offering modularity, state management, and built-in tools (like Google Search) to build autonomous agents.

The ADK can be installed from here:

Agent Development Kit (ADK)

Agent Skills

Gemini CLI can be customized to work with ADK agents. Both an Agent Development MCP server, and specific Agent skills are available.

More details are here:

Agent Development Kit (ADK)

The ADK documentation is available as a MCP server:

> /mcp list
Configured MCP servers:
🟢 adk-docs-mcp (from adk-docs-ext) - Ready (2 tools)
  Tools:
  - mcp_adk-docs-mcp_fetch_docs
  - mcp_adk-docs-mcp_list_doc_sources

Where do I start?

The strategy for starting multi agent development is a incremental step by step approach.

First, the basic development environment is setup with the required system variables, and a working Gemini CLI configuration.

Then, and ADK agent is built, debugged, and tested locally.

Setup the Basic Environment

At this point you should have a working Python environment and a working Gemini CLI installation. The next step is to clone the GitHub samples repository with support scripts:

cd ~
git clone https://github.com/xbill9/gemini-cli-azure

Then run init.sh from the cloned directory.

The script will attempt to determine your shell environment and set the correct variables:

cd adk-litellm
source init.sh

If your session times out or you need to re-authenticate- you can run the set_env.sh script to reset your environment variables:

source set_env.sh

Variables like PROJECT_ID need to be setup for use in the various build scripts- so the set_env script can be used to reset the environment if you time-out.

Azure Setup

The Azure CLI provides the basic tools for working with the cloud based services:

Check the installation:

xbill@penguin:~/gemini-cli-azure$ az --version
azure-cli 2.86.0

core 2.86.0
telemetry 1.1.0

Dependencies:
msal 1.35.1
azure-mgmt-resource 24.0.0

Python location '/opt/az/bin/python3'
Config directory '/home/xbill/.azure'
Extensions directory '/home/xbill/.azure/cliextensions'

Python (Linux) 3.13.13 (main, May 1 2026, 00:24:43) [GCC 12.2.0]

Legal docs and information: aka.ms/AzureCliLegal

Your CLI is up-to-date.

Once the tools are installed — login to the console:

az login

Then run the Azure setup scripts:

./setup-azure-phi.sh

./get-azure-phi-creds.sh

The Azure console should look something like this:

Running the Python Code

First- switch the directory with the Python version of the MCP sample code. Run the release version on the local system:

cd adk-litellm$ make install
Processing ./.

Verify The ADK Installation

First- verify the ADK is installed:

xbill@penguin:~/gemini-cli-aws/adk-litellm$ adk --version
adk, version 1.32.0

To check the ADK setup, run the ADK CLI locally with the hello Agent. This is a basic Hello World style agent. No external tools or MCP calls are used in the Agent code- which allows the ADK installation to be checked:

xbill@penguin:~/gemini-cli-azure/adk-litellm-azure$ adk run agents/azure
Log setup complete: /tmp/agents_log/agent.20260505_133501.log
To access latest log: tail -F /tmp/agents_log/agent.latest.log
/home/xbill/.local/lib/python3.13/site-packages/google/adk/cli/cli.py:204: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.local/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
Running agent azure_agent, type exit to exit.

13:35:04 - LiteLLM:INFO: utils.py:4011 - 
LiteLLM completion() model= phi-4-mini; provider = azure_ai

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

13:35:14 - LiteLLM:INFO: utils.py:4011 - 
LiteLLM completion() model= phi-4-mini; provider = azure_ai
[azure_agent]: Hello! How can I assist you today?
[user]: what are your model details
13:35:23 - LiteLLM:INFO: utils.py:4011 - 
LiteLLM completion() model= phi-4-mini; provider = azure_ai
[azure_agent]: Hello! I am Phi, developed by Microsoft, and I'm here to help you with information, answer questions, and assist with a wide range of topics. How may I be of service to you?



  
  
  Test The ADK Web Interface


This step tests the ADK agent interactions with a browser:



xbill@penguin:~/gemini-cli-azure/adk-litellm-azure/agents$ adk web --host 0.0.0.0
2026-05-05 13:39:05,375 - INFO - service_factory.py:266 - Using in-memory memory service
2026-05-05 13:39:05,375 - INFO - local_storage.py:84 - Using per-agent session storage rooted at /home/xbill/gemini-cli-azure/adk-litellm-azure/agents
2026-05-05 13:39:05,376 - INFO - local_storage.py:110 - Using file artifact service at /home/xbill/gemini-cli-azure/adk-litellm-azure/agents/.adk/artifacts
/home/xbill/.local/lib/python3.13/site-packages/google/adk/cli/fast_api.py:204: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.local/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
INFO: Started server process [15609]
INFO: Waiting for application startup.

+-----------------------------------------------------------------------------+
| ADK Web Server started |
| |
| For local testing, access at http://0.0.0.0:8000. |
+-----------------------------------------------------------------------------+



    

    







Then use the web interface — either on the local interface 127.0.0.1 or the catch-all web interface 0.0.0.0 -depending on your environment:



Special note for Google Cloud Shell Deployments- add a CORS allow_origins configuration exemption to allow the ADK agent to run:




adk web --host 0.0.0.0 --allow_origins 'regex:.*'



    

    








  
  
  Phi Models on Azure


Microsoft Phi models are a family of highly capable, cost-effective Small Language Models (SLMs) designed for low-latency generative AI, available on Azure AI Foundry. They support text, image, and audio inputs (Phi-4-multimodal) and are ideal for memory-constrained environments. Key models include Phi-4, Phi-3.5-MoE, and Phi-3-mini/medium, optimized for high-performance reasoning tasks.

More details are here:




  
  
  Deep Dive with a Phi Model


Once the connection has been tested. You can drill into the model:




xbill@penguin:~/gemini-cli-azure/adk-litellm-azure$ adk run agents/azure
Log setup complete: /tmp/agents_log/agent.20260505_134754.log
To access latest log: tail -F /tmp/agents_log/agent.latest.log
/home/xbill/.local/lib/python3.13/site-packages/google/adk/cli/cli.py:204: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.local/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
Running agent azure_agent, type exit to exit.
[user]: what model are you
13:48:00 - LiteLLM:INFO: utils.py:4011 - 
LiteLLM completion() model= phi-4-mini; provider = azure_ai
[azure_agent]: I am Phi, an AI developed by Microsoft, designed to help you with a wide range of queries and tasks. How can I assist you today?



and deep model details:



10:29:49 - LiteLLM:INFO: utils.py:4011 - 
LiteLLM completion() model= phi-4-mini; provider = azure_ai
[azure_agent]: My model knowledge was last updated in October 2023. Please note that I can't access or retrieve real-time data, so my information may not include events or developments that occurred after this date. For the most current and accurate information, I recommend referring to the latest sources.



  
  
  One Last Code Review with Gemini CLI


Gemini CLI was used to summarize the project:



> code review adk agent for phi model                                                                                                            

✦ I've reviewed the Azure agent implementation in agents/azure/agent.py. Overall, the implementation correctly utilizes the ADK (Agent Development
  Kit) and integrates with Azure AI Foundry via LiteLLM.

  🔍 Code Review Findings

  1. ADK Best Practices
   * Missing description: While the name is correctly defined, the description parameter is missing. In ADK, the description is critical for
     multi-agent systems, as it helps routers and other agents understand when to delegate tasks to this specific agent.
   * Minimal instruction: The current instruction is very brief. ADK best practices suggest using Markdown for clarity and being more specific
     about the agent's persona and constraints.
   * Short request_timeout: The timeout is set to 10 seconds. For LLM inferences, especially via serverless endpoints which may have occasional
     cold starts or latency, 30 seconds is generally more reliable.

  2. Azure AI Foundry Integration
   * Model String: The model string azure_ai/phi-4-mini is correctly formatted for LiteLLM to recognize the Azure AI Foundry provider.
   * Environment Variables: The code correctly leverages AZURE_MODEL with a fallback. It assumes AZURE_AI_API_KEY and AZURE_AI_API_BASE are set in
     the environment, which matches the setup-azure-phi.sh automation script.

  3. Code Quality
   * Imports: Clean and follows standard ADK patterns.
   * Encapsulation: Using root_agent as the export in agent.py (and re-exporting in __init__.py) is the standard pattern for ADK's discovery
     mechanism.




    

    








  
  
  Summary


The strategy for using the ADK with Azure LLMs was validated with a incremental step by step approach.

First- the basic environment tools were configured. Then the required Python libraries were installed. LiteLLM was used to proxy the Google ADK to a Model hosted on Azure. Finally, some tests were run from the CLI and the Web interface. Gemini CLI was used to validate the approach.

Top comments (7)

Mykola Kondratiuk • May 15

ran into this exact setup when swapping models mid-project - LiteLLM’s unified interface saved a full rewrite. curious how Phi-4 latency holds up vs GPT-4o mini for multi-turn agent tasks

xbill Google Developer Experts • May 15

Its on my to-do list. There is probably enough material for a write-up on a smaller model review across the top 4-5- but it is just running the tests and the writeup

Mykola Kondratiuk • May 15

the hard part is never the tests - it's deciding what to actually test that has meaning. at my scale the signal that mattered most ended up being per-call cost under concurrent load, not latency in isolation

xbill Google Developer Experts • May 15

interesting!

Mykola Kondratiuk • May 15

yeah, the webhook problem is the real blocker - you can't measure what isn't emitting. ends up meaning you measure behavior proxies instead of tool usage directly, which honestly surfaces different and sometimes more useful signals.

PracHub • May 9

Using LiteLLM for managing API calls across various LLMs is pretty nifty. I'm curious about how well the spend tracking works when running multiple agents at scale. By the way, if you're getting ready for technical interviews, especially the coding sections, prachub.com might be useful. Their question bank often matches what appears in interviews.

xbill Google Developer Experts • May 9

Cost management at scale is beyond a simple getting started proof of concept- but I am sure there are some cost management plays - for most tasks the models are interchangeable.