This article is part of Azure Spring Clean 2025 Series
AI Agents on the Fly: AI Agents into Web Apps
The technology industry has witnessed a remarkable progression in service delivery models, from traditional on-premises solutions to increasingly sophisticated cloud-based offerings. As artificial intelligence (AI) reaches new heights, we're now entering the era of Agents on demand - a revolutionary approach to Agent as a Service (AaaS), representing the next frontier in the as-a-service evolution.
Agents on Demand: A Cultural Perspective
The concept of summoning specialized agents on demand has long captured our imagination in popular culture:
- Mission Impossible's IMF Team: Highly specialized agents assembled for specific missions
- The A-Team: "If you have a problem, if no one else can help, and if you can find them..."
- James Bond's Q Branch: On-demand technical expertise and gadgets
- Men in Black: Specialized agents appearing exactly when needed
- Ghost Protocol: Agents operating independently when official channels are down
These cultural touchstones reflect our enduring fascination with the idea of specialized agents ready to tackle specific challenges at a moment's notice - a concept now becoming reality with AI.
The As-a-Service Evolution
1. Infrastructure as a Service (IaaS)
- Basic building blocks of cloud computing
- Virtual machines, storage, and networking
- Pay-as-you-go infrastructure
2. Platform as a Service (PaaS)
- Development and deployment platforms
- Managed runtime environments
- Simplified application lifecycle management
3. Software as a Service (SaaS)
- Ready-to-use applications
- Subscription-based model
- Minimal setup and maintenance
4. Functions as a Service (FaaS)
- Event-driven compute execution
- Serverless architecture
- Automatic scaling of individual functions
- Pay only for execution time
- Popular implementations like AWS Lambda, Azure Functions
5. Agent as a Service (AaaS)
The newest paradigm combining autonomous AI agents with cloud-based delivery, representing a significant leap forward in service automation and intelligence.
Agent as a Service represents a paradigm shift in how we deploy and utilize AI agents. Unlike traditional static AI services, these agents are:
- Instantly Available: Spin up exactly when needed
- Purpose-Built: Configured for specific tasks or domains
- Ephemeral: Exist only for the duration of their mission
- Resource Efficient: Consume resources only when active
- Highly Specialized: Optimized for their designated purpose
What Exactly is AaaS?
AaaS leverages AI agents capable of performing tasks independently. These agents use machine learning and natural language processing (NLP) to respond to user requests, automate complex workflows, and improve operational efficiency. For newcomers, machine learning refers to algorithms that learn patterns from data, while NLP (Natural Language Processing) enables agents to understand and generate human language. Similar to SaaS, AaaS relies on cloud-based delivery, making it scalable and accessible for businesses of any size.
The Anatomy of an AI Agent
To understand AaaS, it helps to know what makes up an AI agent:
- Reasoning Engine: Powered by large language models (LLMs), this is the brain that processes inputs and makes decisions.
- Knowledge Base: A repository of data (similar to an organized database) that the agent pulls from to complete tasks.
- Memory: Short-term memory that helps agents manage ongoing conversations or tasks (comparable to session storage in Azure Functions).
- Tools and Actions: External APIs or software that agents use to execute tasks (much like invoking Azure Function Apps for specific operations).
- Planning Module: Breaks down high-level goals into manageable steps (comparable to task orchestrators like Azure Logic Apps).
AI Agent Considerations
- Knowledge: Providing agents with the right context
- Actions: Giving agents access to the tools needed to complete tasks
- Security: Ensuring agents have access to only the data and services they need
- Evaluation: Ensuring agents complete tasks correctly
Azure AI Agent Service Overview
Azure AI Agent Service is a fully managed service that simplifies the development, deployment, and scaling of AI agents. It eliminates the complexity of managing infrastructure while providing robust tools for building extensible AI agents.
Managed agentmicro-services
Used for Rapid development and automation, Extensive data connections, Flexible model selection, Enterprise-grade security.
Key Features
-
Automatic Tool Calling
- Server-side handling of tool calls
- Simplified integration with external services
- Streamlined workflow automation
-
Secure Data Management
- Thread-based conversation state management
- Built-in security features
- Compliant data handling
-
Pre-built Tool Integration
- Bing integration
- Azure AI Search compatibility
- Azure Functions support
- Code interpreter tools
- File handling capabilities
Infrastructure Management
- Automated provisioning
- Dynamic scaling
- Load balancing
- Resource optimization
Security and Compliance
- Identity and access management
- Network isolation
- Encryption at rest and in transit
- Compliance certifications
Operational Excellence
- Automated updates
- Performance monitoring
- Disaster recovery
- Backup management
Benefits of Using Azure AI Agent Service
-
Serverless Operations
- No infrastructure management
- Automatic scaling from zero
- Pay-per-execution pricing
- Built-in fault tolerance
- Instant deployment
- Global availability
-
Simplified Development
- Reduced code complexity
- Fast implementation
- Built-in best practices
-
Enterprise-Grade Security
- Managed authentication
- Secure data handling
- Compliance-ready infrastructure
-
Scalability
- Automatic resource management
- Elastic computing capabilities
- Cost-effective scaling
-
Simplified Operations
- Zero-touch deployments
- Automated scaling
- Built-in monitoring
- Simplified maintenance
-
Enterprise-Grade Architecture
- Microservices design
- Container orchestration
- Service mesh integration
- Event-driven patterns
-
Cost Optimization
- Pay-per-use pricing
- Resource optimization
- Automatic scaling
- Shared infrastructure
Key Components
Understanding these components is crucial for effective implementation:
- Agent: The custom AI that utilizes AI models and tools; Large Language Model with defined instructions and tools
- Thread: A conversation session between agent and user; manages and truncates messages.
- Message: Content created by either agent or user
- Run: The activation of an agent based on thread contents and configured tools
- Tools: Extensions that enhance agent capabilities; services and functions that extend the agent's ability
AI Agent Service Steps
Step 1: Create an Agent - provide what the agent needs, like tools and documents to search. Or use the agent id to activate.
Step 2:Create a Thread - track session
Step 3:Run the Agent - send the message, including attachments if necessary
Step 4:Check the Run status - is the process complete?
Step 5:Display theAgentโs Response - read the response including generated images,
Step 6: Clean up if necessary - Delete the Agent if it's not needed
Sample Project: Agent On The Fly, a web app with agents
Visit GitHub repository
The file AgentOnTheFly.py
demonstrates a practical blueprint for embedding AI agents into a web application that combines retrieval-augmented generation (RAG) with on-the-fly code generation. This pattern works as follows:
- Dynamic Agent Instantiation: Upon uploading a document, the application automatically instantiates an AI agent that creates a vector search index to extract contextual information from the file.
- Session-Based Context: The agent establishes session context that preserves state during interaction, ensuring that relevant data is available throughout the conversation.
- On-the-Fly Code Generation: By processing user instructions, the agent can generate and execute Python code in real time to perform analyses or create visual outputs.
- Automated Resource Cleanup: After completing its taskโsuch as downloading a generated imageโthe agent terminates, ensuring efficient resource management.
This approach embodies an "agent for hire" model on the fly: you upload a document and the system dynamically creates the necessary backend processes to provide intelligent, context-aware assistance exactly when needed.
1. RAG Implementation
The Retrieval-Augmented Generation capability allows agents to dynamically access and utilize knowledge from uploaded documents, enhancing their responses with contextual information.
# Upload the file
file = project_client.agents.upload_file_and_poll(
file_path=temp_file_path,
purpose=FilePurpose.AGENTS
)
print(f"Uploaded file, file ID: {file.id}")
# Create vector store if not exists or if new file is uploaded
if not st.session_state["vector_store_id"]:
vector_store = project_client.agents.create_vector_store_and_poll(
file_ids=[file.id],
name=f"vectorstore_{file_obj.name}"
)
st.session_state["vector_store_id"] = vector_store.id
print(f"Created vector store, vector store ID: {vector_store.id}")
# Create file search tool
file_search_tool = FileSearchTool(vector_store_ids=[st.session_state["vector_store_id"]])
agent = project_client.agents.create_agent(
model=model,
name="rag-agent",
instructions="You are a helpful agent which provides answer ONLY from the search.",
tools=file_search_tool.definitions,
tool_resources=file_search_tool.resources,
)
st.session_state["rag_agent_id"] = agent.id
print(f"Created agent, agent ID: {agent.id}")
# Create thread and message
thread = project_client.agents.create_thread()
message = project_client.agents.create_message(
thread_id = thread.id,
role = "user",
content = prompt
)
print(f"Created message, message ID: {message.id}")
# Run the agent
run = project_client.agents.create_and_process_run(
thread_id=thread.id,
assistant_id=st.session_state["rag_agent_id"]
)
# Check the run status
if run.status == "failed":
progress_bar.empty()
return f"Run failed: {run.last_error}"
# Get the last message from the agent
messages = project_client.agents.list_messages(thread_id=thread.id)
2. Code Interpretation Capabilities
The code interpretation feature allows AI agents to generate and execute Python code in real-time, enabling data analysis, visualization, and other programmatic tasks based on user requests.
# Initiate AI Project client
project_client = AIProjectClient.from_connection_string(
credential = DefaultAzureCredential(),
conn_str = conn_str
)
# Add Code Interpreter to the Agent's ToolSet
toolset = ToolSet()
code_interpreter_tool = CodeInterpreterTool()
toolset.add(code_interpreter_tool)
# Initiate Agent Service
agent = project_client.agents.create_agent(
model = model,
name = "code-interpreter-agent",
instructions = "You are a helpful data analyst. You can use Python to perform required calculations.",
toolset = toolset
)
print(f"Created agent, agent ID: {agent.id}")
# Create a thread
thread = project_client.agents.create_thread()
print(f"Created thread, thread ID: {thread.id}")
# Create a message
message = project_client.agents.create_message(
thread_id = thread.id,
role = "user",
content = prompt
)
print(f"Created message, message ID: {message.id}")
# Run the agent
run = project_client.agents.create_and_process_run(
thread_id = thread.id,
assistant_id = agent.id
)
# Check the run status
if run.status == "failed":
project_client.agents.delete_agent(agent.id)
print(f"Deleted agent, agent ID: {agent.id}")
progress_bar.empty()
return f"Run failed: {run.last_error}"
time.sleep(10) # Increase delay if needed
# Get the last message from the agent
messages = project_client.agents.list_messages(thread_id=thread.id)
3. Combined Approach: Code Interpretation + RAG
This combined approach integrates both capabilities:
- Document Knowledge Access: Agents can search and retrieve information from uploaded documents
- Live Code Execution: Python code is generated and run in real-time
- Data-Driven Analysis: Perform analysis on document contents directly
- Interactive Visualizations: Create charts and graphs based on document data
- Automated Problem Solving: Generate solutions that combine document knowledge with computational capabilities
project_client = AIProjectClient.from_connection_string(
credential=DefaultAzureCredential(),
conn_str=conn_str
)
# Save and upload file
temp_file_path = f"temp_{file_obj.name}"
with open(temp_file_path, "wb") as f:
f.write(file_obj.getvalue())
# Upload the file
file = project_client.agents.upload_file_and_poll(
file_path=temp_file_path,
purpose=FilePurpose.AGENTS
)
print(f"Uploaded file, file ID: {file.id}")
# Create vector store
vector_store = project_client.agents.create_vector_store_and_poll(
file_ids=[file.id],
name=f"vectorstore_{file_obj.name}"
)
print(f"Created vector store, vector store ID: {vector_store.id}")
# Setup tools
toolset = ToolSet()
code_interpreter_tool = CodeInterpreterTool()
toolset.add(code_interpreter_tool)
print("Added Code Interpreter tool")
file_search_tool = FileSearchTool(vector_store_ids=[vector_store.id])
toolset.add(file_search_tool)
print("Added File Search tool")
# Create agent with both capabilities
agent = project_client.agents.create_agent(
model=model,
name="rag-code-interpreter-agent",
instructions="You are a helpful agent that can analyze documents and generate Python code based on the document content. Use the file search to extract relevant information and then generate appropriate Python code for analysis when needed.",
toolset = toolset
)
print(f"Created agent, agent ID: {agent.id}")
# Create thread and message
thread = project_client.agents.create_thread()
message = project_client.agents.create_message(
thread_id=thread.id,
role="user",
content=prompt
)
print(f"Created message, message ID: {message.id}")
# Run the agent
run = project_client.agents.create_and_process_run(
thread_id=thread.id,
assistant_id=agent.id
)
if run.status == "failed":
project_client.agents.delete_agent(agent.id)
project_client.agents.delete_vector_store(vector_store.id)
os.remove(temp_file_path)
return f"Run failed: {run.last_error}"
# Get the last message from the agent
messages = project_client.agents.list_messages(thread_id=thread.id)
print(f"Messages: {pformat(messages)}")
Project Summary
The Agent on the Fly project demonstrates the practical implementation of Agent as a Service (AaaS) principles through a streamlined, user-friendly web application. This project showcases:
Core Capabilities
- Document Intelligence: Upload any document and instantly access an AI agent that can understand, analyze, and respond to questions about its contents
- Dynamic AI Agent Creation: Agents are created on-demand, precisely when needed, and only exist for the duration of their task
- Serverless Architecture: Built on Azure's serverless infrastructure for optimal resource utilization and scalability
- Combined RAG and Code Generation: Unique integration of retrieval capabilities with real-time code execution
- Interactive Data Analysis: Ask questions about your data and receive visual insights without writing a single line of code
Technical Implementation
The project leverages several key technologies:
- Azure OpenAI Service: Powers the underlying language model capabilities
- Azure AI Search: Enables efficient vector search and document retrieval
- Python Runtime Environment: Executes generated code within secure boundaries
- Web-based Interface: Provides an intuitive entry point for users of all technical levels
Top comments (2)
Thank you for this write up! We were just trying to build a new business model for agentic AI, and the term "Agents as a Service" is perfect way to sum up the business opportunity. Also, the evolution of MSA to Agent-MSA is a fascinating and modular way of looking at it when building a Rapid-Prototyping factory or capability in your organization. Do you have an example of "clean up" step when an Agent goes through a tear down, give back resources or stop consuming resources after the job is complete or user terminates? A project or use-case would illuminate this for me.
Thanks. For the clean up step, it really depends on resources it's going to connect. I recommend separating that where the agent calls other services like api/azure functions/aws lambda. And the functions would be responsible for releasing the resources it consumes. This way there's separation of concerns. The agent is not responsible for connecting/disconnecting to database but rather identifying steps and executing steps to call other services. AI Agent should not care about database connections, but the api should be responsible to maintain the database connection.