A practical guide to architecture, implementation, and lessons learned from a multi-agent AI system
Posts in this Series
Building FeedbackForge a Multi-Agent AI System on Azure with MAF, Foundry & AI Gateway (Part 2)
Introduction
FeedbackForge is an AI-powered feedback analysis system designed to transform raw customer feedback into actionable insights.
Modern enterprises collect feedback from multiple sources such as web forms and platforms like Zendesk, but most of it remains underutilized due to the effort required to analyze it effectively.
FeedbackForge addresses this by using a multi-agent architecture to:
- Analyze customer feedback at scale
- Detect anomalies and emerging issues
- Generate actionable recommendations
- Create tickets in systems like Jira
- Provide an executive dashboard for interactive insights
This project is a side initiative aimed at exploring agentic AI systems using Microsoft technologies such as Microsoft Foundry, Microsoft Agent Framework (MAF), and Azure AI services.
Use Case
To put FeedbackForge into context, let’s imagine a scenario based on a fictional enterprise: Contoso Inc.
Contoso is a global company that collects customer feedback through:
- Traditional web forms
- Zendesk support tickets
- Mail-based feedback
Despite having access to large amounts of feedback, they face several challenges:
- Feedback is fragmented across systems
- Analysis is manual and time-consuming
- Issues are difficult to prioritize
- No clear connection between feedback and actions
- Insights arrive too late to be useful
As a result, critical issues often go unnoticed, impacting customer satisfaction and retention.
Goals
FeedbackForge aims to solve these problems by enabling:
- Automatic classification and analysis of feedback
- Real-time anomaly detection and issue tracking
- Generation of actionable recommendations
- Automatic creation of Jira tickets
- Dynamic FAQ generation from customer feedback
- An executive dashboard with a conversational interface
- Competitive intelligence and churn risk detection
Architecture
FeedbackForge is designed as a multi-agent system, where multiple specialized agents collaborate to analyze feedback and generate insights.
Each agent focuses on a specific task (e.g., sentiment analysis, anomaly detection, action generation…), allowing the system to be modular, scalable, and easier to maintain.
Agents communicate externally using the Agent-to-Agent (A2A) protocol and interact with systems through Model Context Protocol (MCP) servers, enabling integration with tools such as Zendesk and Jira.
AI Hub Gateway
To ensure governance, security, and observability, FeedbackForge uses a centralized AI Gateway based on Azure API Management.
This gateway acts as a control plane for all AI interactions, enforcing:
- Security and access control
- Usage monitoring and cost management
- Policy enforcement and compliance
- Resiliency and load balancing
The architecture follows a hub-and-spoke model, where a central AI gateway manages traffic while individual application environments operate independently within defined guardrails.
This layer is implemented through the Citadel Governance Hub , an enterprise-grade AI landing zone that provides a unified control plane for all AI workloads.
Why this matters
Instead of allowing each application to directly call AI models, all traffic flows through a centralized gateway. This enables:
- Governance & Security Consistent access control, identity management, and policy enforcement
- Observability & Compliance Centralized logging, metrics, and near real-time usage analytics
- Developer Velocity Standardized onboarding and reusable configuration patterns
Hub-and-Spoke Architecture
The system follows a hub-and-spoke model:
- Hub (Control Plane) → Centralized governance and AI Gateway
- Spokes (Applications) → Independent workloads and agents
This allows teams to innovate independently while remaining within enterprise guardrails.
🎯 Citadel Governance Hub — Central Control Plane
The central governance layer with unified AI Gateway that all AI workloads route through.
Spoke Layer (Applications)
The spoke layer contains the actual applications and agents that implement business logic.
Key Components
- Azure Container Apps Hosts all frontend, backendservices and agents with serverless scaling
- Azure Cosmos DB : Stores feedback data, generated FAQS, alerts and workflow reports
- Redis: Stores session data and conversational memory
- Azure AI Search: Enables RAG-based FAQ generation and retrieval
- Zendesk Integration: Source of customer support tickets and feedback
- Jira Integration: Destination for automatically generated action items
Project Structure
FeedbackForge is composed of multiple components implemented in Python (backend) and TypeScript with React (frontend). The system is designed to be modular, allowing different execution modes depending on the use case.
Backend
The backend is built around a single Python project (feedbackforge) that can run in multiple modes. Each mode represents a different way of interacting with the system.
Operating Modes
FeedbackForge supports five main operating modes:
1. Chat Mode (DevUI)-Development Interface
An interactive chat interface for testing agents and workflows.
- Best for: development, testing, and exploration
- Default port: 8090
- Command:
python -m feedbackforge chat
DevUI is a lightweight application built on the Microsoft Agent Framework that provides:
- A web-based interface for interacting with agents
- An OpenAI-compatible API backend
- Tools for debugging and iterating on workflows
2. Serve Mode (AG-UI)-Production Server
A production-ready FastAPI server implementing the AG-UI protocol.
- Best for: frontend integration and production deployments
- Default port: 8081
- Command:
python -m feedbackforge serve --port 8081
AG-UI (Agent User Interaction Protocol) is an event-driven protocol that standardizes communication between user-facing applications and AI agents. It enables:
- Bi-directional communication
- Streaming responses
- Decoupling between frontend and backend
3. Workflow Mode-Batch Analysis
Executes the full multi-agent pipeline for large-scale feedback analysis.
- Best for: scheduled processing of survey or feedback data
- Command:
python -m feedbackforge workflow --max-surveys 50
4. FAQ Mode-RAG-Based FAQ Generation
Generates FAQs automatically from customer feedback using Azure AI Search.
- Uses hybrid search (keyword + vector + semantic)
- Applies clustering to identify common themes
- Best for: customer support knowledge bases and documentation generation
- Command:
python -m feedbackforge faq --days 7 --max-faqs 20
5. MCP Server Mode — External Integration
Runs a Model Context Protocol (MCP) server to integrate external feedback sources.
- Best for: connecting to real systems like Zendesk, replacing mock data with production data and CI/CD and automation scenarios
- Command:
python -m feedbackforge mcp
Action Planner
The Action Planner is an autonomous agent responsible for converting insights into actionable tasks. It integrates directly with systems like Jira to create and track issues.
Key Features
- A2A Protocol → Seamless communication with other agents
- Multi-platform support → Create tickets in Jira
- Intelligent prioritization → Based on severity and impact
- Auto-assignment → Suggests responsible teams
- Effort estimation → T-shirt sizing (S/M/L/XL)
- Traceability → Links tickets back to original feedback
Frontend
The frontend consists of two React applications.
FeedbackForge FAQ Viewer: A web interface for browsing auto-generated FAQs stored in Azure Cosmos DB. Designed for:
- Customer support teams
- End-user self-service
FeedbackForge Dashboard: A React application integrated with AG-UI for real-time interaction with agents. Features:
- Streaming responses from agents
- Interactive exploration of insights
- Executive-level dashboard experience
Microsoft Agent Framework (MAF)
FeedbackForge is built using the Microsoft Agent Framework (MAF), a multi-language framework for building, orchestrating, and deploying AI agents.
MAF provides the foundational building blocks required to create both simple and complex multi-agent systems.
Core Capabilities
- Agent runtime → Execution environment for agents
- Model clients → Chat completions and response handling
- Session management → Stateful conversations
- Context providers → Memory and context injection
- Middleware → Interception and customization of agent behavior
- MCP clients → Integration with external tools and services
- Workflow runtime → Multi-agent workflows with graph-based orchestration
Agents in FeedbackForge
FeedbackForge defines multiple specialized agents.
🤖 Core Agents
- Action Planning Agent: Converts feedback insights into trackable Jira action items.
- Dashboard Agent: Executive Dashboard Assistant for analyzing customer feedback.
- Data Sync Agent: Syncs external feedback sources (Zendesk) into the data store.
🤖 Workflow Agents
- Initial Orchestrator Agent: Validates survey data, assesses quality, plans analysis.
- Data Preprocessor Agent: Cleans and prepares survey data.
- Sentiment Analyzer Agent: Analyzes sentiment.
- Topic Extractor Agent: Extracts topics.
- Anomaly Detector Agent: Detects anomalies.
- Competitive Intelligence Agent: Extracts competitor info.
- Insight Miner Agent: Synthesizes analyses.
- Priority Ranker Agent: Ranks priorities.
- Action Generator Agent: Generates actions.
- Report Generator Agent: Executive report generator which analyzes all the data provided and creates a comprehensive report.
- Final Orchestrator Agent: Reviews results.
Each agent focuses on a specific responsibility, enabling modular and scalable processing.
Tools
The Microsoft Agent Framework supports many different types of tools that extend agent capabilities. Tools allow agents to interact with external systems, execute code, search data, and more. Tools are invoked dynamically by agents during execution, enabling them to move from reasoning to action.
Agent Tools
- Action Planning Agent : Analyze Issue, Create Jira Ticket, Get Available Systems
- Data Sync Agent : Zendesk via MCP, Check Sync Status
- Dashboard Agent : Get Weekly Summary, Get Issue Details, Get Competitor Insights, Get Customer Context, Check for Anomalies, Set Alert, Generate Action Items, Escalate to Team, Create Action Plan, Run Workflow Analysis, Get Latest Workflow Report, Get Workflow Reports History
Example Tool: Get Weekly Summary
@ai_function
def get_weekly_summary() -> str:
"""Get weekly feedback summary with sentiment, top issues, and urgent items."""
try:
logger.debug("📊 Calling get_weekly_summary tool")
summary = feedback_store.get_weekly_summary()
result = json.dumps({
"total_responses": summary["total_responses"],
"sentiment": summary["sentiment"],
"top_issues": [
{"issue": i[0], "mentions": i[1], "priority": "P0" if i[1] > 40 else "P1" if i[1] > 25 else "P2"}
for i in summary["top_issues"]
],
"urgent_items": summary["urgent_count"],
}, indent=2)
logger.debug("✅ get_weekly_summary completed")
return result
except Exception as e:
logger.error(f"❌ Error in get_weekly_summary: {e}", exc_info=True)
return json.dumps({"error": str(e), "message": "Failed to retrieve weekly summary"}, indent=2)
MCP (Model Context Protocol)
MCP enables agents to interact with external systems as tools. In FeedbackForge, an MCP server is used as a tool for the Data Sync Agent to communicate with Zendesk. This MCP server uses SSE transport over HTTP and includes:
- Tools : Fetch Zendesk Tickets, Ingest Feedback to Store, Create Zendesk Ticket
- Resources : Structured data endpoints (e.g., feedbackforge://feedback/recent)
- Prompts : Predefined reasoning templates (e.g., Analyze feedback trends, Generate executive summary)
A2A (Agent-to-Agent Protocol)
FeedbackForge uses the A2A protocol, an open standard designed to enable seamless communication and collaboration between AI agents. It enables the Dashboard Agent to communicate with the Action Planning Agent, which analyzes the issue, determines priority, category, and effort, and generates a Jira issue.
Below we can see how the Dashboard agent transparently calls the ActionPlanner agent telling to create a ticket from an action item.
Agent Cards
The A2A Protocol standardizes the data format shared during the discovery process via Agent Cards. These cards contain standardized descriptions that agents use to advertise their capabilities to other agents.
Example of an Agent Card (JSON format):
{
"capabilities": {
"pushNotifications": false,
"streaming": true
},
"defaultInputModes": [
"text"
],
"defaultOutputModes": [
"text"
],
"description": "Converts customer feedback insights into trackable tickets in Jira",
"name": "ActionPlanningAgent",
"preferredTransport": "JSONRPC",
"protocolVersion": "0.3.0",
"skills": [
{
"description": "Analyzes customer feedback issues and creates actionable tickets with proper prioritization, categorization, and effort estimation. Supports Jira.",
"examples": [
"Create tickets for iOS crash issue affecting 45 users",
"Analyze payment failure feedback and create action plan",
"Generate tickets for login performance complaints"
],
"id": "action_planner_ticket_creation",
"name": "ActionPlanner",
"tags": [
"action-planning",
"ticket-creation",
"jira",
"feedback",
"agent-framework"
]
}
],
"url": "http://0.0.0.0:8084/",
"version": "1.0.0"
}
Agent Discovery
For agents to collaborate, they first need to discover each other. A2A supports multiple discovery methods:
- DNS-based Discovery
- Registry-based Discovery
- Private Discovery
In our case, we will use private discovery , which means directly configuring known agent endpoints through the already known ACA URL.
Task Processing Flow
The A2A protocol defines a standard flow for task processing between agents:
Communication Protocols
A2A is built on established communication protocols:
- HTTP/HTTPS: For basic request/response communication
- JSON-RPC: For structured method calls
- Server-Sent Events (SSE): For streaming responses
- JSON: As the standard data exchange format
Memory and Sessions
FeedbackForge, through the Microsoft Agent Framework (MAF), uses a strategy for storing and managing conversations through sessions in two modes: in-memory and persistent. The persistent mode uses short-term storage, which can be configured through TTL.
This ensures that when we refresh a page on the Dashboard UI, we do not lose the current conversations.
RAG and Azure AI Search
Retrieval Augmented Generation (RAG) is used to create FAQs based on common topics found in customer issues. It leverages Azure AI Search by creating a search index over the fields stored in Cosmos DB, supporting vector and semantic search capabilities.
For semantic search, the fields platform, sentiment and topics are used. Embeddings are generated for the text field using the text-embedding-3-small model. A hybrid search approach (keyword + vector + semantic) is still applied to achieve the best query accuracy.
# Use hybrid search for better accuracy
# Search for question/problem patterns
search_queries = [
("how to use feature", "usage questions"),
("why not working error", "error reports"),
("can I do this", "capability questions"),
("problem with issue", "problems"),
("confused about unclear", "clarity issues"),
("how do I configure", "configuration"),
("where is located", "navigation"),
("when will available", "availability"),
]
all_feedback = {} # id -> feedback dict
feedback_to_query = {} # id -> query that found it
for query, category in search_queries:
logger.info(f" Searching: '{query}' ({category})")
# Use HYBRID search with Azure AI Search (keyword + vector + semantic)
results = self.rag_client.hybrid_search(
query=query,
get_embeddings_func=get_embeddings,
top=100,
filters=time_filter
)
for result in results:
feedback_id = result.get('id')
text = result.get('text', '')
# Only include question-like or problem feedback
if self._is_question_like(text):
if feedback_id not in all_feedback:
all_feedback[feedback_id] = {
'id': feedback_id,
'text': text,
'customer': result.get('customer_name'),
'segment': result.get('customer_segment'),
'platform': result.get('platform'),
'rating': result.get('rating'),
'timestamp': result.get('timestamp'),
'search_score': result.get('reranker_score', result.get('search_score', 0)),
'text_vector': result.get('text_vector'), # For clustering
}
feedback_to_query[feedback_id] = category
logger.info(f" Found {len(all_feedback)} question/problem feedback items")
Clustering is performed using vector similarity and cosine similarity on the embeddings of the feedback items. The themes variable in the code below is a list derived from the all_feedback dictionary, containing only question-like or problem-related feedback items that were retrieved through hybrid search.
def _cluster_themes_with_vectors(self, themes: List[Dict]) -> List[Dict[str, Any]]:
"""
Cluster similar themes using VECTOR SIMILARITY from Azure AI Search.
This uses the actual embeddings stored in Azure AI Search for semantic clustering.
Args:
themes: List of theme dictionaries (with text_vector if available)
Returns:
List of clustered themes with counts
"""
if not themes:
return []
logger.info(f" Clustering {len(themes)} items using vector similarity...")
# Sort by search score (highest relevance first)
themes = sorted(themes, key=lambda x: x.get('search_score', 0), reverse=True)
clusters = []
used_indices = set()
similarity_threshold = 0.75 # Cosine similarity threshold
for i, theme in enumerate(themes):
if i in used_indices:
continue
# Create new cluster
cluster = {
'representative_text': theme['text'],
'count': 1,
'samples': [theme],
'platforms': [theme.get('platform')],
'segments': [theme.get('segment')],
'avg_rating': theme.get('rating') or 3,
'avg_search_score': theme.get('search_score', 0)
}
# Get embedding for this theme (if available)
theme_vector = theme.get('text_vector')
# Find similar themes using vector similarity
for j, other in enumerate(themes):
if j <= i or j in used_indices:
continue
# Use vector similarity if available
is_similar = False
if theme_vector and other.get('text_vector'):
similarity = self._cosine_similarity(theme_vector, other['text_vector'])
is_similar = similarity >= similarity_threshold
else:
# Fallback to text similarity
is_similar = self._are_similar(theme['text'], other['text'], threshold=0.6)
if is_similar:
cluster['count'] += 1
cluster['samples'].append(other)
cluster['platforms'].append(other.get('platform'))
cluster['segments'].append(other.get('segment'))
used_indices.add(j)
used_indices.add(i)
# Calculate averages
ratings = [s.get('rating') for s in cluster['samples'] if s.get('rating')]
if ratings:
cluster['avg_rating'] = sum(ratings) / len(ratings)
scores = [s.get('search_score', 0) for s in cluster['samples']]
if scores:
cluster['avg_search_score'] = sum(scores) / len(scores)
clusters.append(cluster)
logger.info(f" Created {len(clusters)} clusters")
return clusters
Workflows
The project uses MAF workflows, which are designed to handle complex business processes that may involve multiple agents, human interactions, and integrations with external systems.
Our survey analysis workflow can be executed in two ways: as a scheduled job on Azure Container Apps, or synchronously on demand from the dashboard.
All agents involved in the workflow and their responsibilities are described in the Agents section. The workflow uses sequential, fan-out, and fan-in patterns, as shown in the following image.
In this workflow, all issues stored in the Cosmos DB database are processed to generate an executive summary. In async mode, the result is returned in JSON format.
{
"key_metrics": {
"total_responses": 20,
"sentiment_breakdown": {
"executive_summary": "All 20 survey responses are unanimously negative, indicating a severe, platform\u2011wide iOS app failure triggered immediately after the iOS 17 update. Feedback highlights two dominant crash modes\u2014app launch and settings\u2011screen access\u2014supported by high-volume, repetitive reports across all customer segments, including high\u2011value enterprise accounts. Key themes include iOS 17 compatibility breakage, widespread functional instability, repeat crashes, and issues consistently tied to the settings screen. Anomalies such as identical crash patterns, uniformly low ratings, and repeated complaints signal an active crisis with P0-level urgency. No competitive threats or mentions were identified, confirming that the problem is strictly product\u2011related. Overall sentiment reflects total user frustration and high churn risk if remediation is delayed.",
"positive": 0,
"neutral": 0,
"negative": 100
},
"top_topics": [
"ios17_update_breakage: 8",
"general_crashing_on_iphone: 4",
"settings_screen_crashes: 6",
"repeat_crashes_unacceptable: 2"
],
"critical_count": 2,
"high_priority_count": 2
},
"critical_issues": [
{
"issue": "App crashes immediately on launch for all iOS users after iOS 17 update",
"priority": "P0",
"impact": "Affects all iOS customer segments including enterprise accounts",
"category": "ios17_update_breakage"
},
{
"issue": "Settings screen consistently triggers crashes due to likely unhandled iOS 17 changes",
"priority": "P0",
"impact": "High-volume identical reports across users",
"category": "settings_screen_crashes"
},
{
"issue": "Repeated identical crash submissions indicate systemic failure and heightened customer frustration",
"priority": "P1",
"impact": "Multiple reports per user and wave of 1\u2011star ratings",
"category": "repeat_crashes_unacceptable"
},
{
"issue": "Lack of pre\u2011release compatibility testing for major iOS updates",
"priority": "P2",
"impact": "Increased likelihood of future outages and regressions",
"category": "process_gap"
}
],
"recommendations": [
{
"action": "Deploy an emergency hotfix for iOS 17 launch and settings\u2011screen crashes",
"priority": "Immediate",
"owner": "Mobile Engineering",
"rationale": "P0 failures blocking basic app access for all iOS users"
},
{
"action": "Run targeted debugging on iOS 17 devices to identify memory, permissions, and deprecated API triggers",
"priority": "Immediate",
"owner": "Mobile Engineering",
"rationale": "Likely compatibility issues introduced with iOS 17"
},
{
"action": "Conduct full compatibility testing across iOS 17 devices and minor versions",
"priority": "High",
"owner": "QA",
"rationale": "Prevent additional hidden crash paths"
},
{
"action": "Publish user\u2011facing communication acknowledging the issue and outlining hotfix progress",
"priority": "High",
"owner": "Customer Support / Comms",
"rationale": "Mitigate churn and reassure customers during platform\u2011wide outage"
},
{
"action": "Implement pre\u2011release OS testing workflows for future major iOS versions",
"priority": "Medium",
"owner": "Mobile Engineering / QA",
"rationale": "Addresses known process gap and prevents recurrence"
},
{
"action": "Conduct proactive outreach to enterprise customers affected by the outage",
"priority": "High",
"owner": "Account Management",
"rationale": "Prevent churn among high\u2011value accounts experiencing total app failure"
}
]
}
When called from the Dashboard UI, the system displays a human-readable response:
Observability
The Microsoft Agent Framework provides built-in support for observability, allowing you to monitor the behavior of your agents.
It integrates with OpenTelemetry, emitting traces, logs, and metrics, and uses the Azure Monitor exporter to send them through Application Insights.
We have the new (in preview) Agent Monitoring Dashboard in Microsoft Foundry:
Also a new brand Grafana view:
The following spans and metrics are created automatically:
Spans:
- invoke_agent
- chat
- execute_tool
Metrics:
- gen_ai.client.operation.duration (histogram)
- gen_ai.client.token.usage (histogram)
- feedback_processed (counter)
- sessions_created (counter)
- agent_executions (counter)
- agent_duration (counter)
- agent_framework.function.invocation.duration (histogram)
We also trace several operations by decorating methods with the @trace_operation attribute:
@trace_operation("memory.get_latest_workflow_report")
def get_latest_workflow_report(self) -> Optional[Dict[str, Any]]:
"""Get the most recent workflow analysis report."""
try:
reports = self.get_workflow_reports(limit=1)
return reports[0] if reports else None
except Exception as e:
logger.error(f"Failed to retrieve latest workflow report: {e}")
return None
ACA (Azure Container Apps)
Azure Container Apps are used to deploy the entire system, including both backend and frontend components. Key benefits include simplified deployments and application management, built-in load-based scaling, and serverless scaling down to zero or revisions.
The following image shows a list of the deployed container apps:
Jobs
Two special Azure Container Apps are used in a cron-scheduled manner. These are called Azure Container Jobs.
Azure Cosmos DB
Cosmos DB is the primary database. It was chosen for its fast and productive application development, as well as its high availability, high throughput, low latency, and tunable consistency.
The system uses a database called feedbackforge with the following containers:
- feedback: stores all the raw feedback items.
- faqs: stores generated FAQs derived from feedback.
- alerts: stores alerts created by the Dashboard Agent.
- workflow_reports: stores the results of the survey workflows.
Coming Soon
📌In the next post, I’ll dig into Microsoft Foundry and AI Gateway.
👀 Follow me here on Medium to catch Part 2.
💻 Source code:





















Top comments (0)