Abel

Posted on Mar 31 • Originally published at Medium on Mar 30

Building FeedbackForge a Multi-Agent AI System on Azure with MAF, Foundry & AI Gateway (Part 1)

#microsoftfoundry #multiagentsystems #agents #aigateway

A practical guide to architecture, implementation, and lessons learned from a multi-agent AI system

Posts in this Series

Building FeedbackForge a Multi-Agent AI System on Azure with MAF, Foundry & AI Gateway (Part 1) (This Post)

Building FeedbackForge a Multi-Agent AI System on Azure with MAF, Foundry & AI Gateway (Part 2)

Introduction

FeedbackForge is an AI-powered feedback analysis system designed to transform raw customer feedback into actionable insights.

Modern enterprises collect feedback from multiple sources such as web forms and platforms like Zendesk, but most of it remains underutilized due to the effort required to analyze it effectively.

FeedbackForge addresses this by using a multi-agent architecture to:

Analyze customer feedback at scale
Detect anomalies and emerging issues
Generate actionable recommendations
Create tickets in systems like Jira
Provide an executive dashboard for interactive insights

This project is a side initiative aimed at exploring agentic AI systems using Microsoft technologies such as Microsoft Foundry, Microsoft Agent Framework (MAF), and Azure AI services.

Use Case

To put FeedbackForge into context, let’s imagine a scenario based on a fictional enterprise: Contoso Inc.

Contoso is a global company that collects customer feedback through:

Traditional web forms
Zendesk support tickets
Mail-based feedback

Despite having access to large amounts of feedback, they face several challenges:

Feedback is fragmented across systems
Analysis is manual and time-consuming
Issues are difficult to prioritize
No clear connection between feedback and actions
Insights arrive too late to be useful

As a result, critical issues often go unnoticed, impacting customer satisfaction and retention.

Goals

FeedbackForge aims to solve these problems by enabling:

Automatic classification and analysis of feedback
Real-time anomaly detection and issue tracking
Generation of actionable recommendations
Automatic creation of Jira tickets
Dynamic FAQ generation from customer feedback
An executive dashboard with a conversational interface
Competitive intelligence and churn risk detection

Architecture

FeedbackForge is designed as a multi-agent system, where multiple specialized agents collaborate to analyze feedback and generate insights.

Each agent focuses on a specific task (e.g., sentiment analysis, anomaly detection, action generation…), allowing the system to be modular, scalable, and easier to maintain.

Agents communicate externally using the Agent-to-Agent (A2A) protocol and interact with systems through Model Context Protocol (MCP) servers, enabling integration with tools such as Zendesk and Jira.

AI Hub Gateway

To ensure governance, security, and observability, FeedbackForge uses a centralized AI Gateway based on Azure API Management.

This gateway acts as a control plane for all AI interactions, enforcing:

Security and access control
Usage monitoring and cost management
Policy enforcement and compliance
Resiliency and load balancing

The architecture follows a hub-and-spoke model, where a central AI gateway manages traffic while individual application environments operate independently within defined guardrails.

This layer is implemented through the Citadel Governance Hub , an enterprise-grade AI landing zone that provides a unified control plane for all AI workloads.

Why this matters

Instead of allowing each application to directly call AI models, all traffic flows through a centralized gateway. This enables:

Governance & Security Consistent access control, identity management, and policy enforcement
Observability & Compliance Centralized logging, metrics, and near real-time usage analytics
Developer Velocity Standardized onboarding and reusable configuration patterns

Hub-and-Spoke Architecture

The system follows a hub-and-spoke model:

Hub (Control Plane) → Centralized governance and AI Gateway
Spokes (Applications) → Independent workloads and agents

This allows teams to innovate independently while remaining within enterprise guardrails.

🎯 Citadel Governance Hub — Central Control Plane

The central governance layer with unified AI Gateway that all AI workloads route through.

Spoke Layer (Applications)

The spoke layer contains the actual applications and agents that implement business logic.

Key Components

Azure Container Apps Hosts all frontend, backendservices and agents with serverless scaling
Azure Cosmos DB : Stores feedback data, generated FAQS, alerts and workflow reports
Redis: Stores session data and conversational memory
Azure AI Search: Enables RAG-based FAQ generation and retrieval
Zendesk Integration: Source of customer support tickets and feedback
Jira Integration: Destination for automatically generated action items

Project Structure

FeedbackForge is composed of multiple components implemented in Python (backend) and TypeScript with React (frontend). The system is designed to be modular, allowing different execution modes depending on the use case.

Backend

The backend is built around a single Python project (feedbackforge) that can run in multiple modes. Each mode represents a different way of interacting with the system.

Operating Modes

FeedbackForge supports five main operating modes:

1. Chat Mode (DevUI)-Development Interface

An interactive chat interface for testing agents and workflows.

Best for: development, testing, and exploration
Default port: 8090
Command:

python -m feedbackforge chat

DevUI is a lightweight application built on the Microsoft Agent Framework that provides:

A web-based interface for interacting with agents
An OpenAI-compatible API backend
Tools for debugging and iterating on workflows

2. Serve Mode (AG-UI)-Production Server

A production-ready FastAPI server implementing the AG-UI protocol.

Best for: frontend integration and production deployments
Default port: 8081
Command:

python -m feedbackforge serve --port 8081

AG-UI (Agent User Interaction Protocol) is an event-driven protocol that standardizes communication between user-facing applications and AI agents. It enables:

Bi-directional communication
Streaming responses
Decoupling between frontend and backend

3. Workflow Mode-Batch Analysis

Executes the full multi-agent pipeline for large-scale feedback analysis.

Best for: scheduled processing of survey or feedback data
Command:

python -m feedbackforge workflow --max-surveys 50

4. FAQ Mode-RAG-Based FAQ Generation

Generates FAQs automatically from customer feedback using Azure AI Search.

Uses hybrid search (keyword + vector + semantic)
Applies clustering to identify common themes
Best for: customer support knowledge bases and documentation generation
Command:

python -m feedbackforge faq --days 7 --max-faqs 20

5. MCP Server Mode — External Integration

Runs a Model Context Protocol (MCP) server to integrate external feedback sources.

Best for: connecting to real systems like Zendesk, replacing mock data with production data and CI/CD and automation scenarios
Command:

python -m feedbackforge mcp

Action Planner

The Action Planner is an autonomous agent responsible for converting insights into actionable tasks. It integrates directly with systems like Jira to create and track issues.

Key Features

A2A Protocol → Seamless communication with other agents
Multi-platform support → Create tickets in Jira
Intelligent prioritization → Based on severity and impact
Auto-assignment → Suggests responsible teams
Effort estimation → T-shirt sizing (S/M/L/XL)
Traceability → Links tickets back to original feedback

Frontend

The frontend consists of two React applications.

FeedbackForge FAQ Viewer: A web interface for browsing auto-generated FAQs stored in Azure Cosmos DB. Designed for:

Customer support teams
End-user self-service

FeedbackForge Dashboard: A React application integrated with AG-UI for real-time interaction with agents. Features:

Streaming responses from agents
Interactive exploration of insights
Executive-level dashboard experience

Microsoft Agent Framework (MAF)

FeedbackForge is built using the Microsoft Agent Framework (MAF), a multi-language framework for building, orchestrating, and deploying AI agents.

MAF provides the foundational building blocks required to create both simple and complex multi-agent systems.

Core Capabilities

Agent runtime → Execution environment for agents
Model clients → Chat completions and response handling
Session management → Stateful conversations
Context providers → Memory and context injection
Middleware → Interception and customization of agent behavior
MCP clients → Integration with external tools and services
Workflow runtime → Multi-agent workflows with graph-based orchestration

Agents in FeedbackForge

FeedbackForge defines multiple specialized agents.

🤖 Core Agents

Action Planning Agent: Converts feedback insights into trackable Jira action items.
Dashboard Agent: Executive Dashboard Assistant for analyzing customer feedback.
Data Sync Agent: Syncs external feedback sources (Zendesk) into the data store.

🤖 Workflow Agents

Initial Orchestrator Agent: Validates survey data, assesses quality, plans analysis.
Data Preprocessor Agent: Cleans and prepares survey data.
Sentiment Analyzer Agent: Analyzes sentiment.
Topic Extractor Agent: Extracts topics.
Anomaly Detector Agent: Detects anomalies.
Competitive Intelligence Agent: Extracts competitor info.
Insight Miner Agent: Synthesizes analyses.
Priority Ranker Agent: Ranks priorities.
Action Generator Agent: Generates actions.
Report Generator Agent: Executive report generator which analyzes all the data provided and creates a comprehensive report.
Final Orchestrator Agent: Reviews results.

Each agent focuses on a specific responsibility, enabling modular and scalable processing.

Tools

The Microsoft Agent Framework supports many different types of tools that extend agent capabilities. Tools allow agents to interact with external systems, execute code, search data, and more. Tools are invoked dynamically by agents during execution, enabling them to move from reasoning to action.

Agent Tools

Action Planning Agent : Analyze Issue, Create Jira Ticket, Get Available Systems
Data Sync Agent : Zendesk via MCP, Check Sync Status
Dashboard Agent : Get Weekly Summary, Get Issue Details, Get Competitor Insights, Get Customer Context, Check for Anomalies, Set Alert, Generate Action Items, Escalate to Team, Create Action Plan, Run Workflow Analysis, Get Latest Workflow Report, Get Workflow Reports History

Example Tool: Get Weekly Summary

@ai_function
def get_weekly_summary() -> str:
    """Get weekly feedback summary with sentiment, top issues, and urgent items."""
    try:
        logger.debug("📊 Calling get_weekly_summary tool")
        summary = feedback_store.get_weekly_summary()
        result = json.dumps({
            "total_responses": summary["total_responses"],
            "sentiment": summary["sentiment"],
            "top_issues": [
                {"issue": i[0], "mentions": i[1], "priority": "P0" if i[1] > 40 else "P1" if i[1] > 25 else "P2"}
                for i in summary["top_issues"]
            ],
            "urgent_items": summary["urgent_count"],
        }, indent=2)
        logger.debug("✅ get_weekly_summary completed")
        return result
    except Exception as e:
        logger.error(f"❌ Error in get_weekly_summary: {e}", exc_info=True)
        return json.dumps({"error": str(e), "message": "Failed to retrieve weekly summary"}, indent=2)

MCP (Model Context Protocol)

MCP enables agents to interact with external systems as tools. In FeedbackForge, an MCP server is used as a tool for the Data Sync Agent to communicate with Zendesk. This MCP server uses SSE transport over HTTP and includes:

Tools : Fetch Zendesk Tickets, Ingest Feedback to Store, Create Zendesk Ticket
Resources : Structured data endpoints (e.g., feedbackforge://feedback/recent)
Prompts : Predefined reasoning templates (e.g., Analyze feedback trends, Generate executive summary)

A2A (Agent-to-Agent Protocol)

FeedbackForge uses the A2A protocol, an open standard designed to enable seamless communication and collaboration between AI agents. It enables the Dashboard Agent to communicate with the Action Planning Agent, which analyzes the issue, determines priority, category, and effort, and generates a Jira issue.

Below we can see how the Dashboard agent transparently calls the ActionPlanner agent telling to create a ticket from an action item.

Agent Cards

The A2A Protocol standardizes the data format shared during the discovery process via Agent Cards. These cards contain standardized descriptions that agents use to advertise their capabilities to other agents.

Example of an Agent Card (JSON format):

{
  "capabilities": {
    "pushNotifications": false,
    "streaming": true
  },
  "defaultInputModes": [
    "text"
  ],
  "defaultOutputModes": [
    "text"
  ],
  "description": "Converts customer feedback insights into trackable tickets in Jira",
  "name": "ActionPlanningAgent",
  "preferredTransport": "JSONRPC",
  "protocolVersion": "0.3.0",
  "skills": [
    {
      "description": "Analyzes customer feedback issues and creates actionable tickets with proper prioritization, categorization, and effort estimation. Supports Jira.",
      "examples": [
        "Create tickets for iOS crash issue affecting 45 users",
        "Analyze payment failure feedback and create action plan",
        "Generate tickets for login performance complaints"
      ],
      "id": "action_planner_ticket_creation",
      "name": "ActionPlanner",
      "tags": [
        "action-planning",
        "ticket-creation",
        "jira",
        "feedback",
        "agent-framework"
      ]
    }
  ],
  "url": "http://0.0.0.0:8084/",
  "version": "1.0.0"
}

Agent Discovery

For agents to collaborate, they first need to discover each other. A2A supports multiple discovery methods:

DNS-based Discovery
Registry-based Discovery
Private Discovery

In our case, we will use private discovery , which means directly configuring known agent endpoints through the already known ACA URL.

Task Processing Flow

The A2A protocol defines a standard flow for task processing between agents:

Communication Protocols

A2A is built on established communication protocols:

HTTP/HTTPS: For basic request/response communication
JSON-RPC: For structured method calls
Server-Sent Events (SSE): For streaming responses
JSON: As the standard data exchange format

Memory and Sessions

FeedbackForge, through the Microsoft Agent Framework (MAF), uses a strategy for storing and managing conversations through sessions in two modes: in-memory and persistent. The persistent mode uses short-term storage, which can be configured through TTL.

This ensures that when we refresh a page on the Dashboard UI, we do not lose the current conversations.

RAG and Azure AI Search

Retrieval Augmented Generation (RAG) is used to create FAQs based on common topics found in customer issues. It leverages Azure AI Search by creating a search index over the fields stored in Cosmos DB, supporting vector and semantic search capabilities.

For semantic search, the fields platform, sentiment and topics are used. Embeddings are generated for the text field using the text-embedding-3-small model. A hybrid search approach (keyword + vector + semantic) is still applied to achieve the best query accuracy.

# Use hybrid search for better accuracy
# Search for question/problem patterns
search_queries = [
    ("how to use feature", "usage questions"),
    ("why not working error", "error reports"),
    ("can I do this", "capability questions"),
    ("problem with issue", "problems"),
    ("confused about unclear", "clarity issues"),
    ("how do I configure", "configuration"),
      ("where is located", "navigation"),
      ("when will available", "availability"),
  ]

  all_feedback = {} # id -> feedback dict
  feedback_to_query = {} # id -> query that found it

  for query, category in search_queries:
      logger.info(f" Searching: '{query}' ({category})")

      # Use HYBRID search with Azure AI Search (keyword + vector + semantic)
      results = self.rag_client.hybrid_search(
          query=query,
          get_embeddings_func=get_embeddings,
          top=100,
          filters=time_filter
      )

      for result in results:
          feedback_id = result.get('id')
          text = result.get('text', '')

          # Only include question-like or problem feedback
          if self._is_question_like(text):
              if feedback_id not in all_feedback:
                  all_feedback[feedback_id] = {
                      'id': feedback_id,
                      'text': text,
                      'customer': result.get('customer_name'),
                      'segment': result.get('customer_segment'),
                      'platform': result.get('platform'),
                      'rating': result.get('rating'),
                      'timestamp': result.get('timestamp'),
                      'search_score': result.get('reranker_score', result.get('search_score', 0)),
                      'text_vector': result.get('text_vector'), # For clustering
                  }
              feedback_to_query[feedback_id] = category

  logger.info(f" Found {len(all_feedback)} question/problem feedback items")

Clustering is performed using vector similarity and cosine similarity on the embeddings of the feedback items. The themes variable in the code below is a list derived from the all_feedback dictionary, containing only question-like or problem-related feedback items that were retrieved through hybrid search.

 def _cluster_themes_with_vectors(self, themes: List[Dict]) -> List[Dict[str, Any]]:
    """
    Cluster similar themes using VECTOR SIMILARITY from Azure AI Search.

    This uses the actual embeddings stored in Azure AI Search for semantic clustering.

    Args:
        themes: List of theme dictionaries (with text_vector if available)

    Returns:
        List of clustered themes with counts
    """
    if not themes:
        return []

    logger.info(f" Clustering {len(themes)} items using vector similarity...")

    # Sort by search score (highest relevance first)
    themes = sorted(themes, key=lambda x: x.get('search_score', 0), reverse=True)

    clusters = []
    used_indices = set()
    similarity_threshold = 0.75 # Cosine similarity threshold

    for i, theme in enumerate(themes):
        if i in used_indices:
            continue

        # Create new cluster
        cluster = {
            'representative_text': theme['text'],
            'count': 1,
            'samples': [theme],
            'platforms': [theme.get('platform')],
            'segments': [theme.get('segment')],
            'avg_rating': theme.get('rating') or 3,
            'avg_search_score': theme.get('search_score', 0)
        }

        # Get embedding for this theme (if available)
        theme_vector = theme.get('text_vector')

        # Find similar themes using vector similarity
        for j, other in enumerate(themes):
            if j <= i or j in used_indices:
                continue

            # Use vector similarity if available
            is_similar = False
            if theme_vector and other.get('text_vector'):
                similarity = self._cosine_similarity(theme_vector, other['text_vector'])
                is_similar = similarity >= similarity_threshold
            else:
                # Fallback to text similarity
                is_similar = self._are_similar(theme['text'], other['text'], threshold=0.6)

            if is_similar:
                cluster['count'] += 1
                cluster['samples'].append(other)
                cluster['platforms'].append(other.get('platform'))
                cluster['segments'].append(other.get('segment'))
                used_indices.add(j)

        used_indices.add(i)

        # Calculate averages
        ratings = [s.get('rating') for s in cluster['samples'] if s.get('rating')]
        if ratings:
            cluster['avg_rating'] = sum(ratings) / len(ratings)

        scores = [s.get('search_score', 0) for s in cluster['samples']]
        if scores:
            cluster['avg_search_score'] = sum(scores) / len(scores)

        clusters.append(cluster)

    logger.info(f" Created {len(clusters)} clusters")
    return clusters

Workflows

The project uses MAF workflows, which are designed to handle complex business processes that may involve multiple agents, human interactions, and integrations with external systems.

Our survey analysis workflow can be executed in two ways: as a scheduled job on Azure Container Apps, or synchronously on demand from the dashboard.

All agents involved in the workflow and their responsibilities are described in the Agents section. The workflow uses sequential, fan-out, and fan-in patterns, as shown in the following image.

In this workflow, all issues stored in the Cosmos DB database are processed to generate an executive summary. In async mode, the result is returned in JSON format.

{
   "key_metrics": {
     "total_responses": 20,
     "sentiment_breakdown": {
   "executive_summary": "All 20 survey responses are unanimously negative, indicating a severe, platform\u2011wide iOS app failure triggered immediately after the iOS 17 update. Feedback highlights two dominant crash modes\u2014app launch and settings\u2011screen access\u2014supported by high-volume, repetitive reports across all customer segments, including high\u2011value enterprise accounts. Key themes include iOS 17 compatibility breakage, widespread functional instability, repeat crashes, and issues consistently tied to the settings screen. Anomalies such as identical crash patterns, uniformly low ratings, and repeated complaints signal an active crisis with P0-level urgency. No competitive threats or mentions were identified, confirming that the problem is strictly product\u2011related. Overall sentiment reflects total user frustration and high churn risk if remediation is delayed.",
       "positive": 0,
       "neutral": 0,
       "negative": 100
     },
     "top_topics": [
       "ios17_update_breakage: 8",
       "general_crashing_on_iphone: 4",
       "settings_screen_crashes: 6",
       "repeat_crashes_unacceptable: 2"
     ],
     "critical_count": 2,
     "high_priority_count": 2
   },
   "critical_issues": [
     {
       "issue": "App crashes immediately on launch for all iOS users after iOS 17 update",
       "priority": "P0",
       "impact": "Affects all iOS customer segments including enterprise accounts",
       "category": "ios17_update_breakage"
     },
     {
       "issue": "Settings screen consistently triggers crashes due to likely unhandled iOS 17 changes",
       "priority": "P0",
       "impact": "High-volume identical reports across users",
       "category": "settings_screen_crashes"
     },
     {
       "issue": "Repeated identical crash submissions indicate systemic failure and heightened customer frustration",
       "priority": "P1",
       "impact": "Multiple reports per user and wave of 1\u2011star ratings",
       "category": "repeat_crashes_unacceptable"
     },
     {
       "issue": "Lack of pre\u2011release compatibility testing for major iOS updates",
       "priority": "P2",
       "impact": "Increased likelihood of future outages and regressions",
       "category": "process_gap"
     }
   ],
   "recommendations": [
     {
       "action": "Deploy an emergency hotfix for iOS 17 launch and settings\u2011screen crashes",
       "priority": "Immediate",
       "owner": "Mobile Engineering",
       "rationale": "P0 failures blocking basic app access for all iOS users"
     },
     {
       "action": "Run targeted debugging on iOS 17 devices to identify memory, permissions, and deprecated API triggers",
       "priority": "Immediate",
       "owner": "Mobile Engineering",
       "rationale": "Likely compatibility issues introduced with iOS 17"
     },
     {
       "action": "Conduct full compatibility testing across iOS 17 devices and minor versions",
       "priority": "High",
       "owner": "QA",
       "rationale": "Prevent additional hidden crash paths"
     },
     {
       "action": "Publish user\u2011facing communication acknowledging the issue and outlining hotfix progress",
       "priority": "High",
       "owner": "Customer Support / Comms",
       "rationale": "Mitigate churn and reassure customers during platform\u2011wide outage"
     },
     {
       "action": "Implement pre\u2011release OS testing workflows for future major iOS versions",
       "priority": "Medium",
       "owner": "Mobile Engineering / QA",
       "rationale": "Addresses known process gap and prevents recurrence"
     },
     {
       "action": "Conduct proactive outreach to enterprise customers affected by the outage",
       "priority": "High",
       "owner": "Account Management",
       "rationale": "Prevent churn among high\u2011value accounts experiencing total app failure"
     }
   ]
 }

When called from the Dashboard UI, the system displays a human-readable response:

Observability

The Microsoft Agent Framework provides built-in support for observability, allowing you to monitor the behavior of your agents.

It integrates with OpenTelemetry, emitting traces, logs, and metrics, and uses the Azure Monitor exporter to send them through Application Insights.

We have the new (in preview) Agent Monitoring Dashboard in Microsoft Foundry:

Also a new brand Grafana view:

The following spans and metrics are created automatically:

Spans:

invoke_agent
chat
execute_tool

Metrics:

gen_ai.client.operation.duration (histogram)
gen_ai.client.token.usage (histogram)
feedback_processed (counter)
sessions_created (counter)
agent_executions (counter)
agent_duration (counter)
agent_framework.function.invocation.duration (histogram)

We also trace several operations by decorating methods with the @trace_operation attribute:

@trace_operation("memory.get_latest_workflow_report")
    def get_latest_workflow_report(self) -> Optional[Dict[str, Any]]:
        """Get the most recent workflow analysis report."""
        try:
            reports = self.get_workflow_reports(limit=1)
            return reports[0] if reports else None
        except Exception as e:
            logger.error(f"Failed to retrieve latest workflow report: {e}")
            return None

ACA (Azure Container Apps)

Azure Container Apps are used to deploy the entire system, including both backend and frontend components. Key benefits include simplified deployments and application management, built-in load-based scaling, and serverless scaling down to zero or revisions.

The following image shows a list of the deployed container apps:

Jobs

Two special Azure Container Apps are used in a cron-scheduled manner. These are called Azure Container Jobs.

Azure Cosmos DB

Cosmos DB is the primary database. It was chosen for its fast and productive application development, as well as its high availability, high throughput, low latency, and tunable consistency.

The system uses a database called feedbackforge with the following containers:

feedback: stores all the raw feedback items.
faqs: stores generated FAQs derived from feedback.
alerts: stores alerts created by the Dashboard Agent.
workflow_reports: stores the results of the survey workflows.

Coming Soon

📌In the next post, I’ll dig into Microsoft Foundry and AI Gateway.

👀 Follow me here on Medium to catch Part 2.

💻 Source code:

https://github.com/zodraz/feedback-forge

https://github.com/zodraz/ai-hub-gateway-feedbackforge

DEV Community