DEV Community

Cover image for Build an Enterprise AI Document Summarizer with Azure Blob Storage + Copilot Studio
Seena Khan
Seena Khan

Posted on

Build an Enterprise AI Document Summarizer with Azure Blob Storage + Copilot Studio

A Complete Hands-On Guide to Creating a Secure AI Agent that Reads, Lists, and Summarizes Corporate Documents

Modern organizations generate thousands of documents every month — reports, contracts, policies, meeting notes, technical manuals, onboarding guides, and operational procedures. Most of that information sits buried inside cloud storage, difficult to search and even harder to summarize quickly.

What if you could build an AI assistant that instantly retrieves and summarizes those documents directly from Azure Blob Storage?

In this comprehensive hands-on guide, you’ll learn how to create a production-ready AI document summarization agent using:

  • Microsoft Copilot Studio
  • Microsoft Azure Azure Blob Storage
  • Azure Functions (Python)
  • REST APIs with OpenAPI
  • AI-powered summarization workflows

By the end, you’ll have a fully functional enterprise AI assistant capable of:

✅ Listing documents from Azure Blob Storage
✅ Extracting text from PDF, DOCX, TXT, MD, and CSV files
✅ Summarizing documents using structured AI responses
✅ Handling missing files gracefully
✅ Operating through a secure REST API architecture

What You Will Build

You will create a Copilot Studio agent called:

Azure Blob Document Summarizer

This intelligent assistant connects to corporate documents stored in Azure Blob Storage through a custom REST API.

The agent can:

  • Retrieve all available documents
  • Extract text content from documents
  • Generate concise structured summaries
  • Return action items and decisions from reports
  • Help employees quickly understand large files

Final Solution Architecture

Here’s the complete request flow:

User
   ↓
Copilot Studio Agent
   ↓
Custom REST API Tool
   ↓
Azure Function App (Python)
   ↓
Azure Blob Storage
   ↓
Document Text Extraction
   ↓
AI Summary Response
Enter fullscreen mode Exit fullscreen mode

Why This Architecture Works So Well

This solution separates responsibilities cleanly:

Component Responsibility
Copilot Studio Conversational AI orchestration
Azure Function Secure API and document processing
Blob Storage Central document repository
OpenAPI Spec API contract for Copilot
Python Extraction Layer Reads document contents

This modular architecture is scalable, secure, and enterprise-friendly.

Supported File Types

The extraction layer supports these document formats out of the box:

File Type Supported
PDF
DOCX
TXT
Markdown
CSV

Important Note About Scanned PDFs

Scanned image-based PDFs contain no embedded text layer.

That means:

  • Text extraction returns empty content
  • OCR is required

For production workloads, integrate:

  • Microsoft Azure Azure Document Intelligence
  • OCR pipelines
  • Form Recognizer services

Phase 0 — Prerequisites

Before building the solution, ensure you have the following.

Required Accounts and Licenses

Requirement Purpose
Microsoft 365 Tenant Hosts Copilot Studio
Azure Subscription Hosts Function App + Blob Storage
Copilot Studio License Create and publish the agent
Power Apps License Needed only for Custom Connector workflows

Required Local Development Tools

Install the following tools locally.

1. Python 3.11

Required for Azure Functions Python runtime.

Official website:

Python Downloads

2. Visual Studio Code

Install VS Code with the Azure Functions extension.

Official website:

Visual Studio Code

3. Azure Functions Core Tools v4

Install globally:

npm i -g azure-functions-core-tools@4 --unsafe-perm true
Enter fullscreen mode Exit fullscreen mode

4. Azure CLI

Verify installation:

az --version
Enter fullscreen mode Exit fullscreen mode

Expected version:

2.86.0 or later
Enter fullscreen mode Exit fullscreen mode

Official documentation:

Azure CLI Documentation

Fixing Azure CLI Permission Errors on Windows

If you encounter:

PermissionError on C:\Users\<you>\.azure
Enter fullscreen mode Exit fullscreen mode

Run PowerShell as Administrator:

icacls "C:\Users\<YourUser>\.azure" /grant "<YourUser>:(OI)(CI)F" /T
Enter fullscreen mode Exit fullscreen mode

Alternative approach:

setx AZURE_CONFIG_DIR "C:\AzureCLI"
Enter fullscreen mode Exit fullscreen mode

Phase 1 — Create Azure Blob Storage

Step 1 — Create the Storage Account

Inside the Azure Portal:

  1. Go to Storage Accounts
  2. Click Create
  3. Performance → Standard
  4. Redundancy → LRS

Create the Blob Container

After deployment:

  1. Open the storage account
  2. Go to Containers
  3. Create a container named:
agent-docs
Enter fullscreen mode Exit fullscreen mode
  1. Set access level to:
Private
Enter fullscreen mode Exit fullscreen mode

Upload Sample Documents

Upload a few test files:

  • PDF
  • DOCX
  • TXT

Save the Connection String

Navigate to:

Storage Account → Access Keys
Enter fullscreen mode Exit fullscreen mode

Copy:

Connection String
Enter fullscreen mode Exit fullscreen mode

You’ll use this later inside the Function App.

Enterprise Best Practice

Use separate containers for:

  • Departments
  • Business units
  • Agents
  • Security boundaries

This improves:

  • DLP enforcement
  • Auditability
  • Governance
  • Access isolation

Phase 2 — Build the REST API

Now you’ll create the backend API layer.

Project Structure

Create the following folder structure:

doc-summary-api/
├── function_app.py
├── requirements.txt
├── host.json
├── local.settings.json
├── doc-api-openapi.json
├── .gitignore
└── README.md
Enter fullscreen mode Exit fullscreen mode

Understanding the API Design

The API exposes two endpoints:

Endpoint Purpose
GET /documents Lists available files
GET /documents/{name} Returns extracted text

Phase 2.1 — Build the Azure Function

Create:

function_app.py
Enter fullscreen mode Exit fullscreen mode

This file contains:

  • HTTP triggers
  • Blob Storage access
  • File extraction logic
  • JSON responses

The application uses:

  • Azure Functions
  • BlobServiceClient
  • pypdf
  • python-docx

Core Design Principle

The Function App acts as:

A Translation Layer

It converts:

Blob Storage Files
        ↓
Extracted Plain Text
        ↓
AI-Ready Content
Enter fullscreen mode Exit fullscreen mode

This is critical because Copilot Studio works best with plain text.

Why Use ANONYMOUS Authentication Initially?

The guide uses:

func.AuthLevel.ANONYMOUS
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Simplifies Copilot integration
  • No function key required
  • Faster prototyping

Production environments should later switch to:

  • OAuth
  • Entra ID
  • Managed Identity

Supported Text Extraction Logic

The extraction helper automatically handles:

Extension Extraction Method
.pdf PdfReader
.docx python-docx
.txt UTF-8 decode
.csv UTF-8 decode
.md UTF-8 decode

Unsupported files return:

[Unsupported file type]
Enter fullscreen mode Exit fullscreen mode

Python Dependencies

Create:

requirements.txt
Enter fullscreen mode Exit fullscreen mode

Contents:

azure-functions
azure-storage-blob
azure-identity
pypdf
python-docx
Enter fullscreen mode Exit fullscreen mode

Configure host.json

This file controls:

  • Runtime behavior
  • Extension bundles
  • Logging

It also enables Application Insights integration.

Configure local.settings.json

This file stores:

  • Local environment variables
  • Storage connection strings
  • Runtime settings

⚠️ Never commit this file to GitHub.

Add it to:

.gitignore
Enter fullscreen mode Exit fullscreen mode

Test the API Locally

Install dependencies:

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Start the Functions runtime:

func start
Enter fullscreen mode Exit fullscreen mode

Test Endpoint 1 — List Documents

Open:

http://localhost:7071/api/documents
Enter fullscreen mode Exit fullscreen mode

Expected result:

{
  "documents": [...]
}
Enter fullscreen mode Exit fullscreen mode

Test Endpoint 2 — Retrieve Document Content

Example:

http://localhost:7071/api/documents/sample.pdf
Enter fullscreen mode Exit fullscreen mode

Expected response:

{
  "name": "sample.pdf",
  "content": "Extracted text..."
}
Enter fullscreen mode Exit fullscreen mode

Phase 3 — Deploy to Azure

Now it’s time to move from local development to the cloud.

Critical Azure Functions Rule

Python Azure Functions require:

Linux Hosting

If Python does not appear in the runtime list:

❌ You selected Windows
✅ Switch to Linux

Recommended Hosting Plan

Use:

Flex Consumption

Benefits:

  • Serverless
  • Scales to zero
  • Lowest cost
  • Auto-scaling

Configure Environment Variables

Inside the Function App:

Settings → Environment Variables
Enter fullscreen mode Exit fullscreen mode

Add:

Name Value
BLOB_CONNECTION_STRING Your storage connection string
BLOB_CONTAINER agent-docs

Restart the Function App afterward.

Deploy from VS Code

Using the Azure extension:

  1. Sign in
  2. Right-click project
  3. Deploy to Function App
  4. Confirm overwrite

Validate Deployment

Verify:

Functions → list_documents
Functions → get_document
Enter fullscreen mode Exit fullscreen mode

Then test the public URL.

Phase 4 — Create the OpenAPI Specification

This step is extremely important.

Copilot Studio uses the OpenAPI document to understand:

  • Endpoints
  • Parameters
  • Outputs
  • Schemas

Why OpenAPI Matters

Without OpenAPI:

❌ Copilot cannot discover your actions
❌ Tool orchestration breaks
❌ Parameters become unreliable

With OpenAPI:

✅ Actions become AI callable
✅ Schemas are validated
✅ Responses are structured

Critical OpenAPI Formatting Rules

The host field must contain:

✅ Hostname only

Correct:

contoso-api.azurewebsites.net
Enter fullscreen mode Exit fullscreen mode

Incorrect:

https://contoso-api.azurewebsites.net
Enter fullscreen mode Exit fullscreen mode

Common Swagger Error

If you see:

Swagger contains base path:/api but backend Url doesn't end on same path...
Enter fullscreen mode Exit fullscreen mode

The host format is wrong.

Phase 5 — Configure Copilot Studio

Now the exciting part begins.

Create the AI Agent

Inside Microsoft Copilot Studio:

  1. Create a new agent
  2. Name it:
Azure Blob Document Summarizer
Enter fullscreen mode Exit fullscreen mode
  1. Enable:
Generative Orchestration
Enter fullscreen mode Exit fullscreen mode

Why Generative Orchestration Matters

This feature allows the agent to:

  • Decide which action to call
  • Chain tool executions
  • Interpret user intent dynamically
  • Recover from failures

Without it, the agent becomes rigid.

Agent Instruction Design

Your instructions teach the agent:

  • When to call ListDocuments
  • When to call GetDocument
  • How to summarize
  • How to handle failures

This prompt engineering layer is critical.

Recommended Summary Structure

Use a consistent response format:

Section Purpose
Purpose High-level objective
Key Points Main insights
Decisions Important approvals
Action Items Next steps

This improves readability dramatically.

Add the REST API Tool

Inside the Tools tab:

Create:

AzureDocAPI
Enter fullscreen mode Exit fullscreen mode

Avoid:

  • Spaces
  • Hyphens
  • Duplicate registrations

Configure the Actions

Action 1 — ListDocuments

GET /documents
Enter fullscreen mode Exit fullscreen mode

Action 2 — GetDocument

GET /documents/{name}
Enter fullscreen mode Exit fullscreen mode

Input parameter:

name
Enter fullscreen mode Exit fullscreen mode

Attach the Tool

Finally:

Agent → Tools → Add Tool
Enter fullscreen mode Exit fullscreen mode

Select:

AzureDocAPI
Enter fullscreen mode Exit fullscreen mode

Phase 6 — Test the Agent

Example Prompt 1

What documents do you have?
Enter fullscreen mode Exit fullscreen mode

Expected behavior:

  • Agent calls ListDocuments
  • Returns available filenames

Example Prompt 2

Summarize Q1-report.pdf
Enter fullscreen mode Exit fullscreen mode

Expected behavior:

  • Calls GetDocument
  • Retrieves extracted text
  • Generates structured summary

Example Prompt 3

Summarize a missing file
Enter fullscreen mode Exit fullscreen mode

Expected behavior:

  • Calls ListDocuments
  • Suggests closest matches

Troubleshooting Guide

1. Azure CLI Permission Errors

Fix with:

icacls "C:\Users\<you>\.azure" /grant "<you>:(OI)(CI)F" /T
Enter fullscreen mode Exit fullscreen mode

2. Python Missing in Runtime Dropdown

Cause:

Windows OS selected
Enter fullscreen mode Exit fullscreen mode

Fix:

Use Linux
Enter fullscreen mode Exit fullscreen mode

3. Swagger Import Failure

Cause:

Incorrect host formatting.

4. 401 Unauthorized

Possible causes:

  • Missing function key
  • Cached APIM responses
  • Incorrect authentication mode

5. Hyphenated Tool Name Errors

Avoid names like:

Azure-Doc-API
Enter fullscreen mode Exit fullscreen mode

Use:

AzureDocAPI
Enter fullscreen mode Exit fullscreen mode

instead.

Production Hardening Checklist

Before rolling this out enterprise-wide, implement:

Security Improvement Recommended
OAuth Authentication
Entra ID Protection
Managed Identity
DLP Policies
Blob Segmentation
File Size Limits
Rate Limiting
Application Insights

Recommended Enterprise Enhancements

Here are powerful next steps.

1. Add OCR Support

Integrate:

Microsoft Azure Azure Document Intelligence

to support:

  • Scanned PDFs
  • Images
  • Handwritten documents

2. Add Semantic Search

Create:

SearchDocuments
Enter fullscreen mode Exit fullscreen mode

This enables:

  • Keyword filtering
  • Metadata filtering
  • Natural language search

3. Add Vector Embeddings

Store embeddings using:

  • Azure AI Search
  • Vector databases
  • Retrieval-Augmented Generation (RAG)

This transforms your assistant into a true enterprise knowledge system.

4. Add Role-Based Access Control

Limit document visibility by:

  • Department
  • Job role
  • Security group
  • Entra ID claims

5. Publish to Teams

Deploy directly into:

Microsoft Teams

Final Thoughts

This architecture is far more than a simple document summarizer.

It is the foundation for:

  • Enterprise knowledge assistants
  • AI-powered search systems
  • RAG copilots
  • Internal support bots
  • Intelligent compliance assistants

By combining:

  • Azure Blob Storage
  • Azure Functions
  • REST APIs
  • Copilot Studio
  • AI summarization

You create a scalable, enterprise-ready AI platform capable of transforming how employees access and consume information.

Where to Go Next

After completing this project, explore:

  • Multi-container routing
  • Metadata indexing
  • Vector search
  • OCR pipelines
  • SharePoint integration
  • Azure AI Search
  • Semantic Kernel
  • LangChain orchestration
  • Entra ID-secured APIs

Conclusion

You now have a complete blueprint for building a cloud-native AI document summarization platform powered by:

  • Microsoft Copilot Studio
  • Microsoft Azure Azure Blob Storage
  • Python Azure Functions
  • REST APIs
  • OpenAPI integration

This solution is scalable, extensible, and production-ready with the right security hardening steps.

Most importantly, it demonstrates how modern AI copilots can move beyond chat and become true enterprise productivity platforms.

Hope you enjoy the session.

Please leave a comment below if you have any further questions.

Happy Sharing !!!
Keep Learning | Spread Knowledge | Stay blessed |

Top comments (2)

Collapse
 
kaliyan profile image
Kaaliyaan

Useful❤️

Collapse
 
seenakhan profile image
Seena Khan

Thank you