A Complete Hands-On Guide to Creating a Secure AI Agent that Reads, Lists, and Summarizes Corporate Documents
Modern organizations generate thousands of documents every month — reports, contracts, policies, meeting notes, technical manuals, onboarding guides, and operational procedures. Most of that information sits buried inside cloud storage, difficult to search and even harder to summarize quickly.
What if you could build an AI assistant that instantly retrieves and summarizes those documents directly from Azure Blob Storage?
In this comprehensive hands-on guide, you’ll learn how to create a production-ready AI document summarization agent using:
- Microsoft Copilot Studio
- Microsoft Azure Azure Blob Storage
- Azure Functions (Python)
- REST APIs with OpenAPI
- AI-powered summarization workflows
By the end, you’ll have a fully functional enterprise AI assistant capable of:
✅ Listing documents from Azure Blob Storage
✅ Extracting text from PDF, DOCX, TXT, MD, and CSV files
✅ Summarizing documents using structured AI responses
✅ Handling missing files gracefully
✅ Operating through a secure REST API architecture
What You Will Build
You will create a Copilot Studio agent called:
Azure Blob Document Summarizer
This intelligent assistant connects to corporate documents stored in Azure Blob Storage through a custom REST API.
The agent can:
- Retrieve all available documents
- Extract text content from documents
- Generate concise structured summaries
- Return action items and decisions from reports
- Help employees quickly understand large files
Final Solution Architecture
Here’s the complete request flow:
User
↓
Copilot Studio Agent
↓
Custom REST API Tool
↓
Azure Function App (Python)
↓
Azure Blob Storage
↓
Document Text Extraction
↓
AI Summary Response
Why This Architecture Works So Well
This solution separates responsibilities cleanly:
| Component | Responsibility |
|---|---|
| Copilot Studio | Conversational AI orchestration |
| Azure Function | Secure API and document processing |
| Blob Storage | Central document repository |
| OpenAPI Spec | API contract for Copilot |
| Python Extraction Layer | Reads document contents |
This modular architecture is scalable, secure, and enterprise-friendly.
Supported File Types
The extraction layer supports these document formats out of the box:
| File Type | Supported |
|---|---|
| ✅ | |
| DOCX | ✅ |
| TXT | ✅ |
| Markdown | ✅ |
| CSV | ✅ |
Important Note About Scanned PDFs
Scanned image-based PDFs contain no embedded text layer.
That means:
- Text extraction returns empty content
- OCR is required
For production workloads, integrate:
- Microsoft Azure Azure Document Intelligence
- OCR pipelines
- Form Recognizer services
Phase 0 — Prerequisites
Before building the solution, ensure you have the following.
Required Accounts and Licenses
| Requirement | Purpose |
|---|---|
| Microsoft 365 Tenant | Hosts Copilot Studio |
| Azure Subscription | Hosts Function App + Blob Storage |
| Copilot Studio License | Create and publish the agent |
| Power Apps License | Needed only for Custom Connector workflows |
Required Local Development Tools
Install the following tools locally.
1. Python 3.11
Required for Azure Functions Python runtime.
Official website:
2. Visual Studio Code
Install VS Code with the Azure Functions extension.
Official website:
3. Azure Functions Core Tools v4
Install globally:
npm i -g azure-functions-core-tools@4 --unsafe-perm true
4. Azure CLI
Verify installation:
az --version
Expected version:
2.86.0 or later
Official documentation:
Fixing Azure CLI Permission Errors on Windows
If you encounter:
PermissionError on C:\Users\<you>\.azure
Run PowerShell as Administrator:
icacls "C:\Users\<YourUser>\.azure" /grant "<YourUser>:(OI)(CI)F" /T
Alternative approach:
setx AZURE_CONFIG_DIR "C:\AzureCLI"
Phase 1 — Create Azure Blob Storage
Step 1 — Create the Storage Account
Inside the Azure Portal:
- Go to Storage Accounts
- Click Create
- Performance → Standard
- Redundancy → LRS
Create the Blob Container
After deployment:
- Open the storage account
- Go to Containers
- Create a container named:
agent-docs
- Set access level to:
Private
Upload Sample Documents
Upload a few test files:
- DOCX
- TXT
Save the Connection String
Navigate to:
Storage Account → Access Keys
Copy:
Connection String
You’ll use this later inside the Function App.
Enterprise Best Practice
Use separate containers for:
- Departments
- Business units
- Agents
- Security boundaries
This improves:
- DLP enforcement
- Auditability
- Governance
- Access isolation
Phase 2 — Build the REST API
Now you’ll create the backend API layer.
Project Structure
Create the following folder structure:
doc-summary-api/
├── function_app.py
├── requirements.txt
├── host.json
├── local.settings.json
├── doc-api-openapi.json
├── .gitignore
└── README.md
Understanding the API Design
The API exposes two endpoints:
| Endpoint | Purpose |
|---|---|
| GET /documents | Lists available files |
| GET /documents/{name} | Returns extracted text |
Phase 2.1 — Build the Azure Function
Create:
function_app.py
This file contains:
- HTTP triggers
- Blob Storage access
- File extraction logic
- JSON responses
The application uses:
- Azure Functions
- BlobServiceClient
- pypdf
- python-docx
Core Design Principle
The Function App acts as:
A Translation Layer
It converts:
Blob Storage Files
↓
Extracted Plain Text
↓
AI-Ready Content
This is critical because Copilot Studio works best with plain text.
Why Use ANONYMOUS Authentication Initially?
The guide uses:
func.AuthLevel.ANONYMOUS
Benefits:
- Simplifies Copilot integration
- No function key required
- Faster prototyping
Production environments should later switch to:
- OAuth
- Entra ID
- Managed Identity
Supported Text Extraction Logic
The extraction helper automatically handles:
| Extension | Extraction Method |
|---|---|
| PdfReader | |
| .docx | python-docx |
| .txt | UTF-8 decode |
| .csv | UTF-8 decode |
| .md | UTF-8 decode |
Unsupported files return:
[Unsupported file type]
Python Dependencies
Create:
requirements.txt
Contents:
azure-functions
azure-storage-blob
azure-identity
pypdf
python-docx
Configure host.json
This file controls:
- Runtime behavior
- Extension bundles
- Logging
It also enables Application Insights integration.
Configure local.settings.json
This file stores:
- Local environment variables
- Storage connection strings
- Runtime settings
⚠️ Never commit this file to GitHub.
Add it to:
.gitignore
Test the API Locally
Install dependencies:
pip install -r requirements.txt
Start the Functions runtime:
func start
Test Endpoint 1 — List Documents
Open:
http://localhost:7071/api/documents
Expected result:
{
"documents": [...]
}
Test Endpoint 2 — Retrieve Document Content
Example:
http://localhost:7071/api/documents/sample.pdf
Expected response:
{
"name": "sample.pdf",
"content": "Extracted text..."
}
Phase 3 — Deploy to Azure
Now it’s time to move from local development to the cloud.
Critical Azure Functions Rule
Python Azure Functions require:
Linux Hosting
If Python does not appear in the runtime list:
❌ You selected Windows
✅ Switch to Linux
Recommended Hosting Plan
Use:
Flex Consumption
Benefits:
- Serverless
- Scales to zero
- Lowest cost
- Auto-scaling
Configure Environment Variables
Inside the Function App:
Settings → Environment Variables
Add:
| Name | Value |
|---|---|
| BLOB_CONNECTION_STRING | Your storage connection string |
| BLOB_CONTAINER | agent-docs |
Restart the Function App afterward.
Deploy from VS Code
Using the Azure extension:
- Sign in
- Right-click project
- Deploy to Function App
- Confirm overwrite
Validate Deployment
Verify:
Functions → list_documents
Functions → get_document
Then test the public URL.
Phase 4 — Create the OpenAPI Specification
This step is extremely important.
Copilot Studio uses the OpenAPI document to understand:
- Endpoints
- Parameters
- Outputs
- Schemas
Why OpenAPI Matters
Without OpenAPI:
❌ Copilot cannot discover your actions
❌ Tool orchestration breaks
❌ Parameters become unreliable
With OpenAPI:
✅ Actions become AI callable
✅ Schemas are validated
✅ Responses are structured
Critical OpenAPI Formatting Rules
The host field must contain:
✅ Hostname only
Correct:
contoso-api.azurewebsites.net
Incorrect:
https://contoso-api.azurewebsites.net
Common Swagger Error
If you see:
Swagger contains base path:/api but backend Url doesn't end on same path...
The host format is wrong.
Phase 5 — Configure Copilot Studio
Now the exciting part begins.
Create the AI Agent
Inside Microsoft Copilot Studio:
- Create a new agent
- Name it:
Azure Blob Document Summarizer
- Enable:
Generative Orchestration
Why Generative Orchestration Matters
This feature allows the agent to:
- Decide which action to call
- Chain tool executions
- Interpret user intent dynamically
- Recover from failures
Without it, the agent becomes rigid.
Agent Instruction Design
Your instructions teach the agent:
- When to call ListDocuments
- When to call GetDocument
- How to summarize
- How to handle failures
This prompt engineering layer is critical.
Recommended Summary Structure
Use a consistent response format:
| Section | Purpose |
|---|---|
| Purpose | High-level objective |
| Key Points | Main insights |
| Decisions | Important approvals |
| Action Items | Next steps |
This improves readability dramatically.
Add the REST API Tool
Inside the Tools tab:
Create:
AzureDocAPI
Avoid:
- Spaces
- Hyphens
- Duplicate registrations
Configure the Actions
Action 1 — ListDocuments
GET /documents
Action 2 — GetDocument
GET /documents/{name}
Input parameter:
name
Attach the Tool
Finally:
Agent → Tools → Add Tool
Select:
AzureDocAPI
Phase 6 — Test the Agent
Example Prompt 1
What documents do you have?
Expected behavior:
- Agent calls ListDocuments
- Returns available filenames
Example Prompt 2
Summarize Q1-report.pdf
Expected behavior:
- Calls GetDocument
- Retrieves extracted text
- Generates structured summary
Example Prompt 3
Summarize a missing file
Expected behavior:
- Calls ListDocuments
- Suggests closest matches
Troubleshooting Guide
1. Azure CLI Permission Errors
Fix with:
icacls "C:\Users\<you>\.azure" /grant "<you>:(OI)(CI)F" /T
2. Python Missing in Runtime Dropdown
Cause:
Windows OS selected
Fix:
Use Linux
3. Swagger Import Failure
Cause:
Incorrect host formatting.
4. 401 Unauthorized
Possible causes:
- Missing function key
- Cached APIM responses
- Incorrect authentication mode
5. Hyphenated Tool Name Errors
Avoid names like:
Azure-Doc-API
Use:
AzureDocAPI
instead.
Production Hardening Checklist
Before rolling this out enterprise-wide, implement:
| Security Improvement | Recommended |
|---|---|
| OAuth Authentication | ✅ |
| Entra ID Protection | ✅ |
| Managed Identity | ✅ |
| DLP Policies | ✅ |
| Blob Segmentation | ✅ |
| File Size Limits | ✅ |
| Rate Limiting | ✅ |
| Application Insights | ✅ |
Recommended Enterprise Enhancements
Here are powerful next steps.
1. Add OCR Support
Integrate:
Microsoft Azure Azure Document Intelligence
to support:
- Scanned PDFs
- Images
- Handwritten documents
2. Add Semantic Search
Create:
SearchDocuments
This enables:
- Keyword filtering
- Metadata filtering
- Natural language search
3. Add Vector Embeddings
Store embeddings using:
- Azure AI Search
- Vector databases
- Retrieval-Augmented Generation (RAG)
This transforms your assistant into a true enterprise knowledge system.
4. Add Role-Based Access Control
Limit document visibility by:
- Department
- Job role
- Security group
- Entra ID claims
5. Publish to Teams
Deploy directly into:
Microsoft Teams
Final Thoughts
This architecture is far more than a simple document summarizer.
It is the foundation for:
- Enterprise knowledge assistants
- AI-powered search systems
- RAG copilots
- Internal support bots
- Intelligent compliance assistants
By combining:
- Azure Blob Storage
- Azure Functions
- REST APIs
- Copilot Studio
- AI summarization
You create a scalable, enterprise-ready AI platform capable of transforming how employees access and consume information.
Where to Go Next
After completing this project, explore:
- Multi-container routing
- Metadata indexing
- Vector search
- OCR pipelines
- SharePoint integration
- Azure AI Search
- Semantic Kernel
- LangChain orchestration
- Entra ID-secured APIs
Conclusion
You now have a complete blueprint for building a cloud-native AI document summarization platform powered by:
- Microsoft Copilot Studio
- Microsoft Azure Azure Blob Storage
- Python Azure Functions
- REST APIs
- OpenAPI integration
This solution is scalable, extensible, and production-ready with the right security hardening steps.
Most importantly, it demonstrates how modern AI copilots can move beyond chat and become true enterprise productivity platforms.
Hope you enjoy the session.
Please leave a comment below if you have any further questions.
Happy Sharing !!!
Keep Learning | Spread Knowledge | Stay blessed |
Top comments (2)
Useful❤️
Thank you