In today’s digital-first world, automating document processing is a game-changer. Whether it’s invoices, receipts, or forms—extracting and analyzing structured data from unstructured content can save countless hours. In this post, we'll walk through how to build a .NET Core MVC web application that leverages AWS Textract, a powerful OCR (Optical Character Recognition) service, and AWS Comprehend, a Natural Language Processing (NLP) service, to extract and analyze text from documents.
📊 Why AWS Textract & AWS Comprehend?
- AWS Textract goes beyond simple OCR. It can:
- Detect printed text, handwriting, tables, and forms.
- Extract structured data from documents.
- Work with PDFs and images.
AWS Comprehend allows you to:
- Analyze text for sentiment, key phrases, and named entities.
- Detect the dominant language.
- Classify and organize textual content.
By combining these services, you can both extract and interpret content for smarter automation.
🔧 Tech Stack & Prerequisites
- .NET 6 (or later)
- AWS SDK for .NET
- Visual Studio / VS Code
- AWS Account with Textract and Comprehend enabled
- IAM user with AmazonTextractFullAccess and ComprehendFullAccess
⚙️ Step 1: Project Setup
Open your terminal and scaffold a new MVC project:
dotnet new mvc -n TextractComprehendApp
cd TextractComprehendApp
dotnet add package AWSSDK.Textract
dotnet add package AWSSDK.Comprehend
Configure AWS credentials:
aws configure
# Enter Access Key, Secret, Region (e.g., us-east-1)
📁 Step 2: Create File Upload UI
Views/Home/Index.cshtml
Upload & Analyze
📑 Step 3: Add Textract & Comprehend Integration in Controller
Controllers/HomeController.cs
using Amazon.Textract;
using Amazon.Textract.Model;
using Amazon.Comprehend;
using Amazon.Comprehend.Model;
using Microsoft.AspNetCore.Mvc;
public class HomeController : Controller
{
private readonly IAmazonTextract _textractClient;
private readonly IAmazonComprehend _comprehendClient;
public HomeController(IAmazonTextract textractClient, IAmazonComprehend comprehendClient)
{
_textractClient = textractClient;
_comprehendClient = comprehendClient;
}
[HttpPost]
public async Task<IActionResult> Upload(IFormFile document)
{
using var stream = document.OpenReadStream();
var request = new DetectDocumentTextRequest
{
Document = new Document
{
Bytes = new MemoryStream()
}
};
await stream.CopyToAsync(request.Document.Bytes);
request.Document.Bytes.Position = 0;
var response = await _textractClient.DetectDocumentTextAsync(request);
var lines = response.Blocks
.Where(b => b.BlockType == BlockType.LINE)
.Select(b => b.Text)
.ToList();
var text = string.Join(" ", lines);
var comprehendRequest = new DetectEntitiesRequest
{
Text = text,
LanguageCode = "en"
};
var comprehendResponse = await _comprehendClient.DetectEntitiesAsync(comprehendRequest);
ViewBag.ExtractedText = string.Join("\n", lines);
ViewBag.Entities = comprehendResponse.Entities;
return View("Result");
}
}
📋 Step 4: Display Extracted Text and NLP Insights
Views/Home/Result.cshtml
<h2>Extracted Text</h2>
<pre>@ViewBag.ExtractedText</pre>
<h2>Identified Entities</h2>
<ul>
@foreach (var entity in ViewBag.Entities)
{
<li><strong>@entity.Type:</strong> @entity.Text (Score: @entity.Score)</li>
}
</ul>
<a href="/">Upload Another Document</a>
📊 Pro Tips for Production
✅ Use S3 for large file uploads
✅ Implement StartDocumentTextDetection for async processing of multipage PDFs
✅ Validate file type, size, and sanitize input
✅ Use Comprehend’s sentiment or key phrase detection for deeper analysis
✅ Add error handling and logging
📈 Future Enhancements
Extract tables and forms using AnalyzeDocumentAsync
Add support for sentiment and key phrase detection from Comprehend
Store extracted and analyzed data in DynamoDB or SQL Server
Add Cognito for user authentication
Visualize entity relationships using JS libraries
📚 Conclusion
AWS Textract + AWS Comprehend + .NET Core is a powerful stack to build smart, cloud-native document processing and analysis apps. Whether you're automating invoice entry or building intelligent insights, these services provide a flexible, scalable foundation.
📅 Stay Tuned
In the next post, we'll explore how to extract tables and form fields and enrich results using AWS Comprehend’s key phrases and sentiment analysis.
If you found this helpful, tap ❤️ and follow me for more .NET and AWS how-to guides.
Keywords: .NET Textract Tutorial, AWS Textract OCR, AWS Comprehend NLP, Build OCR App in .NET, Intelligent Document Processing .NET, AWS NLP with MVC
Top comments (0)