DEV Community

Santosh Kumar Tripathy
Santosh Kumar Tripathy

Posted on

🚀 Building a .NET Web Application with AWS Textract and Comprehend for Intelligent Document Processing

Image description

In today’s digital-first world, automating document processing is a game-changer. Whether it’s invoices, receipts, or forms—extracting and analyzing structured data from unstructured content can save countless hours. In this post, we'll walk through how to build a .NET Core MVC web application that leverages AWS Textract, a powerful OCR (Optical Character Recognition) service, and AWS Comprehend, a Natural Language Processing (NLP) service, to extract and analyze text from documents.

📊 Why AWS Textract & AWS Comprehend?

  • AWS Textract goes beyond simple OCR. It can:
  • Detect printed text, handwriting, tables, and forms.
  • Extract structured data from documents.
  • Work with PDFs and images.

AWS Comprehend allows you to:

  • Analyze text for sentiment, key phrases, and named entities.
  • Detect the dominant language.
  • Classify and organize textual content.

By combining these services, you can both extract and interpret content for smarter automation.

🔧 Tech Stack & Prerequisites

  • .NET 6 (or later)
  • AWS SDK for .NET
  • Visual Studio / VS Code
  • AWS Account with Textract and Comprehend enabled
  • IAM user with AmazonTextractFullAccess and ComprehendFullAccess

⚙️ Step 1: Project Setup

Open your terminal and scaffold a new MVC project:

dotnet new mvc -n TextractComprehendApp
cd TextractComprehendApp
dotnet add package AWSSDK.Textract
dotnet add package AWSSDK.Comprehend
Enter fullscreen mode Exit fullscreen mode

Configure AWS credentials:

aws configure
# Enter Access Key, Secret, Region (e.g., us-east-1)
Enter fullscreen mode Exit fullscreen mode

📁 Step 2: Create File Upload UI

Views/Home/Index.cshtml

Upload & Analyze

📑 Step 3: Add Textract & Comprehend Integration in Controller

Controllers/HomeController.cs

using Amazon.Textract;
using Amazon.Textract.Model;
using Amazon.Comprehend;
using Amazon.Comprehend.Model;
using Microsoft.AspNetCore.Mvc;

public class HomeController : Controller
{
private readonly IAmazonTextract _textractClient;
private readonly IAmazonComprehend _comprehendClient;

public HomeController(IAmazonTextract textractClient, IAmazonComprehend comprehendClient)
{
    _textractClient = textractClient;
    _comprehendClient = comprehendClient;
}

[HttpPost]
public async Task<IActionResult> Upload(IFormFile document)
{
    using var stream = document.OpenReadStream();
    var request = new DetectDocumentTextRequest
    {
        Document = new Document
        {
            Bytes = new MemoryStream()
        }
    };
    await stream.CopyToAsync(request.Document.Bytes);
    request.Document.Bytes.Position = 0;

    var response = await _textractClient.DetectDocumentTextAsync(request);
    var lines = response.Blocks
        .Where(b => b.BlockType == BlockType.LINE)
        .Select(b => b.Text)
        .ToList();

    var text = string.Join(" ", lines);

    var comprehendRequest = new DetectEntitiesRequest
    {
        Text = text,
        LanguageCode = "en"
    };

    var comprehendResponse = await _comprehendClient.DetectEntitiesAsync(comprehendRequest);

    ViewBag.ExtractedText = string.Join("\n", lines);
    ViewBag.Entities = comprehendResponse.Entities;
    return View("Result");
}
Enter fullscreen mode Exit fullscreen mode

}

📋 Step 4: Display Extracted Text and NLP Insights
Views/Home/Result.cshtml

<h2>Extracted Text</h2>
<pre>@ViewBag.ExtractedText</pre>

<h2>Identified Entities</h2>
<ul>
@foreach (var entity in ViewBag.Entities)
{
    <li><strong>@entity.Type:</strong> @entity.Text (Score: @entity.Score)</li>
}
</ul>

<a href="/">Upload Another Document</a>
Enter fullscreen mode Exit fullscreen mode

📊 Pro Tips for Production

✅ Use S3 for large file uploads

✅ Implement StartDocumentTextDetection for async processing of multipage PDFs

✅ Validate file type, size, and sanitize input

✅ Use Comprehend’s sentiment or key phrase detection for deeper analysis

✅ Add error handling and logging

📈 Future Enhancements

Extract tables and forms using AnalyzeDocumentAsync

Add support for sentiment and key phrase detection from Comprehend

Store extracted and analyzed data in DynamoDB or SQL Server

Add Cognito for user authentication

Visualize entity relationships using JS libraries

📚 Conclusion

AWS Textract + AWS Comprehend + .NET Core is a powerful stack to build smart, cloud-native document processing and analysis apps. Whether you're automating invoice entry or building intelligent insights, these services provide a flexible, scalable foundation.

📅 Stay Tuned

In the next post, we'll explore how to extract tables and form fields and enrich results using AWS Comprehend’s key phrases and sentiment analysis.

If you found this helpful, tap ❤️ and follow me for more .NET and AWS how-to guides.

Keywords: .NET Textract Tutorial, AWS Textract OCR, AWS Comprehend NLP, Build OCR App in .NET, Intelligent Document Processing .NET, AWS NLP with MVC

Top comments (0)