DEV Community: ABBYY Developer

DevCon 2025 Workshop: Creating a Document Processing MCP Server

Matt Netkow — Fri, 08 Aug 2025 15:09:01 +0000

I recently led an engaging hands-on workshop at ABBYY DevCon 2025. The session, "Creating a Document Processing MCP Server," was crafted for developers eager to explore advanced workflow automation, focused on constructing a Model Context Protocol (MCP) server tailored for intelligent document processing (IDP) tasks. MCP is one of the hottest technologies in the AI ecosystem, so I was thrilled to give a presentation on its fundamentals. Here’s an overview of the workshop, its objectives, and the interactive exercises participants tackled.

👉 Want to try some hands-on MCP Server exercises? Access the workshop materials.

A deep dive into MCP servers

The workshop was centered on building an MCP server designed to streamline a typical bank account onboarding process. By integrating a range of cutting-edge tools and techniques, participants learned how to manage tasks including document uploads, data extraction, validation, and final submission.

Why MCP servers matter

Model Context Protocol, or MCP, is an open standard that bridges large language models (LLMs) with various data sources and tools, enabling developers to design dynamic workflows. The server built during this workshop demonstrated how MCP serves as both an engine and an orchestrator, bringing both determinism and scalability to complex IDP scenarios. Key features of the MCP server covered in the workshop included:

Document processing automation: Eliminate manual data entry through structured workflows.
Real-time insights and validation: Verify identity and residency documents on the fly.
Extensibility: Seamlessly integrate additional tools, prompts, and external resources as needed.

Hands-on learning experience

My main focus of the workshop was ensuring an interactive experience. I broke the material into manageable sections with practical coding exercises after each one. Participants were able to immediately apply their knowledge, receive feedback, and enhance their understanding incrementally. Here’s a high-level overview of what was covered:

1. MCP fundamentals

The session started with an introduction to MCP architecture and its benefits within document processing workflows. Attendees learned how MCP implements transports, clients, and servers to handle requests and orchestrate operations effectively.

2. Setting up and testing an MCP server

Participants began hands-on by setting up a basic MCP server, verifying its functionality using the MCP Inspector debugging tool. This phase focused on understanding the foundational structure and testing server responses to ensure accuracy.

3. Document processing with ABBYY Document AI API

The workshop progressed into integrating the ABBYY Document AI API for optical character recognition (OCR). This step highlighted the importance of leveraging robust, domain-specific OCR tools to avoid issues like hallucinated data, common with general-purpose LLMs. Attendees worked on converting utility bill uploads into structured data, demonstrating real-world applications of IDP.

4. Tool and resource management

Exercises included defining and implementing server tools to process and validate user data. Participants learned about incorporating a “human-in-the-loop” approach, ensuring secure and effective operations when handling sensitive data.

5. Extending MCP with Claude integration

Expanding the server's capabilities, participants explored integration with Claude Desktop for creating prompts and guiding workflows. This step allowed for experimenting with adaptive responses, making workflows more dynamic and user focused.

Final workflow: Bank account onboarding

At the end of the workshop, we had built a complete bank account opening workflow:

From Claude Desktop, the user enters a prompt:
"Help me open a bank account with ABBYY Bank."
The LLM, connecting to the MCP server, recognizes that the user needs to upload a utility bill.
The user uploads a utility bill and the MCP server extracts its data using the Document AI API.
If the info looks correct, the user confirms that the Server can submit the application to the bank.
The LLM responds:
"Excellent! Your bank account application has been successfully submitted to ABBYY Bank. They'll be in touch with next steps."

Explore the workshop at your own pace

While the live session provided a vibrant and collaborative atmosphere, anyone interested can take the workshop independently. To facilitate this, all code and step-by-step instructions are available in a dedicated GitHub repository:

👉 Access the workshop materials

The repository, available in Python or TypeScript, includes everything needed to build and test MCP-based workflows, from environment setup to live debugging tools and reference exercises. Developers can also experiment with the next steps suggested in the workshop, such as incorporating new validation types or extending the server for additional document workflows.

Share your journey

If you’re ready to learn MCP fundamentals and how to incorporate intelligent document processing, the "Creating a Document Processing MCP Server" workshop is an excellent place to start. Explore the repository, complete the exercises, and bring your knowledge into real-world projects.

We'd love to hear your feedback and see the innovations you build with MCP! Let us know when you've completed the workshop by tagging me @dotNetkow or @ABBYY on LinkedIn or X.

Choosing OCR Technology: Key Considerations for Software Developers

Matt Netkow — Mon, 24 Mar 2025 16:30:11 +0000

When it comes to choosing OCR (Optical Character Recognition) technology, developers have a lot to consider. Since OCR solutions have been around for decades, it’s tempting to think that they are standardized and thus, any of them will do. That couldn’t be farther from the truth: not all OCRs are created equally, so choosing the right one can still be a headache. From the type of models to AI offerings to pricing and community support, many factors play a crucial role in determining the best fit for your project. This article covers key points to keep in mind, including considerations for open source models, limitations of LLMs, and pricing.

Join the waitlist: new OCR API for AI developers coming soon

Open-source models: Cost effective, but less accurate

Open-source OCR models like Tesseract and PaddleOCR are popular choices among developers due to their accessibility and cost-effectiveness. However, they come with certain limitations:
Accuracy: Open-source models often have lower accuracy compared to commercial engines. They struggle with handwriting, rotated text, and low-quality images.
Support for complex documents: These models may not handle complex documents, tables, and charts effectively.
Continuous optimization: Enhancements to OSS models are at the whim of the community. Maintainers come and go, and their priorities often differ from your project’s needs. Proprietary companies maintain an edge through continuous optimization, leveraging years of practical experience and refined technologies.

Open-source OCR models may work for POCs or processing simple documents, but if high-quality, reliable accuracy is a must, they are a no-go.

Can LLMs replace OCR? Not so fast

LLMs like GPT-4.5 and other general-purpose AI models are increasingly being used for document processing. The ability to quickly test their OCR abilities by uploading a document through a web UI or chatbot is compelling. However, they also have their challenges:

Hallucinations: LLMs often omit significant portions of text, hallucinate content, and fail to output text coordinates.
Inconsistencies: They display inconsistent formatting and table extraction, making them less reliable for robust OCR tasks. Results themselves are inconsistent too, meaning you could process the same document ten times and get ten different results.
Speed and cost: LLM-based extraction can be slow and expensive due to high compute costs.

Due to the unpredictability of inaccuracies in large language models (LLMs), the automation of business processes is hindered. This puts significant burden on the developer to capture errors and code exceptions, feeling like a game of “LLM whack-a-mole.” Downstream, any issues missed would require users to resort to manual corrections. This defeats the purpose of introducing OCR solutions in the first place.

Pricing: Cheap may cost you more

Pricing is a critical factor when choosing an OCR solution, but it's not just about the cost.

Support and reliability: A significant benefit of paying for a solution, especially when business-critical processes depend on it, is ready access to the support, advisory, and SLAs are included.
Cost-effectiveness: Look for solutions that offer a low-cost, pay-as-you-go model, ensuring scalable solutions without unexpected expenses.
Free trials and freemium tiers: Many commercial OCR solutions offer free trials or freemium tiers, allowing developers to test capabilities before committing.
Capability comparisons: Many solutions, especially those from hyperscalers like Microsoft or AWS, appear cheap up front because they price their OCR capabilities a la carte. When compared to an all-inclusive pricing model, of course it’ll seem cheaper! Review all pricing pages carefully.

When assessing OCR solutions, seek those that provide adequate trial periods, sufficient document processing capacity, and a pay-as-you-go pricing model.

Developer support and community

A great product is not enough; comprehensive support and an active community are essential.

Documentation and SDKs: Ensure the OCR solution provides detailed documentation, SDKs, and sandbox environments to streamline integration and optimize solutions.
Community engagement: The OCR solution should have an active and friendly developer community to turn to if needed. The best encourage you to exchange ideas, get expert guidance, and enhance your OCR implementations.

The OCR world is more complex than it looks on the surface. It’s a solved problem, until you need real-world accuracy, reliability, and robust capabilities. To ensure project success, look for a strong company and community-backed solution.

Introducing ABBYY’s purpose-built Document OCR API for Developers (Coming soon)

Choosing the right OCR solution involves balancing the above factors to meet your specific needs. If your project is business critical, then ABBYY’s new Document AI platform warrants a look.

ABBYY’s upcoming Document AI API is a developer-friendly, purpose-built OCR service designed for seamless integration into AI-powered business process automation workflows. It efficiently converts unstructured business documents into structured JSON with exceptional accuracy and reliability, equipping your business solutions and application for success.

Join the Waitlist