Building a Document Reader AI Tool During My Internship at Agno

#ai #agents #development

During my internship, I had the opportunity to work on a suite of AI agentic tools for their lightweight library 'agno agi'. One of the key projects I contributed to was a Document Reader capable of reading and processing PDFs, CSVs, and PNG files.

What is Agno
Agno is a powerful framework for building high-performance, multi-modal AI agents. It’s model-agnostic, supports text, image, audio, and video inputs/outputs, and enables advanced reasoning, memory, and multi-agent architectures. With features like built-in search, structured outputs, FastAPI integration, and real-time monitoring, Agno helps you create sophisticated agentic systems quickly and efficiently.

Project Overview
The goal of the project was to create a tool that could automatically read different document formats and extract meaningful information. This has practical applications in educational and administrative workflows, such as:

Answer sheets evaluation
Marksheet processing
Question paper review

Technologies & Tools

Agno AGI – Core AI/agentic engine for document understanding
Google Generative AI – For reading and interpreting images (PNG)
Python – Programming language
PyPDF2 / pdfplumber – PDF extraction
python-docx – DOCX file processing
Pandas – CSV handling
OpenCV + Tesseract OCR – Image preprocessing for better recognition

Implementation Highlights

PDF & DOCX Reading: Extracted text and tables from PDFs and DOCX documents
CSV Handling: Fast processing of structured data for marksheets and reports.
PNG/Image Reading: OCR and image understanding using Google Generative AI, integrated with OpenCV and Tesseract for high accuracy.

Here’s a sample code demonstrating image reading and generating responses using the AI agent.

import os
import PIL.Image
import google.generativeai as genai
from textwrap import dedent
from agno.agent import Agent
from agno.models.google import Gemini

def _config_genai():
    api_key = YOUR_API_KEY
    if not api_key:
        raise RuntimeError("Set GEMINI_API_KEY env var.")
    genai.configure(api_key=api_key)
    return genai.GenerativeModel('gemini-2.0-flash')

def get_text_from_image(file_path):
    """Extract text/description from an image using Gemini."""
    try:
        model = _config_genai()
        img = PIL.Image.open(file_path)
        resp = model.generate_content(img)
        # gemini SDK returns response object;
        return {"text": getattr(resp, "text", str(resp))}
    except Exception as e:
        return {"error": str(e)}

Smart_Advise_student = Agent(
    model=_gemini(),
    instructions=dedent("""
    You analyze messy text extracted from PDFs/DOCX/Images.Parse    marks/topics,
    summarize performance, give topic explanations, classify, and recommend next steps.
    """),
)

Challenges & Learnings

Multi-format processing: Ensuring all file types (PDF, DOCX, CSV, PNG) could be processed in one workflow.
Image recognition accuracy: Google Generative AI improved reading of low-quality scans and handwritten content.
Agentic AI integration: Learned to leverage Agno AGI and generative AI together for practical document automation.

This project enhanced my skills in AI, OCR, document processing, and agentic tool development, providing hands-on experience with cutting-edge AI technologies.

DEV Community

Building a Document Reader AI Tool During My Internship at Agno

Top comments (0)