During my internship, I had the opportunity to work on a suite of AI agentic tools for their lightweight library 'agno agi'. One of the key projects I contributed to was a Document Reader capable of reading and processing PDFs, CSVs, and PNG files.
What is Agno
Agno is a powerful framework for building high-performance, multi-modal AI agents. It’s model-agnostic, supports text, image, audio, and video inputs/outputs, and enables advanced reasoning, memory, and multi-agent architectures. With features like built-in search, structured outputs, FastAPI integration, and real-time monitoring, Agno helps you create sophisticated agentic systems quickly and efficiently.
Project Overview
The goal of the project was to create a tool that could automatically read different document formats and extract meaningful information. This has practical applications in educational and administrative workflows, such as:
- Answer sheets evaluation
- Marksheet processing
- Question paper review
Technologies & Tools
- Agno AGI – Core AI/agentic engine for document understanding
- Google Generative AI – For reading and interpreting images (PNG)
- Python – Programming language
- PyPDF2 / pdfplumber – PDF extraction
- python-docx – DOCX file processing
- Pandas – CSV handling
- OpenCV + Tesseract OCR – Image preprocessing for better recognition
Implementation Highlights
- PDF & DOCX Reading: Extracted text and tables from PDFs and DOCX documents
- CSV Handling: Fast processing of structured data for marksheets and reports.
- PNG/Image Reading: OCR and image understanding using Google Generative AI, integrated with OpenCV and Tesseract for high accuracy.
Here’s a sample code demonstrating image reading and generating responses using the AI agent.
import os
import PIL.Image
import google.generativeai as genai
from textwrap import dedent
from agno.agent import Agent
from agno.models.google import Gemini
def _config_genai():
api_key = YOUR_API_KEY
if not api_key:
raise RuntimeError("Set GEMINI_API_KEY env var.")
genai.configure(api_key=api_key)
return genai.GenerativeModel('gemini-2.0-flash')
def get_text_from_image(file_path):
"""Extract text/description from an image using Gemini."""
try:
model = _config_genai()
img = PIL.Image.open(file_path)
resp = model.generate_content(img)
# gemini SDK returns response object;
return {"text": getattr(resp, "text", str(resp))}
except Exception as e:
return {"error": str(e)}
Smart_Advise_student = Agent(
model=_gemini(),
instructions=dedent("""
You analyze messy text extracted from PDFs/DOCX/Images.Parse marks/topics,
summarize performance, give topic explanations, classify, and recommend next steps.
"""),
)
Challenges & Learnings
- Multi-format processing: Ensuring all file types (PDF, DOCX, CSV, PNG) could be processed in one workflow.
- Image recognition accuracy: Google Generative AI improved reading of low-quality scans and handwritten content.
- Agentic AI integration: Learned to leverage Agno AGI and generative AI together for practical document automation.
This project enhanced my skills in AI, OCR, document processing, and agentic tool development, providing hands-on experience with cutting-edge AI technologies.
Top comments (0)