<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hriday Patel</title>
    <description>The latest articles on DEV Community by Hriday Patel (@hriday_patel_fc9018170287).</description>
    <link>https://dev.to/hriday_patel_fc9018170287</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3465057%2Feb56534b-8a83-4791-8324-3febbb67bacf.png</url>
      <title>DEV Community: Hriday Patel</title>
      <link>https://dev.to/hriday_patel_fc9018170287</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hriday_patel_fc9018170287"/>
    <language>en</language>
    <item>
      <title>Building a Document Reader AI Tool During My Internship at Agno</title>
      <dc:creator>Hriday Patel</dc:creator>
      <pubDate>Thu, 28 Aug 2025 12:36:40 +0000</pubDate>
      <link>https://dev.to/hriday_patel_fc9018170287/building-a-document-reader-ai-tool-during-my-internship-at-agno-25g0</link>
      <guid>https://dev.to/hriday_patel_fc9018170287/building-a-document-reader-ai-tool-during-my-internship-at-agno-25g0</guid>
      <description>&lt;p&gt;During my internship, I had the opportunity to work on a suite of AI agentic tools for their lightweight library 'agno agi'. One of the key projects I contributed to was a Document Reader capable of reading and processing PDFs, CSVs, and PNG files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is Agno&lt;/strong&gt;&lt;br&gt;
Agno is a powerful framework for building high-performance, multi-modal AI agents. It’s model-agnostic, supports text, image, audio, and video inputs/outputs, and enables advanced reasoning, memory, and multi-agent architectures. With features like built-in search, structured outputs, FastAPI integration, and real-time monitoring, Agno helps you create sophisticated agentic systems quickly and efficiently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project Overview&lt;/strong&gt;&lt;br&gt;
The goal of the project was to create a tool that could automatically read different document formats and extract meaningful information. This has practical applications in educational and administrative workflows, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Answer sheets evaluation&lt;/li&gt;
&lt;li&gt;Marksheet processing&lt;/li&gt;
&lt;li&gt;Question paper review
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyq3882roxtiv06qakcfz.png" alt=" " width="800" height="369"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technologies &amp;amp; Tools&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agno AGI – Core AI/agentic engine for document understanding&lt;/li&gt;
&lt;li&gt;Google Generative AI – For reading and interpreting images (PNG)&lt;/li&gt;
&lt;li&gt;Python – Programming language&lt;/li&gt;
&lt;li&gt;PyPDF2 / pdfplumber – PDF extraction&lt;/li&gt;
&lt;li&gt;python-docx – DOCX file processing&lt;/li&gt;
&lt;li&gt;Pandas – CSV handling&lt;/li&gt;
&lt;li&gt;OpenCV + Tesseract OCR – Image preprocessing for better recognition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Implementation Highlights&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDF &amp;amp; DOCX Reading: Extracted text and tables from PDFs and DOCX documents&lt;/li&gt;
&lt;li&gt;CSV Handling: Fast processing of structured data for marksheets and reports.&lt;/li&gt;
&lt;li&gt;PNG/Image Reading: OCR and image understanding using Google Generative AI, integrated with OpenCV and Tesseract for high accuracy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s a sample code demonstrating image reading and generating responses using the AI agent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import PIL.Image
import google.generativeai as genai
from textwrap import dedent
from agno.agent import Agent
from agno.models.google import Gemini

def _config_genai():
    api_key = YOUR_API_KEY
    if not api_key:
        raise RuntimeError("Set GEMINI_API_KEY env var.")
    genai.configure(api_key=api_key)
    return genai.GenerativeModel('gemini-2.0-flash')

def get_text_from_image(file_path):
    """Extract text/description from an image using Gemini."""
    try:
        model = _config_genai()
        img = PIL.Image.open(file_path)
        resp = model.generate_content(img)
        # gemini SDK returns response object;
        return {"text": getattr(resp, "text", str(resp))}
    except Exception as e:
        return {"error": str(e)}

Smart_Advise_student = Agent(
    model=_gemini(),
    instructions=dedent("""
    You analyze messy text extracted from PDFs/DOCX/Images.Parse    marks/topics,
    summarize performance, give topic explanations, classify, and recommend next steps.
    """),
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Challenges &amp;amp; Learnings&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-format processing: Ensuring all file types (PDF, DOCX, CSV, PNG) could be processed in one workflow.&lt;/li&gt;
&lt;li&gt;Image recognition accuracy: Google Generative AI improved reading of low-quality scans and handwritten content.&lt;/li&gt;
&lt;li&gt;Agentic AI integration: Learned to leverage Agno AGI and generative AI together for practical document automation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This project enhanced my skills in AI, OCR, document processing, and agentic tool development, providing hands-on experience with cutting-edge AI technologies.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>development</category>
    </item>
  </channel>
</rss>
