DEV Community: Jad Tounsi

📄 OCR Reader, 🔍 Analyzer, and 💬 Chat Assistant using 🔎 Zerox, 🧠 GPT-4o, powered by 🚀 AI/ML API

Jad Tounsi — Tue, 22 Oct 2024 00:39:51 +0000

What I Built

I built an OCR Document Reader that allows users to upload and extract text from various document types such as PDFs, Word, and documents. The app utilizes the Zerox library for Optical Character Recognition (OCR) and integrates the AI/ML API's GPT-4o model for advanced text analysis. With features like support for multiple document formats, text analysis, and an interactive interface built with Gradio 5.0, this app simplifies the process of extracting and analyzing text from complex documents.

Limitations

Processing Time: Enabling the maintain_format option can slow down processing due to sequential requests needed to preserve formatting.
API Constraints: The app's capabilities depend on the limitations of the AI/ML API plan, such as request quotas and document size restrictions.
System Dependencies: Requires installation of system packages like poppler-utils, which may not be straightforward on all platforms.

Demo

Here are some key features of the app:

Upload Documents:

Users can upload PDFs, Word documents, or images for OCR processing.
Extracted Text Display:

The extracted text is displayed within the app, with options to copy or download it.
Maintain Formatting:

Optionally preserve the original document's formatting in the extracted text.

My Code

Find the source code for the project on GitHub.

Tech Stack

Python: Core programming language.
Gradio 5.0: For building the user-friendly interface.
Zerox: Library used for OCR processing.
AI/ML API: Provides the GPT-4o model for text analysis.
LiteLLM: Used under the hood for model interactions.

More Details

Zerox Library: Transforms uploaded documents into images and performs OCR to extract text.
AI/ML API's GPT-4o: Analyzes the extracted text, enabling advanced features like summarization or content analysis.
Gradio Interface: Offers an intuitive web-based UI for users to interact with the app seamlessly.

Future Improvements

Batch Processing: Enable users to upload and process multiple documents at once.
Advanced Formatting Preservation: Improve the ability to retain complex layouts, tables, and graphics.
User Accounts: Implement authentication to allow users to save and manage their processed documents.
Cloud Integration: Add options to upload documents from and save results to cloud storage services.

Running the Repository

To run this project locally, follow these steps:

# 1. Clone the repository
git clone https://github.com/jadouse5/ocr-gradio-aimlapi.git
cd ocr-document-reader

# 2. Install Python dependencies
pip install -r requirements.txt

# 3. Install system dependencies
# On Ubuntu/Linux
sudo apt-get update
sudo apt-get install -y poppler-utils

# On macOS (using Homebrew)
brew install poppler

# 4. Set up environment variables
# Create a .env file in the root directory and add:
OPENAI_API_KEY=your_api_key
OPENAI_API_BASE=https://api.aimlapi.com/v1  # Adjust if necessary

# 5. Run the application
python ocr_app.py

# 6. Open your browser and navigate to
http://localhost:7860

Note: Replace your_api_key with your actual API key for the AI/ML API.

Hashtags

OCR #AI #Gradio #Python #GPT4o #Zerox #TextAnalysis #MachineLearning

Feel free to customize this README with your own links, images, and additional details to better suit your project. This template follows the structure of the example you provided and highlights the key aspects of your OCR Document Reader application.

Multi-Agent System for 🚀 ANY AI/ML Model: 🌐 Web Scraping & 📝 Content Analysis Powered by the 🔗 AI/ML API

Jad Tounsi — Sun, 20 Oct 2024 21:13:23 +0000

🐝 Multi-Agent System for 🚀 ANY AI/ML Model: 🌐 Web Scraping & 📝 Content Analysis Powered by the 🔗 AI/ML API

This project demonstrates a multi-agent system that automates web scraping, content analysis, and summary generation using the AI/ML API. It is built using Streamlit for the user interface, BeautifulSoup for web scraping, and the AI/ML API for text generation and analysis.

The app enables you to dynamically change the model and modify any agent in the workflow to suit different use cases. Simply provide your AI/ML API key, and you can use any model supported by the AI/ML API.

Get your AI/ML API

You can obtain your AI/ML API key by visiting the following link:

AI/ML API

Features

Web Scraping: Scrapes the content of a given website URL using BeautifulSoup.
Content Analysis: Analyzes the scraped content to extract key insights using the AI/ML API.
Summary Generation: Generates a detailed summary of the analyzed content.
Streamlit UI: Interactive user interface that allows users to enter the website URL and view the generated report.
Flexible AI Models: Supports any model from the AI/ML API. You can change the model used for content analysis and summary generation dynamically.
Agent Customization: Modify the behavior of each agent (scraping, analyzing, summarizing) by changing the instructions, functions, or models.

How It Works

AI/ML API Key Input
- The app dynamically sets the API key using an input field. The key is stored in the environment and used for making API calls to the AI/ML API.
Web Scraping
- The app scrapes the provided website URL using BeautifulSoup and extracts the text content from the website's HTML.
Content Analysis
- The scraped content is analyzed by the AI/ML API using a chat completion model to extract key insights.
Summary Generation
- A detailed summary is generated using the AI/ML API based on the content analysis.
Download Report
- The final summary can be downloaded as a text file directly from the Streamlit interface.

Installation

Prerequisites

Python 3.10+
Streamlit for the interactive web interface.
BeautifulSoup for web scraping.
Requests for handling HTTP requests.
AI/ML API Key for making API calls.

Steps

Clone the Repository:

git clone https://github.com/jadouse5/aimlapi-webscraper-agents.git
cd aimlapi-webscraper-agents

Set Up a Virtual Environment:

python3 -m venv myenv
source myenv/bin/activate  # On macOS/Linux
myenv\Scripts\activate  # On Windows

Install Required Packages:
```
pip install -r requirements.txt
```
Set Up API Keys:

Create a .env file in the project root and add your AI/ML API key:
```
echo "AIMLAPI_API_KEY=your-api-key-here" > .env
```
Run the Application:
```
streamlit run app.py
```

Usage

Open the Web Interface:

Once the application is running, it will open in your default browser. If not, go to http://localhost:8501 manually.
Set Your AI/ML API Key:

Input your AI/ML API Key in the text box to authenticate and allow the app to access the API.
Input Website URL:

Enter the URL of the website you want to scrape in the provided input box.
Run Workflow:

Click the "Run Workflow" button to start scraping the website, analyzing its content, and generating a summary report.
Modify Models or Agents:

You can modify the AI models used in each agent by adjusting the code, allowing you to experiment with different models for scraping, analysis, or summarizing.
Download Report:

Once the workflow completes, you can download the generated report by clicking the "Download Report" button.

Key Components

Web Scraping:

Scrapes the text content from the provided website URL using BeautifulSoup.
Content Analysis:

The scraped content is analyzed using the AI/ML API, extracting key insights.
Summary Generation:

A detailed summary is generated based on the analysis using another AI model call.

Code Example

Here’s an example of how the system orchestrates the workflow:

def orchestrate_workflow(client, url):
    # Step 1: Scrape the website
    scraped_content = scrape_website(url)

    # Step 2: Analyze the scraped content
    messages = [
        {"role": "system", "content": "You are an agent that analyzes content and extracts key insights."},
        {"role": "user", "content": f"Analyze the following content: {scraped_content}"}
    ]
    response = client.chat.completions.create(
        model="gpt-4o-mini-2024-07-18",
        messages=messages
    )
    analysis_summary = response.choices[0].message.content

    # Step 3: Write the summary based on the analysis
    messages = [
        {"role": "system", "content": "You are an agent that writes summaries of research."},
        {"role": "user", "content": f"Write a summary based on this analysis: {analysis_summary}"}
    ]
    response = client.chat.completions.create(
        model="gpt-4o-mini-2024-07-18",
        messages=messages
    )
    final_summary = response.choices[0].message.content

    return final_summary

Customization

Using Different Models

You can change the models used in the agents by modifying the model parameter in the orchestrate_workflow function. The AI/ML API supports multiple models, allowing you to experiment with different models for each task:

Scraping Agent: Modify the scraping agent to handle different types of content or preprocess the data differently.
Analysis Agent: Choose a model that best suits your analysis needs, such as summarization or topic extraction.
Summary Agent: Use a model that generates detailed, concise, or creative summaries depending on your goal.

Modify Agents

Each agent is highly customizable. Adjust the instructions or add new functions for more advanced workflows.

Future Improvements

Advanced Scraping: Improve the scraper to handle dynamic content (e.g., JavaScript-heavy sites).
More Detailed Analysis: Expand the analysis to include sentiment analysis or categorization.
Multilingual Support: Extend the app to support scraping, analyzing, and summarizing content in multiple languages.
CAPTCHA Handling: Add support for bypassing or manually entering CAPTCHAs when scraping protected websites.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

Developed by: Jad Tounsi El Azzoiani

GitHub: Jad Tounsi El Azzoiani

LinkedIn: Jad Tounsi El Azzoiani

🦙💬 Building a Next.js Chatbot with NVIDIA Llama 3.1 Nemotron-70B Integration

Jad Tounsi — Thu, 17 Oct 2024 01:50:06 +0000

This project implements an AI chatbot using Next.js, React, and integrates with the NVIDIA Llama 3.1 Nemotron-70B model for generating AI-powered responses. The frontend is built using Tailwind CSS, and the chatbot includes a real-time chat interface and supports customization for different applications.

Features

🦙💬 Llama 3.1 Nemotron 70B Chatbot
🧠 AI-powered conversational interface
🌓 Dark/light mode toggle
⚛️ Built with React and Next.js
🎨 Styled with Tailwind CSS
🔄 Real-time chat interactions
📱 Responsive design
🚀 Fast and efficient
🔒 Secure API integration

Demo

DEMO

Installation

Clone the repository:

   git clone https://github.com/jadouse5/llama3.1-nemotron-chatbot.git
   cd llama3.1-nemotron-chatbot

Install dependencies:

   npm install

Set up environment variables: Create a .env.local file in the root directory and add your NVIDIA API key:

   NVIDIA_API_KEY=your_nvidia_api_key_here

Run the development server:

   npm run dev

Open http://localhost:3000 in your browser to interact with the chatbot.

Usage

Type your message in the input field at the bottom of the chat interface.
Press Enter or click the Send button to submit your message.

Customization

Modify the gradient background by editing the file components/ui/background-gradient.tsx to adjust the colors and animation.
Adjust the chatbot interface styling in components/ChatbotComponent.tsx to fit your design preferences.
You can also tweak the behavior of the AI model by adjusting the parameters (such as temperature and max_tokens) in the API route file.

Contributing

Contributions are welcome! If you'd like to improve the project or add new features, please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Connect with Me

Feel free to reach out for discussions, collaborations, or questions about AI development:

GitHub: Jad Tounsi El Azzoiani
LinkedIn: Jad Tounsi El Azzoiani

Building an 🐝 OpenAI SWARM 🔍 Web Scraping and Content Analysis Streamlit Web App with 👥 Multi-Agent Systems

Jad Tounsi — Mon, 14 Oct 2024 01:05:00 +0000

🔍 Building an OpenAI SWARM Web Scraping and Content Analysis Application with Multi-Agent Systems

Web scraping and content analysis are critical in today's data-driven world. In this article, we explore how to implement a multi-agent system that automates these tasks using OpenAI's Swarm framework. This project demonstrates how a system can scrape websites, process the content, and generate summaries automatically. The system is ideal for applications like content aggregation, market analysis, and research automation.

About the Author
Introduction to the Project
What You'll Need
Setting Up the Project
- Step 1: Install Python
- Step 2: Create a Virtual Environment
- Step 3: Install Jupyter (Optional)
- Step 4: Install Required Packages
- Step 5: Set Up the OpenAI API Key
Running the Web App
Credits
Wrapping Up
License
Connect with Me

About the Author

Hi there! I'm Jad Tounsi El Azzoiani, a passionate machine learning and AI enthusiast who loves exploring efficient computing techniques, AI-driven automation, and web scraping. My goal is to stay on the cutting edge of AI technology and contribute to the open-source community by sharing my knowledge and solutions with fellow developers.

GitHub: Jad Tounsi El Azzoiani
LinkedIn: Jad Tounsi El Azzoiani

Introduction to the Project

In this project, I explore how OpenAI's Swarm framework can be used to build a multi-agent system that scrapes and analyzes content from websites. The system is designed to automatically retrieve data, analyze it, and provide concise summaries—perfect for anyone needing real-time content extraction and analysis.

Some potential use cases include:

Content Aggregation: Automatically gather and summarize content from multiple sources.
Market Research: Analyze data from multiple websites for industry trends.
Research Automation: Automatically collect and process research data for easy access and analysis.

What You'll Need

Before you get started with this project, ensure that the following tools and libraries are installed:

Python 3.10+
Streamlit: A Python library for building web apps.
OpenAI API Key: Required for the Swarm framework.
BeautifulSoup: A popular Python library for web scraping.
Requests: For handling HTTP requests.
dotenv: For managing environment variables.

These tools form the backbone of this project and will help you build and run the multi-agent web scraping and content analysis system.

Setting Up the Project

Step 1: Install Python

Make sure you have Python 3.10+ installed. You can download the latest version from the official Python website.

Step 2: Create a Virtual Environment

It's always a good practice to isolate your project dependencies in a virtual environment. Here’s how to do that:

Open a terminal and navigate to your project directory.
Create a virtual environment called myenv:

   python -m venv myenv

Activate the virtual environment:
- On macOS/Linux:
```
 source myenv/bin/activate
```

On Windows:
```
 myenv\Scripts\activate
```

Step 3: Install Jupyter (Optional)

If you plan to develop or run the project using Jupyter notebooks, install JupyterLab inside the virtual environment:

pip install jupyterlab

Step 4: Install Required Packages

Once your virtual environment is activated, install the necessary Python packages for this project:

pip install streamlit beautifulsoup4 requests python-dotenv
pip install git+https://github.com/openai/swarm.git

Step 5: Set Up the OpenAI API Key

In the project directory, create a .env file to store your environment variables.
Add the following line to the .env file, replacing your-api-key-here with your actual OpenAI API key:

OPENAI_API_KEY=your-api-key-here

Running the Web App

Now that everything is set up, follow these steps to run the web app:

Activate the virtual environment:

On macOS/Linux:
```
 source myenv/bin/activate
```
On Windows:
```
 myenv\Scripts\activate
```

Start the Streamlit app:

Run the following command in your terminal:

   streamlit run app.py

Open the app in your browser:

Once the app starts, Streamlit will provide a local URL (usually http://localhost:8501). Open this URL in your browser.

Run the workflow:

Enter the URL of the website you want to scrape.
Click the Run Workflow button to start the scraping and content analysis process.
View the summary generated by the system directly in the browser.

Credits

This project leverages the Swarm framework from OpenAI, which allows for efficient multi-agent orchestration. You can explore the Swarm repository on GitHub to learn more about how it works:

Swarm GitHub Repository: OpenAI Swarm

Wrapping Up

The OpenAI Swarm Web Scraping project demonstrates the incredible power of multi-agent systems in automating web scraping and content analysis tasks. By combining multiple agents with the flexibility of the Swarm framework, this project can extract valuable insights from websites with ease. It’s a great example of how AI-driven systems can reduce manual effort in collecting and analyzing data.

Connect with Me

I’m always open to discussions, collaborations, or just a chat about AI and machine learning. Feel free to reach out:

GitHub: Jad Tounsi El Azzoiani
LinkedIn: Jad Tounsi El Azzoiani

Binarized Neural Network (BNN) on MNIST Dataset

Jad Tounsi — Sun, 13 Oct 2024 23:40:17 +0000

Binarized Neural Network (BNN) on MNIST Dataset

Author

I am a passionate machine learning and artificial intelligence enthusiast, with a focus on efficient computing and neural network optimization. I aim to explore SoTA AI technologies and contribute to the open-source community by sharing knowledge and innovative solutions.

You can follow my work on GitHub: Jad Tounsi El Azzoiani

Connect with me on LinkedIn: Jad Tounsi El Azzoiani

Introduction

This project demonstrates the implementation and performance of a Binarized Neural Network (BNN) on the popular MNIST dataset, which contains a collection of handwritten digits. BNNs use binary weights and, in some cases, binary activations, offering computational efficiency and suitability for resource-constrained environments such as embedded systems and edge devices.

Prerequisites

Before running the project, ensure you have the following installed:

Python 3.x
Jupyter Notebook or JupyterLab
TensorFlow
Numpy
Matplotlib
Larq

These libraries will be essential for building and training the BNN model.

Installation

To set up the environment for running this project, follow these steps:

Install Python 3.x from the official Python website.
Install Jupyter using pip:

   pip install jupyterlab

Install the required libraries:

   pip install tensorflow numpy matplotlib larq

Running the Notebook

Once you have set up the environment, follow these steps to run the project:

Open a terminal or command prompt and navigate to the directory containing the .ipynb file.
Run the following command to launch Jupyter Notebook:

   jupyter notebook

From the Jupyter interface, open the binarized-neural-network-mnist.ipynb file.
Follow the instructions in the notebook to train the BNN on the MNIST dataset.

Notebook Contents

The notebook is organized into the following sections:

Introduction to BNNs: A brief overview of Binarized Neural Networks and their advantages.
Loading the MNIST Dataset: Instructions on loading and preprocessing the MNIST dataset for training.
Building the BNN Model: Steps to define and compile the BNN using TensorFlow and Larq.
Training the Model: Training the BNN on the MNIST dataset and visualizing the process.
Evaluation and Results: Evaluating the model's performance and observing the accuracy and efficiency of the BNN.
Conclusion: A summary of the project's findings and potential areas for future work.

Expected Outcomes

After running the notebook, you should:

Understand the core concepts behind Binarized Neural Networks.
See how BNNs can be applied to image recognition tasks like digit classification on the MNIST dataset.
Explore the benefits of using binary weights and activations for efficient model execution.

Credits

This project leverages the Larq library, an open-source deep learning library for training neural networks with low-precision weights and activations, such as Binarized Neural Networks. Learn more about Larq by visiting their official documentation or GitHub repository.

Conclusion

The Binarized Neural Network project demonstrates how BNNs can offer significant computational efficiency for machine learning tasks. By working with the MNIST dataset, we showcase the practical application of BNNs in a real-world scenario. The project also serves as a foundation for further exploration into low-precision neural networks and their potential for deployment in resource-constrained environments.

This work highlights the importance of optimizing neural networks for faster and more efficient inference while maintaining accuracy, especially in scenarios where resources are limited, such as IoT devices and mobile platforms.

🦙 📹 PinataShot: Multimodal LLaMA 3.2 Screenshot Categorization on Pinata IPFS

Jad Tounsi — Sun, 13 Oct 2024 00:03:25 +0000

PinataShot

What I Built

I built a SaaS Screenshot Organizer that helps users upload, categorize, and search through their screenshots with ease. The app leverages Pinata’s Files API for decentralized storage of images and integrates GROQ API's LLaMA 3.2 11B for AI-powered analysis of screenshots. With features like OCR (optical character recognition) for text extraction, automatic categorization, and a searchable screenshot gallery, this app streamlines organizing large collections of images and screenshots. It is deployed using Next.js on Vercel to ensure scalability and speed.

Limitations

One limitation in this version of the app is the restriction of processing one image at a time when using the GROQ API. This limitation stems from the current API constraints of GROQ's LLaMA 3.2 11B model, which can handle a single image per request for analysis. While this allows precise categorization and naming for each screenshot, it does limit bulk processing capabilities.

However, Pinata shines in this setup, as it seamlessly handles the decentralized storage of multiple images. Thanks to Pinata's robust and reliable IPFS-backed storage, users can upload several screenshots at once, which are securely stored and easily retrievable, even when waiting for their turn in the AI analysis queue.

Demo

Check out the Demo.
Below are a few key features of the app:

Upload Interface:

Drag-and-drop feature for uploading screenshots with instant AI analysis.
Speak to images:

You can ask questions about your screenshots, or images.
Screenshot Gallery:

Screenshots are automatically named with what the AI will describe."
Text Search:

Use the OCR feature to search through the text found in screenshots (e.g., receipts, documents).

My Code

Find the source code for the project on GitHub.

Tech Stack

Next.js: Frontend and backend (serverless API routes).
Pinata’s Files API: For decentralized file storage and retrieval on IPFS.
GROQ API’s LLaMA 3.2 11B: For vision capabilities and text extraction.
Vercel: Deployment platform ensuring scalability and speed.
Tailwind CSS: For styling and responsive UI.
Shadcn/ui & Aceternity UI: UI components library.

More Details

Pinata’s Files API is used to securely store screenshots and retrieve them from IPFS, ensuring decentralized storage and reliability. Pinata excels at handling multiple files, enabling users to store and access their screenshots quickly, even when dealing with large collections.
The AI analysis uses GROQ’s LLaMA 3.2 11B model to automatically categorize screenshots into appropriate names based on its' content, and extract text via OCR for easy search functionality. Although each image needs to be processed one at a time, Pinata’s decentralized storage makes this manageable by allowing users to upload many images at once, which can then be queued for AI processing.

This powerful combination of Pinata’s decentralized storage and GROQ’s AI capabilities makes this tool incredibly useful for a wide range of users—whether it’s for work, personal organization, or creative projects.

Future Improvements

Bulk image processing: Overcoming the single image limitation by exploring options for batch image analysis.
Advanced categorization algorithms.
Enhanced search functionality using more refined OCR text extraction.
User authentication and personal galleries.
Real-time collaboration for sharing and organizing screenshots.

Running the Repository

To run this project locally, follow these steps:

# 1. Clone the repository
git clone https://github.com/yourusername/screenshot-organizer.git
cd screenshot-organizer

# 2. Install dependencies
npm install

# 3. Set up environment variables
# Create a .env.local file in the root directory and add:
PINATA_API_KEY=your_pinata_api_key
PINATA_SECRET_API_KEY=your_pinata_secret_key
GROQ_API_KEY=your_groq_api_key

# 4. Run the development server
npm run dev

# 5. Open your browser and navigate to
http://localhost:3000

devchallenge #pinatachallenge #webdev #api #decentralizedstorage #AIanalysis #moroccoaisolutions

DEV Community: Jad Tounsi

📄 OCR Reader, 🔍 Analyzer, and 💬 Chat Assistant using 🔎 Zerox, 🧠 GPT-4o, powered by 🚀 AI/ML API

Limitations

Demo

My Code

Tech Stack

More Details

Future Improvements

Running the Repository

Hashtags

OCR #AI #Gradio #Python #GPT4o #Zerox #TextAnalysis #MachineLearning

Multi-Agent System for 🚀 ANY AI/ML Model: 🌐 Web Scraping & 📝 Content Analysis Powered by the 🔗 AI/ML API

🐝 Multi-Agent System for 🚀 ANY AI/ML Model: 🌐 Web Scraping & 📝 Content Analysis Powered by the 🔗 AI/ML API

Get your AI/ML API

Features

How It Works

Installation

Prerequisites

Steps

Usage

Key Components

Code Example

Customization

Using Different Models

Modify Agents

Future Improvements

License

Contact

🦙💬 Building a Next.js Chatbot with NVIDIA Llama 3.1 Nemotron-70B Integration

Features

Demo

Installation

Usage

Customization

Contributing

License

Connect with Me

Building an 🐝 OpenAI SWARM 🔍 Web Scraping and Content Analysis Streamlit Web App with 👥 Multi-Agent Systems

🔍 Building an OpenAI SWARM Web Scraping and Content Analysis Application with Multi-Agent Systems

Table of Contents

About the Author

Introduction to the Project

What You'll Need

Setting Up the Project

Step 1: Install Python

Step 2: Create a Virtual Environment

Step 3: Install Jupyter (Optional)

Step 4: Install Required Packages

Step 5: Set Up the OpenAI API Key

Running the Web App

Credits

Wrapping Up

Connect with Me

Binarized Neural Network (BNN) on MNIST Dataset

Binarized Neural Network (BNN) on MNIST Dataset

Author

Introduction

Prerequisites

Installation

Running the Notebook

Notebook Contents

Expected Outcomes

Credits

Conclusion

🦙 📹 PinataShot: Multimodal LLaMA 3.2 Screenshot Categorization on Pinata IPFS

PinataShot

Limitations

Demo

My Code

Tech Stack

More Details

Future Improvements

Running the Repository

devchallenge #pinatachallenge #webdev #api #decentralizedstorage #AIanalysis #moroccoaisolutions