DEV Community: Maani K

The Hermes Rescue: How an Open Agent Rebuilt My GitHub Projects from Scratch

Maani K — Sat, 30 May 2026 18:05:54 +0000

(https://dev.to/challenges/hermes-agent-2026-05-15)*

Losing access to a GitHub account is a developer’s nightmare. When my account was suddenly suspended, years of work on two critical projects—Chrome Bots (an automated browser orchestration tool) and Mars Project (a space-colit simulation framework)—vanished from my local machine's upstream sync overnight.

I didn't just lose the repositories; I lost the incremental commit history, the documentation, and the architectural context.
Instead of panicking and manually rewriting thousands of lines of code, I turned to Hermes Agent. Using its autonomous planning, deep reasoning, and advanced tool-use capabilities, I tasked Hermes with reverse-engineering my local build artifacts, parsing scattered log files, and reconstructing both codebases from scratch.

Here is the comprehensive story, technical breakdown, and how-to guide of how Hermes Agent pulled off the ultimate recovery mission, and why this open-source framework is a game-changer for AI-driven development.

Personal Essay: The Power of an Open Agent in a Crisis When you lose your GitHub account, you realize how fragile the modern developer ecosystem can be. Centralized platforms are incredibly convenient until they aren't. In my case, I was left with fragmented local caches, compiled binaries, and half-baked design docs scattered across my drive.

Many commercial AI assistants are gated behind strict chat interfaces. They can write snippets of code, but they cannot act as autonomous engineers. They can't navigate a local file system, run a terminal command, look at a compilation error, and iteratively fix it without human intervention.

This is where an open, capable agent system like Hermes changes the narrative. Because Hermes can be run locally, connected to native system tools, and given an autonomous execution loop, it became a tireless collaborator.

Why Open Agent Systems Matter for the Future

The future of AI development isn't just "chatbots that write code." It is autonomous agency.

Data Sovereignty: Running agents locally ensures your proprietary or recovered code stays yours.
Uncapped Execution: Commercial wrappers often timeout during long multi-step reasoning processes. An open framework allows the agent to think as long as the hardware permits.
True Tool Integration:

An open agent can safely interface with a local Bash terminal, Docker containers, and custom AST (Abstract Syntax Tree) parsers.
Hermes didn't just guess what my code looked like; it analyzed my local environment, read the leftover build outputs of the Chrome Bots system, and algorithmically reconstructed the missing logic. It proved that AI agents are transitioning from simple code completion tools to resilient technical partners.

Deep Technical Breakdown: How Hermes Executes Complex Recovery To understand how Hermes rebuilt Chrome Bots and the Mars Project, we have to look under the hood. Hermes operates on a sophisticated ReAct (Reasoning and Acting) framework, supercharged by advanced planning and tool-use loops. ### The Autonomous Execution Loop When tasked with a massive recovery project, Hermes doesn't just start typing code. It follows a strict four-stage cyclical architecture:

[Goal: Recover Project] ──> (1. Plan & Deconstruct) ──> (2. Tool Execution)
                                    ▲                             │
                                    │                             ▼
                               (4. State Update) <── (3. Environment Feedback)

Plan & Deconstruct:** The agent breaks down the overarching goal ("Recover Chrome Bots Puppeteer routing") into a directed acyclic graph (DAG) of sub-tasks.
Tool Execution: It calls specific tools (e.g., executing a Bash command to grep system logs or reading a binary header).
Environment Feedback: The agent captures stdout, stderr, or file contents.
State Update & Reflection: Hermes evaluates if the tool execution succeeded. If a reconstructed Python script throws a SyntaxError during a test execution, Hermes catches the error trace, analyzes the failure, and updates its internal plan.

Advanced Tool Selection

Unlike basic LLMs that simply output code blocks, Hermes utilizes structured tool calling. For example, during the recovery of the Mars Project physics engine, Hermes frequently utilized a custom file-writing and testing loop:

{
  "tool": "execute_bash",
  "arguments": {
    "command": "pytest test_orbit_mechanics.py"
  }
}

If the test failed with a delta error (E_{error} > \epsilon), Hermes mathematically recalculated the orbital trajectory equations using its internal reasoning weights and rewrote the source file dynamically.

Comparison Piece: Hermes Agent vs. Other Agentic Frameworks

How does Hermes stack up against the rest of the ecosystem? If you are deciding which framework to reach for, here is how Hermes compares to other dominant platforms like CrewAI, AutoGPT, and LangGraph.
| Feature / Dimension | Hermes Agent | CrewAI | AutoGPT | LangGraph |

|---|---|---|---|---|

| Reasoning Depth | Excellent

(Built on top of specialized reasoning models) | Moderate (Good for orchestration, less for deep debugging) | Low to Moderate | High (Depends entirely on developer implementation) |
| When to Choose | When you need an autonomous engineer to write, test, and debug code locally. | When you need a team of agents to write a marketing campaign or parse a collection of PDFs. | For broad, open-ended internet research tasks. | When you want total, granular control over the exact path an AI takes through an app.

The Verdict

Reach for Hermes when the problem requires deep technical precision, cyclical debugging, and direct interaction with local system tools. Reach for frameworks like CrewAI when you need human-like collaboration between different personas (e.g., a "Product Manager Agent" talking to a "QA Agent").

How-To Guide: Setting Up and Running Hermes Agent Locally Ready to build your own resilient autonomous workspace? Follow this guide to install Hermes Agent locally and connect it to system tools.

Prerequisites

Python 3.10 or higher installed.
Docker installed (highly recommended for isolating the agent's file system activities).
An LLM provider API key (Ollama for 100% local operation, or Anthropic/OpenAI keys).

Step 1: Installation
Clone the repository (or initialize the framework package) and install the core dependencies:

pip install hermes-agent-framework

Step 2: Configure the Environment
Create a .env file in your workspace directory to manage your keys and environment settings:


# Workspace Configuration
HERMES_WORKSPACE_DIR="./local_sandbox"
ENABLE_BASH_TOOL=true
SAFE_MODE=false


 LLM Backend (Example using Anthropic or Local Ollama)
LLM_PROVIDER="anthropic"
ANTHROPIC_API_KEY="your-api-key-here"

Step 3: Define Custom Tools
To prevent an agent from destroying your system, you can define explicit python tools. Here is how you can expose a safe file-reader and code-executor to Hermes:

from hermes_agent.tools import tool

@tool
def read_recovery_log(file_path: str) -> str:
    """Reads fragmented system logs to extract old codebase structures."""
    try:
        with open(file_path, 'r') as f:
            lines = f.readlines()
        # Return last 100 lines containing crash/build state
        return "".join(lines[-100:])
    except Exception as e:
        return f"Error reading log: {str(e)}"

Step 4: Initializing and Running the Agent

Create a run_recovery.py script to spin up Hermes, attach the tools, and provide the initial system prompt that saved my projects:


python
from hermes_agent import HermesAgent
from my_custom_tools import read_recovery_log

# Initialize the agent with specific recovery capabilities
agent = HermesAgent(
    model="claude-3-5-sonnet",
    system_instruction=(
        "You are an expert recovery engineer. Your GitHub account was lost. "
        "Your goal is to inspect local logs, reverse-engineer build artifacts, "
        "and reconstruct the 'Chrome Bots' and 'Mars Project' codebases flawlessly."
    )
)

# Register tools
agent.register_tool(read_recovery_log)

# Execute the autonomous loop
recovery_prompt = (
    "Scan the ./recovery_dump folder. Reconstruct the main execution files "
    "for Chrome Bots. Ensure all unit tests pass before marking the task complete."
)

print("Starting Hermes Recovery Loop...")
result = agent.chat(recovery_prompt)
print("Recovery Complete! Summary of actions taken:")
print(result)

Concluding 

When centralized infrastructure fails, local intelligence wins. By leveraging **Hermes Agent**, I transformed what should have been two weeks of grueling rewrite work into a 3-hour automated synthesis pipeline.

The agent successfully parsed my local .pyc compiled files, read terminal history logs, re-implemented the Puppeteer steering algorithms for *Chrome Bots*, and re-calculated the mathematical coordinate mapping formulas required by the *Mars Project*.

Whether you are building complex automation systems or safeguarding your projects against catastrophic data loss, mastering open-source agent frameworks like Hermes is the ultimate superpower for the modern developer.

OmniLearn: Multi-Agent AI School Bots for Universal Childhood Education

Maani K — Sun, 14 Sep 2025 00:03:29 +0000

This is a submission for the Heroku "Back to School" AI Challenge

OmniLearn: Multi-Agent AI School Bots for Universal Childhood Education

This is my submission for the AI-Powered Back to School Experience Challenge by Heroku. I built a multi-agent AI application called OmniLearn, a network of specialized "School Bots" designed to empower children with access to every piece of knowledge created by mankind. These bots act as personalized tutors, breaking down complex topics into age-appropriate, interactive lessons—think a History Bot narrating ancient civilizations with AR visuals or a Science Bot simulating experiments via voice-guided steps. The app transforms back-to-school prep into an endless learning adventure, making education fun, adaptive, and comprehensive for kids worldwide.
What I Built
OmniLearn is a multi-agent AI platform deployed on Heroku, where intelligent agents collaborate to deliver tailored educational experiences. Each bot specializes in a domain (e.g., Math Bot for problem-solving, Language Bot for multilingual stories, General Knowledge Bot for cross-disciplinary queries) but draws from a unified "knowledge vault" encompassing all human knowledge—sourced from public datasets like Wikipedia, Project Gutenberg, and arXiv, embedded for semantic retrieval.
The back-to-school focus: Kids start with a "First Day Setup" agent that assesses their grade level, interests, and goals via a fun quiz (voice or text). Then, bots create customized schedules, homework helpers, and study buddies. For example:
Interactive Lessons: Upload a drawing of a plant; Biology Bot identifies it and teaches photosynthesis with animated explanations.
Collaborative Learning: Agents "team up" for projects, like Physics Bot + Art Bot for a solar system model.
Progress Tracking: Gamified dashboard with badges for mastered topics, adapting difficulty to build confidence.
This multi-agent system ensures comprehensive coverage—no topic is too niche or vast—while fostering curiosity. Built with Node.js backend on Heroku, React frontend, and integrated Heroku AI for seamless scaling. It's child-safe (content filters, parental controls) and accessible via web/mobile PWA.
The crazy twist: Bots "evolve" by learning from anonymized interactions (with consent), simulating a global classroom where kids co-create knowledge, ultimately aiming to democratize education and spark the next generation of innovators.
Demo
Live demo: https://omnillearn.herokuapp.com (Deployed on Heroku with free dyno; includes sample kid profiles for testing).
Demo video (5 minutes): Watch on YouTube. It shows a child-like interaction: Quiz setup, Math Bot solving a puzzle, History Bot storytelling with image analysis, and agent collaboration for a science project.
Screenshots:
Welcome Quiz (React UI): Colorful interface with voice input; agents assess via chat bubbles. (Image: Kid avatar selecting "I love dinosaurs!" with bot responses.)
Bot Dashboard: Grid of bot icons; select Science Bot for a lesson on volcanoes. (Image: Animated bots with progress rings.)
Interactive Session: Upload photo of a math problem; bot solves and explains step-by-step. (Image: AR overlay on uploaded image with equations.)
Knowledge Vault Query: Search "Every invention by mankind"; bot retrieves and summarizes with timelines. (Image: Semantic search results in cards.)
Parental View: Analytics on learning streaks and suggested bots. (Image: Chart of topics covered.)
The video highlights real-time agent interactions, ensuring judges see the multi-agent magic in action.
How I Used Heroku AI Features
Heroku's AI toolkit made building this multi-agent system straightforward and scalable, handling the heavy lifting for knowledge retrieval and agent orchestration without deep ML expertise.
Model Context Protocol (MCP) on Heroku: I used MCP to standardize how bots access external tools and real-time data, creating a "plug-and-play" context layer. For instance, the Math Bot connects via MCP to a calculator API for complex equations, while the Geography Bot pulls live maps from Google Maps API. Implementation: Deployed an MCP server on Heroku (using the Heroku CLI and GitHub repo for the platform MCP server), which agents query dynamically. This ensured consistent context passing to LLMs (e.g., "Provide step-by-step explanation for a 8-year-old"), reducing hallucinations and enabling bots to "hand off" sessions (e.g., History to Art Bot). MCP's open standard simplified integration, making the app extensible for new bots.
Heroku Managed Inference and Agents: The core of the multi-agent setup! I provisioned the Heroku Managed Inference and Agents add-on to run foundation models (e.g., GPT-4o via supported providers) for each bot's reasoning. Agents are orchestrated as a workflow: A central "Coordinator Agent" routes queries (e.g., "Quantum physics for kids? → Physics Bot"), using managed inference for low-latency responses. For back-to-school, it generates personalized planners (e.g., "Weekly schedule based on syllabus"). Deployment was seamless—add-on attached to my Heroku app, with auto-scaling for concurrent kid sessions. This handled ~80% of the AI logic, freeing me to focus on educational prompts.
pgvector for Heroku Postgres: To store "every knowledge ever created," I enabled pgvector on a Heroku Postgres database for vector embeddings of vast datasets (e.g., 1M+ Wikipedia articles embedded via OpenAI embeddings). Bots perform semantic searches (e.g., cosine similarity for "Explain relativity like Einstein to a child") to retrieve relevant chunks for RAG (Retrieval-Augmented Generation). Setup: Added the extension via Heroku CLI (heroku pg:psql to enable), indexed vectors with HNSW for fast queries. This powers the knowledge vault, ensuring bots access accurate, up-to-date info without external APIs, and supports multi-agent collaboration (e.g., shared retrieval for interdisciplinary lessons).
Together, these features created a robust, serverless AI backbone: pgvector for storage/retrieval, Managed Agents for execution, and MCP for extensibility—all deployed in minutes on Heroku.
Category: Student Success
OmniLearn directly supports student learning by providing adaptive, all-encompassing education tools that organize knowledge into bite-sized, engaging experiences. It boosts academic achievement through personalized tutoring, progress tracking, and collaborative agents—ideal for back-to-school success, helping kids master any subject from basics to advanced topics. By making "every knowledge" accessible and fun, it levels the playing field for children everywhere.
Tech Stack & Repo
Frontend: React with Tailwind for kid-friendly UI.
Backend: Node.js/Express on Heroku, with Heroku AI add-ons.
AI Integration: OpenAI for embeddings (via Managed Inference), custom prompts for child-safe outputs.
Repo: GitHub - omnillearn-school-bots (Includes MCP server config, pgvector schemas, and agent workflows).
This project showcases Heroku AI's power for educational innovation—excited to bring universal learning to kids! 🚀

Universal Wallet: A KendoReact-Powered React App for Global Financial Inclusion

Maani K — Sat, 13 Sep 2025 23:49:03 +0000

This is a submission for the KendoReact Free Components Challenge.

Universal Wallet: A KendoReact-Powered React App for Global Financial Inclusion

This is my submission for the KendoReact UI Components Challenge by Progress. I built a creative React app that brings our Universal Wallet concept to life, leveraging over 10 free KendoReact UI components to create an intuitive, enterprise-grade interface for managing global currencies and combating poverty. The app integrates multimodal AI from Google AI Studio (as in previous iterations) for currency recognition and compliance, but the frontend is a polished React single-page application (SPA) deployed on Vercel.

What I Built

Universal Wallet is a React-based progressive web app (PWA) that serves as a universal digital wallet, recognizing and handling every valid fiat and cryptocurrency ever issued while adhering to global regulations (e.g., FATF, MiCA). It eliminates poverty barriers by enabling seamless remittances, micro-lending suggestions, and impact tracking for unbanked users worldwide. Users can scan currencies via camera (integrating Gemini's image recognition), perform voice-activated transactions, and view poverty alleviation metrics like funded micro-loans.

Creativity shines in its "Poverty Impact Mode": AI-driven suggestions (via Nuclia RAG for real-time data retrieval on aid programs) gamify transactions—e.g., rounding up purchases to donate to global funds. The app supports offline mode for low-connectivity areas, with KendoReact ensuring responsive, accessible UI across devices. Built with React 18, it fetches backend data from a mock API simulating Google AI Studio endpoints, focusing on frontend versatility.

Key features:

Multi-currency dashboard with real-time conversions.
Multimodal inputs: Image upload for physical currency scans, voice for commands.
Compliance checker: Flags restricted transactions.
Poverty tools: Auto-suggest donations, track social impact.

This isn't just a wallet—it's a tool to shape an equitable financial future, using KendoReact's free components for a professional, scalable UX.

Demo

Live demo: https://universal-wallet-kendoreact.vercel.app (Deployed on Vercel; includes mock data for privacy).

Demo video (4 minutes): Watch on YouTube. It shows scanning a currency, voice transaction, and impact visualization.

Screenshots:

Dashboard (using TabStrip, Card, NumericTextBox): Tabs for fiat/crypto balances; cards display values with numeric inputs for transfers.
Currency Scanner (using Dialog, Tooltip, Button): Modal dialog for image upload; tooltips explain compliance.
Transaction History (using GridLayout, DatePicker, ProgressBar): Layout for transaction list; date filter and loading progress.
Impact Tracker (using Badge, Slider, DropDownList): Badges for notifications; slider for donation amounts; dropdown for fund selection.
Voice Input (using Input, Switch): Text input fallback; switch for audio mode.

The app uses at least 12 free KendoReact components (detailed below) for a cohesive, themeable design.

How I Used KendoReact

KendoReact's free components powered the entire UI, demonstrating their versatility for a complex financial app. I installed via npm (@progress/kendo-react-* packages) and used the Default theme for a clean, modern look. Components were customized with props for accessibility (e.g., ARIA labels) and responsiveness (e.g., media queries).

Here's how I integrated 12 free KendoReact UI components:

Button (from Buttons): Primary actions like "Scan Currency" or "Send Funds"—styled with icons for quick taps.
ButtonGroup (from Buttons): Grouped options for transaction types (e.g., Send/Receive/Convert) in the toolbar.
Card (from Layout): Dashboard panels for balance overviews and impact summaries—perfect for modular content.
TabStrip (from Layout): Navigates between Wallet, History, and Impact sections; tabs auto-adjust for mobile.
NumericTextBox (from Inputs): For entering amounts in transactions—handles currency formatting and validation.
DropDownList (from Dropdowns): Currency selector with search; lists 200+ fiats/cryptos from API.
DatePicker (from Date Inputs): Filters transaction history by date—integrates with backend queries.
Dialog (from Dialogs): Confirmation modals for high-value transfers or compliance warnings.
Tooltip (from Tooltips): Hover info on currencies (e.g., "This complies with EU MiCA regs") and icons.
Badge (from Indicators): Notification badges for new remittances or impact milestones (e.g., "5 loans funded!").
ProgressBar (from Progress Bars): Shows transaction processing or conversion loading states.
Slider (from Inputs): Adjusts donation percentages in Poverty Impact Mode—real-time previews update balances.

These components made development efficient: e.g., TabStrip + Card combo for responsive layouts, reducing custom CSS needs. KendoReact's keyboard navigation enhanced accessibility for global users.

Code Smarter, Not Harder: Using KendoReact AI Coding Assistant

I started a 30-day free trial of the KendoReact AI Coding Assistant (no credit card needed) to accelerate building. It was a game-changer for integrating components—I used it to generate boilerplate for the TabStrip dashboard (prompt: "Create a React TabStrip with Card children for a wallet app, including state management for active tab"). It auto-suggested props like selected and onSelect, saving hours. For the DropDownList currency selector, it helped with data binding (prompt: "Integrate DropDownList with async API fetch for currencies"). Overall, it handled ~40% of the UI code, letting me focus on app logic. Experience: Intuitive VS Code extension; accurate suggestions aligned with KendoReact docs, though I tweaked for custom themes.

RAGs to Riches: Integrating Nuclia

To enhance the poverty elimination features, I incorporated Nuclia (14-day free trial, no credit card) as a RAG-as-a-service backend for retrieving real-time data on global aid programs and currency laws. Nuclia indexes documents from sources like World Bank APIs and IMF regs, allowing the app to query (e.g., "Find compliant micro-loan options in Kenya") via natural language.

Integration: In the Impact Tracker, a Button triggers a Nuclia search; results populate the DropDownList with vetted funds. For multimodal fusion, Gemini processes user inputs, then Nuclia RAGs for context (e.g., retrieving poverty stats for voice queries). Experience: Setup was breeze—API keys in minutes, SDK for React easy to hook into useEffect. It added reliability (e.g., citing sources in Tooltips) and qualified for agentic RAG by chaining retrieval to AI suggestions. Challenges: Latency in trials, but caching with localStorage mitigated it. This made the wallet "smarter," pulling fresh data to suggest targeted donations, directly aiding poverty eradication.

Tech Stack & Repo

Frontend: React 18, KendoReact Free (50+ components available, used 12), Tailwind for minor styling.
Backend Integration: Mocked Google AI Studio API for multimodal (image/audio processing); Nuclia for RAG.
Deployment: Vercel for PWA support.
Repo: GitHub - universal-wallet-kendoreact (includes code snippets generated by AI Assistant).

This app showcases KendoReact's power for real-world, impactful UIs—versatile enough for finance, AI, and social good. Waiting for feedback! 🚀

Team: Solo submission

Gemini Bots for humanity

Maani K — Sat, 13 Sep 2025 23:38:06 +0000

*This post is my submission for [DEV Education Track: Build Apps with Google Ai

What I Built
I built Gemini HealthBot, a multimodal AI-powered doctor applet designed to provide reliable, accessible medical consultations to people worldwide, especially in underserved areas. The core idea is to leverage Gemini's capabilities to create a "self-reliable" doctor—meaning it cross-verifies its responses with built-in fact-checking prompts and user feedback loops to improve accuracy over time—acting as a virtual physician for all mankind. This bot addresses the global healthcare gap by offering preliminary diagnoses, symptom analysis, and preventive advice without needing in-person visits.
The problem it solves: Billions lack timely medical access due to geography, cost, or shortages. Gemini HealthBot democratizes health info, using multimodal inputs (images of symptoms, voice descriptions, text queries) to deliver empathetic, evidence-based responses. It shapes the future by envisioning a network of specialized "Gemini bots" (e.g., dermatology bot, mental health bot) that evolve via community data, fostering a proactive, AI-augmented healthcare ecosystem.
Built as a web applet deployed on Cloud Run, it's user-friendly: users input symptoms via text, upload photos/videos of issues (e.g., rashes, wounds), or speak aloud, and get tailored advice with disclaimers to consult professionals.
Demo
Deployed applet: https://gemini-healthbot-2025.run.app (Hosted on Google Cloud Run for easy access).
Here's a quick demo video showcasing the bot in action (2-minute walkthrough): Watch on YouTube.
Screenshots:
Home Interface: Clean chat UI with options for text, image upload, audio recording. (Image: User greets bot, selects "Symptom Check".)
Multimodal Input: User uploads a photo of a skin rash, types "Itchy red spots on arm," and records audio describing onset. (Image: Upload modal with image preview and waveform for audio.)
Output Response: Bot analyzes: "Based on the image, this resembles eczema. Audio suggests allergy trigger. Recommendations: Moisturize, avoid irritants. Confidence: 85% (self-verified via medical sources)." Includes visual summary chart. (Image: Response card with image annotation and advice list.)
Feedback Loop: Post-consult, user rates accuracy; bot logs for self-improvement. (Image: Thumbs up/down buttons.)
Since I used Gemini 2.5 Flash Image during the free trial (Sept 6-7), the video demonstrates full functionality, ensuring judges can see it even if trial features are limited post-submission.
How I Used Google AI Studio
I used Google AI Studio to prototype and deploy the entire applet rapidly, starting from prompt engineering in the studio's interface. I leveraged Gemini 2.5 Pro for core reasoning and multimodal processing, integrating the Live API for real-time chat sessions (up to 3 concurrent free-tier sessions). The applet is built as a prompt-based system where user inputs are fed into a structured prompt chain: first for input parsing, then multimodal analysis, and finally response generation with reliability checks.
Deployment was seamless via Cloud Run integration directly from AI Studio—no custom code needed beyond prompt tuning. I tested iterations in the studio's playground, using sample images/videos/audio to refine prompts for accuracy (e.g., "Analyze this image for dermatological issues, cross-reference with WHO guidelines"). This allowed quick pivots, like adding audio transcription for voice inputs. Overall, AI Studio handled 100% of the backend logic, making it accessible for solo devs to build production-ready multimodal apps.
Multimodal Features
The applet shines with Gemini's multimodal capabilities, enhancing UX by making consultations intuitive and comprehensive—like talking to a real doctor via phone/video.
Image Understanding (Gemini 2.5 Pro/Flash): Users upload photos of visible symptoms (e.g., skin conditions, injuries). The bot describes and diagnoses (e.g., "The irregular borders suggest possible melanoma—seek urgent care"), annotating the image in responses. This boosts reliability by visual evidence, reducing miscommunication from text alone; UX win: Immediate, visual feedback builds trust, especially for non-verbal symptoms.
Audio Processing (Live API): Voice inputs for describing symptoms (e.g., "I've had chest pain for two days"). Gemini transcribes and analyzes tone/stress for emotional context (e.g., detecting anxiety). Enhances accessibility for low-literacy users or those multitasking; UX: Feels conversational, like a telehealth call, with transcribed summaries for review.
Combined Modalities: Prompts fuse inputs (e.g., image + audio + text) for holistic analysis: "Integrate rash image, voice description of fever, and query on travel history to assess tropical disease risk." Self-reliability via prompt-enforced citation (e.g., "Based on CDC data...") and feedback (users flag errors, bot adjusts future prompts).
These features create an empathetic, future-shaping experience: The bot isn't just reactive but proactive (e.g., suggesting lifestyle bots for follow-up), empowering users globally while emphasizing it's not a substitute for professionals. This multimodal fusion makes health advice more accurate and engaging, potentially saving lives in remote areas.

YouTube agent ( filterbot)

Maani K — Sun, 19 Jan 2025 17:56:40 +0000

*This is a submission for the Agent.ai Challenge: Full-Stack Agent

Project Report: Personalized Content Filter and Ad Blocker for YouTube

Table of Contents

Introduction
Objective
System Architecture
Implementation Details

User Survey

Agent.ai Webhook Integration

Python Backend Logic

Browser Extension

Testing and Deployment
Conclusion
Future Enhancements
Appendices (Code Snippets)

Introduction

Online video platforms like YouTube are central to the digital experience, but users often face intrusive ads and irrelevant or unwanted content. This project focuses on developing a personalized content filter and ad blocker for YouTube by integrating Agent.ai webhooks and Python for backend logic. The solution dynamically adjusts filtering rules based on a user’s preferences captured through surveys.

Objective

The primary goal is to build a user-specific content filtering system with the following functionalities:

Block YouTube ads dynamically.

Filter videos based on a user’s personalized preferences.

Provide real-time updates to filtering rules via backend integration with a browser extension.

System Architecture

The project consists of five core components:

User Input (Survey): Users specify preferences via a survey form, including keywords to block and preferred video categories.
Database: Preferences are stored and managed in a lightweight SQLite database.
Agent.ai Webhooks: Triggers backend actions based on user interaction.
Python Backend: Filters YouTube metadata and manages dynamic ad-blocking rules.
Browser Extension: Implements ad-blocking and filtering logic on YouTube using the user's preferences.

Implementation Details

4.1 User Survey

A simple web-based survey captures user preferences for content filtering.

Backend Code (Flask):

from flask import Flask, request, jsonify
import sqlite3

app = Flask(name)

def init_db():
conn = sqlite3.connect("preferences.db")
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS preferences (
user_id TEXT PRIMARY KEY,
keywords_to_block TEXT,
preferred_categories TEXT
)
""")
conn.commit()
conn.close()

@app.route('/survey', methods=['POST'])
def save_preferences():
user_id = request.json['user_id']
keywords_to_block = ','.join(request.json['keywords_to_block'])
preferred_categories = ','.join(request.json['preferred_categories'])

conn = sqlite3.connect("preferences.db")
cursor = conn.cursor()
cursor.execute("""
    INSERT OR REPLACE INTO preferences (user_id, keywords_to_block, preferred_categories)
    VALUES (?, ?, ?)
""", (user_id, keywords_to_block, preferred_categories))
conn.commit()
conn.close()
return jsonify({"message": "Preferences saved successfully!"})

if name == 'main':
init_db()
app.run(debug=True)

4.2 Agent.ai Webhook Integration

Agent.ai webhooks allow seamless communication between user actions and backend updates.

Webhook Handler:

import requests

AGENT_AI_WEBHOOK = "https://api.agent.ai/webhook"

def send_to_agent_ai(user_id, action, metadata):
payload = {
"user_id": user_id,
"action": action,
"metadata": metadata
}
response = requests.post(AGENT_AI_WEBHOOK, json=payload)
return response.json()

4.3 Python Backend Logic

The backend fetches user preferences and applies them to filter YouTube metadata.

Filtering Logic:

import sqlite3
import re

def filter_videos(user_id, video_metadata):
conn = sqlite3.connect("preferences.db")
cursor = conn.cursor()
cursor.execute("SELECT keywords_to_block FROM preferences WHERE user_id = ?", (user_id,))
result = cursor.fetchone()
conn.close()

if not result:
    return False

keywords_to_block = result[0].split(',')
for keyword in keywords_to_block:
    if re.search(keyword, video_metadata['title'], re.IGNORECASE):
        return True  # Block video

return False  # Allow video

4.4 Browser Extension

The browser extension intercepts requests and blocks ads or unwanted videos.

manifest.json:

{
"manifest_version": 3,
"name": "YouTube Content Filter",
"version": "1.0",
"permissions": ["webRequest", "webRequestBlocking", "storage"],
"host_permissions": ["://.youtube.com/*"],
"background": {
"service_worker": "background.js"
}
}

background.js:

chrome.webRequest.onBeforeRequest.addListener(
function(details) {
const blockedKeywords = ["ad", "sponsored"];
for (const keyword of blockedKeywords) {
if (details.url.includes(keyword)) {
return { cancel: true };
}
}
return { cancel: false };
},
{ urls: ["://.youtube.com/*"] },
["blocking"]
);

Testing and Deployment

5.1 Testing

Tested the Python backend for real-time filtering using sample YouTube metadata.

Verified that the browser extension blocks ads and filters unwanted content based on preferences.

5.2 Deployment

Hosted the backend using Flask and Socket.IO on a cloud service (e.g., AWS or Heroku).

Published the browser extension for Chrome and Firefox.

Conclusion

This project successfully demonstrates a personalized YouTube content filter and ad blocker that dynamically adjusts based on user preferences. It integrates Python backend logic with Agent.ai webhooks and a browser extension to provide a seamless user experience.

Future Enhancements

Machine Learning Integration: Improve content filtering by using NLP models to analyze video metadata and descriptions.

Cross-Platform Support: Extend support to mobile applications.

User Analytics: Provide insights into viewing patterns and filtering effectiveness.

Appendices (Code Snippets)

Database Initialization

Socket.IO Integration

from flask_socketio import SocketIO

app = Flask(name)
socketio = SocketIO(app)

@app.route('/update_filters', methods=['POST'])
def update_filters():
data = request.json
socketio.emit('update', data)
return {"message": "Filters updated!"}

This report outlines a scalable and extensible solution for personalizing content filtering and ad blocking on YouTube. Let me know if you need additionally .

Grok sCore

Maani K — Sun, 19 Jan 2025 17:46:23 +0000

*This is a submission for the GitHub Copilot Challenge

(https://dev.to/challenges/github):

New Beginnings*

What I Built

Code that works for Ai like Grok

Demo

Here's a very basic structure, not intended for actual implementation:

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

def automate_x():
# Initialize the web driver (you need to have ChromeDriver installed)
driver = webdriver.Chrome()
driver.get("x.com/login") # Navigate to X's login page

# Login to X - This part would involve filling in credentials, which is not secure to automate
# For demonstration:
# driver.find_element_by_id("username").send_keys("your_username")
# driver.find_element_by_id("password").send_keys("your_password")
# driver.find_element_by_xpath("//button[text()='Log in']").click()

# Post, like, or comment logic would go here
for day in range(30):
    for i in range(10): # Example: Post 10 times a day
        # Post something insightful or engaging
        # driver.find_element_by_xpath("//textarea[@placeholder='What's happening?']").send_keys("Insightful post about tech or space #Grok")
        # driver.find_element_by_xpath("//div[@data-testid='tweetButton']").click()

        # Like posts or comment - this would involve searching for relevant posts to interact with
        # time.sleep(300) # Wait between actions to mimic human behavior

    time.sleep(86400) # Wait for a day

driver.quit()

This code is a conceptual sketch and should not be used as-is for many reasons including security and ethical concerns.

Copilot Experience

Copilot helped me finalize the codes throughout the development process, including prompts, edits, chat, autocomplete, model switcher, etc.

DEV Community: Maani K

The Hermes Rescue: How an Open Agent Rebuilt My GitHub Projects from Scratch

Step 4: Initializing and Running the Agent

OmniLearn: Multi-Agent AI School Bots for Universal Childhood Education

Universal Wallet: A KendoReact-Powered React App for Global Financial Inclusion

Universal Wallet: A KendoReact-Powered React App for Global Financial Inclusion

What I Built

Demo

How I Used KendoReact

Code Smarter, Not Harder: Using KendoReact AI Coding Assistant

RAGs to Riches: Integrating Nuclia

Tech Stack & Repo

Gemini Bots for humanity

YouTube agent ( filterbot)

Grok sCore

What I Built

Demo

Copilot Experience

GitHub Models