Asmae

Posted on Mar 4

Argus

#devchallenge #geminireflections #gemini

Built with Google Gemini: Writing Challenge

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

Information overload is real. As someone who follows tech markets and industry news, I found myself drowning in browser tabs, losing track of sources, and spending hours trying to synthesize scattered data into something actionable. I wanted a system that could watch the web for me — and that's how the Web Intelligence Platform was born.

The platform is a full-stack, production-ready web monitoring and market intelligence system built with Python (FastAPI), React + TypeScript, MongoDB, Redis, and Docker Compose. Here's what it does:

Project Management: Organize monitoring targets into isolated workspaces (e.g., "Renewable Energy", "AI Startups", "Competitor Watch")
Autonomous Crawling: Discover and scrape web sources with anti-blocking mechanisms — Playwright fallback for JS-heavy pages, User-Agent rotation, adaptive rate limiting, and retry logic
NLP Pipeline: Every scraped document goes through relevance scoring (TF-IDF / Cosine Similarity), Sentiment Analysis, and Named Entity Recognition (companies, people, locations)
Interactive Dashboard: Real-time crawling status, activity timelines, and aggregate metrics
AI Chatbot (RAG): Ask natural language questions about your scraped data and get cited, sourced answers

Google Gemini powered two critical layers of this project:

1. Antigravity IDE (Gemini-powered development assistant)

I used Antigravity throughout the entire build — not just for autocomplete, but for real architectural decisions. Its multi-file context awareness helped me design the RAG pipeline, debug complex async issues in FastAPI, write the Docker Compose orchestration for five interconnected services, and refine the NLP scoring logic. Having an AI that understands your whole codebase, not just the open file, was a genuine superpower.

2. Gemini 1.5 Pro API as the brain of the chatbot

The AI Assistant tab is powered by the Gemini API. When a user asks something like "What are the main risks mentioned across all sources this week?", the system retrieves the most relevant document chunks from MongoDB and feeds them to Gemini with strict instructions to cite sources:

import google.generativeai as genai

genai.configure(api_key=GEMINI_API_KEY)
model = genai.GenerativeModel("gemini-1.5-pro")

context = retrieve_relevant_chunks(user_query, top_k=5)

prompt = f"""
You are an analyst assistant. Answer the user's question using ONLY the context below.
Always cite your sources by document ID.

Context:
{context}

Question: {user_query}
"""

response = model.generate_content(prompt)

The key architectural decision was keeping Gemini focused on reasoning over retrieved context — not raw web browsing. This keeps responses grounded and auditable.

Demo

🔗 GitHub Repository: https://github.com/AsamaeS/Web-Analytics_projet-AS

What I Learned

Async is unforgiving. The FastAPI + Motor (async MongoDB) stack is incredibly fast, but debugging race conditions in background workers was humbling. I learned to design task queues defensively — idempotent jobs, dead-letter queues, and explicit retry limits.

RAG quality is a retrieval problem, not a model problem. My first instinct was to upgrade the model whenever answers felt off. The real fix was almost always upstream: better chunking, cleaner text extraction, or smarter relevance filtering before anything reached Gemini. The model is only as good as what you send it.

Playwright is your friend for modern web scraping. A surprising number of sites that look static are actually JS-rendered. Having Playwright as an automatic fallback — triggered when the lightweight scraper returns thin content — increased data yield significantly.

Scope creep is seductive. I had to cut several features mid-build (a graph visualization layer, Slack notifications, an auto-scheduling daemon) to ship something coherent. Knowing when to cut is as important as knowing what to build.

Documentation-first is faster, not slower. Writing the README and architecture docs before finishing the code forced me to confront design inconsistencies early — before they were baked in.

Google Gemini Feedback

What worked exceptionally well ✅

Long-context reasoning. Gemini 1.5 Pro's large context window was genuinely useful for RAG. I could feed it several long documents simultaneously and it synthesized across them coherently — something that required chunking tricks and multiple calls with smaller-context models.

Instruction-following fidelity. When I told the model to only answer from provided context and always cite document IDs, it respected that consistently. For a production RAG system where hallucinations destroy user trust, this reliability is critical and honestly underrated.

Antigravity IDE. The multi-file context awareness was the standout experience of the whole build. It didn't just help me write code — it helped me think through the system. When I described what I was building, it pushed back on architectural choices in ways that saved me from real design mistakes. It felt less like a smarter autocomplete and more like pairing with a senior engineer who had read all my code.

Where I ran into friction ⚠️

Rate limits during rapid iteration. In hackathon mode — where you're making dozens of test calls in quick succession — I hit limits more than expected. Better quota visibility and a more generous prototyping tier would make a real difference for builders under time pressure.

Latency on complex RAG prompts. With large contexts and multi-document prompts, response times were occasionally 3–5 seconds. For an interactive chatbot, this is noticeable. Streaming responses helped, but out-of-the-box latency could be improved.

Embeddings documentation. Setting up the retrieval side of RAG using Gemini's native embedding API felt less documented than the generative side. I had to piece together the workflow from multiple sources. A unified RAG quickstart — covering embedding, retrieval, and generation end-to-end — would save new users a couple of hours.

The surprise 💡

I expected Gemini to be a better version of what I already knew. What I didn't expect was how much Antigravity changed my development workflow. Reasoning about the whole project — not just a file — is qualitatively different from any AI tool I'd used before. That shift in how I collaborated with AI during the build was the biggest takeaway of this project.

What's next: Scheduled crawling, sentiment-shift alerts, and Gemini multimodal ingestion to extract data from PDFs and charts — because most market intelligence doesn't live in clean HTML.

Stack: Python 3.11 · FastAPI · MongoDB · Redis · React 18 · TypeScript · TailwindCSS · Docker Compose · Google Gemini 1.5 Pro · spaCy · scikit-learn · Playwright

DEV Community