DEV Community

Kaushikcoderpy
Kaushikcoderpy

Posted on

🧠 NeuroDoc: From Broken Prototype to Production-Ready Async AI Documentation Engine

GitHub β€œFinish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge

I abandoned this project. Then I resurrected it. Here's how a fragile CLI script became a full-stack async web dashboard with RAG capabilities.

System Execution Timeline

Time Interval Activity Description
00:00 - 00:15 Code Runner Demo execution
00:15 - 00:35 System command execution sequence
00:35 - 00:45 Dependency analysis performed on aiohttp
00:45 - 00:55 RAG (Retrieval-Augmented Generation) task initiated
00:55 System telemetry dashboard display
0.55 - 01.20 Waiting for RAG results
01:21 - 01:28 Backend log visualization
01:29 - 01:41 RAG query results display

🧨 The Problem β€” Why It Was Abandoned

NeuroDoc started as an ambitious idea: a single tool to fetch, scrape, process, and summarize documentation across Python, scikit-learn, PyTorch, and TensorFlow β€” powered by NLP and multi-core processing.

But it hit a wall fast.

# The villain: a blocking synchronous loop that froze everything
while True:
    query = input("Enter query: ")  # 🚫 BLOCKS the main thread
    result = fetch_docs(query)      # 🚫 BLOCKS background workers
    print(result)
Enter fullscreen mode Exit fullscreen mode

The original prototype had three fatal flaws:

Problem Impact
input() loop on main thread Blocked all background scraping workers
In-memory task queue All pending jobs vanished on crash
Brittle core resolver Failed silently on dynamic imports

Long-running doc crawls would stall. A single crash wiped the entire task queue. It was a house of cards β€” impressive from a distance, terrifying up close.

So I shelved it.


πŸ’‘ The Comeback β€” What Changed

Months later, I came back with a clear head and a plan. The rewrite wasn't incremental β€” it was architectural. Three shifts made everything click:

1. πŸ”„ Full Async Rewrite with asyncio + aiohttp

Out went the blocking loop. In came a proper async event loop that lets scraping, processing, and serving happen concurrently without stepping on each other.

async def fetch_documentation(url: str, session: aiohttp.ClientSession) -> DocResult:
    async with session.get(url, timeout=aiohttp.ClientTimeout(total=30)) as response:
        content = await response.text()
        return await process_content(content)

async def run_pipeline(queries: list[str]) -> list[DocResult]:
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_documentation(q, session) for q in queries]
        return await asyncio.gather(*tasks, return_exceptions=True)
Enter fullscreen mode Exit fullscreen mode

No more frozen terminals. No more stalled workers.


2. πŸ—„οΈ Database-Backed Task Queue (Goodbye, Volatile Memory)

The in-memory queue was replaced with a persistent, database-backed task queue using PostgreSQL and asyncpg. Now if the server crashes at 3 AM while crawling PyTorch docs, no work is lost. Tasks resume exactly where they left off.

class TaskQueue:
    async def enqueue(self, task: DocumentationTask) -> str:
        task_id = str(uuid.uuid4())
        await self.db.execute(
            ""INSERT INTO tasks (id, status, payload, created_at) VALUES ($1, $2, $3, $4)",
+            task_id, TaskStatus.PENDING, task.to_json(), datetime.utcnow()",
            (task_id, TaskStatus.PENDING, task.to_json(), datetime.utcnow())
        )
        return task_id

    async def get_next(self) -> DocumentationTask | None:
        row = await self.db.fetchone(
            "SELECT * FROM tasks WHERE status = 'pending' ORDER BY created_at LIMIT 1"
        )
        return DocumentationTask.from_row(row) if row else None
Enter fullscreen mode Exit fullscreen mode

3. πŸ” RAG β€” Retrieval-Augmented Generation Layer

This is where NeuroDoc levels up from "scraper" to "intelligent documentation assistant."

Instead of returning raw docs, it:

  1. Chunks scraped content into semantic segments
  2. Embeds them into a vector store
  3. Retrieves the most relevant chunks for a query
  4. Generates a focused, context-aware summary
class RAGPipeline:
    async def query(self, user_query: str) -> RAGResponse:
        # Step 1: Embed the query
        query_embedding = await self.embedder.embed(user_query)

        # Step 2: Retrieve top-k relevant chunks
        relevant_chunks = await self.vector_store.similarity_search(
            query_embedding, top_k=5
        )

        # Step 3: Generate grounded summary
        context = "\n\n".join(chunk.text for chunk in relevant_chunks)
        summary = await self.llm.generate(
            prompt=f"Answer based on this documentation:\n{context}\n\nQuery: {user_query}"
        )

        return RAGResponse(summary=summary, sources=relevant_chunks)
Enter fullscreen mode Exit fullscreen mode

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Web Dashboard (FastAPI)            β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚              β”‚  Submit  β”‚   Results    β”‚            β”‚
β”‚              β”‚  Query   β”‚   Viewer     β”‚            β”‚
β”‚              β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚            β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚       Async Task Dispatcher      β”‚
          β”‚    (asyncio + DB task queue)     β”‚
          β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚                  β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Multi-core     β”‚    β”‚   RAG Pipeline       β”‚
    β”‚  Doc Scraper    β”‚    β”‚  (Embed β†’ Retrieve   β”‚
    β”‚  (aiohttp)      β”‚    β”‚   β†’ Generate)        β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚                      β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚            PostgreSQL DB           
    β”‚   (tasks Β· chunks Β· embeddings Β· results)    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

πŸ“¦ Supported Documentation Sources

Library Sections Scraped NLP Processing
🐍 Python stdlib, builtins, language ref Code extraction, summaries
πŸ€– scikit-learn API reference, user guide Table parsing, param docs
πŸ”₯ PyTorch Tensor ops, nn, autograd Code snippets, examples
🌊 TensorFlow Keras, tf.data, layers API signatures, guides

πŸš€ Getting Started

# Clone the repo
git clone https://github.com/kaushikcoderpy1/neurodoc
cd neurodoc

# Install dependencies
pip install -r requirements.txt

# Initialize the database
python -m neurodoc.db init

# Start the async dashboard
uvicorn neurodoc.app:app --reload --port 8000
Enter fullscreen mode Exit fullscreen mode

Then open http://localhost:8000 and start querying.


πŸ§ͺ Key Technical Decisions β€” And Why

Why asyncio over threading?
Doc scraping is I/O-bound (waiting on HTTP). asyncio handles thousands of concurrent requests with a single thread β€” no GIL fights, no race conditions.

Why SQLite for the task queue instead of Redis?
Zero infrastructure. NeuroDoc is a dev tool β€” adding a Redis dependency just to persist a queue adds friction. SQLite WAL mode handles concurrent reads/writes cleanly for this use case.

Why RAG over fine-tuning?
Documentation changes constantly. RAG retrieves from live-scraped content. A fine-tuned model would be stale in weeks.



πŸ€– How GitHub Copilot Saved NeuroDoc β€” 4 Critical Bugs It Helped Crush

This section is the heart of the comeback story. NeuroDoc didn't just get rewritten β€” it got debugged at a deep architectural level with Copilot as a true pair programmer. Here are four real, production-blocking bugs it helped resolve.


πŸ› Bug 1: Async Database Connection Pool Leaks Under Multi-Core Batches

The failure: Under high-concurrency loads via asyncio.gather, edge-case exceptions inside sub-coroutines bypassed connection release hooks β€” leaving asyncpg pool sockets exhausted and the app hanging silently.

Standard try/finally cleanup blocks failed because they referenced stale async contexts. The pool hit max capacity and froze.

How Copilot helped:

Copilot introduced a strict connection acquisition pattern bound directly to local transaction lifecycles, with absolute timeout guards:

# Copilot-suggested acquisition pattern
async with pool.acquire() as connection:
    async with connection.transaction():
        result = await asyncio.wait_for(
            connection.fetch(query, *args),
            timeout=5.0  # Hard boundary β€” no silent hangs
        )
Enter fullscreen mode Exit fullscreen mode

It also added global exception wrappers that translate raw driver errors into clean structured responses β€” guaranteeing connection cleanup even if the downstream scraping pipeline crashed.


πŸ› Bug 2: SpecifierSet .contains() AttributeError Across Packaging Versions

The failure: formatter.py runs dependency diagnostics via DependencyAnalyzer. On environments with older packaging library versions, calling .contains() on a SpecifierSet threw:

AttributeError: 'SpecifierSet' object has no attribute 'contains'
Enter fullscreen mode Exit fullscreen mode

This crashed the entire diagnostic panel before it could render β€” silently breaking environment validation for a large chunk of users.

How Copilot helped:

Copilot identified that .contains() is version-specific, but the native in operator is universally backward-compatible across all historical releases of packaging:

# ❌ Old failing code
elif not raw_spec.contains(local):

# βœ… Copilot's robust fix β€” works on every packaging version
elif local not in raw_spec:
Enter fullscreen mode Exit fullscreen mode

One operator swap. Zero crashes across all environments.


πŸ› Bug 3: Implicit String Mappings Breaking Single-Core CLI Dispatch

The failure: In neurodoc.py, CLI input like neurodoc fetch os passed the core ID "1" as a raw string into isinstance(core, Core1PythonBasics) checks. Since "1" is a string, every check silently fell through with:

Unknown core type for str
Enter fullscreen mode Exit fullscreen mode

Worse β€” the topic "os" was passed into the batch resolver without list wrapping, so it iterated over the characters 'o' and 's' separately instead of treating "os" as a unified module name.

How Copilot helped:

Copilot introduced dynamic string dereferencing that maps string IDs back to their live handler instances, plus list-wrapping for topic encapsulation:

# Dynamic dereference β€” string β†’ live core handler
if isinstance(core, str):
    core = self.command_handler.available_cores.get(core)

# Topic wrapped as list β€” no more character iteration
return await self.call_backend("core1", topics=[topic_f], flags=flags)
Enter fullscreen mode Exit fullscreen mode

πŸ› Bug 4: NLP Tensor Dimension Mismatch in Cross-Encoder Similarity

The failure: nlp_with_cos.py calculates semantic similarity across documentation topics using PyTorch/TensorFlow models. Queries of varying lengths produced tensors with mismatched dimensions, throwing:

RuntimeError: Tensors must be of the same shape
Enter fullscreen mode Exit fullscreen mode

This crashed deep multi-core fetches completely β€” the most expensive operation in the entire pipeline.

How Copilot helped:

Copilot suggested a preprocessing step using dynamic zero-padding and truncation to align all input vectors before the cosine similarity matrix calculation:

# Copilot's shape-alignment fix
inputs = tokenizer(
    text,
    padding="max_length",
    truncation=True,
    max_length=512,
    return_tensors="pt"
)
Enter fullscreen mode Exit fullscreen mode

All tensors now enter the similarity layer at identical dimensions β€” no shape mismatches, no crashes.


πŸ’¬ What Copilot Actually Felt Like as a Pair Programmer

These weren't simple autocomplete suggestions. Copilot reasoned about async lifecycle boundaries, cross-version API compatibility, type system edge cases, and linear algebra constraints β€” the kind of bugs that take hours of debugging to even locate, let alone fix.

The biggest unlock: it didn't just fix the symptom. For each bug, it explained why the original approach was fragile and offered a pattern that would hold up under production conditions.

That's the difference between a tool and a collaborator.


πŸ”­ What's Next

  • [ ] Browser extension for one-click doc lookup
  • [ ] Streaming responses via WebSockets
  • [ ] Support for Hugging Face, Pandas, NumPy docs
  • [ ] Self-hosted embedding model (no API key required)
  • [ ] Export summaries as Jupyter notebooks

πŸ”— Links


Built for the DEV.to hackathon. Powered by stubbornness, async Python, and too much coffee.

Top comments (0)