DEV Community: manoj mallick

From 93 to 400 GitHub stars in 8 days — what the numbers actually mean

manoj mallick — Wed, 13 May 2026 11:30:49 +0000

I built SigMap in April. By May 5 it had 93 stars — growing slowly through organic
discovery. Then last week happened.

61K Reddit views. 90.5% upvote ratio. 307 new stars in 8 days.

SigMap is now in the top 1.5% of all coding AI repositories on GitHub.

Here's what it does and why the numbers moved.

The problem it solves

When you use Claude Code, Cursor, or Copilot on a large codebase, the tool
reads your open files or sends the whole project. On real repos, that's
80,000+ tokens per session. 13 of 18 repos I benchmarked overflow GPT-4o's
context window entirely.

The AI is working with an incomplete, random picture of your codebase.

How SigMap fixes it

Instead of sending full source files, SigMap:

Extracts function signatures + import graph from your entire codebase
Ranks them by TF-IDF against your specific query
Sends 200–4,000 query-ranked tokens instead of 80,000 random ones

Different question = different context. The AI sees what's relevant, not what's open.

The benchmark numbers

Tested across 18 repos, 90 tasks, no LLM API calls:

80.0% hit@5 vs 13.6% random baseline — 5.9× lift
96.8% average token reduction
52.2% task success vs 10% without SigMap
41.0% fewer prompts per task

The 405-repo large-scale token benchmark is on Zenodo (DOI: 10.5281/zenodo.19898842).
Peer-reviewed. Reproducible.

Why zero dependencies matters

Most context tools require a vector database, an embedding model, or cloud
infrastructure. SigMap uses Node.js stdlib only — crypto, fs, path, child_process.
Zero npm dependencies. You can audit every line.

This is a deliberate constraint. A tool that reduces your AI's dependencies
shouldn't add its own.

How to try it

npx sigmap
sigmap ask "how does the auth flow work"

That's it. Works offline. Works with any model.

MCP integration for Claude Code, Cursor, Copilot, Windsurf, Codex, Gemini CLI.

→ GitHub: https://github.com/manojmallick/sigmap
→ Benchmark: https://github.com/manojmallick/sigmap-benchmark-suite
→ Docs: https://manojmallick.github.io/sigmap

Built in Amsterdam. Open source. MIT licensed.

Complete SigMap Ecosystem

manoj mallick — Thu, 30 Apr 2026 01:17:26 +0000

This benchmark is part of the larger SigMap project:

SigMap Tool

Repository: github.com/manojmallick/sigmap

The production-ready context extraction tool that powers this benchmark.

Features:

Multi-language analysis (30+ languages)
5 benchmark modes
Real-time operation
Integration-ready API

SigMap Documentation

Website: manojmallick.github.io/sigmap

Complete guide to installation, usage, and integration.

Covers:

Setup and installation
API reference
Configuration options
Integration examples
Best practices

SigMap Benchmark Suite

Repository: github.com/manojmallick/sigmap-benchmark-suite

This project — large-scale empirical evaluation and datasets.

Includes:

405 repository analysis
Research papers
Complete datasets
Reproducibility scripts

Getting Started

Want to use SigMap?
→ Start with Documentation
Want to evaluate SigMap?
→ Download Benchmark Suite
Want to extend SigMap?
→ Check out Tool repository
Want to do research?
→ Use Benchmark Suite dataset

How I got 80% code retrieval accuracy without vectors, embeddings, or any ML

manoj mallick — Fri, 17 Apr 2026 15:57:48 +0000

I wanted to answer one question: how far can pure heuristic
retrieval go before you actually need embeddings?

The answer surprised me.

The problem with AI coding tools today

When you paste your codebase into an LLM, you're typically
sending 60,000–100,000 tokens of raw source code. Most of
that is noise — loop bodies, imports, boilerplate — that
never shows up in the answer.

The model reads the wrong file. Guesses the rest.
You retry 2–3 times.

The insight

Code identifiers are already the compressed representation.

parseToken(src: string, opts?: ParseOpts) → Token[]

That signature tells a retrieval system everything it needs
to decide "is this file relevant to my query?" The body adds
nothing for retrieval purposes.

Embedding that signature loses information — you're projecting
a precise vocabulary into a dense vector. Exact token match
keeps it.

How SigMap works

Walk the codebase, extract signatures per file using language-specific regex extractors (21 languages)
Build a signature index: Map
At query time, tokenize the query (camelCase/snake_case split, stop-word removal)
Score every file using stacked heuristics:

exact token match    +1.0
symbol name hit      +0.5
path token match     +0.8  
prefix match         +0.3
recency boost        ×1.5 multiplier

Return top-K files. Done.

No model inference. No API calls. Runs in ~200ms on a
50K file repo.

Benchmark results

Tested across 18 real open-source repos, 90 hand-labeled
(query → expected_file) tasks:

Metric	Result
Hit@5	80.0%
Random baseline	13.6%
Lift	5.8x
Token reduction	98.1%
Prompts per task	1.69 vs 2.84

The random baseline is 1 / avg_files_in_repo — it's what
you get by picking files at random. SigMap hits 5.8x that
with zero ML.

Why path match is underrated

Queries like "python extractor" or "retrieval ranker"
self-select by path before a single signature is checked.

src/extractors/python.js scores +0.8 just from
path match. Most well-structured repos are
src/feature/thing.js — this is essentially free recall.

Where it fails (~20% of tasks)

Implicit intent: "how does auth work" when auth functions are named validateSession, checkPermissions etc. — no keyword overlap
Synonyms: "authenticate" ≠ "login" unless both appear as identifiers
Multi-hop: "find where input gets validated before the DB" needs graph traversal, not single-file scoring

This is where embeddings earn their cost — on the hard 20%,
not the easy 80%.

Try it

npx sigmap

Generates a context file your IDE or LLM already knows
how to read. Works with Claude, Cursor, Copilot, Gemini,
Windsurf, OpenAI.

GitHub: https://github.com/manojmallick/sigmap

Curious what retrieval approaches others are using —
especially around multi-hop and the hybrid TF-IDF +
embeddings tradeoff.

I Built an AI Agent That Sits in Your Incident War Room and Writes DORA Compliance Reports in Real Time

manoj mallick — Mon, 16 Mar 2026 22:48:30 +0000

The 5 AM Problem

It's 3 AM. Your payment gateway is down. 73,000 customers can't transact. Your engineers are in a war room, screaming service names, error codes, and blast radius numbers at each other over a Zoom call.

You fix it by 4 AM. Good.

Now comes the other clock. Under EU DORA Article 11.1(a), you have 4 hours from incident classification to notify your competent authority. That means someone — usually the most senior compliance person — has to reconstruct everything that happened from memory, channel logs, and a half-dozen Grafana screenshots, and turn it into a structured regulatory report. By 5 AM. While still exhausted.

That report currently takes 4+ hours.

I built ARIA (Automated Regulatory Incident Analyst) to do it in under 8 minutes, live, while the incident is still happening.

What ARIA Does

ARIA joins the incident call as a silent agent. It:

Listens to every spoken word via the Gemini Live API — hearing service names (payment-gateway-v2), error codes (503, EXHAUSTED), and impact numbers (73,000 users, 7.3% failure rate)
Watches engineers' screens every 5 seconds — reading Grafana dashboards, kubectl output, alert panels
Builds the DORA Article 11 report in real time, section by section, as evidence comes in
Switches persona based on who's speaking — gives technical commands to engineers, cites exact regulatory clauses to compliance officers, and speaks plain business language to executives
Triggers a 4-hour countdown clock the moment the DORA threshold is crossed (>5% transaction failure rate)

By the time the incident is resolved, the compliance report is already written.

The Hard Part: Gemini Live API on AI Studio

This challenge had one brutal technical constraint: the Gemini Live API (bidiGenerateContent) is only available on native-audio models on AI Studio keys. Specifically: gemini-2.5-flash-native-audio-latest.

These models support real-time bidirectional audio streaming and produce inputTranscription of what participants say — but they cannot emit structured JSON text output directly. They're designed for voice-to-voice applications, not voice-to-JSON pipelines.

My first 8 attempts at the architecture failed:

Version	Model	Config	Result
v1	`gemini-2.0-flash`	`startChat()`	No Live API in old SDK
v2	`gemini-2.0-flash-live-001`	correct config	1008 — not found on AI Studio
v3–v4	various flash models	bidiGenerateContent	1008 — model not available
v5	native-audio	TEXT modality	1007 — "Cannot extract voices"
v6	native-audio	AUDIO+TEXT + systemInstruction	1007 — "Invalid argument"
v7	native-audio	AUDIO only, no systemInstruction	✅ Session stays open

The breakthrough was understanding that gemini-2.5-flash-native-audio-latest is a voice-to-voice model. It will reject TEXT modality and reject systemInstruction in the live config. You must give it responseModalities: ['AUDIO'] and nothing else.

The Hybrid Architecture

Gemini Live  (gemini-2.5-flash-native-audio-latest)
  responseModalities: ['AUDIO']   ← stays open
  → inputTranscription fires per speech turn
  → on turnComplete → transcript string

generateContent (gemini-2.5-flash)
  systemInstruction: ARIA_ANALYST_PROMPT
  responseMimeType: 'application/json'
  contents: [{ role: 'user', parts: [{ text: transcript }] }]
  → structured IncidentEvent JSON
  → Zod validation → Pub/Sub → 3 ADK agents → SSE → browser

Two models working together: the Live session handles the real-time audio stream and transcription, and a separate generateContent call handles the structured reasoning. Each does what it's actually good at.

// listenerAgent.js — the key insight
session = await ai.live.connect({
  model: 'gemini-2.5-flash-native-audio-latest',
  config: {
    responseModalities: ['AUDIO'],
    // No systemInstruction here — live model = transcription only
  },
  callbacks: {
    onmessage: (message) => {
      const sc = message.serverContent
      if (sc?.inputTranscription?.text) {
        transcriptBuffer += sc.inputTranscription.text
      }
      if (sc?.turnComplete && transcriptBuffer.trim()) {
        generateIncidentEvent(transcriptBuffer.trim(), incidentId)
        transcriptBuffer = ''
      }
    }
  }
})

The 5-Agent ADK Pipeline

Once a structured IncidentEvent JSON lands in Pub/Sub, three Google ADK agents process it sequentially:

Pub/Sub: incident-events
    │
    ▼
[Analyst Agent]       ← root cause, blast radius, severity classification
    │
    ▼  Pub/Sub: incident-analysis
    │
[Compliance Agent]    ← DORA Art. 11.1(a/b/c), SOX 404, notification deadlines
    │
    ▼  Pub/Sub: compliance-mappings
    │
[Reporter Agent]      ← generates 6 report sections, writes to Firestore, broadcasts via SSE

The Reporter Agent generates each section with a targeted prompt — Timeline, Blast Radius, Root Cause, Regulatory Obligations, Remediation, and Executive Summary — and broadcasts them live via SSE. Each section appears in the browser as it's generated, creating the "report building before your eyes" effect.

The Infrastructure

Everything is Terraform-provisioned on GCP:

Cloud Run — containerised Node.js 20, min-instances=1, CPU always-on (critical for WebSocket longevity)
Cloud Pub/Sub — 4 topics + 4 DLQ topics for the agent chain
Firestore — incident state and report sections
Cloud Build — CI/CD: git push main → build → deploy → new revision
Artifact Registry — Docker image store
Secret Manager — GEMINI_API_KEY never in plaintext env vars

One command to provision everything:

cd terraform
terraform apply -var="project_id=YOUR_PROJECT" -var="gemini_api_key=YOUR_KEY"

What a Real Incident Looks Like

You open ARIA, type the incident title, and click Start Incident. Then click Start Listening — the browser requests microphone permission and immediately begins streaming PCM audio at 16kHz to the server via WebSocket.

You say into your microphone:

"payment-gateway-v2 is throwing 503 errors, postgres connection pool is exhausted, 73,000 users affected, 7.3 percent transaction failure rate"

Seconds later:

The transcript card appears in the Live Transcript panel — raw quote + ARIA's spoken response
The DORA clock switches to orange and starts counting down from 4:00:00
The DORA Article 11 Report panel begins building — Timeline... Blast Radius... Root Cause... Regulatory Obligations (citing exact DORA Article 11.1(a) clause + notification deadline)... Remediation... Executive Summary
The persona badge in the top-right switches based on vocabulary — Engineer → Compliance → Executive

When a compliance officer asks "which DORA clause does this trigger?", ARIA responds with exact clause citations. When the CEO asks "what do I tell the board?", ARIA gives a business-impact summary with no jargon.

The Numbers

Metric	Before ARIA	With ARIA
Compliance report completion	4+ hours post-mortem	Under 8 minutes live
DORA notification prep	Manual, from memory	Automated, from real-time evidence
Multi-stakeholder communication	One message for all	Persona-adapted per audience
Audit trail	Channel logs + memory	Timestamped, structured, Firestore-persisted

Try It

Live demo: https://regguardian-908307939543.us-central1.run.app

Source: https://github.com/manojmallick/regguardian

The repo includes full Terraform IaC, a multi-stage Dockerfile, Zod schemas, and the complete ADK agent pipeline. The README has step-by-step deployment instructions.

What I Learned

The biggest lesson: the Gemini Live API is genuinely different from a text model with audio input. It's a voice-to-voice model designed for conversational agents. Trying to use it like a text model (adding TEXT modality, systemInstruction, structured output) breaks the session within milliseconds with a 1007.

The hybrid architecture — using the Live model purely for transcription and a separate generateContent call for reasoning — is the correct pattern for building agentic pipelines on top of the Live API. The Live session stays permanently open for as long as the incident lasts. The reasoning model gets invoked once per speech turn.

Built for the Gemini Live Agent Challenge 2026.

#GeminiLiveAgentChallenge #GoogleCloud #GeminiAPI #DORA #IncidentResponse #AI

From Red CI to Green PR — Automatically, Safely, and with Evidence

manoj mallick — Fri, 06 Feb 2026 21:22:38 +0000

This is a submission for the GitHub Copilot CLI Challenge.

What I Built

I built copilot-ci-doctor, a CLI tool that diagnoses and fixes GitHub Actions CI failures using GitHub Copilot CLI as its core reasoning engine.

Instead of manually digging through noisy logs and guessing fixes, the tool turns a failed CI run into a structured, evidence-based workflow:

failure → evidence → reasoning → safe fix → green CI → Pull Request

Given a failed workflow, copilot-ci-doctor:

Collects a tagged Evidence Bundle (repo metadata, failed jobs, logs, workflow YAML)
Uses GitHub Copilot CLI to reason about the failure
Explains why the CI failed in plain English
Generates minimal, safe patch diffs with confidence scores
Iteratively applies fixes until CI passes
Automatically opens a Pull Request against main

This is not log summarization or autocomplete.
Copilot is used as a reasoning engine that must justify its conclusions using evidence.

Demo

40-second end-to-end demo (recommended viewing):

https://www.youtube.com/watch?v=6w3kjiRh8as

👉 https://github.com/manojmallick/copilot-ci-doctor#-40-second-demo-end-to-end

One command → failing CI → Copilot reasoning → safe fixes → green CI → PR

npx copilot-ci-doctor demo

What the demo shows:

A demo repository is created with a deliberately broken GitHub Actions workflow
CI fails ❌
copilot-ci-doctor enters an automated loop:

analyzes the failure
explains the root cause
proposes a minimal patch
applies and pushes the fix
waits for CI to re-run
1. The process repeats (multiple iterations if needed)
2. CI turns green ✅
3. A Pull Request is automatically opened with the fix

The demo handles real GitHub latency and shows the full lifecycle, including:

multiple CI failures
diff previews
iteration scoreboard
final PR link

Source code and demo assets:
https://github.com/manojmallick/copilot-ci-doctor

npm package:
https://www.npmjs.com/package/copilot-ci-doctor

My Experience with GitHub Copilot CLI

This project fundamentally changed how I think about GitHub Copilot.

Instead of using Copilot to write code, I used GitHub Copilot CLI to reason about systems.

Copilot CLI is used to:

analyze CI evidence and form ranked hypotheses
explain failures in plain English (including why CI fails but local passes)
generate minimal unified diffs, not full rewrites
attach confidence scores and risk levels to each fix

To make this reliable:

Every Copilot response must follow a strict JSON contract
Every conclusion must reference evidence IDs (E1, E2, …)
Patch diffs are validated and normalized before being applied
A single-call mode combines analysis + explanation + patch to reduce token usage by ~60%

The result is a workflow where Copilot behaves less like an assistant and more like a careful, explainable CI engineer.

This challenge pushed me to think beyond autocomplete and explore how Copilot CLI can safely automate complex, real-world developer workflows.

From Prompts to Autonomous Ecosystems: My Learning Journey in the 5-Day Google x Kaggle AI Agents Intensive

manoj mallick — Wed, 03 Dec 2025 23:47:14 +0000

This is a submission for the Google AI Agents Writing Challenge: Learning Reflections.

Over the last five days, I took the Google x Kaggle AI Agents Intensive Course — and what started as “learning how to prompt better” quickly expanded into a complete understanding of how real AI agents think, act, store memory, collaborate, and evaluate themselves.

What surprised me most is how each day built naturally on the previous one, almost like watching a simple idea grow into a full intelligent ecosystem.

Below is my journey — day by day — with real-life analogies that helped me internalize the concepts.

🌱 Day 1 — From Prompt to Action (1A) & Agent Architecture (1B)

“A prompt is not the end. It is the ignition.”

On Day 1, I realized something fundamental:

A prompt is not a request — it is an instruction chain starter.

The first lesson showed how:

prompts → goals
goals → decisions
decisions → actions

In real life, it felt like giving instructions to a personal assistant:

“Can you plan a birthday party for me?”

You don’t want a single answer —

you want:

venue suggestions
budget organisation
guest list management
timeline planning

This is what agents do.

They interpret the prompt as a multi-step workflow, not a single response.

🧠 1B — Agent Architecture (Expanded & Highlighted)

“If prompts are the spark, the architecture is the engine that makes an agent move.”

Agent Architecture was the first moment where I understood that an AI agent is not a chatbot.

It is a system composed of multiple interacting components — like a small intelligent organization.

🔹 The 4 Core Components of Modern Agent Architecture

1️⃣ Planner (the “brain”)

Interprets the request and converts it into structured steps.

A planner transforms vague human language → actionable plan.

2️⃣ Tools (the “hands and legs”)

Tools enable the agent to do things:

search
run code
query APIs
manipulate files
analyze data

Intelligence becomes action only when tools exist.

3️⃣ Memory (the “long-term knowledge”)

Stores:

user preferences
prior steps
facts
context

This is what separates an agent from a chatbot.

4️⃣ Evaluator (the “quality inspector”)

Checks for:

accuracy
safety
hallucinations
correctness of tool usage

An evaluator makes the agent self-aware and self-correcting.

🔸 The 3 Major Types of Agent Architectures

One thing I appreciated was understanding that there isn’t just one architecture.

Different designs fit different needs.

1️⃣ Reactive Agents (simple responders)

No planning
No long-term memory
Respond instantly Good for quick, rule-based answers.

2️⃣ Deliberative Agents (think → plan → act)

Multi-step reasoning
Tool usage
Self-correction These feel closest to intelligent assistants.

3️⃣ Hybrid Agents (the best of both worlds)

They can:

react quickly
plan deeply
remember patterns
use tools This is what most advanced production systems use today.

🧩 The Agent Loop

The architecture works through a continuous cycle:

Input → Plan → Use Tools → Observe → Update Memory → Evaluate → Repeat

This loop makes agents feel alive — adjusting strategies dynamically until the task is complete.

By the end of Day 1, I found myself thinking less about “better prompts” and more about

how to architect intelligent systems with components that think, act, remember, and evaluate.

🧰 Day 2 — Agent Tools (2A) & Best Practices (2B)

“An agent without tools is a smart person with no hands.”

Tools turn agents into doers.

Examples:

search APIs
code execution
file operations
data extraction

Real-life analogy:

If Day 1 built the “brain,”

Day 2 gave the assistant a laptop, a phone, and the internet.

Best Practices

Key insights I took away:

Give tools only when needed
Define strict input/output formats
Test tools independently
Sandbox anything that could cause errors

Tools aren’t features—they are responsibilities.

🧭 Day 3 — Sessions (3A) & Memory (3B)

“Memory is the difference between a chatbot and a companion.”

Sessions

Sessions allow agents to:

stay aware of the conversation
continue tasks
maintain context

Like telling a human:

“Let’s pick up where we left off.”

Memory

Memory was a breakthrough concept.

Agents can store:

your preferences
your style
your earlier decisions
the history of the workflow

Real-life analogy:

A personal trainer remembering your injuries, goals, and routines.

Memory transforms agents into something that can grow with you.

🔍 Day 4 — Observability (4A) & Evaluation (4B)

“If you cannot observe it, you cannot improve it.”

Observability

Agents need to expose:

logs
metrics
errors
internal reasoning
tool usage

Just like monitoring production software, observability helps answer:

Why did the agent behave this way?
Where did a mistake happen?
What step caused a failure?

Evaluation

Agents evaluate:

correctness
safety
reliability
latency
cost

This is where agents become measurable, tunable, and improvable.

Like reviewing your work and improving your workflow.

🤝 Day 5 — Agent-to-Agent Communication (5A)

“One agent is powerful. Two agents are a team.”

On the final day, everything came together.

Agents can:

delegate
cross-check each other
collaborate
negotiate
co-plan tasks

Real-life example:

Imagine multiple assistants:

one finds hotels
one checks reviews
one books transport
one optimizes budget Together → a flawless travel plan.

This showed me the future isn’t one super-agent.

It's ecosystems of specialized agents working together.

🌟 My Biggest Takeaways

✔ Prompts are not messages — they are architectures in disguise

✔ Tools turn agents into action-takers

✔ Memory creates personalization and continuity

✔ Observability brings reliability

✔ Evaluation ensures improvement

✔ Multi-agent systems unlock scalability

The course trained me to think like an AI systems architect, not just an AI user.

💡 Final Reflection

I entered the course thinking:

“I want to learn how AI agents work.”

I finished the course thinking:

“I want to build AI agent ecosystems that mirror real-world teamwork.”

The progression from

prompt → architecture → tools → memory → evaluation → agent-to-agent orchestration

changed how I view AI completely.

Agents aren’t just chat interfaces.

They are self-improving collaborators that can scale workflows, automate complexity, and amplify human capability.

This course didn’t only teach me concepts —

it reshaped how I view the future of intelligent systems.

Thanks to Google, Kaggle, and the Dev community for this opportunity to grow, learn, and build.