<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Muhammad Ali Nasir</title>
    <description>The latest articles on DEV Community by Muhammad Ali Nasir (@al1nasir).</description>
    <link>https://dev.to/al1nasir</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3817260%2F1fe7d997-e7ca-4b37-b3d3-4def622b0758.jpg</url>
      <title>DEV Community: Muhammad Ali Nasir</title>
      <link>https://dev.to/al1nasir</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/al1nasir"/>
    <language>en</language>
    <item>
      <title>LocalForge: I built a self-hosted LLM control plane with intelligent routing and LoRA finetuning</title>
      <dc:creator>Muhammad Ali Nasir</dc:creator>
      <pubDate>Thu, 23 Apr 2026 14:33:00 +0000</pubDate>
      <link>https://dev.to/al1nasir/localforge-i-built-a-self-hosted-llm-control-plane-with-intelligent-routing-and-lora-finetuning-2149</link>
      <guid>https://dev.to/al1nasir/localforge-i-built-a-self-hosted-llm-control-plane-with-intelligent-routing-and-lora-finetuning-2149</guid>
      <description>&lt;p&gt;Running local LLMs is easy. Running them &lt;em&gt;well&lt;/em&gt; in a real application is not.&lt;/p&gt;

&lt;p&gt;You end up with fragile inference scripts, no idea which model fits which task, manual VRAM calculations, and zero observability into what's actually happening. I got tired of it, so I built &lt;strong&gt;LocalForge&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What it is
&lt;/h2&gt;

&lt;p&gt;LocalForge is a self-hosted AI control plane. It exposes a single OpenAI-compatible endpoint and handles everything else — model lifecycle, intelligent routing, memory, and finetuning.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Your app stays the same. Just change base_url.&lt;/span&gt;
client &lt;span class="o"&gt;=&lt;/span&gt; openai.OpenAI&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8010/v1"&lt;/span&gt;, &lt;span class="nv"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"lf-xxx"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
response &lt;span class="o"&gt;=&lt;/span&gt; client.chat.completions.create&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"auto"&lt;/span&gt;, &lt;span class="nv"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;...]&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  How the router works
&lt;/h2&gt;

&lt;p&gt;When you send &lt;code&gt;model: "auto"&lt;/code&gt;, the routing engine:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Classifies the query&lt;/strong&gt; — TF-IDF + Logistic Regression, under 5ms, into coding / math / reasoning / instruction / general&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scores each model&lt;/strong&gt; using:

&lt;ul&gt;
&lt;li&gt;Benchmark scores from HuggingFace (MMLU-Pro, HumanEval, GSM8K) — 40%&lt;/li&gt;
&lt;li&gt;Vector memory of past query→outcome pairs stored in Qdrant — 30%&lt;/li&gt;
&lt;li&gt;Measured latency on your hardware — 15%&lt;/li&gt;
&lt;li&gt;Thumbs up/down feedback — 15%&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Falls back to cloud&lt;/strong&gt; (OpenAI/Gemini) if confidence &amp;lt; 0.3&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The memory layer uses &lt;code&gt;nomic-embed-text-v1.5&lt;/code&gt; to embed every query locally. Similar past queries are retrieved at routing time, and scores decay exponentially (λ = 0.95) so fresh failures hurt more than old ones.&lt;/p&gt;
&lt;h2&gt;
  
  
  VRAM lifecycle
&lt;/h2&gt;

&lt;p&gt;Consumer GPUs can only hold 1–2 models at a time. LocalForge manages atomic state transitions:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UNLOADED → LOADING → HOT → UNLOADING → UNLOADED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Requests queue during model swaps. The "Resident Model" (most-used in the past 24h) is prioritized to stay loaded.&lt;/p&gt;
&lt;h2&gt;
  
  
  Finetuning pipeline
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Upload CSV or JSONL dataset via the dashboard&lt;/li&gt;
&lt;li&gt;Pick base model + hyperparameters&lt;/li&gt;
&lt;li&gt;Training runs in an isolated subprocess via Unsloth (2× faster, 60% less VRAM)&lt;/li&gt;
&lt;li&gt;Live loss curves stream to the browser via SSE&lt;/li&gt;
&lt;li&gt;On completion: LoRA adapters merged → GGUF exported → model auto-registered in the router&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Tech stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tech&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;FastAPI + aiosqlite (WAL)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Next.js 16 + React 19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inference&lt;/td&gt;
&lt;td&gt;llama-cpp-python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector store&lt;/td&gt;
&lt;td&gt;Qdrant (disk, no Docker)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embeddings&lt;/td&gt;
&lt;td&gt;nomic-embed-text-v1.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Finetuning&lt;/td&gt;
&lt;td&gt;Unsloth / PEFT + TRL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Classifier&lt;/td&gt;
&lt;td&gt;scikit-learn TF-IDF + LogReg&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  GitHub
&lt;/h2&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/al1-nasir" rel="noopener noreferrer"&gt;
        al1-nasir
      &lt;/a&gt; / &lt;a href="https://github.com/al1-nasir/LocalForge" rel="noopener noreferrer"&gt;
        LocalForge
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Self-hosted AI control plane for intelligent local LLM orchestration. OpenAI-compatible API · ML-powered multi-model routing · LoRA finetuning · vector memory · RAG 
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/al1-nasir/LocalForge/banner.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fal1-nasir%2FLocalForge%2FHEAD%2Fbanner.png" alt="LocalForge Header"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
  &lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/f5641ef7113d69151aa77264d83c1d3a9b919fe5c6161524e26ff77fcfa48eee/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f507974686f6e2d332e31302b2d626c75653f7374796c653d666f722d7468652d6261646765266c6f676f3d707974686f6e266c6f676f436f6c6f723d7768697465"&gt;&lt;img src="https://camo.githubusercontent.com/f5641ef7113d69151aa77264d83c1d3a9b919fe5c6161524e26ff77fcfa48eee/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f507974686f6e2d332e31302b2d626c75653f7374796c653d666f722d7468652d6261646765266c6f676f3d707974686f6e266c6f676f436f6c6f723d7768697465" alt="Python 3.10+"&gt;&lt;/a&gt;
  &lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/efe83f27bff0d95abbf8e4149a475beb6c588020de491ece9207566d1e2a69c3/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4e6578742e6a732d31362d626c61636b3f7374796c653d666f722d7468652d6261646765266c6f676f3d6e6578742e6a73266c6f676f436f6c6f723d7768697465"&gt;&lt;img src="https://camo.githubusercontent.com/efe83f27bff0d95abbf8e4149a475beb6c588020de491ece9207566d1e2a69c3/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4e6578742e6a732d31362d626c61636b3f7374796c653d666f722d7468652d6261646765266c6f676f3d6e6578742e6a73266c6f676f436f6c6f723d7768697465" alt="Next.js 16"&gt;&lt;/a&gt;
  &lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/8bfd929409d63aabfc3eabd444dc85ac04ce2e29da7c6b62992d2f0757d992aa/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f466173744150492d302e3131352d3030393638383f7374796c653d666f722d7468652d6261646765266c6f676f3d66617374617069266c6f676f436f6c6f723d7768697465"&gt;&lt;img src="https://camo.githubusercontent.com/8bfd929409d63aabfc3eabd444dc85ac04ce2e29da7c6b62992d2f0757d992aa/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f466173744150492d302e3131352d3030393638383f7374796c653d666f722d7468652d6261646765266c6f676f3d66617374617069266c6f676f436f6c6f723d7768697465" alt="FastAPI"&gt;&lt;/a&gt;
  &lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/153acf9dff19deb8abfc598c53bac50a4ceae0f5c83a552711060d3d78d2c057/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d677265656e3f7374796c653d666f722d7468652d6261646765"&gt;&lt;img src="https://camo.githubusercontent.com/153acf9dff19deb8abfc598c53bac50a4ceae0f5c83a552711060d3d78d2c057/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d677265656e3f7374796c653d666f722d7468652d6261646765" alt="MIT License"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;⚡ LocalForge&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;
  &lt;strong&gt;Self-Hosted AI Control Plane for Intelligent Local LLM Orchestration&lt;/strong&gt;
&lt;/p&gt;

&lt;p&gt;
  A production-grade platform for running, routing, benchmarking, and finetuning local LLMs.&lt;br&gt;
  Drop-in OpenAI-compatible API · Intelligent multi-model routing · LoRA finetuning with live monitoring
&lt;/p&gt;




&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Overview&lt;/h2&gt;
&lt;/div&gt;

&lt;p&gt;LocalForge is a self-hosted AI control plane that transforms your GPU workstation into an intelligent LLM serving infrastructure. Instead of manually managing model files, writing inference scripts, and guessing which model fits which task — LocalForge automates the entire lifecycle:&lt;/p&gt;


&lt;ol&gt;

&lt;li&gt;

&lt;strong&gt;Browse &amp;amp; Download&lt;/strong&gt; GGUF models from HuggingFace with automatic VRAM compatibility filtering&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Serve&lt;/strong&gt; models via a fully OpenAI-compatible &lt;code&gt;/v1/chat/completions&lt;/code&gt; endpoint&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Route&lt;/strong&gt; queries to the optimal model using ML-powered task classification + multi-signal scoring&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Learn&lt;/strong&gt; from usage patterns via a vector-based memory layer that improves routing over time&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Benchmark&lt;/strong&gt; models against standard evaluations (MMLU-Pro, HumanEval, GSM8K, GPQA, MT-Bench)&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Finetune&lt;/strong&gt; models with LoRA/QLoRA via a managed subprocess pipeline with live loss streaming&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Augment&lt;/strong&gt; responses with a…&lt;/li&gt;

&lt;/ol&gt;
&lt;/div&gt;
&lt;br&gt;
  &lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/al1-nasir/LocalForge" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;


&lt;p&gt;Built by Ali Nasir — &lt;a href="https://alinasir.me" rel="noopener noreferrer"&gt;alinasir.me&lt;/a&gt; · &lt;a href="https://www.linkedin.com/in/al1-nasir/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would love feedback on the routing architecture in particular!&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>I Built an AI System That Makes 4 Agents Debate Scientific Papers , And Then Tells You Where They Disagree</title>
      <dc:creator>Muhammad Ali Nasir</dc:creator>
      <pubDate>Tue, 10 Mar 2026 16:38:34 +0000</pubDate>
      <link>https://dev.to/al1nasir/i-built-an-ai-system-that-makes-4-agents-debate-scientific-papers-and-then-tells-you-where-they-102n</link>
      <guid>https://dev.to/al1nasir/i-built-an-ai-system-that-makes-4-agents-debate-scientific-papers-and-then-tells-you-where-they-102n</guid>
      <description>&lt;p&gt;&lt;em&gt;How GraphRAG + a multi-LLM council produces more trustworthy answers than any single AI model&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnw7flkllbuqa9yb5a87f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnw7flkllbuqa9yb5a87f.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is a quiet crisis in AI-assisted research that nobody talks about.&lt;/p&gt;

&lt;p&gt;Every tool you've used — ChatGPT, Perplexity, Copilot, Elicit — does the same thing: it reads papers and gives you &lt;strong&gt;one confident answer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfgjqqpd2nuyicicmmhz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfgjqqpd2nuyicicmmhz.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The problem is that science doesn't work that way.&lt;/p&gt;

&lt;p&gt;Take BRCA1's role in triple-negative breast cancer. Ask any AI tool and you'll get a confident, well-written paragraph. What you won't get is this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Study A says BRCA1 mutations are associated with &lt;strong&gt;increased aggressiveness&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Study B says the same patients show &lt;strong&gt;better response rates to chemotherapy&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Study C shows &lt;strong&gt;shorter progression-free survival&lt;/strong&gt; despite better initial response&lt;/li&gt;
&lt;li&gt;Three studies report conflicting BRCA1 expression levels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't edge cases. These are the real contradictions buried across 200 papers that a researcher needs to know before designing an experiment, filing an IND, or trusting a conclusion.&lt;/p&gt;

&lt;p&gt;A single AI model smooths these over. It synthesizes them into a confident answer. And in doing so, it hides exactly the information a scientist needs most.&lt;/p&gt;

&lt;p&gt;This is why I built &lt;strong&gt;Research Council&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Idea: Deliberation Over Confidence
&lt;/h2&gt;

&lt;p&gt;The insight that drove this project came from an unlikely place: Andrej Karpathy's &lt;a href="https://github.com/karpathy/llm-council" rel="noopener noreferrer"&gt;llm-council&lt;/a&gt; repo — a simple Saturday hack that instead of asking one LLM a question, routes it to multiple LLMs and has them review each other's answers.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;cross-model review catches things a single model misses.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I wanted to take this further. What if instead of generic LLMs reviewing each other, you had &lt;em&gt;specialized agents&lt;/em&gt; — each trained to look at the evidence from a fundamentally different angle — deliberating over a &lt;em&gt;structured knowledge graph&lt;/em&gt; of papers?&lt;/p&gt;

&lt;p&gt;That's Research Council.&lt;/p&gt;




&lt;h2&gt;
  
  
  What It Actually Does
&lt;/h2&gt;

&lt;p&gt;When you ask Research Council a research question, here's what happens:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: The Knowledge Graph
&lt;/h3&gt;

&lt;p&gt;Before you ask anything, papers are ingested from PubMed, arXiv, Semantic Scholar, or uploaded as PDFs. Research Council doesn't just chunk them into vectors. It builds a &lt;strong&gt;Neo4j knowledge graph&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nodes&lt;/strong&gt;: Paper, Gene, Drug, Disease, Protein, Pathway, Author, Conclusion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relationships&lt;/strong&gt;: CONTRADICTS, SUPPORTS, CITES, MENTIONS, STUDIES, TARGETS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the critical difference from standard RAG. Traditional RAG asks: &lt;em&gt;"which chunks are semantically similar to my query?"&lt;/em&gt; GraphRAG asks: &lt;em&gt;"what is the structural relationship between the entities in my query?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When you ask about BRCA1 and TNBC, the graph doesn't just return the most similar text chunks — it returns the &lt;strong&gt;neighborhood of BRCA1&lt;/strong&gt;: every paper that mentions it, every drug that targets related proteins, and critically, every paper that &lt;strong&gt;contradicts&lt;/strong&gt; another paper on the topic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Token-Efficient Context Assembly
&lt;/h3&gt;

&lt;p&gt;Here's a technical detail that matters a lot at scale.&lt;/p&gt;

&lt;p&gt;Naive multi-agent systems load every available tool into the prompt upfront. With 50+ tools, that's 25,000+ tokens before the agent does anything useful.&lt;/p&gt;

&lt;p&gt;Research Council uses &lt;a href="https://github.com/langchain-ai/langgraph-bigtool" rel="noopener noreferrer"&gt;langgraph-bigtool&lt;/a&gt;: tools are embedded with SentenceTransformers at startup and retrieved semantically at query time. Only 2-4 relevant tools are loaded per query.&lt;/p&gt;

&lt;p&gt;The result: a full 4-agent deliberation on a complex biomedical question uses &lt;strong&gt;3,118 tokens total&lt;/strong&gt;. About $0.002.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: The Council Deliberates
&lt;/h3&gt;

&lt;p&gt;Four specialized agents receive the same knowledge graph subgraph and analyze it in parallel via &lt;code&gt;asyncio.gather()&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔬 Evidence Agent&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"What does the data actually show? Be precise about sample sizes, study types, and effect sizes. Never speculate beyond what the data shows."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;⚔️ Skeptic Agent&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Find the weaknesses: biased study designs, underpowered samples, conflicting results, publication bias. Be constructively critical."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;🔗 Connector Agent&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Find non-obvious links — drug repurposing opportunities, analogous mechanisms from other diseases, techniques from adjacent fields."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;📋 Methodology Agent&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Evaluate whether experimental designs are appropriate, controls are adequate, statistical methods are sound, and whether conclusions are justified by the methods used."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each agent produces an independent response. Then the real work begins.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: 12 Cross-Reviews
&lt;/h3&gt;

&lt;p&gt;Every agent reviews every other agent's response — anonymized, to prevent model bias. That's 4 × 3 = 12 peer evaluations.&lt;/p&gt;

&lt;p&gt;Each review produces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An agreement score (0.0 to 1.0)&lt;/li&gt;
&lt;li&gt;Specific points of disagreement&lt;/li&gt;
&lt;li&gt;Constructive critique&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;aggregate agreement score&lt;/strong&gt; becomes a signal for confidence. High agreement → higher confidence. Persistent disagreement → lower confidence, and the Chairman must explain &lt;em&gt;why&lt;/em&gt; the agents disagreed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: The Chairman Synthesizes
&lt;/h3&gt;

&lt;p&gt;A Chairman agent (running on OpenRouter with the best available model) receives all four original responses plus all twelve cross-reviews. It produces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.65&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"key_findings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"contradictions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"citations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"claim"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"paper_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PMID:..."&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"methodology_notes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agent_agreement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.80&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the confidence score is &lt;strong&gt;0.65, not 0.95&lt;/strong&gt;. That's intentional. The system doesn't inflate confidence. If the evidence is contested, the score reflects that.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: The Write-Back Loop
&lt;/h3&gt;

&lt;p&gt;Every conclusion the Chairman produces gets written back to Neo4j as a new &lt;code&gt;Conclusion&lt;/code&gt; node — linked to every paper it references. The graph compounds over time. Each query makes future queries more informed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;Here's the actual output on &lt;em&gt;"Are there contradictions in BRCA1's role in triple-negative breast cancer?"&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Confidence: 65%&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not 95%. Not "based on multiple sources." A calibrated 65% because the agents genuinely disagreed on two points and the methodology agent flagged three studies as underpowered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4 Contradictions surfaced:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BRCA1 mutations associated with both increased tumor aggressiveness AND better prognosis&lt;/li&gt;
&lt;li&gt;Higher treatment response rates but shorter progression-free survival&lt;/li&gt;
&lt;li&gt;Conflicting reports on BRCA1 expression levels across studies&lt;/li&gt;
&lt;li&gt;Variable associations between BRCA1 mutations and TNBC significance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;6 Key findings&lt;/strong&gt;, each cited to a specific PubMed ID.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8 Methodology concerns&lt;/strong&gt; — variable TNBC definitions, selection bias, small sample sizes, retrospective designs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent agreement: 80%&lt;/strong&gt; — two agents disagreed on whether the survival paradox was explained by tumor heterogeneity or methodological inconsistency.&lt;/p&gt;

&lt;p&gt;Compare this to what ChatGPT gives you: a confident, well-written paragraph that mentions none of the contradictions.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Researcher Query
      │
      ▼
GraphRAG Layer
  Neo4j: entities + relationships
  ChromaDB: vector embeddings (CPU-only, MiniLM)
  Hybrid retrieval: ~2,000 token context
      │
      ▼
LangGraph Orchestrator
  BigTool: 2-4 tools loaded dynamically
  Hybrid retrieval node
  Context assembly node
      │
      ▼
LLM Council (Groq + OpenRouter)
  Stage 1: 4 parallel agents
  Stage 2: 12 cross-reviews
  Stage 3: Chairman synthesis
      │
      ▼
Answer + Neo4j Writeback
  Confidence · Citations · Contradictions · Provenance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Full stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LangGraph + langgraph-bigtool (orchestration)&lt;/li&gt;
&lt;li&gt;Neo4j 5 Community (knowledge graph, local)&lt;/li&gt;
&lt;li&gt;ChromaDB (vector store, local, CPU)&lt;/li&gt;
&lt;li&gt;all-MiniLM-L6-v2 (embeddings, 80MB, CPU-only)&lt;/li&gt;
&lt;li&gt;Groq llama-3.3-70b + llama-3.1-8b (council agents, fast)&lt;/li&gt;
&lt;li&gt;OpenRouter claude-sonnet (Chairman, best synthesis quality)&lt;/li&gt;
&lt;li&gt;FastAPI + React + Vite&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hardware requirements:&lt;/strong&gt; 16GB RAM, 4GB VRAM. No beefy GPU needed. Embeddings run entirely on CPU.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned Building This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The write-back loop is the most underappreciated part&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most RAG systems are stateless. Query in, answer out. Research Council writes every conclusion back to the graph as a new node linked to source papers. After 50 queries, the graph has 50 validated conclusions that inform future answers. This is the difference between a tool and a system that compounds knowledge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Confidence calibration is harder than it sounds&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Getting agents to express &lt;em&gt;genuine&lt;/em&gt; uncertainty rather than inflated confidence required careful prompt engineering. The current approach — deriving confidence from agent agreement scores — works but isn't theoretically principled. There's real research to be done here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. 12 cross-reviews might be overkill&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;n × (n-1) cross-reviews scales quadratically. With 4 agents that's 12, manageable. With 8 agents that's 56 — too slow. A smarter aggregation strategy (maybe pairwise disagreement sampling) would make larger councils viable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The skeptic agent is the most valuable&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After dozens of test queries, the Skeptic Agent consistently surfaces the most useful information — not because skepticism is inherently valuable, but because existing AI tools have a strong bias toward presenting positive findings. The explicit adversarial role corrects for this.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Research Council V1 is live and open source (MIT licensed).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;V2 plans:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Community detection on the knowledge graph (Louvain clustering)&lt;/li&gt;
&lt;li&gt;Temporal analysis — track how understanding of a topic evolves year by year&lt;/li&gt;
&lt;li&gt;MCP server integrations — Zotero, PubMed MCP, Neo4j MCP&lt;/li&gt;
&lt;li&gt;HuggingFace Space for live demos&lt;/li&gt;
&lt;li&gt;Export council output as formatted PDF for lab notes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What I'd love help with:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better confidence calibration methodology&lt;/li&gt;
&lt;li&gt;Async optimization of the cross-review loop&lt;/li&gt;
&lt;li&gt;Additional paper sources (bioRxiv, ChemRxiv, Europe PMC)&lt;/li&gt;
&lt;li&gt;Domain-specific agent specializations beyond biomedicine&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/al1-nasir/Research_council" rel="noopener noreferrer"&gt;https://github.com/al1-nasir/Research_council&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIT licensed. Runs on a laptop. Groq has a free tier. OpenRouter costs fractions of a cent per query.&lt;/p&gt;

&lt;p&gt;If you work in research, drug discovery, or AI for science — I'd genuinely love to know what question you'd throw at it first.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by Muhammad Ali Nasir. Inspired by karpathy/llm-council, extended with domain-specific GraphRAG and biomedical agent specialization.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; Artificial Intelligence · Machine Learning · Biomedical Research · GraphRAG · Open Source · LLM · Python · Drug Discovery · Research Tools&lt;/p&gt;

</description>
      <category>graphrag</category>
      <category>llm</category>
      <category>multiagent</category>
    </item>
  </channel>
</rss>
