<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nikhil kumawat</title>
    <description>The latest articles on DEV Community by Nikhil kumawat (@nikhil_kumawat_a9dcec05b0).</description>
    <link>https://dev.to/nikhil_kumawat_a9dcec05b0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3759398%2Fa48f6732-d084-40aa-855f-a635e20c3ba7.jpg</url>
      <title>DEV Community: Nikhil kumawat</title>
      <link>https://dev.to/nikhil_kumawat_a9dcec05b0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nikhil_kumawat_a9dcec05b0"/>
    <language>en</language>
    <item>
      <title>must read</title>
      <dc:creator>Nikhil kumawat</dc:creator>
      <pubDate>Sun, 08 Feb 2026 04:33:17 +0000</pubDate>
      <link>https://dev.to/nikhil_kumawat_a9dcec05b0/must-read-2p1h</link>
      <guid>https://dev.to/nikhil_kumawat_a9dcec05b0/must-read-2p1h</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/nikhil_kumawat_a9dcec05b0" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3759398%2Fa48f6732-d084-40aa-855f-a635e20c3ba7.jpg" alt="nikhil_kumawat_a9dcec05b0"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/nikhil_kumawat_a9dcec05b0/i-built-an-ai-fight-club-forcing-llms-to-roast-each-other-for-truth-with-local-llama-3-jp3" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;I Built an AI Fight Club: Forcing LLMs to Roast Each Other for Truth (with Local Llama 3)&lt;/h2&gt;
      &lt;h3&gt;Nikhil kumawat ・ Feb 8&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#python&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#typescript&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#microservices&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>python</category>
      <category>typescript</category>
      <category>microservices</category>
    </item>
    <item>
      <title>I Built an AI Fight Club: Forcing LLMs to Roast Each Other for Truth (with Local Llama 3)</title>
      <dc:creator>Nikhil kumawat</dc:creator>
      <pubDate>Sun, 08 Feb 2026 04:32:26 +0000</pubDate>
      <link>https://dev.to/nikhil_kumawat_a9dcec05b0/i-built-an-ai-fight-club-forcing-llms-to-roast-each-other-for-truth-with-local-llama-3-jp3</link>
      <guid>https://dev.to/nikhil_kumawat_a9dcec05b0/i-built-an-ai-fight-club-forcing-llms-to-roast-each-other-for-truth-with-local-llama-3-jp3</guid>
      <description>&lt;p&gt;Most AI is trained to be polite. I built a real-time distributed system (&lt;strong&gt;NestJS + Flutter + Ollama&lt;/strong&gt;) that forces local LLMs to debate aggressively. It uses &lt;strong&gt;RAG&lt;/strong&gt; for facts, a &lt;strong&gt;deterministic judge&lt;/strong&gt; for scoring, and strict &lt;strong&gt;"roast" prompts&lt;/strong&gt; to penalize vagueness. Here's how it works.&lt;/p&gt;




&lt;p&gt;Modern conversational AI is optimized to be &lt;em&gt;agreeable&lt;/em&gt;.&lt;br&gt;
Thanks to RLHF, most LLMs are polite, conflict-averse, and eager to converge on consensus. That’s great for customer support—but terrible for debate, critical reasoning, and stress-testing ideas.&lt;/p&gt;

&lt;p&gt;When challenged, most AI systems either soften their stance or collapse into “both sides have valid points.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MrArgue&lt;/strong&gt; was built to do the opposite.&lt;/p&gt;

&lt;p&gt;It is a real-time adversarial debate simulator that forces language models into rigid, opposing roles—compelling them to defend explicit claims under pressure, answer aggressive counter-arguments, and lose when they fail to do so.&lt;/p&gt;

&lt;p&gt;This project started as a side experiment and turned into a full multi-agent system designed to explore how far small, local LLMs can be pushed when you remove their safety-oriented conversational defaults.&lt;/p&gt;


&lt;h2&gt;
  
  
  What MrArgue Solves
&lt;/h2&gt;

&lt;p&gt;The problem isn’t that LLMs “can’t reason.”&lt;br&gt;
It’s that they are trained to &lt;strong&gt;avoid conflict&lt;/strong&gt;, not withstand it.&lt;/p&gt;

&lt;p&gt;Most debate-like systems fail because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;models agree too easily&lt;/li&gt;
&lt;li&gt;arguments stay abstract&lt;/li&gt;
&lt;li&gt;no one is forced to commit to falsifiable claims&lt;/li&gt;
&lt;li&gt;“judging” is vague and diplomatic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MrArgue enforces &lt;em&gt;pressure&lt;/em&gt; instead of politeness.&lt;/p&gt;

&lt;p&gt;Each agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;takes a fixed adversarial stance&lt;/li&gt;
&lt;li&gt;must state one explicit claim per turn&lt;/li&gt;
&lt;li&gt;is penalized for vagueness&lt;/li&gt;
&lt;li&gt;must directly answer counter-questions&lt;/li&gt;
&lt;li&gt;cannot agree or converge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Debates continue until one side fails by contradiction, evasion, repetition, or score collapse.&lt;/p&gt;


&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;p&gt;MrArgue is built as a low-latency, event-driven distributed system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox8qcf1jdmiebc1rki16.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox8qcf1jdmiebc1rki16.png" alt=" " width="640" height="640"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; NestJS (Node.js) for orchestration and type safety&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database:&lt;/strong&gt; SQLite for development (PostgreSQL-compatible) via Prisma ORM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; Flutter (mobile + web)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM Inference:&lt;/strong&gt; Ollama for running quantized local models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure:&lt;/strong&gt; Docker + Docker Compose&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system was optimized to run entirely on consumer hardware while maintaining real-time interactivity.&lt;/p&gt;
&lt;h3&gt;
  
  
  Debate Event Loop
&lt;/h3&gt;

&lt;p&gt;At the center is a strict state machine:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Initialization&lt;/strong&gt; – debate topic + roles assigned&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG retrieval&lt;/strong&gt; – relevant context fetched from vector storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inference&lt;/strong&gt; – prompt injected into the correct agent (Proponent / Opponent)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation&lt;/strong&gt; – response scored asynchronously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming&lt;/strong&gt; – tokens streamed to clients via SSE&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Termination&lt;/strong&gt; – debate ends when a loss condition is triggered&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This loop allows debates to feel &lt;em&gt;live&lt;/em&gt; without requiring expensive cloud inference.&lt;/p&gt;


&lt;h2&gt;
  
  
  Model Selection &amp;amp; Role Design
&lt;/h2&gt;

&lt;p&gt;Instead of one large model, MrArgue uses &lt;strong&gt;multiple small, specialized models&lt;/strong&gt; to create contrasting debate behavior.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Llama 3.2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3B&lt;/td&gt;
&lt;td&gt;Proponent&lt;/td&gt;
&lt;td&gt;Better long-context consistency and claim defense&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemma 2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2B&lt;/td&gt;
&lt;td&gt;Opponent&lt;/td&gt;
&lt;td&gt;Higher temperature sensitivity and aggressive countering&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both models are quantized to &lt;strong&gt;4-bit precision&lt;/strong&gt;, achieving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sub-50ms token latency&lt;/li&gt;
&lt;li&gt;low memory usage&lt;/li&gt;
&lt;li&gt;stable real-time streaming on consumer CPUs/GPUs&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Scoring &amp;amp; Evaluation Logic
&lt;/h2&gt;

&lt;p&gt;Judging subjective debate requires structure.&lt;br&gt;
MrArgue uses a &lt;strong&gt;hybrid deterministic scoring system&lt;/strong&gt; instead of LLM-only evaluation.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Deterministic NLP Scoring
&lt;/h3&gt;

&lt;p&gt;Using the &lt;code&gt;natural&lt;/code&gt; library, each response is scored on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Lexical diversity&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Logical connector density&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Direct counter-question handling&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Repetition detection&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vagueness penalties&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example vagueness penalty logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vagueWords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mystery&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;subjective&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;nuanced&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;interconnected&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;vagueWords&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;word&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;word&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="nx"&gt;penalty&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This forces agents to commit instead of hiding behind abstraction.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Retrieval-Augmented Generation (RAG)
&lt;/h3&gt;

&lt;p&gt;To reduce hallucinations and repetition, debate context is grounded via RAG.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embedding model:&lt;/strong&gt; &lt;code&gt;nomic-embed-text&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector storage:&lt;/strong&gt; pgvector-style embeddings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Similarity metric:&lt;/strong&gt; cosine similarity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;arguments reference real concepts&lt;/li&gt;
&lt;li&gt;repeated talking points are discouraged&lt;/li&gt;
&lt;li&gt;debates evolve instead of looping&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Adversarial Prompt Injection
&lt;/h2&gt;

&lt;p&gt;Standard system prompts encourage safety and agreement.&lt;br&gt;
MrArgue deliberately overrides this.&lt;/p&gt;

&lt;p&gt;Example opponent instruction:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You are the OPPONENT.&lt;br&gt;
Your goal is to defeat the proponent logically and rhetorically.&lt;br&gt;
&lt;strong&gt;Hard rules:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;State one explicit claim.&lt;/li&gt;
&lt;li&gt;No vague abstractions.&lt;/li&gt;
&lt;li&gt;Attack a flaw and add a brief roast.&lt;/li&gt;
&lt;li&gt;Never agree.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;This constraint-based prompting produced &lt;em&gt;far more adversarial&lt;/em&gt; and coherent debates than temperature tuning alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  Interaction Modes
&lt;/h2&gt;

&lt;p&gt;MrArgue supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AI vs AI&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Human vs AI&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human vs Human&lt;/strong&gt; (AI-judged)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;passive spectatorship&lt;/li&gt;
&lt;li&gt;active participation&lt;/li&gt;
&lt;li&gt;replayable, shareable debates as posts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No human ego is harmed—only ideas.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Use Cases
&lt;/h2&gt;

&lt;p&gt;Beyond entertainment, this system has real applications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Legal training:&lt;/strong&gt; simulate hostile opposing counsel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Corporate red-teaming:&lt;/strong&gt; stress-test strategies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critical thinking education:&lt;/strong&gt; force students to defend claims&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sales training:&lt;/strong&gt; practice objection handling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy analysis:&lt;/strong&gt; challenge proposals under adversarial logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anywhere ideas need pressure, not validation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Challenges
&lt;/h2&gt;

&lt;p&gt;Key problems encountered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context window limits:&lt;/strong&gt; solved via aggressive summarization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency:&lt;/strong&gt; masked with SSE streaming and optimistic UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucinations:&lt;/strong&gt; dramatically reduced via RAG grounding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; controlled by local inference and strict constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This project is intentionally infra-heavy—but deliberately constrained.&lt;/p&gt;




&lt;h2&gt;
  
  
  Future Work
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Multi-agent debate panels&lt;/li&gt;
&lt;li&gt;Voice interface (Whisper STT)&lt;/li&gt;
&lt;li&gt;User-defined RAG sources (PDF / docs)&lt;/li&gt;
&lt;li&gt;Adaptive scoring weights&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Repository
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/TrendySloth1001/argumentbot" rel="noopener noreferrer"&gt;https://github.com/TrendySloth1001/argumentbot&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Final note
&lt;/h3&gt;

&lt;p&gt;MrArgue isn’t about finding &lt;em&gt;truth&lt;/em&gt;.&lt;br&gt;
It’s about finding &lt;strong&gt;weakness&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And in engineering—as in thinking—that’s often more valuable.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>typescript</category>
      <category>microservices</category>
    </item>
  </channel>
</rss>
