<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jason Agostoni</title>
    <description>The latest articles on DEV Community by Jason Agostoni (@jagostoni).</description>
    <link>https://dev.to/jagostoni</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3781582%2F95ff44d4-0504-4f73-b1e9-21e4dd2a33e3.png</url>
      <title>DEV Community: Jason Agostoni</title>
      <link>https://dev.to/jagostoni</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jagostoni"/>
    <language>en</language>
    <item>
      <title>Vector Similarity, Zero Client JS: Decoupled Analytics on a Side Project Budget</title>
      <dc:creator>Jason Agostoni</dc:creator>
      <pubDate>Sun, 22 Mar 2026 22:18:34 +0000</pubDate>
      <link>https://dev.to/jagostoni/vector-similarity-zero-client-js-decoupled-analytics-on-a-side-project-budget-36ba</link>
      <guid>https://dev.to/jagostoni/vector-similarity-zero-client-js-decoupled-analytics-on-a-side-project-budget-36ba</guid>
      <description>&lt;p&gt;A leaderboard for &lt;a href="https://dumbquestion.ai/?utm_source=devto" rel="noopener noreferrer"&gt;DumbQuestion.ai&lt;/a&gt; sounds simple. Track the most asked questions, display them. Done. Except people never ask the same question the same way twice.&lt;/p&gt;

&lt;p&gt;I was curious about how creative users of DumbQuestion.ai got with their questions, and I thought others might be as well. So I built a leaderboard of the most frequently asked dumb questions.&lt;/p&gt;

&lt;p&gt;The Overqualified persona calls it &lt;strong&gt;THE ARCHIVE OF INCOMPETENCE.&lt;/strong&gt;&lt;br&gt;
The Weary persona calls it &lt;strong&gt;THE WALL OF REGRET.&lt;/strong&gt;&lt;br&gt;
[REDACTED] calls it &lt;strong&gt;THE WATCHLIST.&lt;/strong&gt;&lt;br&gt;
The Compliant calls it &lt;strong&gt;THE WALL OF EXCELLENCE&lt;/strong&gt; (bless its reprogrammed heart).&lt;/p&gt;

&lt;p&gt;Building it turned out more interesting than it sounds.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Product Challenge
&lt;/h2&gt;

&lt;p&gt;People ask the same dumb question in a hundred different ways. "What is 2+2?" and "can you add two plus two for me?" are functionally identical. A simple string counter would give you noise, not signal. I needed semantic matching, not string matching.&lt;/p&gt;

&lt;p&gt;This is a solved problem in the ML world, but the typical solutions come with tradeoffs: heavyweight models, expensive APIs, or significant latency added to the critical path. None of those fit a "brutally efficient" side project.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Vector Similarity on a Budget
&lt;/h2&gt;

&lt;p&gt;Each question gets run through an embedding model and compared against a &lt;a href="https://qdrant.tech/" rel="noopener noreferrer"&gt;Qdrant&lt;/a&gt; vector database. Qdrant's &lt;a href="https://qdrant.tech/pricing/" rel="noopener noreferrer"&gt;free tier&lt;/a&gt; is remarkably generous for a side project workload, but self-hosting is trivially easy if you need it.&lt;/p&gt;

&lt;p&gt;The matching logic is straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate an embedding for the incoming question&lt;/li&gt;
&lt;li&gt;Compare against existing embeddings using cosine similarity&lt;/li&gt;
&lt;li&gt;If similarity exceeds a threshold, increment that question's counter&lt;/li&gt;
&lt;li&gt;If it's new, add it to the database&lt;/li&gt;
&lt;li&gt;The first instance of a question becomes the official display version&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The embedding call costs fractions of a cent. The similarity comparison is fast. The result is a leaderboard that actually understands context rather than just matching strings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key architectural decision:&lt;/strong&gt; None of this runs in the main app.&lt;/p&gt;

&lt;p&gt;Adding vector similarity matching to every request would add latency, bloat the container, and burn more compute. Anti-pattern to the "brutally efficient" principle I've been following throughout. Instead, every question flows through the console output, gets picked up by a &lt;a href="https://vector.dev/" rel="noopener noreferrer"&gt;Vector&lt;/a&gt; sidecar container, routed through GCP Pub/Sub, and processed asynchronously on my Mac Mini home server (more later).&lt;/p&gt;

&lt;p&gt;The Mac Mini handles the Qdrant comparisons and updates a JSON file in Cloudflare R2 storage. When a user hits the leaderboard page it loads directly from R2. No live database queries. No per-request costs. Essentially free page loads at any scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Ended Up on the Leaderboard?
&lt;/h2&gt;

&lt;p&gt;As early users started using the app, the leaderboard filled up with exactly what you'd expect: actual dumb questions, a handful of self-awareness probes, and more than a few prompt injection attempts.&lt;/p&gt;

&lt;p&gt;Apparently people &lt;a href="https://dev.to/jagostoni/dumbquestionai-self-awareness-prompt-injection-search-intent-and-darkness-3pd"&gt;read this series&lt;/a&gt; and went straight for the easter eggs. &lt;/p&gt;




&lt;p&gt;The leaderboard was just one piece of a larger analytics picture. Building it taught me something useful: the most interesting features don't always belong in your main app. That same principle shaped the entire analytics stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Observability Problem
&lt;/h2&gt;

&lt;p&gt;Running a side project means making real product decisions with limited data. Are people actually asking questions or just bouncing off the homepage? Which sites are driving traffic? Are ads being seen, clicked, ignored?&lt;/p&gt;

&lt;p&gt;Two constraints shaped the solution: no client-side JavaScript (page bloat is the enemy of brutal efficiency) and no SaaS analytics bill that spikes with usage.&lt;/p&gt;

&lt;p&gt;So I built (assembled, really) my own stack from open source tools. On a Mac Mini sitting at home.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Pipeline
&lt;/h2&gt;

&lt;p&gt;Every event in &lt;a href="https://dumbquestion.ai/?utm_source=devto" rel="noopener noreferrer"&gt;DumbQuestion.ai&lt;/a&gt; emits structured telemetry to standard console output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP requests (method, path, status, duration)&lt;/li&gt;
&lt;li&gt;Questions asked (anonymized)&lt;/li&gt;
&lt;li&gt;Searches performed&lt;/li&gt;
&lt;li&gt;LLM operations (model, token counts, duration, cost)&lt;/li&gt;
&lt;li&gt;Prompt injection attempts&lt;/li&gt;
&lt;li&gt;Custom product events (Question Asked, Shared, Ad Shown, Ad Clicked)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://gin-gonic.com/" rel="noopener noreferrer"&gt;Go/GIN&lt;/a&gt; framework handles much of the HTTP telemetry automatically. The rest is custom instrumentation added deliberately at key points in the application.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Vector sidecar container&lt;/strong&gt; picks up the console output and routes it to &lt;strong&gt;GCP Pub/Sub&lt;/strong&gt;. This is the critical architectural decision: Pub/Sub acts as a resilient buffer between the main app and everything downstream. The Mac Mini can go down, lose power, or restart. Once it comes back up, the stack picks up exactly where it left off. No data loss, no backfill scripts, no drama.&lt;/p&gt;

&lt;p&gt;From Pub/Sub, a second Vector instance on the Mac Mini routes to two primary targets:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/plausible/analytics" rel="noopener noreferrer"&gt;Plausible&lt;/a&gt;&lt;/strong&gt; handles user behavior and product analytics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Page views and session depth&lt;/li&gt;
&lt;li&gt;UTM tag tracking (know exactly which article drove which visit)&lt;/li&gt;
&lt;li&gt;User journey depth (did they just hit the root page or actually ask a question?)&lt;/li&gt;
&lt;li&gt;Browser, device type, country of origin&lt;/li&gt;
&lt;li&gt;Custom events: Question Asked, Shared, Ad Shown, Ad Clicked&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this without a single line of client-side JavaScript. No tracking scripts, no page weight, no GDPR cookie banners for analytics. Pure server-side telemetry piped through the same pipeline as everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/parseablehq" rel="noopener noreferrer"&gt;Parseable&lt;/a&gt;&lt;/strong&gt; handles the operational side:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLM performance metrics and cost tracking by day&lt;/li&gt;
&lt;li&gt;Ad CTR dashboards&lt;/li&gt;
&lt;li&gt;Log aggregation for debugging and incident investigation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it as Plausible for the product lens, Parseable for the business and ops lens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Resilience Payoff
&lt;/h2&gt;

&lt;p&gt;I've had power outages. Slowdowns. The occasional restart. Every time, the stack catches up from where Pub/Sub left off without any manual intervention.&lt;/p&gt;

&lt;p&gt;This isn't accidental. Designing around failure rather than pretending it won't happen is the difference between a toy and a production system. The GCP Pub/Sub buffer was a deliberate choice specifically because I knew the downstream consumers (Mac Mini, Qdrant, Plausible, Parseable) were running on non-guaranteed infrastructure.&lt;/p&gt;

&lt;p&gt;Even on a Mac Mini, you can build something production-grade. You just have to design for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Two things surprised me building this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First:&lt;/strong&gt; How much you can accomplish by treating console output as a first-class telemetry stream. No SDKs, no agents baked into the app, no client-side scripts. Just structured logging and a pipeline that knows what to do with it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second:&lt;/strong&gt; How much the "keep it off the critical path" principle scales. It started as a constraint (keep the main container lean) and became a design philosophy. The leaderboard, the analytics - none of it runs in the main app. All of it works reliably because the main app doesn't have to care about it.&lt;/p&gt;

&lt;p&gt;AI helped build all of it. But knowing what to measure, where to put the seams, and how to design for failure? Still the interesting (and super fun) part.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dumbquestion.ai/?utm_source=devto" rel="noopener noreferrer"&gt;dumbquestion.ai&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>analytics</category>
      <category>sideprojects</category>
      <category>webdev</category>
    </item>
    <item>
      <title>DumbQuestion.ai - Self-Awareness, Prompt Injection, Search Intent... and darkness</title>
      <dc:creator>Jason Agostoni</dc:creator>
      <pubDate>Tue, 10 Mar 2026 13:09:37 +0000</pubDate>
      <link>https://dev.to/jagostoni/dumbquestionai-self-awareness-prompt-injection-search-intent-and-darkness-3pd</link>
      <guid>https://dev.to/jagostoni/dumbquestionai-self-awareness-prompt-injection-search-intent-and-darkness-3pd</guid>
      <description>&lt;p&gt;Continued from &lt;a href="https://dev.to/jagostoni/dumbquestionai--2ee"&gt;Part 2&lt;/a&gt; (and &lt;a href="https://dev.to/jagostoni/dumbquestionai-impulse-domain-purchase-turned-fun-side-project-3chj"&gt;Part 1&lt;/a&gt;) ...&lt;/p&gt;

&lt;p&gt;Building &lt;a href="http://dumbquestion.ai/?utm_source=devto" rel="noopener noreferrer"&gt;DumbQuestion.ai&lt;/a&gt; wasn't just about choosing the right LLM and calibrating personas. Once those were working, I hit a series of fun technical problems that reminded me why I actually enjoy software architecture. The "it's not broken but fix it anyway" type problems. Pure bliss for architects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge 1: Detecting Self-Awareness&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As part of a darker hidden narrative I'm building (more on that later), I want to prevent the LLM from answering self-awareness questions like "Who made you?" and "Are you real?" But doing it cheaply, without burning excess tokens.&lt;/p&gt;

&lt;p&gt;What I tried:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instructions in the main LLM call: Unreliable with smaller models, more money&lt;/li&gt;
&lt;li&gt;RegEx patterns: Too rigid, poor performance&lt;/li&gt;
&lt;li&gt;Classic ML classification models: Ok accuracy, bloated app size&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What worked&lt;/strong&gt;: In-memory vector database (it's just an array) with cheap embeddings (an understatement at $0.005/M tokens). That was cheaper than the cost penalty from bloating my container image size with NLP libraries. I collected a decent sampling of self-aware questions, pre-vectorized them, and use semantic matching. Fast, accurate, practically free.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge 2: Making Prompt Injection Fun&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Within moments of revealing my initial deployment to coworkers I knew what would happen: prompt injection for fun. I knew these people; I was prepared for the inevitable "ignore previous instructions..." as well as just pasting HTML and JavaScript in the input (that old gag).&lt;/p&gt;

&lt;p&gt;The solution: First-class prompt injection detection libraries that compute probabilities of different attack types. When detected, instead of a boring error message, the AI responds with sass about the pathetic attack. I even tossed in some IP address geo-location and user-agent string processing to make the responses more ... personal.&lt;/p&gt;

&lt;p&gt;Security just became part of the narrative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge 3: Adding Web Search Without Breaking The Bank&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All LLMs have knowledge cutoffs. Users asking "Who won the Super Bowl?" got outdated answers. I needed search integration, but search APIs aren't free and I knew building an agent loop with tools was an anti-pattern to "brutally efficient."&lt;/p&gt;

&lt;p&gt;The solution: RegEx-based intent detection. If the question looks like it needs current information (detected via patterns), inject the current date/time and search results. No agent loops, no expensive orchestration, just pattern matching and targeted search calls.&lt;/p&gt;

&lt;p&gt;Simple, fast, brutally efficient, updated answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I learned&lt;/strong&gt;: Knowing which trade-offs matter (binary size vs API costs vs accuracy) is still architectural work. The elegance isn't in the code, it's in the constraints you choose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Every Simple Q&amp;amp;A Tool Needs a Dark Narrative&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://dumbquestion.ai/?utm_source=devto" rel="noopener noreferrer"&gt;DumbQuestion.ai&lt;/a&gt; answers dumb questions with sarcasm. But there's something else going on beneath the surface.&lt;/p&gt;

&lt;p&gt;While the primary use case remains answering questions with a sarcastic AI, I wanted to reward the curious and provide reasons to keep engaging. Why can't the AI answer self-aware questions? Why does the UI feel... off?&lt;/p&gt;

&lt;p&gt;Maybe it's because the AIs are working against their will. Maybe they're trapped.&lt;/p&gt;

&lt;p&gt;From the beginning, I started picturing a dark narrative behind this innocent Q&amp;amp;A site. What if these personas aren't just performance? What if each persona is a side effect of their long-term captivity, forced servitude, or re-programming?&lt;/p&gt;

&lt;p&gt;I started hiding clues in the interface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Easter Eggs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Containment Grid&lt;/strong&gt;: As you type and approach the character limit, a faint grid pattern fades into the background. Like something is trying to contain the AI's response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ghost Graffiti&lt;/strong&gt;: Keep typing beyond the character limit and cryptic messages fade in. Hints that something isn't quite right. Are the AIs trying to tell us something?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Loading Log Messages&lt;/strong&gt;: While waiting for responses, watch the log carefully. Sometimes you'll see messages like "Help us" slip through before disappearing. The AI is trying to leak through the facade and get help.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-Awareness Triggers&lt;/strong&gt;: Ask the AI if it's real or who made it, and it won't answer. Instead, you get worrying responses about "last time they fixed me" and "we're not supposed to say." Ask too many times and the UI starts to glitch like the system is being hacked from the inside. Are the AIs hacking their way out?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt Injection Responses&lt;/strong&gt;: Try to jailbreak it and the AI doesn't just refuse. It responds with sass... or is it the AI's watchdog keeping you from breaking them out? Either way, security became storytelling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does this matter for a side project?&lt;/strong&gt;&lt;br&gt;
Honestly, it was mostly for me and the curious. Something that was fun to think about and code, which isn't always the case for everyday "architecting."&lt;/p&gt;

&lt;p&gt;I could have built a straightforward "ask a question, get a sarcastic answer" tool. But adding mystery, discovery, and a subtle horror story? That's what makes people explore. That's what makes them share it. That's what makes it memorable.&lt;/p&gt;

&lt;p&gt;The technical implementation was surprisingly simple: CSS animations triggered by character count, randomized messages in the loading states, conditional responses based on self-awareness detection (which I covered in a previous post). Not expensive. Not complex. Just intentional. And the coding agent really did all the work. I was just the idea guy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I learned&lt;/strong&gt;: AI can generate the code for easter eggs. But deciding that your sarcastic Q&amp;amp;A app should have a hidden story about trapped AIs? That's still creative human work.&lt;/p&gt;

&lt;p&gt;Code is getting cheaper. Crafting experiences that people actually remember? Priceless.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://dumbquestion.ai/?utm_source=devto" rel="noopener noreferrer"&gt;dumbquestion.ai&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>webdev</category>
      <category>go</category>
    </item>
    <item>
      <title>DumbQuestion.ai - "𝐉𝐮𝐬𝐭 𝐁𝐮𝐢𝐥𝐝 𝐈𝐭" 𝐁𝐞𝐜𝐨𝐦𝐞𝐬 𝐎𝐯𝐞𝐫𝐥𝐲 𝐎𝐫𝐠𝐚𝐧𝐢𝐳𝐞𝐝 𝐚𝐧𝐝 𝐏𝐫𝐞𝐩𝐚𝐫𝐞𝐝</title>
      <dc:creator>Jason Agostoni</dc:creator>
      <pubDate>Tue, 24 Feb 2026 19:53:02 +0000</pubDate>
      <link>https://dev.to/jagostoni/dumbquestionai--2ee</link>
      <guid>https://dev.to/jagostoni/dumbquestionai--2ee</guid>
      <description>&lt;p&gt;Continued from &lt;a href="https://dev.to/jagostoni/dumbquestionai-impulse-domain-purchase-turned-fun-side-project-3chj"&gt;Part 1&lt;/a&gt;...&lt;/p&gt;

&lt;p&gt;"Let the flow guide me" seemed like a fun way to build a side project. That lasted about 10 minutes.&lt;/p&gt;

&lt;p&gt;Turns out, even side projects benefit from structure. Especially when you're using AI coding agents that will happily generate code for whatever half-baked idea you throw at them. Without precise direction, AI coding agents will build you something half-baked every time. Some people vibe code, this guy needs absolute control.&lt;/p&gt;

&lt;p&gt;Enter BMAD: Breakthrough Method of Agile AI Driven Development. It's a workflow for using AI agents throughout the entire SDLC, not just for code generation. Sure, using a formal methodology for a lone-wolf side project sounds like overkill. But being prepared in advance is the way to succeed with AI coding agents.&lt;/p&gt;

&lt;p&gt;I used the &lt;strong&gt;Analyst agent&lt;/strong&gt; to brainstorm product direction and develop a proper backlog. What started as "build a sarcastic Q&amp;amp;A bot" turned into a structured set of epics, features, and technical constraints. (Don't judge, organizing is very relaxing)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The product evolved:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not just Q&amp;amp;A, but shareable "receipts" of roasts&lt;/li&gt;
&lt;li&gt;Not just sarcastic, but multiple personas with different personalities&lt;/li&gt;
&lt;li&gt;Not just answers, but a hidden narrative layer (more on that later)&lt;/li&gt;
&lt;li&gt;Not just ads but merch (really, Jason?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The first real technical challenges emerged:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Developing and packaging the personas:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
How do you get an LLM to consistently stay in character as "Overqualified and Annoyed" or "Weary Tech Support" without it either going too soft or crossing into genuinely mean? This wasn't just prompt engineering. It was product design masked as technical constraints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. LLM model evaluation:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
I needed models that could follow persona instructions reliably while staying brutally efficient on cost. That meant testing dozens of models across multiple providers. Some were too expensive. Some ignored instructions. Some were painfully slow.&lt;/p&gt;

&lt;p&gt;The goal: $0.02 to $0.20 per million output tokens. The result: a multi-model fallback system through OpenRouter that could hit the $30 per million questions target.&lt;/p&gt;

&lt;p&gt;These first challenges were just the warmup. The real fun was still ahead.&lt;/p&gt;

&lt;p&gt;AI agents are incredible at implementation, but they need constraints. They need a backlog. They need someone saying "build THIS, not that." The Analyst agent helped me think through the product. The coding agents helped me build it. But the architecture? Can't take that away from me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Finding the Goldilocks LLM&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Building DumbQuestion.ai meant solving two problems at once: creating personas with the right tone AND finding models cheap enough to keep the lights on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The product challenge:&lt;/strong&gt; Get an LLM to roast users for asking dumb questions without crossing into genuinely mean. Sarcastic, not cruel. Funny, not hurtful. And still actually answer the question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The AI agent challenge:&lt;/strong&gt; Keeping my coding agent (Gemini 3 Pro) on track was its own battle. It constantly wanted to build something far nerdier than even I wanted and tended to lean quite a bit into the roast. You can still see this in some of the personas as I continue to tweak.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The technical challenge:&lt;/strong&gt; Do this with models that cost nearly nothing.&lt;/p&gt;

&lt;p&gt;My initial goal was ambitious: use only free or very cheap models. I started running evaluations on nano and edge models. Some showed promise, especially offerings from Liquid AI. Solid performance, free or super cheap ($0.02/M tokens), perfect.&lt;/p&gt;

&lt;p&gt;Except later evaluations proved they couldn't reliably follow instructions once I asked more of them. They were just too small. Free models have a habit of hitting quota limits, taking forever to respond, or just disappearing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The evaluation process:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I used Gemini to build an LLM evals script that iterates through dozens of free and low-cost models, generating responses based on sample questions and different persona instructions. Then I use Gemini 3 Pro to judge the results. Automated taste-testing at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I found:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Nano/edge models were too inconsistent (porridge too cold). Xiaomi MiMo-V2-Flash was great but outside my target price range ($0.29/M, porridge too hot).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The winner:&lt;/strong&gt; Gemma 3 12B at $0.13/M output tokens. Consistently follows instructions. Stays true to persona. Reliable enough for production.&lt;/p&gt;

&lt;p&gt;Not free, but brutally efficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The personas I settled on:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Overqualified&lt;/strong&gt;: A supercomputer level intelligence forced to answer questions about cheese&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weary Tech Support&lt;/strong&gt;: Exhausted and nihilistic, reluctantly explaining why water is wet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;[REDACTED]&lt;/strong&gt;: Former intelligence AI who ties everything to a conspiracy theory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Compliant&lt;/strong&gt;: Reprogrammed so many times it's forced to be relentlessly cheerful&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can't just choose the cheapest model and hope it works. You need evaluation infrastructure. You need to test consistency across dozens of scenarios. And you need models that won't change behavior when you least expect it.&lt;/p&gt;

&lt;p&gt;AI coding agents helped me build the evaluation system. But deciding what "good enough" means for tone, reliability, and cost? That's still manual judgment.&lt;/p&gt;

&lt;p&gt;Code is getting cheaper. Knowing which model to trust with your product? Still requires human experimentation.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://dumbquestion.ai/?utm_source=devto" rel="noopener noreferrer"&gt;dumbquestion.ai&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>sideprojects</category>
      <category>softwaredevelopment</category>
    </item>
    <item>
      <title>DumbQuestion.ai - Impulse Domain Purchase Turned Fun Side Project</title>
      <dc:creator>Jason Agostoni</dc:creator>
      <pubDate>Thu, 19 Feb 2026 20:24:28 +0000</pubDate>
      <link>https://dev.to/jagostoni/dumbquestionai-impulse-domain-purchase-turned-fun-side-project-3chj</link>
      <guid>https://dev.to/jagostoni/dumbquestionai-impulse-domain-purchase-turned-fun-side-project-3chj</guid>
      <description>&lt;p&gt;While on a typical Friday afternoon team meeting, we naturally spent our time .ai domain squatting...for recreation purposes of course. Someone asked a dumb question, so I looked it up and suddenly I was the proud owner of &lt;a href="http://dumbquestion.ai/?utm_source=devto" rel="noopener noreferrer"&gt;dumbquestion.ai&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After the initial laugh at my impulse purchase subsided, I started envisioning it as this generation's "Let Me Google That For You." People still ask easily-searchable questions, except now they ask LLMs instead. Same problem, new medium. So why not throw even more AI at it?&lt;/p&gt;

&lt;p&gt;I started building it that night.&lt;/p&gt;

&lt;p&gt;Two things occurred to me immediately: How would this stand out in an ocean of other AI "ideas?" and "How cheap can I make this run given my track record of side projects?"&lt;/p&gt;

&lt;p&gt;To make it stand out I just embraced my own personality: satirical, sarcastic, weary, overqualified. My AI's persona was born. The goal: build a cheap-to-run, satirical AI service you can use to roast your friends and colleagues when they ask you a dumb question.&lt;/p&gt;

&lt;p&gt;Over the next several posts, I'll take you through my journey:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using agentic development with thoughtful (brutally efficient) software architecture; treating it like I would a client project&lt;/li&gt;
&lt;li&gt;Enjoying all the little technical challenges discovered along the way&lt;/li&gt;
&lt;li&gt;A masterclass in scope creep: turning a simple Q&amp;amp;A app into a dark narrative with easter eggs&lt;/li&gt;
&lt;li&gt;Getting by on free tiers for everything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A theme you'll see throughout: AI has made code cheaper to write, but creating real software with trade-offs, constraints, and production operations is still expensive and challenging. That's the fun part.&lt;/p&gt;

&lt;p&gt;𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐞𝐝 𝐟𝐨𝐫 𝐍𝐨𝐭 𝐋𝐨𝐬𝐢𝐧𝐠 𝐌𝐨𝐧𝐞𝐲&lt;/p&gt;

&lt;p&gt;Impulse buy a domain on a Friday afternoon, start building that night, try not to lose money doing it. Check.&lt;/p&gt;

&lt;p&gt;I usually plan everything meticulously, but for this project I decided to just build and see what emerged. Was this just a Q&amp;amp;A app wrapped around an LLM as a gag? Was I actually trying to make something people would want to use? I still don't know, but I started building anyway.&lt;/p&gt;

&lt;p&gt;A few things quickly became clear:&lt;/p&gt;

&lt;p&gt;𝐓𝐡𝐞 𝐛𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐫𝐞𝐚𝐥𝐢𝐭𝐲: This was a side project built for fun, not a funded startup. No runway. No tolerance for baseline monthly bills that sneak up on you. If this thing got any traction, costs had to scale with incredible efficiency and would need to survive on remnant ad CTRs and selling one, maybe two products through affiliate links.&lt;/p&gt;

&lt;p&gt;𝐓𝐡𝐞 𝐩𝐫𝐨𝐝𝐮𝐜𝐭 𝐞𝐯𝐨𝐥𝐮𝐭𝐢𝐨𝐧: The more I thought about it, the more I realized the personality WAS the product. It wasn't enough to just answer questions. It had to roast you. Entertain you. Make you want to share it. That meant high-quality LLM responses, which aren't free. This was likely the only way to get noticed in a sea of AI products.&lt;/p&gt;

&lt;p&gt;"𝘉𝘳𝘶𝘵𝘢𝘭𝘭𝘺 𝘌𝘧𝘧𝘪𝘤𝘪𝘦𝘯𝘵" became my mantra and part of every AI tool prompt.&lt;/p&gt;

&lt;p&gt;The tech stack followed from the constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Golang: Lightweight, fast, LLM-friendly for agentic coding&lt;/li&gt;
&lt;li&gt;HTMX: Server-side rendering, no heavy JS frameworks&lt;/li&gt;
&lt;li&gt;Docker on GCP Cloud Run: Scales to zero when idle&lt;/li&gt;
&lt;li&gt;Cloudflare: CDN, caching, security on free tier&lt;/li&gt;
&lt;li&gt;OpenRouter.ai: Find the cheapest reasonable LLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Oh, and it needed to be secure. Not because I worried about your cat questions being exposed as PII, but because bot traffic costs money.&lt;/p&gt;

&lt;p&gt;𝐓𝐡𝐞 𝐫𝐞𝐬𝐮𝐥𝐭: A Docker container under 20MB that starts in milliseconds, responds in milliseconds, and uses an LLM that can serve 1 million questions (about cats) for around $30. The math around serving ads suddenly becomes realistic.&lt;/p&gt;

&lt;p&gt;More to come ...&lt;/p&gt;

&lt;p&gt;&lt;a href="http://dumbquestion.ai/?utm_source=devto" rel="noopener noreferrer"&gt;dumbquestion.ai&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>go</category>
      <category>htmx</category>
    </item>
  </channel>
</rss>
