<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dan Gurgui</title>
    <description>The latest articles on DEV Community by Dan Gurgui (@arch4g).</description>
    <link>https://dev.to/arch4g</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3682999%2F713b5538-40fb-4770-8150-52466ebf82bb.png</url>
      <title>DEV Community: Dan Gurgui</title>
      <link>https://dev.to/arch4g</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arch4g"/>
    <language>en</language>
    <item>
      <title>I Deployed Gemma 4 32B on a Rented H100 for $1.50/Hour. The Hard Part Wasn't What I Expected.</title>
      <dc:creator>Dan Gurgui</dc:creator>
      <pubDate>Sun, 05 Apr 2026 17:11:38 +0000</pubDate>
      <link>https://dev.to/arch4g/i-deployed-gemma-4-32b-on-a-rented-h100-for-150hour-the-hard-part-wasnt-what-i-expected-3og9</link>
      <guid>https://dev.to/arch4g/i-deployed-gemma-4-32b-on-a-rented-h100-for-150hour-the-hard-part-wasnt-what-i-expected-3og9</guid>
      <description>&lt;h2&gt;
  
  
  I Deployed Gemma 4 32B on a Rented H100 for $1.50/Hour. The Hard Part Wasn't What I Expected.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The surprising part: H100 access felt almost trivial
&lt;/h2&gt;

&lt;p&gt;This week I experimented with &lt;a href="https://vast.ai" rel="noopener noreferrer"&gt;vast.ai&lt;/a&gt;, a marketplace where you can rent GPU hardware on demand for AI workloads. I walked in expecting friction. Provisioning an NVIDIA H100, deploying a brand-new model, configuring networking — all of it sounded like a weekend project at minimum. Instead, I had a freshly released Gemma 4 32B model running and responding to prompts in about an hour. The cost? Roughly &lt;strong&gt;$1.50 per hour&lt;/strong&gt; for an H100.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I tried vast.ai (and what I needed)
&lt;/h2&gt;

&lt;p&gt;I've been wanting to test self-hosted LLMs for coding assistance. The goal was simple: deploy a capable model on remote hardware, connect to it from my local development environment, and use it as a coding agent through Cline. No API rate limits, no per-token billing that spirals, just a flat hourly rate for raw compute.&lt;/p&gt;

&lt;p&gt;Vast.ai gives you a catalog of available machines from individual GPU providers. You pick an NVIDIA card (anything from consumer RTX series up to H100s), configure storage, CPU cores, and RAM, then spin it up. Like an Airbnb for GPUs. The platform handles the matchmaking; you handle the workload. With the &lt;a href="https://aimatch.pro/stats" rel="noopener noreferrer"&gt;AI tools ecosystem now tracking over 4,000 tools&lt;/a&gt; and growing, self-hosted infrastructure like this is becoming a practical alternative to managed API services, especially when you want full control over your model and data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment walkthrough: Gemma 4 32B in about one hour
&lt;/h2&gt;

&lt;p&gt;Google had just released Gemma 4, and I wanted to test it while it was still fresh. The deployment process on vast.ai was more straightforward than I expected.&lt;/p&gt;

&lt;p&gt;I selected an H100 instance with enough VRAM to fit the 32B parameter model comfortably. The platform lets you filter by GPU type, VRAM, and price, so finding the right machine took a few minutes. Once provisioned, I SSH'd into the instance and set up the serving stack. For a model like Gemma 4 32B, you need a serving framework (vLLM or text-generation-inference work well here) that exposes an OpenAI-compatible API endpoint.&lt;/p&gt;

&lt;p&gt;The model download and loading took the bulk of that hour. Once the server was up, I could hit the endpoint from my local machine. The deployment side of this experiment was the easy part.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost and speed reality check: what $1.50/hour buys
&lt;/h2&gt;

&lt;p&gt;For context, an H100 on AWS (p5 instances) runs roughly $30 to $40 per hour depending on region and commitment. Even spot pricing on major clouds rarely drops below $10/hour. Lambda Labs and RunPod sit somewhere in the $2 to $4/hour range for comparable hardware. At $1.50/hour, vast.ai is at the aggressive end of that spectrum.&lt;/p&gt;

&lt;p&gt;The inference speed I observed was around 20 tokens per second. Not blazing fast, but comparable to what you experience with Claude or other hosted coding agents through tools like Cline. For interactive coding workflows, 20 tokens/sec is workable. You're not waiting 30 seconds for a response. It feels conversational enough.&lt;/p&gt;

&lt;p&gt;The tradeoff is clear: you lose the managed experience and reliability of a first-party API. You gain cost control and model flexibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real challenge: using the remote LLM from my local machine
&lt;/h2&gt;

&lt;p&gt;Everything I described so far went smoothly. The friction started the moment I tried to connect Cline (a VS Code extension for AI-assisted coding) to my remotely deployed model.&lt;/p&gt;

&lt;p&gt;Cline expects an OpenAI-compatible endpoint, which my serving stack provided. But the integration was rough. I hit bugs I didn't anticipate: connection timeouts that weren't timeout issues, malformed request headers, response parsing failures that gave cryptic error messages. Each problem required a different workaround. Some were Cline configuration issues. Others seemed to be edge cases in how Cline handles non-OpenAI endpoints.&lt;/p&gt;

&lt;p&gt;I did manage to get a small feature implemented and a PR submitted. But the ratio of "time debugging the toolchain" to "time actually coding with the model" was painful. For every productive 15 minutes, I spent 15 to 30 minutes troubleshooting the connection layer. Getting Cline to behave was, by far, the hardest part of this entire experiment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure mode postmortem: context overflow killed the machine
&lt;/h2&gt;

&lt;p&gt;The most frustrating failure was a context window overflow. Gemma 4 32B on the H100 has a context window around 32,000 tokens. During a longer coding session, Cline pushed the conversation past that limit, hitting roughly 32,500 tokens. Instead of gracefully truncating or compacting the conversation, the model tried to process the full context.&lt;/p&gt;

&lt;p&gt;That extra 500 tokens was enough to overfill the GPU's VRAM. The process didn't crash cleanly. It hung. The machine became unresponsive, SSH sessions froze, and there was no way to recover. I had to terminate the instance entirely and provision a new one, losing the session state.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The model didn't fail loudly. It failed silently, which is worse.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a real operational risk when you're self-hosting. Managed APIs handle context truncation for you. When you own the stack, you own every failure mode too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons learned: guardrails you'll want from the start
&lt;/h2&gt;

&lt;p&gt;If you're planning a similar setup, a few mitigations would save you hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Budget your context aggressively.&lt;/strong&gt; Set a hard limit at 80% of the model's context window (around 25,600 tokens for a 32K model). Don't let your client tool manage this on its own. Monitor token counts on the server side if possible.&lt;/p&gt;

&lt;p&gt;Break complex coding tasks into smaller requests rather than letting the conversation accumulate. Shorter, focused prompts keep you well within the context budget and reduce the chance of a catastrophic hang.&lt;/p&gt;

&lt;p&gt;Vast.ai supports stopping and restarting instances, so snapshot your instance before long sessions. If you're about to start a heavy run, make sure you can recover without re-provisioning from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm looking for next
&lt;/h2&gt;

&lt;p&gt;The experiment proved the concept. Self-hosted LLMs on rented hardware are viable for coding workflows, and the cost is genuinely competitive. The weak link wasn't the model or the infrastructure. It was the local client tooling.&lt;/p&gt;

&lt;p&gt;I'm actively looking for alternatives to Cline that handle remote OpenAI-compatible endpoints more gracefully, especially around context management and error recovery. If you've had success with other tools (Continue, Aider, or something else entirely), I'd genuinely like to hear about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The infrastructure problem is solved. The developer experience problem is not.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Dan Gurgui&lt;/strong&gt; | A4G&lt;br&gt;
&lt;em&gt;AI Architect&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Weekly Architecture Insights: &lt;a href="https://architectureforgrowth.com/newsletter" rel="noopener noreferrer"&gt;Subscribe&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
    </item>
    <item>
      <title>How I Passed the AWS Generative AI Developer Professional Certification (and Earned the Early Adopter Badge)</title>
      <dc:creator>Dan Gurgui</dc:creator>
      <pubDate>Sun, 18 Jan 2026 09:17:00 +0000</pubDate>
      <link>https://dev.to/arch4g/how-i-passed-the-aws-generative-ai-developer-professional-certification-and-earned-the-early-4kmh</link>
      <guid>https://dev.to/arch4g/how-i-passed-the-aws-generative-ai-developer-professional-certification-and-earned-the-early-4kmh</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time invested:&lt;/strong&gt; ~4 weeks of focused preparation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resources used:&lt;/strong&gt; Frank Kane's Udemy course, Stephane Maarek's AI Practitioner tests, Tutorials Dojo practice exams, AWS documentation, hands-on Bedrock projects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Difficulty level:&lt;/strong&gt; Hardest AWS exam I've taken—questions require 2-3 layers of mental assumptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top 3 tips:&lt;/strong&gt; (1) Take AI Practitioner first for exam structure familiarity, (2) Focus on Bedrock integrations with other AWS services, (3) Budget the full 4 hours and prepare physically for endurance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why I Pivoted from Solutions Architect Professional to GenAI (and What Surprised Me)
&lt;/h2&gt;

&lt;p&gt;I recently got my AWS Generative AI Developer Professional certification, along with the early adopter badge. Here's the thing—this wasn't my original plan at all.&lt;/p&gt;

&lt;p&gt;I'd been preparing for the AWS Solutions Architect Professional for some time, with the exam scheduled somewhere in December. Then AWS launched the Generative AI Developer certification mid-November, and I found out about it at the beginning of December.&lt;/p&gt;

&lt;p&gt;It caught me off guard. AWS putting this much emphasis on Generative AI specifically? That was a signal worth paying attention to. The &lt;a href="https://www.researchandmarkets.com/report/global-generative-ai-in-software-development-market" rel="noopener noreferrer"&gt;generative AI in software development market is experiencing explosive growth&lt;/a&gt;, and AWS clearly wants certified professionals ready to build on their platform.&lt;/p&gt;

&lt;p&gt;I decided to dig into it. With significant AWS experience and training already under my belt, I wanted to understand what this certification actually meant—and whether pivoting made strategic sense.&lt;/p&gt;

&lt;p&gt;Turns out, it did. But the path wasn't straightforward.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Thought the Exam Would Be vs. What It Really Targets
&lt;/h2&gt;

&lt;p&gt;I understood very quickly that this certification is Bedrock and AI heavy. What I wasn't sure about was how deep it went into machine learning territory.&lt;/p&gt;

&lt;p&gt;I tried my best to find information online. No luck. With a brand-new certification, the community hadn't built up the usual knowledge base of "here's what to expect" posts. That's the early adopter tax—you're trading uncertainty for the badge before the market gets flooded.&lt;/p&gt;

&lt;p&gt;What I eventually discovered through preparation and the exam itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bedrock is the core.&lt;/strong&gt; If you don't know Bedrock inside and out, you're not passing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration is everything.&lt;/strong&gt; More than 50% of questions require knowledge of how Bedrock works with other AWS services—Lambda for Agents, OpenSearch for RAG, IAM for permissions, CloudWatch for monitoring, Comprehend for PII detection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SageMaker shows up, but it's not the focus.&lt;/strong&gt; There are ML questions, but they're not as dominant as some practice tests suggest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;This is really a GenAI Architect exam in disguise.&lt;/strong&gt; Despite the "Developer" title, the focus is on integrating services and designing solutions, not just writing Python code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The exam aligns with where the industry is heading. According to &lt;a href="https://www.techtarget.com/searchenterpriseai/feature/The-future-of-generative-AI-Trends-to-follow" rel="noopener noreferrer"&gt;TechTarget's analysis of 2026 generative AI trends&lt;/a&gt;, agentic AI orchestration and plug-and-play LLMs are becoming key focus areas—exactly what Bedrock enables. AWS is making this certification hard because the market value for these skills is projected to skyrocket.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Preparation Path: What I Used and Why
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Starting with Frank Kane's Course (The Skim Phase)
&lt;/h3&gt;

&lt;p&gt;Knowing the exam was Bedrock-heavy, I decided to take Frank Kane's Udemy course to get a grasp of it. The course is substantial—22 hours of learning. Initially, I just skimmed it.&lt;/p&gt;

&lt;p&gt;What I realized: this is all about Bedrock and how Bedrock works together with other AWS services, plus the AI solutions AWS provides. It gave me a mental map of what I needed to know, even if I wasn't ready to go deep yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AI Practitioner Detour (Strategic, Not a Distraction)
&lt;/h3&gt;

&lt;p&gt;I remembered that the AI Practitioner certification also touches on many AWS AI solutions and Bedrock. So I made a decision that ended up being crucial: &lt;strong&gt;take AI Practitioner first.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not because I needed the knowledge—the requirements are somewhat light, more general understanding of ML/AI terminology than hands-on skills. But because I needed to understand the exam structure and feeling before jumping into a Professional-level certification with zero practice materials available.&lt;/p&gt;

&lt;p&gt;I used Stephane Maarek's practice test. Did it twice before scheduling the exam. The exam itself was very straightforward, and I got results immediately.&lt;/p&gt;

&lt;p&gt;This gave me exactly what I needed: training for the AWS Generative AI Professional format without the high stakes.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Deep Dive (Where Real Learning Happened)
&lt;/h3&gt;

&lt;p&gt;After AI Practitioner, I went back to Frank Kane's course—this time properly. I started playing around with Bedrock and foundation models hands-on.&lt;/p&gt;

&lt;p&gt;After making sure I had a solid understanding, I did the course's practice test.&lt;/p&gt;

&lt;p&gt;It felt very, very easy. Too easy for a Professional certification.&lt;/p&gt;

&lt;p&gt;That worried me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practice Tests Reality Check: Too Easy, Off-Target, or ML-Heavy
&lt;/h2&gt;

&lt;p&gt;I did some digging around, and many Redditors shared the same experience: the exam is Professional-level hard, but there aren't many accurate practice tests available. The questions are reportedly brutal.&lt;/p&gt;

&lt;p&gt;So I practiced even more with Bedrock directly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deployed guardrails&lt;/strong&gt; for a test project (critical for enterprise adoption—hallucination prevention and PII masking are the biggest blockers right now)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrated AWS Comprehend&lt;/strong&gt; for PII detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployed AgentCore&lt;/strong&gt; (directly relevant to the &lt;a href="https://www.techtarget.com/searchenterpriseai/feature/The-future-of-generative-AI-Trends-to-follow" rel="noopener noreferrer"&gt;agentic orchestration trend&lt;/a&gt; predicted for 2025/2026)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-read the entire AWS documentation&lt;/strong&gt; at least twice&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I tried Gemini's quiz features, but it kept drifting into ML Specialty territory rather than AWS Bedrock Generative AI specifics. Not helpful.&lt;/p&gt;

&lt;p&gt;Then I found people recommending a new practice test on &lt;strong&gt;Tutorials Dojo&lt;/strong&gt;. I took it.&lt;/p&gt;

&lt;p&gt;Reality check: I was barely getting 65%. And I understood exactly why—I had very little experience with Machine Learning, and many questions were SageMaker ML-heavy.&lt;/p&gt;

&lt;p&gt;I worked to get to a 75% rate on the two question sets there. But it felt like I'd need another month or two of deep ML study to score higher.&lt;/p&gt;

&lt;h2&gt;
  
  
  The SageMaker Dilemma: How Much ML You Actually Need
&lt;/h2&gt;

&lt;p&gt;I was confronted with a big dilemma.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A:&lt;/strong&gt; Take the Generative AI exam without deep SageMaker knowledge.&lt;br&gt;
&lt;strong&gt;Option B:&lt;/strong&gt; Spend another month or two mastering ML concepts first.&lt;/p&gt;

&lt;p&gt;Many people in the community recommended sticking to Bedrock. The logic: this is a Generative AI certification, not ML Specialty. SageMaker matters, but it's not the core.&lt;/p&gt;

&lt;p&gt;I decided to take my chances.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt; It went really well.&lt;/p&gt;

&lt;p&gt;While the exam did have a couple of SageMaker questions, the majority were Bedrock-heavy. All the hands-on Bedrock practice paid off.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Here's the lesson: &lt;strong&gt;knowing what NOT to study deeply can be as important as knowing what to study.&lt;/strong&gt; I could have spent two months on SageMaker and still faced the same Bedrock-integration questions. The risk calculation was worth it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What the Exam Actually Felt Like: Question Style, Time Pressure, and Mental Endurance
&lt;/h2&gt;

&lt;p&gt;The questions were some of the toughest I had ever seen on any AWS exam.&lt;/p&gt;

&lt;p&gt;Not because of technical complexity alone—but because of how they were formulated. Each question required building a mental model with two or three layers of assumptions before arriving at an answer.&lt;/p&gt;

&lt;p&gt;Here's what I mean. Don't expect questions like "What does Guardrails do?" Expect something closer to: &lt;em&gt;"If a Guardrail filters PII, but the Agent is configured to retry on failure, and the Lambda timeout is set to X seconds, what is the user experience when the content policy triggers?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You're not just recalling facts. You're simulating system behavior in your head.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time breakdown:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some questions took me 10-15 minutes each&lt;/li&gt;
&lt;li&gt;I used 3 hours and 30 minutes of the 4-hour exam&lt;/li&gt;
&lt;li&gt;That's an average of about 3 minutes per question&lt;/li&gt;
&lt;li&gt;I sped up on the last questions because I physically couldn't sit still any longer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 4-hour allocation for non-native English speakers isn't generous—it's necessary. And even native speakers should expect to use most of it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Physical reality:&lt;/strong&gt; By hour three, I desperately needed to move around. I couldn't resist the urge to finish quickly just to stand up. Plan for this. The exam is a mental marathon, but it's also a physical endurance test.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways: How to Prepare Efficiently (Without Overstudying)
&lt;/h2&gt;

&lt;p&gt;Here's what I'd tell anyone preparing for this certification:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The exam is genuinely difficult.&lt;/strong&gt; The tough questions aren't a rumor. Accept this going in and prepare accordingly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The 4-hour time limit is real.&lt;/strong&gt; Non-native speakers get this by default, but everyone needs it. Don't rush through practice tests—simulate the actual pacing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prepare physically.&lt;/strong&gt; No water breaks, no bathroom breaks, no interruptions for the full duration. Eat well before. Hydrate earlier in the day, not right before.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Practice tests are imperfect.&lt;/strong&gt; They'll either train you for things not on the exam or be too soft. Use them for structure familiarity, not content accuracy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS documentation + hands-on practice is the real preparation.&lt;/strong&gt; Deploy guardrails. Build agents. Integrate Comprehend. Read the docs twice.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Know Bedrock integrations cold.&lt;/strong&gt; More than half the questions require understanding how Bedrock works with Lambda, OpenSearch, IAM, CloudWatch, S3, and other services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Take another certification first.&lt;/strong&gt; AI Practitioner or AWS Solutions Architect gives you exam format experience and foundational knowledge. Prior certification experience is incredibly valuable here.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Who Should Take This Next (and What's Coming)
&lt;/h2&gt;

&lt;p&gt;If you have solid AWS experience and want to position yourself for the generative AI wave, this certification is worth the effort. The early adopter badge won't be available forever, and the market demand for these skills is only growing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My recommendation:&lt;/strong&gt; Don't rush it, but don't over-prepare either. Focus on Bedrock, integrations, and hands-on practice. Accept that some ML questions will show up, but don't let SageMaker anxiety derail your timeline.&lt;/p&gt;

&lt;p&gt;In a follow-up post, I'll share what I actually learned through this process—the technical knowledge that stuck, and what prior experience helped me the most. Stay tuned.&lt;/p&gt;




&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.researchandmarkets.com/report/global-generative-ai-in-software-development-market" rel="noopener noreferrer"&gt;Generative AI in Software Development Lifecycle Market Size&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.techtarget.com/searchenterpriseai/feature/The-future-of-generative-AI-Trends-to-follow" rel="noopener noreferrer"&gt;The future of generative AI: 10 trends to follow in 2026 | TechTarget&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Dan Gurgui&lt;/strong&gt; | A4G&lt;br&gt;
&lt;em&gt;AI Architect&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Weekly Architecture Insights: &lt;a href="https://architectureforgrowth.com/newsletter" rel="noopener noreferrer"&gt;Subscribe&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
    </item>
    <item>
      <title>LangSearch Inside Claude: The Fastest “Search Tool” Setup I’ve Used Lately</title>
      <dc:creator>Dan Gurgui</dc:creator>
      <pubDate>Thu, 01 Jan 2026 15:13:56 +0000</pubDate>
      <link>https://dev.to/arch4g/langsearch-inside-claude-the-fastest-search-tool-setup-ive-used-lately-5381</link>
      <guid>https://dev.to/arch4g/langsearch-inside-claude-the-fastest-search-tool-setup-ive-used-lately-5381</guid>
      <description>&lt;h2&gt;
  
  
  Hook: When Claude’s web search feels nerfed, add a turbocharger
&lt;/h2&gt;

&lt;p&gt;I’ve been playing with a bunch of “AI + web” setups lately, and I keep running into the same vibe: the model is smart, but the search layer feels… constrained.&lt;/p&gt;

&lt;p&gt;You ask for sources, you ask for breadth, you ask for “show me five different angles,” and you get a couple of thin results, slow turnaround, or citations that feel like they were picked by a cautious librarian with a strict budget. I’m not even mad about it, I get why default web search has guardrails. But in practice, it can feel nerfed.&lt;/p&gt;

&lt;p&gt;Then I tried &lt;strong&gt;LangSearch inside Claude&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Man. That shit is amazing.&lt;/p&gt;

&lt;p&gt;The difference isn’t subtle. With the default experience, I’m nudging and waiting. With LangSearch wired in, it’s like flying. &lt;strong&gt;Blazing fast queries, lots of results, and tight iteration loops&lt;/strong&gt;. I haven’t felt that kind of “search responsiveness” in other assistants lately.&lt;/p&gt;

&lt;h2&gt;
  
  
  What LangSearch-in-Claude actually is (in plain terms)
&lt;/h2&gt;

&lt;p&gt;At a high level, you’re doing something simple: you’re giving Claude a better search engine to call.&lt;/p&gt;

&lt;p&gt;Claude supports &lt;strong&gt;tool use&lt;/strong&gt; (Anthropic calls it tool use / function calling). You register a tool with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;name&lt;/strong&gt; (what Claude will call)&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;description&lt;/strong&gt; (when it should use it)&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;schema&lt;/strong&gt; (what inputs it accepts)&lt;/li&gt;
&lt;li&gt;And you provide the actual execution (you run the API call in your app, or via an agent runner)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you hand it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your &lt;strong&gt;LangSearch API key&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A bit of &lt;strong&gt;LangSearch documentation&lt;/strong&gt; (or at least the endpoint + parameters you want Claude to use)&lt;/li&gt;
&lt;li&gt;And you let Claude decide when to execute search queries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, it looks like: Claude generates a structured tool call, your code calls LangSearch, and then Claude reads the results and synthesizes an answer with citations.&lt;/p&gt;

&lt;p&gt;Here’s a minimal sketch of what “tool registration” looks like conceptually (exact wiring depends on your runtime and SDK):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"langsearch_query"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Search the web for up-to-date information and return top results with snippets and URLs."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input_schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"num_results"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"default"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then your executor does something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// TypeScript-ish pseudo-code&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;langsearch_query&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;num_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.langsearch.com/v1/search&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;LANGSEARCH_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;num_results&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it. You’re not “making Claude smarter.” You’re giving it a &lt;strong&gt;higher-throughput retrieval layer&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it feels so fast: the practical differences you notice
&lt;/h2&gt;

&lt;p&gt;Speed is a mushy word, so here’s what I actually mean when I say it feels faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Latency: time-to-first-usable result drops
&lt;/h3&gt;

&lt;p&gt;With a good search API, you get results back quickly, consistently. That matters because most of us don’t do one search. We do a research loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ask question
&lt;/li&gt;
&lt;li&gt;Skim results
&lt;/li&gt;
&lt;li&gt;Refine query
&lt;/li&gt;
&lt;li&gt;Pull a second source to confirm
&lt;/li&gt;
&lt;li&gt;Summarize, compare, decide&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If each loop costs 20–40 seconds, you stop iterating. If each loop costs 3–8 seconds, you keep going. &lt;strong&gt;Iteration speed is the real productivity unlock.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Throughput: you can ask for breadth without punishment
&lt;/h3&gt;

&lt;p&gt;A common failure mode with built-in search tools is that they return a tiny handful of results, or the model “chooses” to search less often than you’d like.&lt;/p&gt;

&lt;p&gt;With LangSearch, you can comfortably ask for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10–20 results&lt;/li&gt;
&lt;li&gt;multiple query variants&lt;/li&gt;
&lt;li&gt;separate searches per subtopic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…and it doesn’t feel like you’re paying a tax in waiting time.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) Relevance: fewer “why is this result here?” moments
&lt;/h3&gt;

&lt;p&gt;This is subjective, but I noticed fewer irrelevant links and fewer “SEO sludge” pages in the top set. That means less time spent telling the model, “no, not that, the other thing.”&lt;/p&gt;

&lt;h3&gt;
  
  
  4) Tool calling reliability is good enough to trust in the loop
&lt;/h3&gt;

&lt;p&gt;Tool use has gotten materially better. Anthropic’s tool use is generally strong, and independent evaluations like Berkeley’s function calling leaderboards show modern models are much more consistent about producing valid tool calls than they were a year ago (BFCL: &lt;a href="https://gorilla.cs.berkeley.edu/leaderboard.html" rel="noopener noreferrer"&gt;https://gorilla.cs.berkeley.edu/leaderboard.html&lt;/a&gt;). That reliability matters because flaky tool calls destroy the “flying” feeling fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it shines: 4 research workflows that benefit immediately
&lt;/h2&gt;

&lt;p&gt;This is where it stopped being a neat trick and started being a daily driver for me.&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Competitive scans without the pain
&lt;/h3&gt;

&lt;p&gt;If you’ve ever tried to map a market quickly, you know the drill: a dozen tabs, half of them garbage, and you still miss two important players.&lt;/p&gt;

&lt;p&gt;With LangSearch inside Claude, I’ll do something like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Search for the top 15 vendors in X”&lt;/li&gt;
&lt;li&gt;“Now search for ‘X vs Y’ comparison posts”&lt;/li&gt;
&lt;li&gt;“Now search for pricing pages and extract tiers”&lt;/li&gt;
&lt;li&gt;“Now summarize positioning in a table”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What changes is not that Claude can summarize, it always could. What changes is &lt;strong&gt;how quickly you can gather enough raw material&lt;/strong&gt; to make the summary credible.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Troubleshooting with real-world context
&lt;/h3&gt;

&lt;p&gt;This is my favorite use case.&lt;/p&gt;

&lt;p&gt;When you hit a weird production issue, the docs are often not enough. You want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub issues&lt;/li&gt;
&lt;li&gt;changelogs&lt;/li&gt;
&lt;li&gt;forum posts&lt;/li&gt;
&lt;li&gt;“someone hit this in Kubernetes 1.29 with Cilium” type threads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LangSearch is great for that “needle in a haystack” search pattern, especially when you chain it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search for the exact error string&lt;/li&gt;
&lt;li&gt;Search again with the library version&lt;/li&gt;
&lt;li&gt;Search for “workaround” / “regression” / “breaking change”&lt;/li&gt;
&lt;li&gt;Pull 3–5 sources and ask Claude to reconcile them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The output gets better because the input set is better.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) Sourcing: pulling multiple perspectives fast
&lt;/h3&gt;

&lt;p&gt;Engineers often need to answer questions that look simple but aren’t:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Is this API stable?”&lt;/li&gt;
&lt;li&gt;“What are the known footguns?”&lt;/li&gt;
&lt;li&gt;“Is the community alive?”&lt;/li&gt;
&lt;li&gt;“Does anyone regret adopting this?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those answers don’t come from one official page. They come from triangulation.&lt;/p&gt;

&lt;p&gt;LangSearch makes it cheap to pull:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;official docs&lt;/li&gt;
&lt;li&gt;blog posts&lt;/li&gt;
&lt;li&gt;issue trackers&lt;/li&gt;
&lt;li&gt;community threads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then Claude can do what it’s good at: &lt;strong&gt;pattern matching across sources&lt;/strong&gt; and telling you what’s consistent vs what’s anecdotal.&lt;/p&gt;

&lt;h3&gt;
  
  
  4) Summarizing multiple pages (without pretending)
&lt;/h3&gt;

&lt;p&gt;A lot of assistants will “summarize the web” while actually summarizing a couple of snippets.&lt;/p&gt;

&lt;p&gt;With a fast search tool, you can push a more honest workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pull 10–15 relevant URLs&lt;/li&gt;
&lt;li&gt;ask Claude to summarize with citations&lt;/li&gt;
&lt;li&gt;ask it to call out disagreements between sources&lt;/li&gt;
&lt;li&gt;ask it what’s missing and run another search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially useful for writing technical docs, internal RFCs, or even blog posts where you want breadth without spending half a day collecting links.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to evaluate it yourself (a simple benchmark you can run)
&lt;/h2&gt;

&lt;p&gt;If you’re considering wiring this into your own setup, don’t trust vibes. Run a repeatable test.&lt;/p&gt;

&lt;h3&gt;
  
  
  A lightweight benchmark
&lt;/h3&gt;

&lt;p&gt;Pick &lt;strong&gt;three research tasks&lt;/strong&gt; you actually do at work. For example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Troubleshoot a specific error message from your logs
&lt;/li&gt;
&lt;li&gt;Compare two competing tools (feature + pricing + tradeoffs)
&lt;/li&gt;
&lt;li&gt;Find the latest docs / changelog for a dependency and summarize what changed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then run the same workflow across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude + built-in web search (if you have it enabled)&lt;/li&gt;
&lt;li&gt;Claude + LangSearch tool&lt;/li&gt;
&lt;li&gt;(Optional) another assistant you use day-to-day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Track three metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time-to-first-good-answer&lt;/strong&gt; (not first answer, first useful one)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Citation quality&lt;/strong&gt; (are links relevant, diverse, and not duplicated?)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iteration count&lt;/strong&gt; (how many follow-ups did you need to get to “done”?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If LangSearch is doing what I’m seeing, you’ll notice the biggest win in iteration count and time-to-first-good-answer, not in raw “model intelligence.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Caveats and gotchas before you wire it into everything
&lt;/h2&gt;

&lt;p&gt;This is the part people skip, then they get burned.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost and rate limits:&lt;/strong&gt; Fast search encourages more searching. That’s good, until you hit per-minute limits or your bill spikes. Put basic throttling and caching in place.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key security:&lt;/strong&gt; Treat the LangSearch API key like any other production credential. Don’t paste it into random clients. Use server-side execution, env vars, secret managers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucinated citations:&lt;/strong&gt; Even with real search results, the model can still misattribute a claim to a URL. You want your tool to return structured fields (title, snippet, url), and you want prompts that force quoting or explicit referencing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Over-trusting “top results”:&lt;/strong&gt; Search ranking is not truth ranking. For sensitive decisions, you still need to sanity check primary sources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in search is improving:&lt;/strong&gt; Anthropic has been investing in web search (they announced a Web Search API in 2025: &lt;a href="https://www.anthropic.com/news/web-search-api" rel="noopener noreferrer"&gt;https://www.anthropic.com/news/web-search-api&lt;/a&gt;). The gap may narrow over time. But today, alternatives can still be worth it if research speed matters to you.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closer + CTA: Add LangSearch, then compare against ChatGPT/Gemini
&lt;/h2&gt;

&lt;p&gt;If you’re using Claude for real engineering work and you keep bouncing off the built-in search experience, &lt;strong&gt;try adding LangSearch as a tool&lt;/strong&gt;. The setup is straightforward, and the payoff is immediate if you do any serious research loops.&lt;/p&gt;

&lt;p&gt;I haven’t wired the same setup into ChatGPT yet, but I probably will, mostly because I want a fair comparison under the same benchmark.&lt;/p&gt;

&lt;p&gt;If you run this test, I’d love to hear your numbers: time-to-first-good-answer, citation quality, and where it helped (or didn’t). What workflows are you trying to speed up?&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Dan Gurgui&lt;/strong&gt; | A4G&lt;br&gt;
&lt;em&gt;AI Architect&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Weekly Architecture Insights: &lt;a href="https://architectureforgrowth.com/newsletter" rel="noopener noreferrer"&gt;Subscribe&lt;/a&gt;&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>engineering</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>My Experience Using the BMAD Framework on a Personal Project (Patience Required)</title>
      <dc:creator>Dan Gurgui</dc:creator>
      <pubDate>Tue, 30 Dec 2025 20:45:20 +0000</pubDate>
      <link>https://dev.to/arch4g/my-experience-using-the-bmad-framework-on-a-personal-project-patience-required-28aa</link>
      <guid>https://dev.to/arch4g/my-experience-using-the-bmad-framework-on-a-personal-project-patience-required-28aa</guid>
      <description>&lt;h2&gt;
  
  
  Getting Started: “I’ll just use BMAD to move faster”
&lt;/h2&gt;

&lt;p&gt;Over the last couple of weeks I’ve been working with the &lt;strong&gt;BMAD framework&lt;/strong&gt; on a personal project, and I wanted to write this up while it’s still fresh.&lt;/p&gt;

&lt;p&gt;Going in, my expectation was pretty simple: I’d plug in my idea, let the workflow guide me, and I’d be writing code quickly, with better direction and fewer dead ends.&lt;/p&gt;

&lt;p&gt;That’s… partially true. But there’s a big caveat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BMAD is not a “start coding in 20 minutes” setup.&lt;/strong&gt; It’s closer to “do the work up front so the coding part stops being the hardest part.”&lt;/p&gt;

&lt;p&gt;And if you’re used to hacking a prototype together first and figuring out the product later, this is going to feel slow. Sometimes painfully slow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Reality Check: it takes a lot of time before you write anything
&lt;/h2&gt;

&lt;p&gt;The first thing you notice with BMAD is that it pushes you into an extensive workflow before you’re allowed to feel productive in the way engineers usually define productivity (shipping code).&lt;/p&gt;

&lt;p&gt;It takes you through a bunch of steps like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Defining the problem&lt;/strong&gt; (and not just “I want to build X”, but “what pain exists and for who?”)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Defining user personas&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Brainstorming approaches&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Researching the space&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clarifying constraints&lt;/strong&gt; (time, money, infra, team, target platform)&lt;/li&gt;
&lt;li&gt;Turning that into &lt;strong&gt;epics&lt;/strong&gt;, &lt;strong&gt;stories&lt;/strong&gt;, and execution plans&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of that is useful. But it’s not free.&lt;/p&gt;

&lt;p&gt;For me, it took roughly &lt;strong&gt;12 to 16 hours before the first line of code was written&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That number sounds ridiculous if you’re thinking in “weekend project” mode. But the more I sat with it, the more it made sense: BMAD forces you to do the thinking you usually avoid until the project is already messy.&lt;/p&gt;

&lt;p&gt;And to be fair, I’ve done the opposite too many times:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build something fast&lt;/li&gt;
&lt;li&gt;Realize I built the wrong thing&lt;/li&gt;
&lt;li&gt;Rewrite it&lt;/li&gt;
&lt;li&gt;Lose motivation&lt;/li&gt;
&lt;li&gt;Abandon it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So yes, this up-front investment is real. It’s also kind of the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Frameworks Are Actually Good (especially for business thinking)
&lt;/h2&gt;

&lt;p&gt;One of the things I genuinely liked is that the frameworks presented in BMAD give you a different perspective, especially around the &lt;strong&gt;business side&lt;/strong&gt; of what you’re building.&lt;/p&gt;

&lt;p&gt;If you’re an engineer building a personal project, you usually start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“What stack do I want to use?”&lt;/li&gt;
&lt;li&gt;“What architecture seems clean?”&lt;/li&gt;
&lt;li&gt;“What cloud services are cheapest?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;BMAD drags you back to questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who is this for, specifically?&lt;/li&gt;
&lt;li&gt;What are they trying to accomplish?&lt;/li&gt;
&lt;li&gt;What do they do today instead?&lt;/li&gt;
&lt;li&gt;Why would they switch?&lt;/li&gt;
&lt;li&gt;What’s the smallest thing that proves value?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even if you think you already know those answers, writing them down forces clarity.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The value here isn’t that it tells you something magical. The value is that it makes you commit to decisions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But again, you pay for that clarity with time. You’re not coding, you’re thinking and documenting.&lt;/p&gt;

&lt;h2&gt;
  
  
  “Party Mode” and how I burned through context and credits
&lt;/h2&gt;

&lt;p&gt;Then I hit the fun (and painful) part: &lt;strong&gt;party mode&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you haven’t used it, party mode is basically the “get multiple perspectives and generate a lot of material quickly” mode. It can be super useful when you want breadth:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;different solutions&lt;/li&gt;
&lt;li&gt;different tradeoffs&lt;/li&gt;
&lt;li&gt;different product angles&lt;/li&gt;
&lt;li&gt;risk lists&lt;/li&gt;
&lt;li&gt;architecture options&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I made the mistake of telling it to run party mode with &lt;strong&gt;LangSearch&lt;/strong&gt; and also run party mode with &lt;strong&gt;Gemini&lt;/strong&gt;, and that combo absolutely &lt;strong&gt;exhausted my context window and usage credits&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;What happened was predictable in hindsight: party mode wants to read, pull in sources, synthesize, then generate. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lots of tokens in&lt;/li&gt;
&lt;li&gt;lots of tokens out&lt;/li&gt;
&lt;li&gt;and depending on the tools, lots of paid calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I tried to be clever and tell it something like: “don’t read everything, just put stuff into files and summarize.”&lt;/p&gt;

&lt;p&gt;In practice, that didn’t really work the way I expected. Once you’ve instructed the workflow to do deep research, it tends to follow through. It wants to gather the material so it can justify conclusions. That’s good for quality, but bad for cost control if you’re not careful.&lt;/p&gt;

&lt;p&gt;Still, I’ll say this: &lt;strong&gt;it was very useful&lt;/strong&gt;. The output was genuinely better when it had multiple angles to compare. It just came at a price.&lt;/p&gt;

&lt;p&gt;If you’re going to use party mode, my advice is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;use it intentionally&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;set boundaries (scope, sources, max depth)&lt;/li&gt;
&lt;li&gt;and assume it will be expensive if you let it run wild&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  12–16 hours later: the first line of code… and then I hit an architecture wall
&lt;/h2&gt;

&lt;p&gt;After all the setup and the workflow, I finally got to the point where code started getting written.&lt;/p&gt;

&lt;p&gt;And almost immediately I realized I had made an architecture mistake.&lt;/p&gt;

&lt;p&gt;This part is important because it’s the kind of mistake that’s easy to make when you’re letting an assistant drive, and you’re “supervising” instead of actively building.&lt;/p&gt;

&lt;p&gt;I had told the architect to focus on &lt;strong&gt;low cost&lt;/strong&gt;, so it leaned into a serverless setup, specifically AWS Lambda-style compute. Then I told it to use &lt;strong&gt;NestJS&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;On paper, that sounds fine. In reality, it’s tricky.&lt;/p&gt;

&lt;p&gt;NestJS can run in a serverless environment, but it’s not “drop in NestJS and deploy to Lambda” unless you set it up correctly. You typically need an adapter layer (for example, using &lt;code&gt;@vendia/serverless-express&lt;/code&gt; or similar patterns) or you use a framework that’s more directly aligned with serverless request handling.&lt;/p&gt;

&lt;p&gt;Without that, you get a mess of mismatched assumptions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long-lived server patterns vs cold starts&lt;/li&gt;
&lt;li&gt;framework bootstrapping time vs latency expectations&lt;/li&gt;
&lt;li&gt;request lifecycle differences&lt;/li&gt;
&lt;li&gt;deployment packaging and handler wiring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what happened next is exactly what you’d expect: &lt;strong&gt;errors all over the place&lt;/strong&gt;, and a system that kept trying to fix itself in a loop, without making real progress.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 6-hour debugging spiral (and why it was so confusing)
&lt;/h2&gt;

&lt;p&gt;I spent a huge amount of time trying to fix it, around &lt;strong&gt;six hours&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The frustrating part was that in the moment, I didn’t immediately know what was wrong. It wasn’t one clean error like “you used the wrong import.”&lt;/p&gt;

&lt;p&gt;It was more like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;something fails&lt;/li&gt;
&lt;li&gt;you fix the symptom&lt;/li&gt;
&lt;li&gt;something else fails&lt;/li&gt;
&lt;li&gt;the fix introduces another issue&lt;/li&gt;
&lt;li&gt;you end up in a loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’ve ever dealt with a misaligned architecture decision early in a project, you know the feeling. The code is “correct” in isolation, but the environment and assumptions are wrong.&lt;/p&gt;

&lt;p&gt;This is also where AI-assisted workflows can get weird. If the system is trying to be helpful, it can keep proposing changes that look plausible locally, but don’t address the root mismatch. You can burn a lot of time approving “reasonable” edits that never converge.&lt;/p&gt;

&lt;p&gt;And that’s exactly what happened. It kept spinning, and I kept thinking, “why is this stuck?”&lt;/p&gt;

&lt;h2&gt;
  
  
  The turning point: I didn’t figure it out, the retrospective did
&lt;/h2&gt;

&lt;p&gt;Here’s the interesting part: it wasn’t me that realized the core issue first.&lt;/p&gt;

&lt;p&gt;What happened is I noticed it was spending too much time and not converging, and I decided to initiate the BMAD workflow for running a &lt;strong&gt;retrospective&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That retrospective step ended up being the breakthrough.&lt;/p&gt;

&lt;p&gt;Because instead of continuing forward motion (which was fake progress), it forced a pause and asked:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what are we trying to do?&lt;/li&gt;
&lt;li&gt;what’s blocking us?&lt;/li&gt;
&lt;li&gt;what assumptions did we make?&lt;/li&gt;
&lt;li&gt;what changed?&lt;/li&gt;
&lt;li&gt;what decision is causing repeated failure?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s when it became clear that the setup was not right. The architecture needed adjustment to match the runtime model.&lt;/p&gt;

&lt;p&gt;Once that was identified, the next steps were obvious:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;either adjust the NestJS setup to run properly in a serverless handler model&lt;/li&gt;
&lt;li&gt;or change the compute model (for example, containerized service on something like ECS/Fargate, or a simple VM), depending on goals&lt;/li&gt;
&lt;li&gt;or pick a framework more naturally aligned with serverless&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The main point is that &lt;strong&gt;the retrospective forced the system to stop patching and start diagnosing&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;And honestly, this is one of the strongest arguments for structured workflows like BMAD. Most engineers don’t run retrospectives on a personal project when things go wrong. We just grind harder.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’ve done that grind plenty of times. It rarely helps.&lt;/p&gt;

&lt;h2&gt;
  
  
  After the fix: everything went smoothly (and the “stories” became the superpower)
&lt;/h2&gt;

&lt;p&gt;Once everything was set up correctly, the experience changed completely.&lt;/p&gt;

&lt;p&gt;The biggest win for me was the fact that I had &lt;strong&gt;stories&lt;/strong&gt;. Real stories. Not vague tasks like “build backend.”&lt;/p&gt;

&lt;p&gt;With stories, I could tell it exactly what to implement, in a way that was scoped and testable. That meant I wasn’t doing a bunch of extra work translating ideas into engineering tasks. The translation was already done.&lt;/p&gt;

&lt;p&gt;At that point my role became:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;supervise&lt;/li&gt;
&lt;li&gt;review decisions&lt;/li&gt;
&lt;li&gt;sanity check the code&lt;/li&gt;
&lt;li&gt;occasionally click yes/no for requests and changes&lt;/li&gt;
&lt;li&gt;keep it aligned with the goal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a very different feeling than “I’m the one doing everything.”&lt;/p&gt;

&lt;p&gt;And it’s genuinely cool when it works because it shifts the bottleneck. Instead of “how fast can I type,” it becomes “how well can I review and steer.”&lt;/p&gt;

&lt;p&gt;If you’ve ever led a team, you’ll recognize that mode. You’re not writing every line. You’re making sure the work being done is the right work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What BMAD gets right: patience in exchange for momentum
&lt;/h2&gt;

&lt;p&gt;Overall, I think BMAD is really cool.&lt;/p&gt;

&lt;p&gt;But I don’t want to oversell it. The trade is clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;You need patience to set it up&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;You need to give good answers&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;You need to review everything&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;and you need to accept that the early phase feels slow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you treat it like a magic code generator, you’re going to be annoyed.&lt;/p&gt;

&lt;p&gt;If you treat it like a process that front-loads thinking, documentation, and execution structure, it starts to make sense.&lt;/p&gt;

&lt;p&gt;And once you’re past that initial slope, it becomes pretty straightforward.&lt;/p&gt;

&lt;h2&gt;
  
  
  The underrated feature: you can resume anytime because everything is in documents
&lt;/h2&gt;

&lt;p&gt;Another thing I didn’t appreciate until I was in it is how nice it is that you can &lt;strong&gt;resume at any time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Because everything is written down, you’re not relying on your memory or on some fragile chat context. You have artifacts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;personas&lt;/li&gt;
&lt;li&gt;problem statements&lt;/li&gt;
&lt;li&gt;architecture notes&lt;/li&gt;
&lt;li&gt;epics&lt;/li&gt;
&lt;li&gt;stories&lt;/li&gt;
&lt;li&gt;decisions and tradeoffs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So you can come back after a day or a week and say:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“execute this epic”&lt;/li&gt;
&lt;li&gt;“continue this story”&lt;/li&gt;
&lt;li&gt;“implement the next task”&lt;/li&gt;
&lt;li&gt;“run a retrospective on the last change”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it doesn’t feel like starting over.&lt;/p&gt;

&lt;p&gt;For personal projects, that’s huge. Most of us lose momentum not because we can’t code, but because we return after a break and spend an hour reconstructing context.&lt;/p&gt;

&lt;p&gt;BMAD reduces that tax.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’d tell another engineer before they try it
&lt;/h2&gt;

&lt;p&gt;I’m not going to pretend this is the answer for every project. If you’re hacking a quick script or testing an API idea, BMAD is probably too heavy.&lt;/p&gt;

&lt;p&gt;But if you’re building something that you actually want to ship, even as a solo developer, it’s worth considering.&lt;/p&gt;

&lt;p&gt;A few practical lessons from my run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Budget time for setup.&lt;/strong&gt; If you expect to write code in the first hour, you’ll fight the workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Be careful with party mode.&lt;/strong&gt; It’s useful, but it can burn context and credits fast.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don’t treat architecture prompts casually.&lt;/strong&gt; “Low cost” pushes you toward serverless patterns, which can be great, but it constrains framework choices and deployment shape.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use the retrospective when you’re stuck.&lt;/strong&gt; The instinct is to push forward. The smarter move is to stop and diagnose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stories are where the payoff happens.&lt;/strong&gt; Once you have good stories, execution becomes much more mechanical.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;BMAD ended up being one of those experiences where the first phase feels like friction, and then later you realize the friction was the whole point.&lt;/p&gt;

&lt;p&gt;It forced me to slow down, define what I was doing, and make decisions explicit. I burned time (and credits) in a couple places, especially with party mode. I also lost six hours to an architecture mismatch that I should have caught earlier.&lt;/p&gt;

&lt;p&gt;But once the workflow and docs were in place, it got surprisingly smooth. Being able to resume from epics and stories, and to steer implementation without constantly rewriting requirements, is a real productivity shift.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you try BMAD, bring patience. Bring discipline. And assume you’ll spend more time thinking before you spend time coding.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If this resonates, what’s your experience been with structured AI-assisted workflows? I’m curious.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>architecture</category>
      <category>engineering</category>
      <category>testing</category>
    </item>
    <item>
      <title>AWS in the AI era: Bedrock, SageMaker, and the enterprise-first tradeoff</title>
      <dc:creator>Dan Gurgui</dc:creator>
      <pubDate>Tue, 30 Dec 2025 20:27:05 +0000</pubDate>
      <link>https://dev.to/arch4g/aws-in-the-ai-era-bedrock-sagemaker-and-the-enterprise-first-tradeoff-3dpk</link>
      <guid>https://dev.to/arch4g/aws-in-the-ai-era-bedrock-sagemaker-and-the-enterprise-first-tradeoff-3dpk</guid>
      <description>&lt;h2&gt;
  
  
  1. The enterprise AI bet: what AWS is actually optimizing for
&lt;/h2&gt;

&lt;p&gt;Here’s the uncomfortable truth about AWS in AI: &lt;strong&gt;they’re not trying to “win the model leaderboard.”&lt;/strong&gt; They’re trying to win regulated, enterprise AI workloads where the boring stuff matters more than the demos.&lt;/p&gt;

&lt;p&gt;If you’re building AI in a bank, healthcare company, or a Fortune 500 with a security team that says “no” by default, the biggest risk isn’t that your model is 2% worse on a benchmark. It’s that you can’t answer basic questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Where did the data go?&lt;/li&gt;
&lt;li&gt;Who accessed it?&lt;/li&gt;
&lt;li&gt;Can we keep traffic private?&lt;/li&gt;
&lt;li&gt;Can we prove compliance later?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS’s AI story (Bedrock + SageMaker + the prebuilt services like Comprehend/Textract/Transcribe) is basically: &lt;strong&gt;control, governance, deployment flexibility, and integration with the rest of AWS&lt;/strong&gt;—even if that means they move slower on “shiny new capability” than innovation-first competitors.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. AWS’s AI stack, mapped to real enterprise jobs-to-be-done
&lt;/h2&gt;

&lt;p&gt;When people say “AWS AI,” they often mash everything together. In practice, AWS has multiple layers, and each maps to a different “job” inside an enterprise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bedrock: “Give me foundation models, but keep it enterprise-safe”
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Amazon Bedrock&lt;/strong&gt; is the managed “foundation model” layer. You use it when you want access to large models (text/image, etc.) without owning the training pipeline.&lt;/p&gt;

&lt;p&gt;The enterprise job-to-be-done here is usually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build internal copilots (support, ops, engineering enablement)&lt;/li&gt;
&lt;li&gt;Do RAG (retrieval-augmented generation) over company docs&lt;/li&gt;
&lt;li&gt;Add summarization/classification into workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bedrock’s pitch is less “best model” and more &lt;strong&gt;choice + governance + integration&lt;/strong&gt;. You can swap models, apply guardrails, and wire it into IAM/VPC patterns you already use.&lt;/p&gt;

&lt;h3&gt;
  
  
  SageMaker: “We’re building, not just consuming”
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;SageMaker&lt;/strong&gt; is for teams that want control: training, fine-tuning, hosting endpoints, MLOps workflows, model registry, monitoring, and pipelines.&lt;/p&gt;

&lt;p&gt;The job-to-be-done:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Train or fine-tune models on proprietary data&lt;/li&gt;
&lt;li&gt;Run repeatable ML pipelines with approvals and audit trails&lt;/li&gt;
&lt;li&gt;Own deployment patterns (multi-account, multi-region, blue/green)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If Bedrock is “buy,” SageMaker is “build.” It’s also where AWS shines for organizations that already have a platform mindset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Comprehend: “We need NLP features, not a whole LLM app”
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Comprehend&lt;/strong&gt; is classic managed NLP: entity extraction, sentiment, classification, PII detection, etc.&lt;/p&gt;

&lt;p&gt;The job-to-be-done:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extract meaning from support tickets, reviews, claims, emails&lt;/li&gt;
&lt;li&gt;Detect PII for compliance workflows&lt;/li&gt;
&lt;li&gt;Standardize analytics without building a custom model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s not sexy, but it fits enterprises that want predictable outputs and a managed service contract.&lt;/p&gt;

&lt;h3&gt;
  
  
  Textract: “Turn PDFs and scans into data we can use”
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Textract&lt;/strong&gt; does OCR + structured extraction from forms and tables.&lt;/p&gt;

&lt;p&gt;The job-to-be-done:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Invoice processing&lt;/li&gt;
&lt;li&gt;Insurance claim ingestion&lt;/li&gt;
&lt;li&gt;KYC document parsing&lt;/li&gt;
&lt;li&gt;Any “we’re drowning in PDFs” workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is one of those services you don’t brag about, but it pays for itself when it works.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transcribe: “Convert audio to text at scale”
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Transcribe&lt;/strong&gt; is speech-to-text.&lt;/p&gt;

&lt;p&gt;The job-to-be-done:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Call center transcription&lt;/li&gt;
&lt;li&gt;Meeting notes&lt;/li&gt;
&lt;li&gt;Compliance archiving&lt;/li&gt;
&lt;li&gt;Searchable audio libraries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And yes, this is where the quality/cost conversation gets real (we’ll get there).&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Differentiators that matter in regulated environments
&lt;/h2&gt;

&lt;p&gt;If you’ve only built AI prototypes, AWS can feel “too heavy.” If you’ve built AI in a regulated org, a lot of AWS’s choices make more sense.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data boundaries (and why enterprises obsess over them)
&lt;/h3&gt;

&lt;p&gt;A big part of AWS’s positioning is reducing the fear that your data becomes someone else’s training set.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For Bedrock specifically, AWS states that &lt;strong&gt;customer inputs and outputs are not used to train the underlying foundation models&lt;/strong&gt; by default. That’s the kind of sentence that procurement teams love, because it maps to a risk they can actually articulate.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In practice, what matters isn’t marketing—it’s whether you can put the right contractual and technical boundaries around data flows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Private networking: VPC, PrivateLink, and “keep it off the public internet”
&lt;/h3&gt;

&lt;p&gt;A lot of AI competitors assume public endpoints and “trust us” security. AWS’s default enterprise move is: &lt;strong&gt;put services behind private connectivity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Patterns you’ll see in real deployments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bedrock access via &lt;strong&gt;VPC endpoints / AWS PrivateLink&lt;/strong&gt; (where supported)&lt;/li&gt;
&lt;li&gt;SageMaker endpoints in private subnets&lt;/li&gt;
&lt;li&gt;Tight egress controls + centralized logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn’t about paranoia. It’s about making your AI system fit the same threat model as everything else you run.&lt;/p&gt;

&lt;h3&gt;
  
  
  IAM, auditability, and “who did what, when”
&lt;/h3&gt;

&lt;p&gt;AWS’s identity and governance tooling is a differentiator when you actually need it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;IAM&lt;/strong&gt; policies for fine-grained access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudTrail&lt;/strong&gt; for audit logs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KMS&lt;/strong&gt; for encryption and key control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Organizations / SCPs&lt;/strong&gt; for guardrails at scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’ve ever been asked to produce an audit trail for an AI system, you know why this matters. It’s not just security—it’s operational credibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Residency and multi-region controls
&lt;/h3&gt;

&lt;p&gt;Enterprises care about data residency, disaster recovery, and “what happens if a region is down.”&lt;/p&gt;

&lt;p&gt;AWS’s global footprint and mature multi-region patterns make it easier to design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Region-pinned workloads&lt;/li&gt;
&lt;li&gt;Cross-region failover&lt;/li&gt;
&lt;li&gt;Separate prod/test accounts with clear boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Guardrails and governance as product features
&lt;/h3&gt;

&lt;p&gt;AWS is leaning into &lt;strong&gt;guardrails&lt;/strong&gt; (policy controls, content filters, safety boundaries) because enterprises want enforceable rules, not “please behave” prompts.&lt;/p&gt;

&lt;p&gt;This is the enterprise-first vs innovation-first trade: guardrails slow you down a bit, but they also keep you from getting fired.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Where AWS falls short: product quality and developer experience
&lt;/h2&gt;

&lt;p&gt;Now the part people don’t say out loud: &lt;strong&gt;AWS’s AI portfolio is uneven.&lt;/strong&gt; Some services are rock-solid. Others feel like they shipped because the roadmap demanded it, not because the UX was done.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transcribe quality: “good enough” isn’t always good enough
&lt;/h3&gt;

&lt;p&gt;There are plenty of teams who report that Transcribe can struggle depending on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accents and multilingual audio&lt;/li&gt;
&lt;li&gt;Crosstalk in meetings&lt;/li&gt;
&lt;li&gt;Domain-specific vocabulary (medical, legal, internal acronyms)&lt;/li&gt;
&lt;li&gt;Noisy environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Speech-to-text is brutally sensitive to audio quality and domain mismatch. If you’re building anything user-facing, “mostly accurate” can translate into “constant complaints.”&lt;/p&gt;

&lt;p&gt;The practical issue isn’t whether Transcribe is bad. It’s that &lt;strong&gt;you may need to run bake-offs&lt;/strong&gt; and measure WER (word error rate) on &lt;em&gt;your&lt;/em&gt; audio—not a vendor demo.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q Developer: useful, but AWS-shaped
&lt;/h3&gt;

&lt;p&gt;Amazon Q Developer is clearly designed to make AWS developers faster. That’s not inherently wrong.&lt;/p&gt;

&lt;p&gt;But if your stack is multi-cloud, heavy Kubernetes, or you’re not all-in on AWS services, Q Developer can feel narrow. It’s less “universal coding copilot” and more “AWS acceleration tool.”&lt;/p&gt;

&lt;p&gt;That’s fine if you want exactly that. It’s frustrating if your expectation is parity with general-purpose coding assistants.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenSearch as a knowledge base: operational pain is real
&lt;/h3&gt;

&lt;p&gt;AWS pushing &lt;strong&gt;OpenSearch&lt;/strong&gt; (their Elasticsearch fork) is a classic example of enterprise tradeoffs: you get control, hosting options, and integration—but you also inherit operational complexity.&lt;/p&gt;

&lt;p&gt;Teams using OpenSearch for RAG knowledge bases often run into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Debugging relevance issues (tokenization, analyzers, mappings)&lt;/li&gt;
&lt;li&gt;Cluster sizing and shard management&lt;/li&gt;
&lt;li&gt;Upgrades and version quirks&lt;/li&gt;
&lt;li&gt;“It works until it doesn’t” operational incidents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yes, you can use managed OpenSearch. You still need people who understand it. If you don’t have that expertise, “cheap and flexible” becomes “slow and fragile.”&lt;/p&gt;

&lt;p&gt;This is where many teams end up hybrid: a managed vector DB elsewhere, or a simpler managed retrieval layer—because &lt;strong&gt;DX matters when you’re iterating weekly&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Where AWS falls short: cost, pricing complexity, and surprise bills
&lt;/h2&gt;

&lt;p&gt;AWS has a cost story that’s both true and annoying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Infrastructure can be cost-effective at scale.&lt;/li&gt;
&lt;li&gt;Managed AI services can get expensive fast.&lt;/li&gt;
&lt;li&gt;Pricing is rarely simple enough to estimate confidently.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Transcribe pricing vs alternatives (a concrete example)
&lt;/h3&gt;

&lt;p&gt;AWS Transcribe’s standard batch transcription is &lt;strong&gt;$0.024 per minute&lt;/strong&gt; in the first pricing tier, according to AWS’s pricing page: &lt;a href="https://aws.amazon.com/transcribe/pricing/?p=ft&amp;amp;z=4" rel="noopener noreferrer"&gt;https://aws.amazon.com/transcribe/pricing/?p=ft&amp;amp;z=4&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s do back-of-the-napkin math:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10,000 minutes/month&lt;/strong&gt; (~167 hours) → 10,000 × $0.024 = &lt;strong&gt;$240/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100,000 minutes/month&lt;/strong&gt; (~1,667 hours) → &lt;strong&gt;$2,400/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1,000,000 minutes/month&lt;/strong&gt; (~16,667 hours) → &lt;strong&gt;$24,000/month&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At enterprise scale, that’s real money—especially if you’re also paying for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Storage (S3)&lt;/li&gt;
&lt;li&gt;Processing pipelines (Lambda/ECS)&lt;/li&gt;
&lt;li&gt;Search/indexing (OpenSearch)&lt;/li&gt;
&lt;li&gt;Observability (CloudWatch costs add up)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Research and market comparisons often show that alternatives can be &lt;strong&gt;dramatically cheaper—up to ~89% cheaper in some scenarios&lt;/strong&gt; (depending on model/provider and quality targets). The exact number varies, but the point stands: &lt;strong&gt;AWS’s managed convenience is not always the low-cost option.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The real cost killer: “pricing complexity tax”
&lt;/h3&gt;

&lt;p&gt;Even when the per-unit price is reasonable, teams get hit by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hard-to-predict request patterns&lt;/li&gt;
&lt;li&gt;Multiple services each with their own meters&lt;/li&gt;
&lt;li&gt;Network egress surprises in hybrid setups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you don’t model the full system cost, you’re not budgeting—you’re guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Industry direction: open weights + portability, and how AWS fits
&lt;/h2&gt;

&lt;p&gt;The long-term industry gravity is toward &lt;strong&gt;more model choice and more portability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not just “open source code,” but increasingly &lt;strong&gt;open weights&lt;/strong&gt; and ecosystems where you can run the same model across clouds—or on-prem—depending on security, cost, or latency constraints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why open weights are winning mindshare
&lt;/h3&gt;

&lt;p&gt;Open-weight models give you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deployment control (where it runs, how it scales)&lt;/li&gt;
&lt;li&gt;Vendor optionality (swap infra without rewriting everything)&lt;/li&gt;
&lt;li&gt;Better customization paths (fine-tune, distill, quantize)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enterprises like this because it reduces lock-in risk. Engineers like it because it’s closer to how we build everything else: composable components, measurable performance, replaceable parts.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS’s quiet advantage: data gravity and the “boring” platform
&lt;/h3&gt;

&lt;p&gt;Here’s where AWS is better positioned than people think: &lt;strong&gt;data management&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If your organization already lives in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3 as the data lake&lt;/li&gt;
&lt;li&gt;Glue / Lake Formation for catalog and governance&lt;/li&gt;
&lt;li&gt;Redshift for warehousing&lt;/li&gt;
&lt;li&gt;Kinesis/MSK for streaming&lt;/li&gt;
&lt;li&gt;IAM/KMS/CloudTrail for security and audit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…then AWS is a natural place to operationalize open-weight models, because the hardest part of enterprise AI is usually &lt;strong&gt;data access + governance&lt;/strong&gt;, not model APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Infrastructure competitiveness: Trainium/Inferentia vs the world
&lt;/h3&gt;

&lt;p&gt;AWS also has a strong infra story with &lt;strong&gt;Trainium&lt;/strong&gt; (training) and &lt;strong&gt;Inferentia&lt;/strong&gt; (inference). Performance-per-dollar comparisons vary by workload, but independent analyses have compared AWS Trainium against Google TPU v5e and Azure ND H100 instances and found meaningful tradeoffs in cost and throughput depending on model shape and batch sizes (see: &lt;a href="https://www.cloudexpat.com/blog/comparison-aws-trainium-google-tpu-v5e-azure-nd-h100-nvidia/" rel="noopener noreferrer"&gt;https://www.cloudexpat.com/blog/comparison-aws-trainium-google-tpu-v5e-azure-nd-h100-nvidia/&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The point isn’t “AWS is always cheapest.” It’s that AWS is investing in custom silicon plus the surrounding platform. If you’re doing sustained training/inference at scale, that matters.&lt;/p&gt;

&lt;p&gt;So the industry trend (open models, portability) doesn’t necessarily threaten AWS. It can actually &lt;strong&gt;strengthen AWS’s platform moat&lt;/strong&gt;—as long as AWS keeps the developer experience and managed service quality competitive.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Decision guide: when AWS is the right AI platform (and when it isn’t)
&lt;/h2&gt;

&lt;p&gt;I think about this as &lt;strong&gt;build vs buy vs hybrid&lt;/strong&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS is the right choice when…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You’re in a regulated environment and need &lt;strong&gt;IAM, audit logs, encryption, residency controls&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Your data is already in AWS and moving it out would be slow/expensive&lt;/li&gt;
&lt;li&gt;You want &lt;strong&gt;hybrid flexibility&lt;/strong&gt;: mix Bedrock (buy) with SageMaker (build)&lt;/li&gt;
&lt;li&gt;You have platform engineers who can operate the surrounding stack (networking, security, observability)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AWS is &lt;em&gt;not&lt;/em&gt; the right choice when…
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You need the absolute bleeding edge model capability &lt;em&gt;this quarter&lt;/em&gt; and don’t want to wait for AWS integrations&lt;/li&gt;
&lt;li&gt;Your team is small and you can’t afford the &lt;strong&gt;operational overhead&lt;/strong&gt; (OpenSearch clusters, multi-service pipelines, cost modeling)&lt;/li&gt;
&lt;li&gt;You’re mostly non-AWS and would be fighting the ecosystem instead of benefiting from it&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick selection checklist
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;What’s your acceptable error rate (WER, hallucination rate, extraction accuracy)?&lt;/li&gt;
&lt;li&gt;What’s your cost target per 1K requests / per hour of audio / per document?&lt;/li&gt;
&lt;li&gt;Do you need private networking and audit trails, or is this a public SaaS feature?&lt;/li&gt;
&lt;li&gt;What’s your exit plan if pricing or quality disappoints?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS wins when you value control and integration. It loses when you value speed and simplicity above all else.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Closing: a pragmatic way to evaluate AWS AI in 30 days
&lt;/h2&gt;

&lt;p&gt;If you’re evaluating AWS for AI, don’t start with architecture diagrams. Start with a 30-day pilot that forces reality to show up.&lt;/p&gt;

&lt;p&gt;Pick one real workflow (transcription, doc extraction, RAG over internal docs) and measure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quality&lt;/strong&gt;: WER / extraction accuracy / human-rated usefulness&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: full system cost, not just API calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency &amp;amp; reliability&lt;/strong&gt;: p95 response times, error rates, retries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational load&lt;/strong&gt;: how many “platform chores” show up weekly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And write down an exit strategy on day one: what you’d swap first (model, vector store, hosting) if AWS isn’t the fit.&lt;/p&gt;

&lt;p&gt;What would your 30-day bake-off reveal about your actual constraints?&lt;/p&gt;

</description>
      <category>aws</category>
      <category>architecture</category>
      <category>engineering</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>My Experience Using the BMAD Framework on a Personal Project (Patience Required)</title>
      <dc:creator>Dan Gurgui</dc:creator>
      <pubDate>Sun, 28 Dec 2025 19:46:08 +0000</pubDate>
      <link>https://dev.to/arch4g/my-experience-using-the-bmad-framework-on-a-personal-project-patience-required-10cd</link>
      <guid>https://dev.to/arch4g/my-experience-using-the-bmad-framework-on-a-personal-project-patience-required-10cd</guid>
      <description>&lt;h2&gt;
  
  
  Getting Started: “I’ll just use BMAD to move faster”
&lt;/h2&gt;

&lt;p&gt;Over the last couple of weeks I’ve been working with the &lt;strong&gt;BMAD framework&lt;/strong&gt; on a personal project, and I wanted to write this up while it’s still fresh.&lt;/p&gt;

&lt;p&gt;Going in, my expectation was pretty simple: I’d plug in my idea, let the workflow guide me, and I’d be writing code quickly, with better direction and fewer dead ends.&lt;/p&gt;

&lt;p&gt;That’s… partially true. But there’s a big caveat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BMAD is not a “start coding in 20 minutes” setup.&lt;/strong&gt; It’s closer to “do the work up front so the coding part stops being the hardest part.”&lt;/p&gt;

&lt;p&gt;And if you’re used to hacking a prototype together first and figuring out the product later, this is going to feel slow. Sometimes painfully slow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Reality Check: it takes a lot of time before you write anything
&lt;/h2&gt;

&lt;p&gt;The first thing you notice with BMAD is that it pushes you into an extensive workflow before you’re allowed to feel productive in the way engineers usually define productivity (shipping code).&lt;/p&gt;

&lt;p&gt;It takes you through a bunch of steps like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Defining the problem&lt;/strong&gt; (and not just “I want to build X”, but “what pain exists and for who?”)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Defining user personas&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Brainstorming approaches&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Researching the space&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clarifying constraints&lt;/strong&gt; (time, money, infra, team, target platform)&lt;/li&gt;
&lt;li&gt;Turning that into &lt;strong&gt;epics&lt;/strong&gt;, &lt;strong&gt;stories&lt;/strong&gt;, and execution plans&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of that is useful. But it’s not free.&lt;/p&gt;

&lt;p&gt;For me, it took roughly &lt;strong&gt;12 to 16 hours before the first line of code was written&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That number sounds ridiculous if you’re thinking in “weekend project” mode. But the more I sat with it, the more it made sense: BMAD forces you to do the thinking you usually avoid until the project is already messy.&lt;/p&gt;

&lt;p&gt;And to be fair, I’ve done the opposite too many times:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build something fast&lt;/li&gt;
&lt;li&gt;Realize I built the wrong thing&lt;/li&gt;
&lt;li&gt;Rewrite it&lt;/li&gt;
&lt;li&gt;Lose motivation&lt;/li&gt;
&lt;li&gt;Abandon it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So yes, this up-front investment is real. It’s also kind of the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Frameworks Are Actually Good (especially for business thinking)
&lt;/h2&gt;

&lt;p&gt;One of the things I genuinely liked is that the frameworks presented in BMAD give you a different perspective, especially around the &lt;strong&gt;business side&lt;/strong&gt; of what you’re building.&lt;/p&gt;

&lt;p&gt;If you’re an engineer building a personal project, you usually start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“What stack do I want to use?”&lt;/li&gt;
&lt;li&gt;“What architecture seems clean?”&lt;/li&gt;
&lt;li&gt;“What cloud services are cheapest?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;BMAD drags you back to questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who is this for, specifically?&lt;/li&gt;
&lt;li&gt;What are they trying to accomplish?&lt;/li&gt;
&lt;li&gt;What do they do today instead?&lt;/li&gt;
&lt;li&gt;Why would they switch?&lt;/li&gt;
&lt;li&gt;What’s the smallest thing that proves value?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even if you think you already know those answers, writing them down forces clarity.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The value here isn’t that it tells you something magical. The value is that it makes you commit to decisions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But again, you pay for that clarity with time. You’re not coding, you’re thinking and documenting.&lt;/p&gt;

&lt;h2&gt;
  
  
  “Party Mode” and how I burned through context and credits
&lt;/h2&gt;

&lt;p&gt;Then I hit the fun (and painful) part: &lt;strong&gt;party mode&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you haven’t used it, party mode is basically the “get multiple perspectives and generate a lot of material quickly” mode. It can be super useful when you want breadth:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;different solutions&lt;/li&gt;
&lt;li&gt;different tradeoffs&lt;/li&gt;
&lt;li&gt;different product angles&lt;/li&gt;
&lt;li&gt;risk lists&lt;/li&gt;
&lt;li&gt;architecture options&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I made the mistake of telling it to run party mode with &lt;strong&gt;LangSearch&lt;/strong&gt; and also run party mode with &lt;strong&gt;Gemini&lt;/strong&gt;, and that combo absolutely &lt;strong&gt;exhausted my context window and usage credits&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;What happened was predictable in hindsight: party mode wants to read, pull in sources, synthesize, then generate. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lots of tokens in&lt;/li&gt;
&lt;li&gt;lots of tokens out&lt;/li&gt;
&lt;li&gt;and depending on the tools, lots of paid calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I tried to be clever and tell it something like: “don’t read everything, just put stuff into files and summarize.”&lt;/p&gt;

&lt;p&gt;In practice, that didn’t really work the way I expected. Once you’ve instructed the workflow to do deep research, it tends to follow through. It wants to gather the material so it can justify conclusions. That’s good for quality, but bad for cost control if you’re not careful.&lt;/p&gt;

&lt;p&gt;Still, I’ll say this: &lt;strong&gt;it was very useful&lt;/strong&gt;. The output was genuinely better when it had multiple angles to compare. It just came at a price.&lt;/p&gt;

&lt;p&gt;If you’re going to use party mode, my advice is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;use it intentionally&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;set boundaries (scope, sources, max depth)&lt;/li&gt;
&lt;li&gt;and assume it will be expensive if you let it run wild&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  12–16 hours later: the first line of code… and then I hit an architecture wall
&lt;/h2&gt;

&lt;p&gt;After all the setup and the workflow, I finally got to the point where code started getting written.&lt;/p&gt;

&lt;p&gt;And almost immediately I realized I had made an architecture mistake.&lt;/p&gt;

&lt;p&gt;This part is important because it’s the kind of mistake that’s easy to make when you’re letting an assistant drive, and you’re “supervising” instead of actively building.&lt;/p&gt;

&lt;p&gt;I had told the architect to focus on &lt;strong&gt;low cost&lt;/strong&gt;, so it leaned into a serverless setup, specifically AWS Lambda-style compute. Then I told it to use &lt;strong&gt;NestJS&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;On paper, that sounds fine. In reality, it’s tricky.&lt;/p&gt;

&lt;p&gt;NestJS can run in a serverless environment, but it’s not “drop in NestJS and deploy to Lambda” unless you set it up correctly. You typically need an adapter layer (for example, using &lt;code&gt;@vendia/serverless-express&lt;/code&gt; or similar patterns) or you use a framework that’s more directly aligned with serverless request handling.&lt;/p&gt;

&lt;p&gt;Without that, you get a mess of mismatched assumptions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long-lived server patterns vs cold starts&lt;/li&gt;
&lt;li&gt;framework bootstrapping time vs latency expectations&lt;/li&gt;
&lt;li&gt;request lifecycle differences&lt;/li&gt;
&lt;li&gt;deployment packaging and handler wiring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what happened next is exactly what you’d expect: &lt;strong&gt;errors all over the place&lt;/strong&gt;, and a system that kept trying to fix itself in a loop, without making real progress.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 6-hour debugging spiral (and why it was so confusing)
&lt;/h2&gt;

&lt;p&gt;I spent a huge amount of time trying to fix it, around &lt;strong&gt;six hours&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The frustrating part was that in the moment, I didn’t immediately know what was wrong. It wasn’t one clean error like “you used the wrong import.”&lt;/p&gt;

&lt;p&gt;It was more like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;something fails&lt;/li&gt;
&lt;li&gt;you fix the symptom&lt;/li&gt;
&lt;li&gt;something else fails&lt;/li&gt;
&lt;li&gt;the fix introduces another issue&lt;/li&gt;
&lt;li&gt;you end up in a loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’ve ever dealt with a misaligned architecture decision early in a project, you know the feeling. The code is “correct” in isolation, but the environment and assumptions are wrong.&lt;/p&gt;

&lt;p&gt;This is also where AI-assisted workflows can get weird. If the system is trying to be helpful, it can keep proposing changes that look plausible locally, but don’t address the root mismatch. You can burn a lot of time approving “reasonable” edits that never converge.&lt;/p&gt;

&lt;p&gt;And that’s exactly what happened. It kept spinning, and I kept thinking, “why is this stuck?”&lt;/p&gt;

&lt;h2&gt;
  
  
  The turning point: I didn’t figure it out, the retrospective did
&lt;/h2&gt;

&lt;p&gt;Here’s the interesting part: it wasn’t me that realized the core issue first.&lt;/p&gt;

&lt;p&gt;What happened is I noticed it was spending too much time and not converging, and I decided to initiate the BMAD workflow for running a &lt;strong&gt;retrospective&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That retrospective step ended up being the breakthrough.&lt;/p&gt;

&lt;p&gt;Because instead of continuing forward motion (which was fake progress), it forced a pause and asked:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what are we trying to do?&lt;/li&gt;
&lt;li&gt;what’s blocking us?&lt;/li&gt;
&lt;li&gt;what assumptions did we make?&lt;/li&gt;
&lt;li&gt;what changed?&lt;/li&gt;
&lt;li&gt;what decision is causing repeated failure?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s when it became clear that the setup was not right. The architecture needed adjustment to match the runtime model.&lt;/p&gt;

&lt;p&gt;Once that was identified, the next steps were obvious:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;either adjust the NestJS setup to run properly in a serverless handler model&lt;/li&gt;
&lt;li&gt;or change the compute model (for example, containerized service on something like ECS/Fargate, or a simple VM), depending on goals&lt;/li&gt;
&lt;li&gt;or pick a framework more naturally aligned with serverless&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The main point is that &lt;strong&gt;the retrospective forced the system to stop patching and start diagnosing&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And honestly, this is one of the strongest arguments for structured workflows like BMAD. Most engineers don’t run retrospectives on a personal project when things go wrong. We just grind harder.&lt;/p&gt;

&lt;p&gt;I’ve done that grind plenty of times. It rarely helps.&lt;/p&gt;

&lt;h2&gt;
  
  
  After the fix: everything went smoothly (and the “stories” became the superpower)
&lt;/h2&gt;

&lt;p&gt;Once everything was set up correctly, the experience changed completely.&lt;/p&gt;

&lt;p&gt;The biggest win for me was the fact that I had &lt;strong&gt;stories&lt;/strong&gt;. Real stories. Not vague tasks like “build backend.”&lt;/p&gt;

&lt;p&gt;With stories, I could tell it exactly what to implement, in a way that was scoped and testable. That meant I wasn’t doing a bunch of extra work translating ideas into engineering tasks. The translation was already done.&lt;/p&gt;

&lt;p&gt;At that point my role became:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;supervise&lt;/li&gt;
&lt;li&gt;review decisions&lt;/li&gt;
&lt;li&gt;sanity check the code&lt;/li&gt;
&lt;li&gt;occasionally click yes/no for requests and changes&lt;/li&gt;
&lt;li&gt;keep it aligned with the goal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a very different feeling than “I’m the one doing everything.”&lt;/p&gt;

&lt;p&gt;And it’s genuinely cool when it works because it shifts the bottleneck. Instead of “how fast can I type,” it becomes “how well can I review and steer.”&lt;/p&gt;

&lt;p&gt;If you’ve ever led a team, you’ll recognize that mode. You’re not writing every line. You’re making sure the work being done is the right work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What BMAD gets right: patience in exchange for momentum
&lt;/h2&gt;

&lt;p&gt;Overall, I think BMAD is really cool.&lt;/p&gt;

&lt;p&gt;But I don’t want to oversell it. The trade is clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;You need patience to set it up&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;You need to give good answers&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;You need to review everything&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;and you need to accept that the early phase feels slow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you treat it like a magic code generator, you’re going to be annoyed.&lt;/p&gt;

&lt;p&gt;If you treat it like a process that front-loads thinking, documentation, and execution structure, it starts to make sense.&lt;/p&gt;

&lt;p&gt;And once you’re past that initial slope, it becomes pretty straightforward.&lt;/p&gt;

&lt;h2&gt;
  
  
  The underrated feature: you can resume anytime because everything is in documents
&lt;/h2&gt;

&lt;p&gt;Another thing I didn’t appreciate until I was in it is how nice it is that you can &lt;strong&gt;resume at any time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Because everything is written down, you’re not relying on your memory or on some fragile chat context. You have artifacts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;personas&lt;/li&gt;
&lt;li&gt;problem statements&lt;/li&gt;
&lt;li&gt;architecture notes&lt;/li&gt;
&lt;li&gt;epics&lt;/li&gt;
&lt;li&gt;stories&lt;/li&gt;
&lt;li&gt;decisions and tradeoffs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So you can come back after a day or a week and say:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“execute this epic”&lt;/li&gt;
&lt;li&gt;“continue this story”&lt;/li&gt;
&lt;li&gt;“implement the next task”&lt;/li&gt;
&lt;li&gt;“run a retrospective on the last change”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it doesn’t feel like starting over.&lt;/p&gt;

&lt;p&gt;For personal projects, that’s huge. Most of us lose momentum not because we can’t code, but because we return after a break and spend an hour reconstructing context.&lt;/p&gt;

&lt;p&gt;BMAD reduces that tax.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’d tell another engineer before they try it
&lt;/h2&gt;

&lt;p&gt;I’m not going to pretend this is the answer for every project. If you’re hacking a quick script or testing an API idea, BMAD is probably too heavy.&lt;/p&gt;

&lt;p&gt;But if you’re building something that you actually want to ship, even as a solo developer, it’s worth considering.&lt;/p&gt;

&lt;p&gt;A few practical lessons from my run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Budget time for setup.&lt;/strong&gt; If you expect to write code in the first hour, you’ll fight the workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Be careful with party mode.&lt;/strong&gt; It’s useful, but it can burn context and credits fast.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don’t treat architecture prompts casually.&lt;/strong&gt; “Low cost” pushes you toward serverless patterns, which can be great, but it constrains framework choices and deployment shape.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use the retrospective when you’re stuck.&lt;/strong&gt; The instinct is to push forward. The smarter move is to stop and diagnose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stories are where the payoff happens.&lt;/strong&gt; Once you have good stories, execution becomes much more mechanical.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;BMAD ended up being one of those experiences where the first phase feels like friction, and then later you realize the friction was the whole point.&lt;/p&gt;

&lt;p&gt;It forced me to slow down, define what I was doing, and make decisions explicit. I burned time (and credits) in a couple places, especially with party mode. I also lost six hours to an architecture mismatch that I should have caught earlier.&lt;/p&gt;

&lt;p&gt;But once the workflow and docs were in place, it got surprisingly smooth. Being able to resume from epics and stories, and to steer implementation without constantly rewriting requirements, is a real productivity shift.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you try BMAD, bring patience. Bring discipline. And assume you’ll spend more time thinking before you spend time coding.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If this resonates, what’s your experience been with structured AI-assisted workflows? I’m curious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Word count:&lt;/strong&gt; ~1,890&lt;/p&gt;

</description>
      <category>aws</category>
      <category>architecture</category>
      <category>engineering</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
