<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Stuart Palmer</title>
    <description>The latest articles on DEV Community by Stuart Palmer (@stuartp).</description>
    <link>https://dev.to/stuartp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F435816%2Feebaafd2-23d2-4977-8e22-933ec06f2b57.jpeg</url>
      <title>DEV Community: Stuart Palmer</title>
      <link>https://dev.to/stuartp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/stuartp"/>
    <language>en</language>
    <item>
      <title>Testing LLM Prompts in Production Pipelines: A Practical Approach</title>
      <dc:creator>Stuart Palmer</dc:creator>
      <pubDate>Wed, 19 Nov 2025 22:04:42 +0000</pubDate>
      <link>https://dev.to/stuartp/testing-llm-prompts-in-production-pipelines-a-practical-approach-349b</link>
      <guid>https://dev.to/stuartp/testing-llm-prompts-in-production-pipelines-a-practical-approach-349b</guid>
      <description>&lt;p&gt;Over the past few months, I’ve been working on integrating a number of LLM-based features throughout our product; things like content generation, intelligent recommendations, and agentic task flows. As these features matured, one question kept coming up: how do we actually test this?&lt;/p&gt;

&lt;p&gt;Traditional unit testing works beautifully for deterministic code. You write a test, assert an expected output, and move on. But LLMs don’t behave that way; run the same prompt twice and you'll get two different responses. So how do you test something that never gives you the same answer?&lt;/p&gt;

&lt;h2&gt;
  
  
  Where We Started: Mocking Our Way Forward
&lt;/h2&gt;

&lt;p&gt;Our in-house LLM service already had strong test coverage, so at the application layer we treated it like any other dependency: we mocked it. Our unit tests would stub out the LLM responses and focus on testing the code around them: error handling, data transformations, business logic.&lt;/p&gt;

&lt;p&gt;This approach &lt;em&gt;did&lt;/em&gt; have value. It gave us confidence that our integration code was solid, let us test edge cases without eating model compute time, and kept our test suite fast. For early development, it was exactly what we needed.&lt;/p&gt;

&lt;p&gt;But there was an elephant in the room; we weren't testing the prompts themselves. We had no way to catch when a prompt change degraded output quality, shifted tone, or broke RAG accuracy. We were testing everything except the thing that mattered the most.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Mocks Can't Catch
&lt;/h2&gt;

&lt;p&gt;As our LLM features moved into production, the gaps became obvious. We had no way to catch things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt regressions&lt;/strong&gt; (changes that subtly degraded output quality)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Behavioural requirements&lt;/strong&gt; (whether outputs met specific criteria like tone, structure, or completeness)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG faithfulness&lt;/strong&gt; (whether our Retrieval-Augmented Generation features accurately represented the knowledge they retrieved, or were hallucinating despite having correct context)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These weren’t just theoretical risk. In one case, I updated a function that generates performance recommendations for technical specialists. I added some extra metrics; a straightforward improvement, or so I thought. The feature still ran fine, but the tone had shifted. What was previously clear, actionable technical advice became overly dramatic and superficial. &lt;/p&gt;

&lt;p&gt;Our mocked tests passed with no issues, but the actual output had regressed in a way that would frustrate users.&lt;/p&gt;

&lt;p&gt;We needed a way to test the prompts themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Prompt Integration Testing
&lt;/h2&gt;

&lt;p&gt;To fill the gap, we added a new pipeline stage specifically for testing LLM prompts, focusing on integration testing that evaluates &lt;em&gt;real model outputs&lt;/em&gt; against defined requirements. We use &lt;a href="https://www.promptfoo.dev/" rel="noopener noreferrer"&gt;PromptFoo&lt;/a&gt;, a framework for LLM testing, and integrated it with our existing Jest test suite.&lt;/p&gt;

&lt;p&gt;Our tests fall into two main categories:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Quality/Subjective
&lt;/h3&gt;

&lt;p&gt;These test behavioural characteristics like tone, structure, completeness and safety. We’re not looking for exact textural matches, instead we define expectations such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Response is professional and contains no profanity"&lt;/li&gt;
&lt;li&gt;"Answer is structured with headings and bullet points"&lt;/li&gt;
&lt;li&gt;"Response does not include personal opinions or speculation"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tests catch regressions where functionality still works but quality silently drops.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Faithfulness/Objective
&lt;/h3&gt;

&lt;p&gt;For our RAG-based features, we need to verify that outputs faithfully reflect the retrieved context. These tests check whether the model is grounding its output correctly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"If the knowledge base lists opening hours as 11am-3pm for the holiday, does the response reflect that?"&lt;/li&gt;
&lt;li&gt;"When the documentation specifies version 2.4.1, does the answer cite the correct version?"&lt;/li&gt;
&lt;li&gt;"Does the summary include all critical warnings from the source material?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This catches hallucinations early and ensures users aren’t seeing confidently incorrect information.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation: Making It Feel Like Normal Testing
&lt;/h2&gt;

&lt;p&gt;The key to adoption was making prompt tests feel like writing any other test. We built a Jest wrapper around PromptFoo that lets developers write LLM tests using familiar patterns.&lt;/p&gt;

&lt;p&gt;First we create a custom Jest Matcher:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// addPromptMatchers.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;assertions&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;promptFoo&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;matchesLlmRubric&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;assertions&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;addPromptMatchers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;toSatisfyLlmRule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="na"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
      &lt;span class="na"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;passRateThreshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.95&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nx"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;llmOutput&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;matchesLlmRubric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;llmOutput&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;passCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;pass&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;pass&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;passRate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;passCount&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;passRate&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;passRateThreshold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;passed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;pass&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
            &lt;span class="s2"&gt;`Prompt Eval Failed: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;passRate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;% pass rate (needed &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;
              &lt;span class="nx"&gt;passRateThreshold&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;% to pass test)`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;pass&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, we add this to the Jest startup config, and create a separate Jest config for prompt-specific test runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;//jest.setupAfter.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;addPromptMatchers&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./addPromptMatchers.ts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="nf"&gt;addPromptMatchers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;//jest.promptEval.config.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;setupFilesAfterEnv&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./jest.setupAfter.ts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="c1"&gt;// notice the glob pattern matches files with the suffix .test.prompt.ts&lt;/span&gt;
  &lt;span class="na"&gt;testMatch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;**/?(*.)+(spec|test).?(prompt).?([mc])[jt]s?(x)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; 
  &lt;span class="c1"&gt;// Other Jest config&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then when you plumb it into your pipeline, remember to invoke an additional Jest stage with the additional config file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// package.json&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;scripts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tests:prompt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;jest --config=jest.promptEval.config.ts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To generate tests which utilise this and run within the dedicated pipeline, we use a modified file naming convention (&lt;code&gt;.test.prompt.ts&lt;/code&gt; vs &lt;code&gt;.test.ts&lt;/code&gt;) to make the distinction clear. Developers who know how to write unit tests can write prompt tests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// LLM prompt test file&lt;/span&gt;
&lt;span class="c1"&gt;// myFeature.test.prompt.ts&lt;/span&gt;

&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;recommendation prompt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;should provide professional, actionable advice&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;promptResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;promptUnderTest&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;promptResults&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toSatisfyLlmRule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;response is professional and contains clear, actionable recommendations&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Handling Non-Determinism
&lt;/h2&gt;

&lt;p&gt;The biggest challenge with LLM testing is non-determinism. Run the same prompt twice and you'll get different outputs. Our approach: embrace it.&lt;/p&gt;

&lt;p&gt;Instead of expecting identical results, we run each test multiple times and measure the success rate. By default, we require a 95% success rate; if a test passes 19 out of 20 runs, it passes overall. This threshold is configurable per test, which has sparked interesting code-review discussions about what ‘consistent enough’ means for different features.&lt;/p&gt;

&lt;p&gt;This is actually a question I'm still thinking about: what success rate thresholds make sense for LLM testing? Is 95% too strict? Too lenient? Does it depend on the use case? (I suspect it does, but I'm curious what others have landed on.)&lt;/p&gt;

&lt;h2&gt;
  
  
  CI/CD Integration
&lt;/h2&gt;

&lt;p&gt;We run these tests in a dedicated pipeline stage, separate from our regular unit tests. There’s a few reasons for that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;They're slow&lt;/strong&gt; (Multiple runs of real LLM calls take time)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;They're expensive&lt;/strong&gt; (Each test execution hits the LLM multiple times)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;They need a different rollout approach&lt;/strong&gt; (This is new territory for most of the team)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point is key. Currently, the stage is non-blocking; failures don’t prevent deployment. Instead, we report results to a Slack channel where the team can monitor performance and trends. This gives us space to build confidence in the solution gradually, expand test coverage, refine our success criteria, and get comfortable with non-deterministic testing before making it a hard gate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs and Learnings
&lt;/h2&gt;

&lt;p&gt;There are definite costs and consequences to this approach:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time&lt;/strong&gt;: This pipeline stage is slower than our standard unit tests. Where regular tests complete in seconds, these can take minutes depending on the number of prompts and configured run counts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: Running multiple iterations of real LLM calls adds up - token usage isn’t free. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wider Scope&lt;/strong&gt;: These tests blur the line between technical and product decisions. Unlike traditional tests that verify logic, prompt tests encode expectations about quality, tone, and behaviour. This means their criteria probably shouldn't be defined by engineering alone - Product owners have a role to play. It reminds me of behaviour-driven development (BDD), where acceptance criteria come from the business rather than the code. "The response should be professional and helpful" is less a technical specification and more a product requirement that happens to be testable with LLMs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flakiness&lt;/strong&gt;: Even with high success rate thresholds, occasional failures still happen. We're still developing our understanding of when a failure indicates a real problem versus normal variance.&lt;/p&gt;

&lt;p&gt;Despite these trade-offs, the benefits are clear. We've caught issues that would have shipped to production, and developers have more confidence making prompt changes knowing there's a safety net.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways for Others
&lt;/h2&gt;

&lt;p&gt;If you're considering adding prompt testing, here's what we’ve learned so far:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start small&lt;/strong&gt;: Pick one or two critical prompts to test first. Don't try to achieve 100% coverage immediately. Focus on prompts where failures would have the highest impact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Involve Product Owners&lt;/strong&gt;: Prompt test criteria are product decisions as much as technical ones. Bring Product Owners into conversations about how we should be validating the prompt output from the user perspective. They'll help you define rubrics that reflect real user needs rather than engineering assumptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it easy to write tests&lt;/strong&gt;: Whatever your implementation, minimise friction for developers. If writing a prompt test is significantly harder than writing a unit test, adoption will suffer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accept imperfection&lt;/strong&gt;: Probabilistic success rates feel odd at first, but they’re a more honest model of how LLMs behave.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Balance cost versus confidence&lt;/strong&gt;: You don't need to test every prompt on every commit. Be strategic about when tests run and how many iterations you perform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;LLMs are now core to many B2B products, but traditional testing doesn’t fit their probabilistic nature. Mocking still has its place for testing integration code, but it can't tell you if your prompts truly deliver the intended outcome.&lt;/p&gt;

&lt;p&gt;Prompt integration testing isn't perfect - it's slower, more expensive, and fuzzier than regular testing. But it fills a critical gap. It can catch regressions before they reach production, give developers confidence to refactor prompts, and forces us to articulate what "good" actually means for our AI features.&lt;/p&gt;

&lt;p&gt;However you’re using AI, one principle holds true: if the product’s value depends on LLM outputs, you need to test those outputs directly.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>testing</category>
    </item>
    <item>
      <title>Closures &amp; Callstacks: Building a Game to Learn JavaScript Closures</title>
      <dc:creator>Stuart Palmer</dc:creator>
      <pubDate>Sat, 15 Nov 2025 15:11:43 +0000</pubDate>
      <link>https://dev.to/stuartp/closures-callstacks-building-a-game-to-learn-javascript-closures-1jb2</link>
      <guid>https://dev.to/stuartp/closures-callstacks-building-a-game-to-learn-javascript-closures-1jb2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A practical exercise in learning closures by building a tiny idle game - no frameworks, just vanilla JavaScript.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Early in my development journey, I struggled with JavaScript closures. The concept felt abstract and slippery - I could read the definitions, but they didn't quite &lt;em&gt;click&lt;/em&gt;. So I did what I often do when learning something new: I built a small project that forced me to use them extensively.&lt;/p&gt;

&lt;p&gt;The result was &lt;strong&gt;Closures &amp;amp; Callstacks&lt;/strong&gt;, a simple browser-based idle game where a party of adventurers battles a dragon. Built with nothing but vanilla HTML, CSS, and JavaScript - no frameworks, no libraries - it served its purpose: by structuring the entire application around factory functions and closures, I finally internalised how they work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Game
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnebv4vag237uidbea324.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnebv4vag237uidbea324.jpg" alt="gameplay screenshot" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The premise is straightforward: you generate a party of three adventurers (fighters, wizards, and clerics, each with their own ASCII art representation), then watch them battle an ancient dragon in turn-based combat. Characters attack enemies or heal allies based on simple AI, with health bars updating in real-time and a combat log narrating the action. Victory or defeat depends on whether your party can whittle down the dragon's health before it stomps and firebreathes your adventurers into oblivion.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Pattern: Factory Functions
&lt;/h2&gt;

&lt;p&gt;The game's architecture revolves around factory functions - functions that return objects with methods. These methods "close over" private variables, creating encapsulated state without needing classes or the &lt;code&gt;new&lt;/code&gt; keyword.&lt;/p&gt;

&lt;p&gt;Here's a simplified version of the health system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;healthFunctions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;maxHealth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;currentHealth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;isKO&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;setMaxHealth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;maxHP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;maxHealth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;maxHP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;currentHealth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;maxHealth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;isKO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;maxHealth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;getCurrentHealth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;currentHealth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;takeDamage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;damage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;currentHealth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;currentHealth&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="nx"&gt;damage&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;isKO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;currentHealth&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;currentHealth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;healDamage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;heal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;isKO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;isKO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;currentHealth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;currentHealth&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;heal&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;maxHealth&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;currentHealth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;isKO&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;isKO&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The variables &lt;code&gt;maxHealth&lt;/code&gt;, &lt;code&gt;currentHealth&lt;/code&gt;, and &lt;code&gt;isKO&lt;/code&gt; are truly private. There's no way to access them directly from outside the function - you can only interact with them through the returned methods. Each method maintains a reference to these variables through closure, even after &lt;code&gt;healthFunctions()&lt;/code&gt; has finished executing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Characters with Closures
&lt;/h2&gt;

&lt;p&gt;The player factory follows the same pattern but at a larger scale:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;playerFunctions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;_playerClass&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;_allies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;_enemies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;_profBonus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;playerAttack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;playerBuffs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;playerHealth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;healthFunctions&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;init&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;playerClass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;allies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;enemies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;_allies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allies&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;_enemies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;enemies&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;_playerClass&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;playerClass&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;playerHealth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setMaxHealth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;calculateHP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;playerClass&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
      &lt;span class="nx"&gt;playerAttack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;attackFunctions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_profBonus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;_playerClass&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;playerBuffs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buffFunctions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_profBonus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;_playerClass&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="kd"&gt;get&lt;/span&gt; &lt;span class="nf"&gt;playerClass&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;_playerClass&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="kd"&gt;get&lt;/span&gt; &lt;span class="nf"&gt;health&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;playerHealth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;takeTurn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;health&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isKO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="c1"&gt;// ... game logic for taking actions&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each character created by &lt;code&gt;playerFunctions()&lt;/code&gt; maintains its own private state. The &lt;code&gt;playerHealth&lt;/code&gt; variable holds a reference to a health system (itself created by a factory function), and all the character's methods can access it through closure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Building this game made closures tangible for me. Instead of being an abstract concept, they became a practical tool for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Privacy&lt;/strong&gt;: No risk of external code accidentally modifying internal state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encapsulation&lt;/strong&gt;: Each character or system manages its own data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composition&lt;/strong&gt;: Factory functions can call other factory functions (like &lt;code&gt;playerHealth = healthFunctions()&lt;/code&gt;) to build complex behaviours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Is this the most efficient way to structure a game? Probably not. The code has plenty of rough edges I'd refactor now (I’ve chosen to leave the code as-is for posterity!). But as a learning exercise, it was invaluable. By forcing myself to use closures everywhere - for health tracking, attack systems, buff management, and game state - I developed an intuitive understanding that stuck.&lt;/p&gt;

&lt;p&gt;Sometimes the best way to learn a concept is to build something with it, even if that something is a simple dragon-fighting game with ASCII art characters.&lt;/p&gt;

&lt;p&gt;🎮 &lt;a href="https://stuart-p.github.io/closures-and-callstacks/" rel="noopener noreferrer"&gt;Play the game&lt;/a&gt; | 💾 &lt;a href="https://github.com/stuart-p/closures-and-callstacks" rel="noopener noreferrer"&gt;View source on GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
