<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Henning Reckey</title>
    <description>The latest articles on DEV Community by Henning Reckey (@jupyterps).</description>
    <link>https://dev.to/jupyterps</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3815000%2F21ab5943-50cf-4244-bad9-e46fbd36e9dc.png</url>
      <title>DEV Community: Henning Reckey</title>
      <link>https://dev.to/jupyterps</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jupyterps"/>
    <language>en</language>
    <item>
      <title>"VBAF Learning Trail -- From Zero to AI Developer in PowerShell 5.1"</title>
      <dc:creator>Henning Reckey</dc:creator>
      <pubDate>Wed, 24 Jun 2026 19:33:14 +0000</pubDate>
      <link>https://dev.to/jupyterps/vbaf-learning-trail-from-zero-to-ai-developer-in-powershell-51-4673</link>
      <guid>https://dev.to/jupyterps/vbaf-learning-trail-from-zero-to-ai-developer-in-powershell-51-4673</guid>
      <description>&lt;h1&gt;
  
  
  VBAF -- Getting Started
&lt;/h1&gt;

&lt;h2&gt;
  
  
  A Guided Trail from Zero to AI Developer
&lt;/h2&gt;

&lt;p&gt;Welcome. You are about to learn how artificial intelligence actually works --&lt;br&gt;
not by reading about it, but by running it, watching it, and breaking it.&lt;/p&gt;

&lt;p&gt;VBAF implements neural networks, reinforcement learning and multi-agent&lt;br&gt;
systems from scratch in PowerShell 5.1. Every algorithm is readable.&lt;br&gt;
Every concept is explained in the code comments.&lt;/p&gt;

&lt;p&gt;This guide takes you from installation to building your own AI agent.&lt;br&gt;
Follow the camps in order. Do not skip ahead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time required:&lt;/strong&gt; 2-4 hours for Camps 0-3. Camps 4-5 are open-ended.&lt;/p&gt;


&lt;h1&gt;
  
  
  CAMP 0 -- BASECAMP
&lt;/h1&gt;
&lt;h2&gt;
  
  
  Get VBAF installed and your first output on screen
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Goal: see "VBAF Framework ready!" on your screen
&lt;/h3&gt;


&lt;h2&gt;
  
  
  Step 1 of 5 -- Check your PowerShell version
&lt;/h2&gt;

&lt;p&gt;Open PowerShell (not ISE, not VS Code -- just plain PowerShell for now).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="bp"&gt;$PSVersionTable&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;PSVersion&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What you will see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Major&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;Minor&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;Build&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;Revision&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;-----&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;-----&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;-----&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;--------&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it means:&lt;br&gt;
VBAF requires PowerShell 5.1. This version ships with every modern&lt;br&gt;
Windows PC. If you see 5.1 -- you are ready. If you see 7.x -- switch&lt;br&gt;
to Windows PowerShell (search "Windows PowerShell" in the Start menu).&lt;/p&gt;

&lt;p&gt;If something goes wrong:&lt;br&gt;
On Windows 10 or 11, PowerShell 5.1 is always present.&lt;br&gt;
Search "Windows PowerShell" in the Start menu -- not "PowerShell".&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 2 of 5 -- Install VBAF from PSGallery
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Install-Module&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;VBAF&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Scope&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CurrentUser&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What you will see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Untrusted&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;repository&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;sure&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;want&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;modules&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'PSGallery'&lt;/span&gt;&lt;span class="nf"&gt;?&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Y&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Yes&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;No&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Type Y and press Enter.&lt;/p&gt;

&lt;p&gt;What it means:&lt;br&gt;
PSGallery is the official PowerShell module repository -- the same place&lt;br&gt;
Microsoft publishes its own modules. VBAF is downloaded and installed&lt;br&gt;
in your user profile. Nothing is changed system-wide.&lt;/p&gt;

&lt;p&gt;If something goes wrong:&lt;br&gt;
If you get a proxy or network error, try:&lt;br&gt;
  Install-Module VBAF -Scope CurrentUser -Force&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 3 of 5 -- Navigate to the VBAF folder
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;USERPROFILE&lt;/span&gt;&lt;span class="s2"&gt;\OneDrive\WindowsPowerShell"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What you will see:&lt;br&gt;
Your prompt changes to show the new folder.&lt;/p&gt;

&lt;p&gt;What it means:&lt;br&gt;
VBAF lives in your OneDrive\WindowsPowerShell folder.&lt;br&gt;
All examples and files are here.&lt;/p&gt;

&lt;p&gt;If something goes wrong:&lt;br&gt;
If OneDrive is not set up, try:&lt;br&gt;
  cd "$env:USERPROFILE\Documents\WindowsPowerShell"&lt;br&gt;
Or wherever you cloned the VBAF repository.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 4 of 5 -- Load the VBAF framework
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\VBAF.LoadAll.ps1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What you will see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Loading&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;VBAF&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Framework...&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Phase&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Core&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;neural&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;network...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Phase&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Reinforcement&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;learning...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Phase&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Business&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;multi-agent...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;VBAF&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Framework&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ready&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="n"&gt;LEARNING&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;PATH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;order&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\VBAF.Core.Example-XOR.ps1&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nx"&gt;2.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;\VBAF.RL.Example-CastleLearning.ps1&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it means:&lt;br&gt;
All VBAF classes and functions are now loaded into your session.&lt;br&gt;
NeuralNetwork, QLearningAgent, DQNAgent, PPOAgent, A3CAgent --&lt;br&gt;
all available. You need to run this once per PowerShell session.&lt;/p&gt;

&lt;p&gt;If something goes wrong:&lt;br&gt;
Make sure you are in the right folder (Step 3) before running this.&lt;br&gt;
The dot-space-dot at the start is important: &lt;code&gt;. .\VBAF.LoadAll.ps1&lt;/code&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 5 of 5 -- Confirm everything loaded
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;NeuralNetwork&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(@(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What you will see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Layers       : {Layer, Layer}
LearningRate : 0.1
Architecture : {2, 3, 1}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it means:&lt;br&gt;
You just created a neural network with 2 inputs, 3 hidden neurons&lt;br&gt;
and 1 output. It exists in memory. It has random weights.&lt;br&gt;
It knows nothing yet. That is about to change.&lt;/p&gt;


&lt;h1&gt;
  
  
  CAMP 0 COMPLETE
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;You have:&lt;/strong&gt; PowerShell 5.1, VBAF installed, framework loaded.&lt;br&gt;
&lt;strong&gt;You can:&lt;/strong&gt; create neural networks and RL agents from scratch.&lt;br&gt;
&lt;strong&gt;Next:&lt;/strong&gt; watch one learn.&lt;/p&gt;




&lt;h1&gt;
  
  
  CAMP 1 -- FIRST FIRE
&lt;/h1&gt;
&lt;h2&gt;
  
  
  Watch a neural network learn something for the first time
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Goal: see "SUCCESS! Network learned XOR!"
&lt;/h3&gt;


&lt;h2&gt;
  
  
  Step 6 of 8 -- Understand the problem first
&lt;/h2&gt;

&lt;p&gt;XOR is the simplest problem a single neuron CANNOT solve.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 XOR 0 = 0   (both same   -&amp;gt; 0)
0 XOR 1 = 1   (different   -&amp;gt; 1)
1 XOR 0 = 1   (different   -&amp;gt; 1)
1 XOR 1 = 0   (both same   -&amp;gt; 0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Draw these four points on paper:&lt;br&gt;
  (0,0) -&amp;gt; label 0&lt;br&gt;
  (0,1) -&amp;gt; label 1&lt;br&gt;
  (1,0) -&amp;gt; label 1&lt;br&gt;
  (1,1) -&amp;gt; label 0&lt;/p&gt;

&lt;p&gt;Try to draw ONE straight line that separates the 0-labelled points&lt;br&gt;
from the 1-labelled points. You cannot. That is the problem.&lt;/p&gt;

&lt;p&gt;In 1969, Minsky and Papert proved mathematically that a single neuron&lt;br&gt;
cannot solve XOR. This killed AI research funding for a decade.&lt;/p&gt;

&lt;p&gt;The solution: add a hidden layer. Two lines can separate the points&lt;br&gt;
even when one cannot. This is the Universal Approximation Theorem.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 7 of 8 -- Run the XOR example
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;examples\01-XOR-Network&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\Run-Example-01.ps1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What you will see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Training Neural Network...
Architecture : 2 -&amp;gt; 3 -&amp;gt; 1

Epoch     1 / 5000 (  0.0%) -- Error: 0.269
Epoch   500 / 5000 ( 10.0%) -- Error: 0.268
Epoch  1500 / 5000 ( 30.0%) -- Error: 0.082
Epoch  2000 / 5000 ( 40.0%) -- Error: 0.006
Epoch  5000 / 5000 (100.0%) -- Error: 0.000697

Accuracy   : 100.00%
Correct    : 4 / 4

SUCCESS! Network learned XOR!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it means:&lt;br&gt;
The network started with random weights and knew nothing.&lt;br&gt;
After 5000 passes through 4 training examples, it learned XOR.&lt;br&gt;
Error dropped from 0.269 to 0.0007 -- near perfect.&lt;/p&gt;

&lt;p&gt;Watch the error curve:&lt;br&gt;
  Epoch 1-1000:  barely moves (stuck in a plateau)&lt;br&gt;
  Epoch 1500:    suddenly drops (escaped the plateau)&lt;br&gt;
  Epoch 2000+:   smooth descent to near zero&lt;/p&gt;

&lt;p&gt;This plateau-then-breakthrough is normal. The network was slowly&lt;br&gt;
repositioning its weights until they crossed a threshold.&lt;/p&gt;

&lt;p&gt;If something goes wrong:&lt;br&gt;
If accuracy is below 75% -- run it again. Different random starting&lt;br&gt;
weights sometimes get stuck. This is normal and expected.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 8 of 8 -- Look inside what just happened
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a network and inspect it&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$nn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;NeuralNetwork&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(@(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# See the random starting weights of the first hidden neuron&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Layers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Neurons&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Weights&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Layers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Neurons&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Bias&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What you will see:&lt;br&gt;
Three random numbers between -0.5 and 0.5.&lt;br&gt;
These are the starting weights -- completely random.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Train it&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;@{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Expected&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;@{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Expected&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;@{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Expected&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;@{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Expected&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# See the weights AFTER training&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Layers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Neurons&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Weights&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$nn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Layers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Neurons&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Bias&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it means:&lt;br&gt;
The weights changed. Backpropagation moved them from random values&lt;br&gt;
to values that encode the XOR pattern. This is learning.&lt;/p&gt;


&lt;h2&gt;
  
  
  What to try before moving on
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Run the XOR example 3 times -- does it always converge?&lt;/li&gt;
&lt;li&gt;Change the architecture to [2, 2, 1] -- can 2 hidden neurons solve it?&lt;/li&gt;
&lt;li&gt;Change learning rate to 0.1 -- slower but more stable&lt;/li&gt;
&lt;li&gt;Change epochs to 500 -- does it converge fast enough?&lt;/li&gt;
&lt;li&gt;Open VBAF.Core.AllClasses.ps1 and find the UpdateWeights method.
Read the formula. Match it to what you read in the comments.&lt;/li&gt;
&lt;/ol&gt;


&lt;h1&gt;
  
  
  CAMP 1 COMPLETE
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;You have seen:&lt;/strong&gt; a neural network learn from random weights.&lt;br&gt;
&lt;strong&gt;You understand:&lt;/strong&gt; XOR, hidden layers, backpropagation, error curves.&lt;br&gt;
&lt;strong&gt;Next:&lt;/strong&gt; make an agent learn WITHOUT being told the correct answer.&lt;/p&gt;




&lt;h1&gt;
  
  
  CAMP 2 -- LEARNING TO HUNT
&lt;/h1&gt;
&lt;h2&gt;
  
  
  An agent discovers strategy through trial and error
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Goal: watch reward increase as the agent learns
&lt;/h3&gt;


&lt;h2&gt;
  
  
  Step 9 of 10 -- Understand the difference
&lt;/h2&gt;

&lt;p&gt;In Camp 1, we gave the network the CORRECT ANSWER for every input.&lt;br&gt;
  Input: [0,1] -&amp;gt; Correct answer: 1&lt;br&gt;
This is called SUPERVISED learning.&lt;/p&gt;

&lt;p&gt;In Camp 2, we give the agent a REWARD for good outcomes.&lt;br&gt;
  Action: choose castle type -&amp;gt; Reward: +2 if varied, -1 if repeated&lt;br&gt;
The agent must figure out what "good" means by itself.&lt;br&gt;
This is called REINFORCEMENT learning.&lt;/p&gt;

&lt;p&gt;The difference:&lt;br&gt;
  Supervised: "here is the right answer"&lt;br&gt;
  Reinforcement: "here is how good that was"&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 10 of 10 -- Run the castle learning example
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="nx"&gt;\02-Castle-Learning&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\Run-Example-02.ps1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What you will see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Episode   1 | Reward:  12.45 | Epsilon: 1.000 | Q-Table:   0 entries
Episode  10 | Reward:  15.32 | Epsilon: 0.951 | Q-Table:  14 entries
Episode  50 | Reward:  18.67 | Epsilon: 0.779 | Q-Table:  38 entries
Episode 100 | Reward:  21.43 | Epsilon: 0.607 | Q-Table:  52 entries
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What to watch:&lt;br&gt;
  Epsilon:  starts at 1.0 (100% random), decays toward 0.01&lt;br&gt;
  Q-Table:  grows as the agent visits new states&lt;br&gt;
  Reward:   should trend upward as the agent learns&lt;/p&gt;

&lt;p&gt;What it means:&lt;br&gt;
  Episode 1:   agent picks randomly -- no knowledge&lt;br&gt;
  Episode 10:  agent has seen some states -- Q-table forming&lt;br&gt;
  Episode 100: agent exploiting learned knowledge -- reward rising&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 11 -- Train a DQN on CartPole
&lt;/h2&gt;

&lt;p&gt;Now a neural network approximates Q-values instead of a table.&lt;br&gt;
This works for problems too large for a table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;USERPROFILE&lt;/span&gt;&lt;span class="s2"&gt;\OneDrive\WindowsPowerShell"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\VBAF.LoadAll.ps1&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="nv"&gt;$agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-DQNTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-PrintEvery&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What you will see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ep    10  Reward:    12  Best:    18  e: 0.951  Loss: 0.04521
Ep    20  Reward:    23  Best:    31  e: 0.905  Loss: 0.03891
Ep    50  Reward:    67  Best:    89  e: 0.779  Loss: 0.02341
Ep   100  Reward:   134  Best:   178  e: 0.607  Loss: 0.01123
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What to watch:&lt;br&gt;
  Reward: random agent gets 10-20. Trained agent gets 100-200.&lt;br&gt;
  Epsilon: decays from 1.0 -- less exploration over time.&lt;br&gt;
  Loss: should trend downward -- Q-values becoming accurate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# See the full stats&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;PrintStats&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Get Q-values for a specific state&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$state&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetQValues&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it means:&lt;br&gt;
GetQValues shows what the neural network thinks each action is worth.&lt;br&gt;
Higher value = agent thinks this action leads to better outcomes.&lt;br&gt;
After training, the values should make intuitive sense:&lt;br&gt;
if the pole is tilting right, pushing right should have low value.&lt;/p&gt;


&lt;h2&gt;
  
  
  What to try before moving on
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Run DQN with FastMode and compare speed:&lt;br&gt;
$agent = (Invoke-DQNTraining -Episodes 50 -FastMode)[-1]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run it twice -- does it always converge to the same reward?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Look at the Q-values before and after training:&lt;br&gt;
Create a new agent, get Q-values, train, get Q-values again.&lt;br&gt;
See how they changed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Open VBAF.RL.DQN.ps1 and find the Replay() method.&lt;br&gt;
Read the Bellman equation comment. Trace through one update.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;h1&gt;
  
  
  CAMP 2 COMPLETE
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;You have seen:&lt;/strong&gt; Q-learning and DQN in action.&lt;br&gt;
&lt;strong&gt;You understand:&lt;/strong&gt; rewards, Q-values, epsilon-greedy, experience replay.&lt;br&gt;
&lt;strong&gt;Next:&lt;/strong&gt; compare three different algorithms head to head.&lt;/p&gt;




&lt;h1&gt;
  
  
  CAMP 3 -- THE ARENA
&lt;/h1&gt;
&lt;h2&gt;
  
  
  Three algorithms compete -- you judge the winner
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Goal: benchmark DQN vs PPO vs A3C and explain the difference
&lt;/h3&gt;


&lt;h2&gt;
  
  
  Step 12 of 10 -- Understand the three algorithms
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q-Learning / DQN:&lt;/strong&gt;&lt;br&gt;
  Learns: Q(state, action) = expected future reward&lt;br&gt;
  Decides: take the action with the highest Q-value&lt;br&gt;
  Memory: experience replay buffer (random batches)&lt;br&gt;
  Key innovation: target network for stable training&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PPO (Proximal Policy Optimization):&lt;/strong&gt;&lt;br&gt;
  Learns: pi(action|state) = probability of each action&lt;br&gt;
  Decides: sample from the probability distribution&lt;br&gt;
  Memory: rollout buffer (recent experiences, then discard)&lt;br&gt;
  Key innovation: clipped update -- no catastrophic policy changes&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A3C (Advantage Actor-Critic):&lt;/strong&gt;&lt;br&gt;
  Learns: policy AND value function in ONE shared network&lt;br&gt;
  Decides: sample from policy head&lt;br&gt;
  Memory: n-step rollout per worker (no buffer)&lt;br&gt;
  Key innovation: parallel workers, shared global network&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 13 -- Train all three
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Write-Host&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Training DQN..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ForegroundColor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Cyan&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$dqn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-DQNTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-FastMode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Quiet&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="n"&gt;Write-Host&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Training PPO..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ForegroundColor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Cyan&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$ppo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-PPOTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-FastMode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Quiet&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="n"&gt;Write-Host&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Training A3C..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ForegroundColor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Cyan&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$a3c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-A3CTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-FastMode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Quiet&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="n"&gt;Write-Host&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"All three trained!"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ForegroundColor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Green&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This takes 2-5 minutes. Watch the output -- each algorithm&lt;br&gt;
prints different statistics because they work differently.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 14 -- Benchmark them head to head
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;New-VBAFEnvironment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CartPole"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-MaxSteps&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;200&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFBenchmark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="bp"&gt;$null&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Environment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Label&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Random baseline"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFBenchmark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$dqn&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Environment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Label&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DQN"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFBenchmark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$ppo&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Environment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Label&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PPO"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFBenchmark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$a3c&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Environment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Label&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A3C"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What you will see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Random baseline
  Avg Reward :  14.3
  Max Reward :  28.0

  DQN
  Avg Reward : 143.7
  Max Reward : 200.0

  PPO
  Avg Reward : 167.2
  Max Reward : 200.0

  A3C
  Avg Reward : 112.4
  Max Reward : 200.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it means:&lt;br&gt;
Random agent: 14 reward -- barely balances&lt;br&gt;
Trained agents: 100-200 reward -- learned to balance&lt;/p&gt;

&lt;p&gt;The winner varies each run because of random weight initialisation.&lt;br&gt;
Run the benchmark 3 times. Which algorithm wins most often?&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 15 -- Watch four companies compete
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;examples\03-Market-Simulation&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\Run-Example-03.ps1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What you will see:&lt;br&gt;
Four companies learning business strategy simultaneously.&lt;br&gt;
After 10 simulated years, you see who won and WHY.&lt;/p&gt;

&lt;p&gt;Watch for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tacit collusion: companies avoid price wars without communicating&lt;/li&gt;
&lt;li&gt;Innovation races: R&amp;amp;D emerges as the dominant strategy&lt;/li&gt;
&lt;li&gt;Herfindahl index: measures market concentration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What it means:&lt;br&gt;
Nobody programmed these behaviours.&lt;br&gt;
They emerged from four Q-learning agents optimising their own rewards.&lt;br&gt;
This is multi-agent reinforcement learning in action.&lt;/p&gt;


&lt;h2&gt;
  
  
  What to try before moving on
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Run the benchmark 3 times -- which algorithm wins most often?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Try GridWorld:&lt;br&gt;
$env = New-VBAFEnvironment -Name "GridWorld" -GridSize 5&lt;br&gt;
Invoke-VBAFBenchmark -Agent $dqn -Environment $env -Episodes 20 -Label "DQN on GridWorld"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read the stats from each agent:&lt;br&gt;
$dqn.PrintStats()&lt;br&gt;
$ppo.PrintStats()&lt;br&gt;
$a3c.PrintStats()&lt;br&gt;
What is different? What does Entropy mean for PPO and A3C?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Open VBAF.RL.PPO.ps1 and find the ComputeGAE method.&lt;br&gt;
Read the comment about lambda. What happens when lambda = 1.0?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;h1&gt;
  
  
  CAMP 3 COMPLETE
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;You have seen:&lt;/strong&gt; three RL algorithms, head-to-head benchmarking, multi-agent.&lt;br&gt;
&lt;strong&gt;You understand:&lt;/strong&gt; the difference between value-based, policy gradient, and actor-critic.&lt;br&gt;
&lt;strong&gt;Next:&lt;/strong&gt; build something yourself.&lt;/p&gt;




&lt;h1&gt;
  
  
  CAMP 4 -- YOUR OWN FIRE
&lt;/h1&gt;
&lt;h2&gt;
  
  
  Design your own environment and train your own agent
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Goal: an agent learns something YOU designed
&lt;/h3&gt;


&lt;h2&gt;
  
  
  Step 16 -- Understand what an environment needs
&lt;/h2&gt;

&lt;p&gt;Every VBAF environment needs three methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c"&gt;# start new episode, return initial state&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$action&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c"&gt;# apply action, return @{NextState; Reward; Done}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;GetState&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c"&gt;# return current state as double array&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is all. Any problem that fits this shape can be learned by&lt;br&gt;
any VBAF agent -- DQN, PPO, A3C or Q-learning.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 17 -- Study a simple example
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# RandomWalk is the simplest possible environment&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;New-VBAFEnvironment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"RandomWalk"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;PrintInfo&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Run one episode manually&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$state&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Reset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Write-Host&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Start state: &lt;/span&gt;&lt;span class="nv"&gt;$state&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="kr"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$step&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$step&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-lt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$step&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nv"&gt;$action&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Get-Random&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Minimum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Maximum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;2&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c"&gt;# 0=left, 1=right&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$action&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;Write-Host&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Action: &lt;/span&gt;&lt;span class="nv"&gt;$action&lt;/span&gt;&lt;span class="s2"&gt;  State: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NextState&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;  Reward: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Reward&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;  Done: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Done&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="kr"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kr"&gt;break&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What it means:&lt;br&gt;
You are manually controlling the agent -- choosing random actions.&lt;br&gt;
Watch how the state changes and reward is assigned.&lt;br&gt;
A trained agent would learn to always move toward 0 (center).&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 18 -- Run the custom agent example
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;examples\06-Custom-Agent&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\Run-Example-06.ps1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This example shows how to build your own environment from scratch&lt;br&gt;
and train a DQN agent on it. Read the code carefully.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 19 -- Modify something
&lt;/h2&gt;

&lt;p&gt;Pick ONE thing to change and observe the effect:&lt;/p&gt;

&lt;p&gt;Option A -- Change the reward function in RandomWalk:&lt;br&gt;
  Currently: +10 for reaching center, else -(distance * 0.1)&lt;br&gt;
  Try: +1 for reaching center, else 0 (no distance penalty)&lt;br&gt;
  Question: does the agent still learn? Is it faster or slower?&lt;/p&gt;

&lt;p&gt;Option B -- Change DQN hyperparameters:&lt;br&gt;
  Currently: Gamma=0.95, LearningRate=0.001, BatchSize=32&lt;br&gt;
  Try: Gamma=0.50 (agent ignores future rewards)&lt;br&gt;
  Question: does short-sighted learning work for CartPole?&lt;/p&gt;

&lt;p&gt;Option C -- Change the architecture:&lt;br&gt;
  Currently: [4, 64, 64, 2]&lt;br&gt;
  Try: &lt;a href="https://dev.tomuch%20smaller%20network"&gt;4, 8, 2&lt;/a&gt;&lt;br&gt;
  Question: can a tiny network still solve CartPole?&lt;/p&gt;


&lt;h2&gt;
  
  
  What to try before moving on
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Run your modified version 3 times. Is the result consistent?&lt;/li&gt;
&lt;li&gt;Write down your hypothesis BEFORE running -- then check it.&lt;/li&gt;
&lt;li&gt;Open VBAF.RL.Environment.ps1 and read the GridWorld class.
Could you adapt it for a different grid-based problem?&lt;/li&gt;
&lt;/ol&gt;


&lt;h1&gt;
  
  
  CAMP 4 COMPLETE
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;You have done:&lt;/strong&gt; run manual episodes, modified a reward function, changed hyperparameters.&lt;br&gt;
&lt;strong&gt;You understand:&lt;/strong&gt; the environment interface, reward shaping, hyperparameter sensitivity.&lt;br&gt;
&lt;strong&gt;Next:&lt;/strong&gt; see where all this leads at enterprise scale.&lt;/p&gt;




&lt;h1&gt;
  
  
  CAMP 5 -- THE SUMMIT
&lt;/h1&gt;
&lt;h2&gt;
  
  
  From foundation to enterprise -- trace the learning ladder
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Goal: understand how Phase 1-9 becomes Phase 10-27
&lt;/h3&gt;


&lt;h2&gt;
  
  
  Step 20 -- The learning ladder
&lt;/h2&gt;

&lt;p&gt;VBAF has two layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Foundation (Phases 1-9):&lt;/strong&gt;&lt;br&gt;
Neural networks, Q-learning, DQN, PPO, A3C, multi-agent.&lt;br&gt;
You have learned all of this in Camps 1-4.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise (Phases 10-27):&lt;/strong&gt;&lt;br&gt;
14 production-grade automation agents built on the SAME foundation.&lt;br&gt;
These are not teaching examples -- they solve real IT problems.&lt;/p&gt;

&lt;p&gt;The ladder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1-9:  learn HOW agents learn
Phase 10-27: see WHAT agents can do when they learn well
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 21 -- Run one enterprise agent
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;USERPROFILE&lt;/span&gt;&lt;span class="s2"&gt;\OneDrive\WindowsPowerShell"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\VBAF.LoadAll.ps1&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Self-Healing Infrastructure -- Phase 14&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFSelfHealingTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-SimMode&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What you will see:&lt;br&gt;
A DQN agent learning to detect and fix system problems.&lt;br&gt;
State: CPU load, memory, disk, error rate, response time.&lt;br&gt;
Actions: Observe, Adjust, Restart, Rebuild.&lt;/p&gt;

&lt;p&gt;What it means:&lt;br&gt;
This is the same DQN you trained on CartPole in Camp 2.&lt;br&gt;
Same algorithm. Same Bellman equation. Same experience replay.&lt;br&gt;
Different environment. Different reward function.&lt;br&gt;
The learning mechanism is identical.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 22 -- Trace it back to the foundation
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Open the enterprise file and find where NeuralNetwork is used&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Select-String&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"NeuralNetwork"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".\VBAF.Enterprise.SelfHealing.ps1"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Compare with DQN&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Select-String&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"NeuralNetwork"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".\VBAF.RL.DQN.ps1"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What you will see:&lt;br&gt;
Both files reference NeuralNetwork.&lt;br&gt;
The enterprise agent uses the SAME class you built in Camp 1.&lt;/p&gt;

&lt;p&gt;Trace the chain:&lt;br&gt;
  VBAF.Core.AllClasses.ps1    -- defines NeuralNetwork&lt;br&gt;
  VBAF.RL.DQN.ps1             -- uses NeuralNetwork for Q-learning&lt;br&gt;
  VBAF.Enterprise.SelfHealing.ps1 -- uses DQN for IT automation&lt;/p&gt;

&lt;p&gt;Three files. One chain. Same foundation.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 23 -- Run the AutoPilot (the crown jewel)
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFAutoPilotTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-SimMode&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What you will see:&lt;br&gt;
AutoPilot orchestrates ALL 13 enterprise pillars simultaneously.&lt;br&gt;
It is an agent that coordinates other agents.&lt;br&gt;
Meta-learning -- an agent that decides which agents to activate.&lt;/p&gt;

&lt;p&gt;This is Phase 27 -- the furthest point VBAF reaches.&lt;br&gt;
But it is built entirely from the same concepts you learned in Camp 1.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 24 -- Read one enterprise file properly
&lt;/h2&gt;

&lt;p&gt;Choose any enterprise file that interests you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VBAF.Enterprise.AnomalyDetector.ps1 -- spots unusual patterns&lt;/li&gt;
&lt;li&gt;VBAF.Enterprise.EnergyOptimizer.ps1 -- reduces power consumption&lt;/li&gt;
&lt;li&gt;VBAF.Enterprise.PatchIntelligence.ps1 -- risk-aware patch scheduling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open it and find:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is the STATE? (what does the agent observe?)&lt;/li&gt;
&lt;li&gt;What are the ACTIONS? (what can the agent do?)&lt;/li&gt;
&lt;li&gt;What is the REWARD? (what is it optimising for?)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Answer those three questions for any environment and you understand&lt;br&gt;
what the agent will learn to do.&lt;/p&gt;


&lt;h1&gt;
  
  
  CAMP 5 COMPLETE
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;You have seen:&lt;/strong&gt; the full learning ladder from XOR to enterprise AutoPilot.&lt;br&gt;
&lt;strong&gt;You understand:&lt;/strong&gt; how foundation concepts scale to production systems.&lt;br&gt;
&lt;strong&gt;You can:&lt;/strong&gt; read any VBAF file and understand what the agent is learning.&lt;/p&gt;




&lt;h1&gt;
  
  
  THE VIEW FROM THE TOP
&lt;/h1&gt;

&lt;p&gt;You started at Camp 0 with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Install-Module&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;VBAF&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You are now at Camp 5 with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Neural networks trained from scratch&lt;/li&gt;
&lt;li&gt;Three RL algorithms benchmarked&lt;/li&gt;
&lt;li&gt;Multi-agent competition observed&lt;/li&gt;
&lt;li&gt;Your own hyperparameters tested&lt;/li&gt;
&lt;li&gt;Enterprise agents running&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What you can do now:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Train any algorithm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-DQNTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-PPOTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-A3CTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# On any environment&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;New-VBAFEnvironment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CartPole"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;New-VBAFEnvironment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GridWorld"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-GridSize&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;8&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;New-VBAFEnvironment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"RandomWalk"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Benchmark anything&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFBenchmark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Environment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Label&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"My Agent"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Run any enterprise pillar&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFSelfHealingTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-SimMode&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFAutoPilotTraining&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-SimMode&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Where to go from here:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read the theory: docs/Theory.md&lt;/li&gt;
&lt;li&gt;Build your own environment: examples/06-Custom-Agent/&lt;/li&gt;
&lt;li&gt;Study the papers referenced in each file&lt;/li&gt;
&lt;li&gt;Contribute an example or a new environment&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  QUICK REFERENCE
&lt;/h1&gt;

&lt;h2&gt;
  
  
  The 5 most important commands
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Load everything (run once per session)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\VBAF.LoadAll.ps1&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# 2. Run the learning path in order&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\examples\01-XOR-Network\Run-Example-01.ps1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;\examples\02-Castle-Learning\Run-Example-02.ps1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\examples\03-Market-Simulation\Run-Example-03.ps1&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# 3. Train an agent&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-DQNTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-PrintEvery&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# 4. Benchmark it&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;New-VBAFEnvironment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CartPole"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFBenchmark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Environment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Label&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"My DQN"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# 5. See what it learned&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;PrintStats&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetQValues&lt;/span&gt;&lt;span class="p"&gt;(@(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  If something breaks
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Unable to find type"     -&amp;gt; run . .\VBAF.LoadAll.ps1 first
"Cannot find path"        -&amp;gt; check you are in the right folder
"accuracy below 75%"      -&amp;gt; run XOR again (random init sometimes fails)
"reward not increasing"   -&amp;gt; train for more episodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;em&gt;VBAF -- Visual AI &amp;amp; Reinforcement Learning Framework&lt;/em&gt;&lt;br&gt;
&lt;em&gt;github.com/JupyterPS/VBAF&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"The best way to understand AI is to build it yourself -- line by line."&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>machinelearning</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>**Title:** VBAF v5.0 -- Now a Full Educational AI Framework (DQN, PPO, A3C in PowerShell)</title>
      <dc:creator>Henning Reckey</dc:creator>
      <pubDate>Mon, 22 Jun 2026 17:10:20 +0000</pubDate>
      <link>https://dev.to/jupyterps/title-vbaf-v50-now-a-full-educational-ai-framework-dqn-ppo-a3c-in-powershell-58kd</link>
      <guid>https://dev.to/jupyterps/title-vbaf-v50-now-a-full-educational-ai-framework-dqn-ppo-a3c-in-powershell-58kd</guid>
      <description>&lt;p&gt;&lt;em&gt;Follow-up to: "I Implemented Deep Q-Networks in Pure PowerShell 5.1"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When I wrote the original article, VBAF was primarily a commercial automation engine. A lot has changed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What changed:&lt;/strong&gt;&lt;br&gt;
VBAF has been repositioned as a purely educational framework. The commercial layer is gone. The focus is now on making AI and reinforcement learning concepts accessible to anyone with a Windows PC.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is new in v5.0:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Full educational comments across 16 core files. Not just "what the code does" but WHY -- the Bellman equation with every symbol explained, the PPO clip trick with intuition, GAE lambda tradeoff, n-step returns with bootstrapping.&lt;/p&gt;

&lt;p&gt;Three complete RL algorithms implemented and benchmarkable head-to-head:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$dqn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-DQNTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-FastMode&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$ppo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-PPOTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-FastMode&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$a3c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-A3CTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-FastMode&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;New-VBAFEnvironment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CartPole"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFBenchmark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$dqn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Environment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Label&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DQN"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFBenchmark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$ppo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Environment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Label&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PPO"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFBenchmark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$a3c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Environment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Label&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A3C"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multi-agent market simulation with 4 competing companies. Emergent behaviours -- price wars, tacit collusion, innovation races -- appear without being programmed. The Herfindahl index is computed at the end to measure market concentration.&lt;/p&gt;

&lt;p&gt;Teaching materials in docs/teaching/ -- 4-week course outline, lab exercises, exam questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why PowerShell?&lt;/strong&gt;&lt;br&gt;
Because the code is readable. You can open any .ps1 file and see exactly what the algorithm is doing. No black boxes. No library abstractions. The code IS the textbook.&lt;/p&gt;

&lt;p&gt;Install-Module VBAF | GitHub: &lt;a href="https://github.com/JupyterPS/VBAF" rel="noopener noreferrer"&gt;https://github.com/JupyterPS/VBAF&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"The best way to understand AI is to build it yourself -- line by line."&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>learning</category>
      <category>machinelearning</category>
      <category>showdev</category>
    </item>
    <item>
      <title>UPDATE: VBAF v4.0.0 is complete</title>
      <dc:creator>Henning Reckey</dc:creator>
      <pubDate>Sun, 15 Mar 2026 18:38:44 +0000</pubDate>
      <link>https://dev.to/jupyterps/update-vbaf-v400-is-complete-28l</link>
      <guid>https://dev.to/jupyterps/update-vbaf-v400-is-complete-28l</guid>
      <description>&lt;p&gt;VBAF v4.0.0 — From Phase 9 to Phase 27: &lt;br&gt;
       Building an Autonomous Enterprise AI Engine in PowerShell 5.1&lt;/p&gt;

&lt;p&gt;One sentence: I started with 5 enterprise pillars and &lt;br&gt;
ended up with 14 DQN agents, 31,000 lines of PS 5.1, &lt;br&gt;
and a master AutoPilot agent orchestrating them all.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>UPDATE: VBAF v4.0.0 is complete!.</title>
      <dc:creator>Henning Reckey</dc:creator>
      <pubDate>Sun, 15 Mar 2026 15:16:49 +0000</pubDate>
      <link>https://dev.to/jupyterps/update-vbaf-v400-is-comupdate-vbaf-v400-is-27pj</link>
      <guid>https://dev.to/jupyterps/update-vbaf-v400-is-comupdate-vbaf-v400-is-27pj</guid>
      <description>&lt;p&gt;VBAF v4.0.0 — From Phase 9 to Phase 27: &lt;br&gt;
       Building an Autonomous Enterprise AI Engine in PowerShell 5.1&lt;/p&gt;

&lt;p&gt;One sentence: I started with 5 enterprise pillars and &lt;br&gt;
ended up with 14 DQN agents, 31,000 lines of PS 5.1, &lt;br&gt;
and a master AutoPilot agent orchestrating them all.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>I implemented Deep Q-Networks in pure PowerShell 5.1 — and connected them to real Windows enterprise data</title>
      <dc:creator>Henning Reckey</dc:creator>
      <pubDate>Mon, 09 Mar 2026 15:09:21 +0000</pubDate>
      <link>https://dev.to/jupyterps/i-implemented-deep-q-networks-in-pure-powershell-51-and-connected-them-to-real-windows-5hak</link>
      <guid>https://dev.to/jupyterps/i-implemented-deep-q-networks-in-pure-powershell-51-and-connected-them-to-real-windows-5hak</guid>
      <description>&lt;p&gt;Yes, really. No Python. No TensorFlow. No cloud.&lt;/p&gt;

&lt;p&gt;VBAF is a full reinforcement learning framework written in PowerShell 5.1 classes. It includes DQN, PPO, A3C, Q-Learning, CNNs, RNNs, AutoML, MLOps — all from scratch.&lt;/p&gt;

&lt;p&gt;The latest release (v3.0.0) adds Enterprise Automation agents that read real Windows data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Agent watching live CPU and learning to optimize&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFResourceOptimizerTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Agent reading Event Logs and learning alert routing&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFAlertRouterTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Agent learning job scheduling from Task Scheduler patterns&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-VBAFJobSchedulerTraining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;100&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results: +292% improvement on scheduling, +230% on alert routing vs random baselines.&lt;/p&gt;

&lt;p&gt;The PS 5.1 constraints made this genuinely hard (no operator overloading on typed arrays, no closures, single-threaded class methods) — but that made solving it more interesting.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Install-Module VBAF&lt;/code&gt;&lt;br&gt;
GitHub: &lt;a href="https://github.com/JupyterPS/VBAF" rel="noopener noreferrer"&gt;https://github.com/JupyterPS/VBAF&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy to write a deep-dive on the DQN implementation in PS 5.1 if there’s interest!&lt;/p&gt;

</description>
      <category>automation</category>
      <category>deeplearning</category>
      <category>machinelearning</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
