<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hassan Naeem </title>
    <description>The latest articles on DEV Community by Hassan Naeem  (@hassannaeem).</description>
    <link>https://dev.to/hassannaeem</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3889903%2F85153afb-ad7c-42a8-a1e9-595b434d0fe3.jpeg</url>
      <title>DEV Community: Hassan Naeem </title>
      <link>https://dev.to/hassannaeem</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hassannaeem"/>
    <language>en</language>
    <item>
      <title>I built a browser agent that plays a game by looking at pixels</title>
      <dc:creator>Hassan Naeem </dc:creator>
      <pubDate>Tue, 21 Apr 2026 15:53:03 +0000</pubDate>
      <link>https://dev.to/hassannaeem/i-built-a-browser-agent-that-plays-a-game-by-looking-at-pixels-3l86</link>
      <guid>https://dev.to/hassannaeem/i-built-a-browser-agent-that-plays-a-game-by-looking-at-pixels-3l86</guid>
      <description>&lt;p&gt;A lot of "AI agents" are just an LLM with a REST API stitched to it. I wanted to build something where the agent can't cheat, no DOM shortcuts, no hidden game state. Just a browser, a screen, and a mouse.&lt;/p&gt;

&lt;p&gt;So I built a small agent that plays &lt;strong&gt;Block Champ&lt;/strong&gt; on CrazyGames.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/Hassan-Naeem-code/Browser-Operating-Agent" rel="noopener noreferrer"&gt;https://github.com/Hassan-Naeem-code/Browser-Operating-Agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's what was interesting about it, and the techniques all transfer to any canvas-based web app (whiteboards, in-browser IDEs, design tools, games).&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with "browser agents."
&lt;/h2&gt;

&lt;p&gt;Most browser automation tutorials assume the app cooperates: meaningful HTML, readable labels, queryable buttons. Games don't. The whole board lives inside a &lt;code&gt;&amp;lt;canvas&amp;gt;&lt;/code&gt; element, which from the DOM's point of view is just a black box.&lt;/p&gt;

&lt;p&gt;On top of that, this game lives inside &lt;strong&gt;nested iframes&lt;/strong&gt;, a CrazyGames wrapper frame, and inside it, the actual game frame. So the agent needs to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Land on the page&lt;/li&gt;
&lt;li&gt;Find the outer iframe&lt;/li&gt;
&lt;li&gt;Switch context into it&lt;/li&gt;
&lt;li&gt;Find the &lt;em&gt;inner&lt;/em&gt; iframe&lt;/li&gt;
&lt;li&gt;Switch context again&lt;/li&gt;
&lt;li&gt;Find the canvas&lt;/li&gt;
&lt;li&gt;Figure out its absolute position on the screen&lt;/li&gt;
&lt;li&gt;Drag the mouse in global coordinates that map back to the canvas&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every step is a place where a naive script breaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 — Traversing nested iframes
&lt;/h2&gt;

&lt;p&gt;Playwright's &lt;code&gt;content_frame()&lt;/code&gt; is the key. You query for the iframe element as usual, then ask Playwright to give you the frame's context so you can query inside it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_selector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;iframe#game-iframe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;iframe_element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_selector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;iframe#game-iframe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;iframe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;iframe_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content_frame&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# And again, one level deeper
&lt;/span&gt;&lt;span class="n"&gt;nested_iframe_element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;iframe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_selector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;iframe[src*=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;block-champ/2/index.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;nested_game_frame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;nested_iframe_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content_frame&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Small gotcha: you have to &lt;code&gt;wait_for_timeout&lt;/code&gt; between switches. IFrames load asynchronously, and if you query too early, you get &lt;code&gt;None&lt;/code&gt; back with no obvious error.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 — Canvas positioning
&lt;/h2&gt;

&lt;p&gt;Once you've got the canvas, &lt;code&gt;bounding_box()&lt;/code&gt; gives you its position &lt;em&gt;inside&lt;/em&gt; its frame. But the mouse API operates on the &lt;em&gt;page's&lt;/em&gt; global coordinates. So you have to add them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;canvas_box&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bounding_box&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;iframe_box&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;nested_iframe_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bounding_box&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;global_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;iframe_box&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;canvas_box&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;canvas_box&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;width&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="n"&gt;global_y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;iframe_box&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;canvas_box&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;canvas_box&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;height&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.85&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the thing most people miss: &lt;em&gt;every iframe has its own coordinate system&lt;/em&gt;. Nested iframes stack those offsets. Forget to add them, and your clicks land on nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3 — Humanlike mouse movement
&lt;/h2&gt;

&lt;p&gt;Block Champ only registers drugs that look like real drugs. If you teleport the mouse from A to B, the game ignores it. You need intermediate moves:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;intermediate_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;source_x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;intermediate_y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_y&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;source_y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;intermediate_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;intermediate_y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Forty steps with a 10ms sleep between each gives you a ~400ms drag — smooth enough to register. This same technique works for bot-detection bypass in legitimate testing contexts, drag-and-drop in design tools, slider inputs, anywhere mouse dynamics matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4 — Reading a canvas with pixels
&lt;/h2&gt;

&lt;p&gt;Here's the fun part. The canvas is opaque to the DOM, but Playwright can screenshot &lt;em&gt;just the canvas element&lt;/em&gt; and hand you raw bytes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;img_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;screenshot&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;BytesIO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_bytes&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From there, it's just PIL. For Block Champ, I divided the canvas into a 10×10 grid, sampled one pixel at the center of each cell, and classified it as empty or filled based on brightness:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;grid_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;cell_w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;grid_size&lt;/span&gt;
&lt;span class="n"&gt;cell_h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;grid_size&lt;/span&gt;
&lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;220&lt;/span&gt;  &lt;span class="c1"&gt;# empty cells are near-white in this game
&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;gx&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grid_size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;gy&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grid_size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;px&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gx&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;cell_w&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;cell_w&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
        &lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gy&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;cell_h&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;cell_h&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pixels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;empty_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;filled_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Crude, but it works. And it's the pattern you'd use to let an LLM "see" any canvas-based UI: screenshot, grid, classify, pass the grid as text to the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's still missing
&lt;/h2&gt;

&lt;p&gt;The current version makes &lt;em&gt;random&lt;/em&gt; moves. The structure is there for smarter play; &lt;code&gt;choose_best_move(board_state)&lt;/code&gt; is a placeholder waiting for real logic.&lt;/p&gt;

&lt;p&gt;Next steps I'm exploring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Send the board state to an LLM as structured text and let the model pick moves&lt;/li&gt;
&lt;li&gt;Use a small local vision model to classify cell shapes instead of just "empty/filled."&lt;/li&gt;
&lt;li&gt;Run it headless on a schedule and track scores over time&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this matters beyond a game
&lt;/h2&gt;

&lt;p&gt;The exact same pattern — &lt;strong&gt;traverse → locate → screenshot → classify → act&lt;/strong&gt; ,is how you'd build agents for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Figma/Miro automation&lt;/li&gt;
&lt;li&gt;In-browser CAD or music tools&lt;/li&gt;
&lt;li&gt;Any legacy internal tool that renders to canvas&lt;/li&gt;
&lt;li&gt;Visual testing of rich web apps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The DOM is no longer the only API for the web. If you can take a screenshot and send a mouse event, you can build an agent.&lt;/p&gt;

&lt;p&gt;Repo with full code: &lt;a href="https://github.com/Hassan-Naeem-code/Browser-Operating-Agent" rel="noopener noreferrer"&gt;https://github.com/Hassan-Naeem-code/Browser-Operating-Agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're building agents or doing browser automation, I'd love to hear what you're working on. Drop a comment or find me on GitHub.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>playwright</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
