<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dhana Abhiraj</title>
    <description>The latest articles on DEV Community by Dhana Abhiraj (@dhanaabhirajk).</description>
    <link>https://dev.to/dhanaabhirajk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3438582%2F52b7272d-8226-494f-a325-9d0a5ff4fac2.png</url>
      <title>DEV Community: Dhana Abhiraj</title>
      <link>https://dev.to/dhanaabhirajk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dhanaabhirajk"/>
    <language>en</language>
    <item>
      <title>ARC AGI 3 Preview Competition — My Journey</title>
      <dc:creator>Dhana Abhiraj</dc:creator>
      <pubDate>Sat, 16 Aug 2025 08:01:02 +0000</pubDate>
      <link>https://dev.to/dhanaabhirajk/arc-agi-3-preview-competition-my-journey-3nmb</link>
      <guid>https://dev.to/dhanaabhirajk/arc-agi-3-preview-competition-my-journey-3nmb</guid>
      <description>&lt;p&gt;Last 2 weeks, I was participating in the &lt;strong&gt;ARC AGI 3 preview competition&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I was trying out different techniques to solve the problem. The competition challenge is that we need to build an &lt;strong&gt;agent to win an unknown game&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Most of the things that I tried didn’t work well.&lt;/p&gt;

&lt;p&gt;My solution uses &lt;strong&gt;Text LLM, Image LLM, and Video LLM&lt;/strong&gt;. But still, it doesn’t perform well enough to win the full game.&lt;/p&gt;

&lt;p&gt;In the full development, I used &lt;strong&gt;Gemini&lt;/strong&gt;. There were &lt;strong&gt;rate limit errors&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Week 1
&lt;/h2&gt;

&lt;p&gt;After a few manual experiments and runs with a random agent, I created the below initial flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate a random trial gameplay and reset the game. Only use the frames that have an effect in analysis.&lt;/li&gt;
&lt;li&gt;Pass the gameplay video to the LLM, then generate &lt;strong&gt;10 hypotheses&lt;/strong&gt; out of it (Retrieval) [Analysis].&lt;/li&gt;
&lt;li&gt;Create &lt;strong&gt;hints&lt;/strong&gt; using the gameplay, which will help the goal achiever achieve the goal. [Analysis]&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;1 multi-stage goal&lt;/strong&gt; out of all hypotheses. [Analysis]&lt;/li&gt;
&lt;li&gt;Pass the goal to the &lt;strong&gt;goal achiever loop&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Use the &lt;strong&gt;Image LLM&lt;/strong&gt; with the last game frame to predict the next action.&lt;/li&gt;
&lt;li&gt;Once the action is taken, I use the last 2 frames and check whether the goal is achieved. If achieved, then steps 1 to 6 iterate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;👉 I tried to solve it using a &lt;strong&gt;top-to-bottom approach&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I focused on steps &lt;strong&gt;6 and 7&lt;/strong&gt; with fixed goals and hints. Then it performed with the expected performance.&lt;/p&gt;

&lt;p&gt;But until halfway to the deadline, I worked on just &lt;strong&gt;1 iteration of the loop&lt;/strong&gt; to optimize with manual evaluation of responses using only 1 game (&lt;strong&gt;LOCKSMITH game&lt;/strong&gt;).&lt;/p&gt;

&lt;p&gt;Then I spent some time &lt;strong&gt;re-architecting the flow&lt;/strong&gt; to work iteratively and improve better.&lt;/p&gt;




&lt;h2&gt;
  
  
  Week 2 — First Half
&lt;/h2&gt;

&lt;p&gt;I re-architected the flow with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random gameplay only at the start time, then the next iteration would use the previous goal play and iterate.&lt;/li&gt;
&lt;li&gt;Modified the &lt;strong&gt;goal generation&lt;/strong&gt; with max action limits to avoid wrong goals.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpmwd9nk7syu2bkferr3q.png" alt="Architecture 1" width="800" height="480"&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Week 2 — Second Half
&lt;/h2&gt;

&lt;p&gt;Later, I realized I needed &lt;strong&gt;evaluation&lt;/strong&gt; to speed up development, so I used the &lt;strong&gt;LLM as a judge&lt;/strong&gt; to evaluate steps 1–4 to correctly generate the fixed goal and hints.&lt;/p&gt;

&lt;p&gt;I ran evaluations for all &lt;strong&gt;3 public games&lt;/strong&gt; and found a few issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random play is not good at exploring click-based games.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;goal achievement checker&lt;/strong&gt; step is not working well. (Most LLMs couldn’t identify the difference between two image frames well.)&lt;/li&gt;
&lt;li&gt;Goal generation is weak.&lt;/li&gt;
&lt;li&gt;Hypotheses generation is weak.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How I tried to improve:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Introduced &lt;strong&gt;probability-based random play&lt;/strong&gt; to handle both clickable and non-clickable games.&lt;/li&gt;
&lt;li&gt;Added &lt;strong&gt;color change descriptions&lt;/strong&gt; in the grid for both the game analysis step and the goal checker step.&lt;/li&gt;
&lt;li&gt;Made the &lt;strong&gt;goal generation shorter&lt;/strong&gt;, with fewer moves.&lt;/li&gt;
&lt;li&gt;Improved &lt;strong&gt;frame analysis&lt;/strong&gt; (not all frames were being analyzed earlier).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Last 3 Days
&lt;/h2&gt;

&lt;h3&gt;
  
  
  New Issues Detected:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Some goals end in just 1–2 steps, so &lt;strong&gt;very few frames&lt;/strong&gt; exist for analysis.&lt;/li&gt;
&lt;li&gt;Sometimes LLM retry calls fail with &lt;strong&gt;empty responses and errors&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Solutions:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Passed only the frames of the &lt;strong&gt;current level play&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;If something fails, generate a &lt;strong&gt;random action&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Converted &lt;strong&gt;actions and gameplay into a human-readable event chain&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Remaining Issues:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;full flow&lt;/strong&gt; still needs optimization, even after many iterations.&lt;/li&gt;
&lt;li&gt;Prompts and inputs passed to the LLM need improvements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model choice matters a lot&lt;/strong&gt;: most steps used &lt;code&gt;gemini-2.5-flash&lt;/code&gt;, but in some cases &lt;code&gt;gemini-2.5-pro&lt;/code&gt; worked better, while flash performed worse.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally, the competition deadline approached.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Flow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|                                                                                 |
|  Random Play --------&amp;gt; Game Analysis --------&amp;gt; Goal Achiever                    | ---&amp;gt; Level Cleared
|   (explore)             (perceive &amp;amp; set goal)      (navigate)                   |        |
|       ^                                                |                        |        |
|       |                                                |                        |        |
|       --------------------------------------------------                        |        |
-----------------------------------------------------------------------------------        |
     ^                                                                                     |
     |                                                                                     |
     ---------------------------- Hints (memory) -------------------------------------------


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What I Could Have Done Better
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Started &lt;strong&gt;evaluation and observability&lt;/strong&gt; earlier.&lt;/li&gt;
&lt;li&gt;Chose a &lt;strong&gt;dedicated model&lt;/strong&gt; for inference.&lt;/li&gt;
&lt;li&gt;Developed the &lt;strong&gt;event chain&lt;/strong&gt; in a better way.&lt;/li&gt;
&lt;li&gt;Considered the &lt;strong&gt;game title impact&lt;/strong&gt; earlier (it has a huge positive/negative effect on all steps).&lt;/li&gt;
&lt;li&gt;Turned parts of the workflow into &lt;strong&gt;autonomous tools&lt;/strong&gt; instead of keeping it strict.&lt;/li&gt;
&lt;li&gt;Improved the &lt;strong&gt;memory mechanism across levels&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Went Well
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Learned a lot about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reasoning&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multimodal models&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Building agents&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Workflow design&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;p&gt;The full project code is available in github here - &lt;a href="https://github.com/dhanaabhirajk/ARC-AGI-3-Agents" rel="noopener noreferrer"&gt;https://github.com/dhanaabhirajk/ARC-AGI-3-Agents&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’d love to hear your thoughts or ideas — always open for discussions and collaborations. Reach me on &lt;a href="https://www.linkedin.com/in/dhanaabhirajk/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
