<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Carson Wu</title>
    <description>The latest articles on DEV Community by Carson Wu (@carsonwu).</description>
    <link>https://dev.to/carsonwu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1027101%2F6ef0d1ac-9390-4434-b3d3-53aa606e5cf9.jpeg</url>
      <title>DEV Community: Carson Wu</title>
      <link>https://dev.to/carsonwu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/carsonwu"/>
    <language>en</language>
    <item>
      <title>Beyond Clicking and Shell Commands: API-Native Computer Control</title>
      <dc:creator>Carson Wu</dc:creator>
      <pubDate>Mon, 22 Jun 2026 08:09:05 +0000</pubDate>
      <link>https://dev.to/carsonwu/beyond-clicking-and-shell-commands-api-native-computer-control-5g82</link>
      <guid>https://dev.to/carsonwu/beyond-clicking-and-shell-commands-api-native-computer-control-5g82</guid>
      <description>&lt;p&gt;An AI agent can draft an email, summarize a repository, or propose edits. The more difficult question is what happens next: how should it operate an application?&lt;/p&gt;

&lt;p&gt;GUI automation and shell access are two practical answers. I use both, but I have also been experimenting with another option: give the agent a small application API and let it write a short JavaScript program for each task.&lt;/p&gt;

&lt;p&gt;I call this &lt;strong&gt;API-native computer control&lt;/strong&gt;. The name is more ambitious than the current implementation. I do not know whether it is the best general interface for agents, but it seems promising when an application already has structured data, domain rules, and a meaningful API.&lt;/p&gt;

&lt;h2&gt;
  
  
  When tool calls become the control loop
&lt;/h2&gt;

&lt;p&gt;Tool calling often exposes actions such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create_rectangle(...)
move_object(...)
set_fill_color(...)
delete_object(...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works well for a few independent actions. It becomes less convenient when a task needs iteration or branching. Imagine a slide editor, canvas, or UI board where the user asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Find every text object smaller than 12 px, increase it to 12 px, and move any overflow into a new text box below the original.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the model calls one tool for every object, every observation and action may require another inference step. A short program keeps the ordinary computation local:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;objects&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listObjects&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fontSize&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;updateText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;fontSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;overflowText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;overflowText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;fontSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model still chooses the operation, but loops, conditions, and intermediate values do not each need another model turn. Generated code is not a replacement for tools; it is a way to compose approved tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  GUI control recovers semantics from pixels
&lt;/h2&gt;

&lt;p&gt;Graphical interfaces are excellent for people. We can scan a canvas, recognize an icon, and point at an object without naming every part of the scene.&lt;/p&gt;

&lt;p&gt;For an agent, the same action often takes a longer translation path.&lt;/p&gt;

&lt;p&gt;GUI control is indispensable when no structured interface exists. Vision also remains important when appearance is the result, such as editing slides, graphics, or a web page. My concern is using pixels and coordinates as the primary control plane when the application already knows the content, bounds, transform, and identity of each object.&lt;/p&gt;

&lt;p&gt;A useful split is to use vision to understand and judge the output, while using semantic operations to change application state when available.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the shell works so well
&lt;/h2&gt;

&lt;p&gt;The shell removes much of the visual interpretation work. It is textual, scriptable, composable, and supported by decades of public examples. Models have seen large amounts of Bash, &lt;code&gt;git&lt;/code&gt;, &lt;code&gt;ffmpeg&lt;/code&gt;, and similar tools during training. Some of that operational experience is encoded in the model's learned parameters, so common commands can feel almost native.&lt;/p&gt;

&lt;p&gt;That is a substantial advantage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ffmpeg &lt;span class="nt"&gt;-i&lt;/span&gt; input.mov &lt;span class="nt"&gt;-vf&lt;/span&gt; &lt;span class="s2"&gt;"scale=1920:-2,fps=30"&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt;:v libx264 &lt;span class="nt"&gt;-crf&lt;/span&gt; 20 output.mp4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A model familiar with &lt;code&gt;ffmpeg&lt;/code&gt; may produce this without first studying a manual. The shell is therefore hard to beat for developer environments and established utilities.&lt;/p&gt;

&lt;p&gt;The tradeoff is authority and precision. A process launcher plus filesystem access is broader than a narrowly scoped application operation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resize&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;assetId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1920&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;frameRate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;approvedOutput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CLI syntax also depends on conventions that are not fully captured by a machine-checkable schema.&lt;/p&gt;

&lt;p&gt;Containers and operating-system sandboxes still matter. A narrow API does not replace them, but it can reduce the authority placed inside the boundary. For application-level automation, that may also allow a lighter runtime than a general shell environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  From fixed handlers to generated programs
&lt;/h2&gt;

&lt;p&gt;In a React application, a button usually invokes code that a developer prepared in advance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;AlignButton&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;selectedIds&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;onClick&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;objects&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;selectedIds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;editor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getObject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;left&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;editor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;updateTransform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;left&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;editor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commitHistory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Align left&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt; &lt;span class="na"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;onClick&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;Align left&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The button is a human-friendly handle for a predefined code path. This is reliable, but fixed: the developer must anticipate the action and prepare the handler beforehand.&lt;/p&gt;

&lt;p&gt;An agent can instead generate a short program on the fly from a finite set of lower-level application APIs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;objects&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getSelection&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;left&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bounds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;canvas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;left&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bounds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Frequent actions should probably remain buttons. The more flexible case is an operation that was not anticipated as one fixed handler. The application developer defines the finite API vocabulary and its limits; the agent combines those primitives for the current request. The developer does not need to enumerate every useful sequence beforehand.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the API-native architecture fits together
&lt;/h2&gt;

&lt;p&gt;Generating a program for each request raises two immediate questions: how does the agent learn the application's API, and how can that generated code run without receiving unrestricted access?&lt;/p&gt;

&lt;p&gt;The architecture I am exploring in &lt;a href="https://github.com/carsonDB/CogCore" rel="noopener noreferrer"&gt;&lt;code&gt;CogCore&lt;/code&gt;&lt;/a&gt; answers those questions with an API retrieval path and a guarded execution path.&lt;/p&gt;

&lt;p&gt;The application team owns the TypeScript API, product permissions, and final state changes shown in orange. &lt;a href="https://github.com/carsonDB/CogCore" rel="noopener noreferrer"&gt;&lt;code&gt;CogCore&lt;/code&gt;&lt;/a&gt; provides the blue agent and runtime components. A chat agent delegates the task to a code agent. The code agent retrieves only the relevant API manual, writes a short JavaScript program, and sends it to the guarded runtime.&lt;/p&gt;

&lt;p&gt;The application developer implements the API in TypeScript. A build step uses &lt;a href="https://ts-morph.com/" rel="noopener noreferrer"&gt;&lt;code&gt;ts-morph&lt;/code&gt;&lt;/a&gt; to extract signatures and type relationships from selected &lt;code&gt;*.api.ts&lt;/code&gt; entry points into a searchable graph. TypeScript remains the source of truth; the graph becomes the agent's manual.&lt;/p&gt;

&lt;p&gt;Generated JavaScript receives named globals chosen by the host application, rather than automatic access to the DOM, shell, network, or application internals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CanvasForAI&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;listObjects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;ObjectFilter&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;ObjectSummary&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="nf"&gt;getObject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;CanvasObject&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nf"&gt;createShape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ShapeInput&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;CanvasObject&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nf"&gt;createText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TextInput&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;CanvasObject&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nf"&gt;updateGeometry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;GeometryUpdate&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;CanvasObject&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nf"&gt;updateAppearance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AppearanceUpdate&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;CanvasObject&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nf"&gt;removeObjects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a generic example, not an announcement of a particular editor project.&lt;/p&gt;

&lt;p&gt;The finite surface is the important constraint. Generated code can combine these APIs in ways the developer did not predefine, but it should not automatically receive the DOM, shell, network, or internal application state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why JavaScript needs runtime validation
&lt;/h2&gt;

&lt;p&gt;Why generate JavaScript rather than TypeScript? A task program has a short lifecycle: write it, run it, verify it, and discard it. Requiring imports and type annotations adds syntax, while the actual application API is already defined in TypeScript.&lt;/p&gt;

&lt;p&gt;More importantly, TypeScript checking would still not be enough. Its types are erased when code becomes JavaScript, and broad static types do not express many runtime rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;UpdateInput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;svgPath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All three fields are statically valid, but the application still needs to ask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;color   -&amp;gt; Is it a valid color accepted by this renderer?
svgPath -&amp;gt; Is it valid SVG path data within a complexity limit?
opacity -&amp;gt; Is it finite and between 0 and 1?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For this reason, &lt;a href="https://github.com/carsonDB/CogCore" rel="noopener noreferrer"&gt;&lt;code&gt;CogCore&lt;/code&gt;&lt;/a&gt; starts with plain JavaScript and adds checks at the boundary. JavaScript &lt;code&gt;Proxy&lt;/code&gt; wrappers restrict access to the declared object surface and reject unknown properties. Zod schemas validate arguments and results against application rules. A worker adds separation and timeouts, although it is not equivalent to a VM, container, or operating-system sandbox.&lt;/p&gt;

&lt;p&gt;The goal is not to make generated code automatically correct. It is to make failures constrained and repairable:&lt;/p&gt;

&lt;p&gt;API design carries much of the safety model. Compare a broad shell capability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;command&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;with a narrow application capability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deleteGeneratedPreview&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;previewId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first delegates meaning to a shell with ambient authority. The second can validate an opaque identifier, check ownership, record history, and support undo. The narrower operation is safer and easier for the model to use correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The central limitation of API-native control
&lt;/h2&gt;

&lt;p&gt;API-native control offers semantic operations, narrow permissions, strong multi-step composition, and the possibility of a lightweight application sandbox. Its central weakness is not execution power but unfamiliarity.&lt;/p&gt;

&lt;p&gt;The same training advantage that helps shell agents exposes this weakness. Models have seen enormous amounts of shell code, so common shell skills are already encoded in their learned parameters. After training, those parameters remain fixed until the model is updated or fine-tuned. A larger model may reason better from context, but size alone does not give it native experience with a newly created application API.&lt;/p&gt;

&lt;p&gt;A manual can supply current information at inference time, yet models do not always use it reliably. They may ignore part of it, invent a familiar-sounding method, or fall back to stale patterns learned during training. Retrieval and repair help, but they also add latency.&lt;/p&gt;

&lt;p&gt;Compared with API-native control, GUI automation has greater reach because it can operate software that exposes no structured API, but it pays for that reach by recovering semantics from pixels. Shell control has much stronger model familiarity and an excellent ecosystem, but it often exposes broader authority and fits structured application state less precisely. Individual tool calls retain narrow schemas, but long tasks can turn the model itself into the control loop.&lt;/p&gt;

&lt;p&gt;API-native control is trying to keep the useful properties together: semantic access, finite authority, local program composition, and adaptation through an updated manual. The unresolved part is that an updated manual is not the same as an internalized skill.&lt;/p&gt;

&lt;p&gt;This is why external skills matter. Instead of placing every operational skill permanently in model weights, an agent system could retain concise lessons from accepted runs and revise them as the application changes. &lt;a href="https://github.com/carsonDB/CogCore" rel="noopener noreferrer"&gt;&lt;code&gt;CogCore&lt;/code&gt;&lt;/a&gt; includes an early experiment with reusable skill tips, but this is not a solved learning system.&lt;/p&gt;

&lt;p&gt;The open problem is not only how to expose a safer interface. It is how to make an agent reliably adapt to an interface it has never seen before, without letting old habits override the application's current rules. Until that works consistently, API-native control remains a useful direction with a very real limit.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Libav (FFmpeg) C API understanding from Object-Oriented view</title>
      <dc:creator>Carson Wu</dc:creator>
      <pubDate>Mon, 27 Feb 2023 03:31:31 +0000</pubDate>
      <link>https://dev.to/carsonwu/libav-ffmpeg-c-api-understanding-from-object-oriented-view-1h9i</link>
      <guid>https://dev.to/carsonwu/libav-ffmpeg-c-api-understanding-from-object-oriented-view-1h9i</guid>
      <description>&lt;p&gt;Recently, I was developing WebAssembly based FFmpeg library, &lt;a href="https://github.com/carsonDB/frameflow" rel="noopener noreferrer"&gt;FrameFlow&lt;/a&gt;. It directly uses low-level C API of libav* folders from FFmpeg, to give more power to web browser. I want to share some development experience of using those C APIs.&lt;/p&gt;

&lt;p&gt;FFmpeg mainly has two ways to use it. Command-line way or C API. Actually Command-line program is also based on C API. Now when your first time to learn those APIs, it would be confused why there are multiple steps to create one thing. Because C language only has functions to do something. Why not use just one function to init something?&lt;br&gt;
Here is an example (C++), from &lt;a href="https://github.com/carsonDB/frameflow/blob/6681b44073a65e5ab612e0bf6f24f71742095d5d/src/cpp/encode.cpp#L14" rel="noopener noreferrer"&gt;encode.cpp&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;codec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;avcodec_find_encoder_by_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;codec_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_str&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;codec_ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;avcodec_alloc_context3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codec&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;set_avcodec_context_from_streamInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;codec_ctx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;avcodec_open2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codec_ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;codec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is minimum requirements to create an encoder. Let me explain one by one.&lt;br&gt;
First &lt;code&gt;avcodec_find_encoder_by_name&lt;/code&gt; find the &lt;code&gt;Codec&lt;/code&gt; by its name. This &lt;code&gt;Codec&lt;/code&gt; is just like a class. You cannot change any value in it. It gives you some meta information about the codec (like &lt;code&gt;libx264&lt;/code&gt; codec), and also has pointers to functions to encode for example. Its type is &lt;code&gt;AVCodec&lt;/code&gt;.&lt;br&gt;
Second line &lt;code&gt;avcodec_alloc_context3&lt;/code&gt;, is just &lt;code&gt;malloc&lt;/code&gt; a memory block, with every value in the struct set to default value. It is called &lt;code&gt;codec_ctx&lt;/code&gt; (codec context). The name is a convention in FFmpeg. Because its type is &lt;code&gt;AVCodecContext&lt;/code&gt;. This is just like using &lt;code&gt;new&lt;/code&gt; to create a new object (instance).&lt;br&gt;
The third line is to set all values from &lt;code&gt;info&lt;/code&gt; which I defined before. And this function is my defined function. Don't care about it. This step is just like giving parameters to &lt;code&gt;constructor&lt;/code&gt; of the class.&lt;br&gt;
The last line &lt;code&gt;avcodec_open2&lt;/code&gt; is to initiate the object (instance). Just like calling constructor of the class.&lt;/p&gt;

&lt;p&gt;So although, FFmpeg is written in pure C language. But it actually uses some Object-oriented style to organize the codebase. You can also see other similar examples about &lt;code&gt;demuxer&lt;/code&gt;, &lt;code&gt;muxer&lt;/code&gt;, &lt;code&gt;decoder&lt;/code&gt; in my project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Changes after init
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Decoder: Time_base
&lt;/h3&gt;

&lt;p&gt;In my experience of developing, there are some annoying bugs that seem weird, at first glance. Then after understanding the init process as I explained above, there is a key step that we should care about, last step &lt;code&gt;avcodec_open2&lt;/code&gt;. Because it starts a contructor function, and init. It may change some fields that you set at the previous step.&lt;br&gt;
For example, here when you call &lt;code&gt;avcodec_open2&lt;/code&gt;. It will use specifed codec algorithm to init. And often, &lt;code&gt;time_base&lt;/code&gt; will be changed to another value. That may let us surprised. So any output frames' &lt;code&gt;time_base&lt;/code&gt; is according to the new one, not the one you set. So after calling &lt;code&gt;avcodec_open2&lt;/code&gt;, you may need to retrieve current &lt;code&gt;time_base&lt;/code&gt; value from &lt;code&gt;codec_ctx&lt;/code&gt;, to do further stuff.&lt;br&gt;
By the way, you may wonder what is &lt;code&gt;time_base&lt;/code&gt; ? It might be worth to write another blog to explain. And now, simply explained, it is just a time unit, like second, microsecond, etc.&lt;/p&gt;

&lt;h3&gt;
  
  
  Encoder: format (pixel format / sample format)...
&lt;/h3&gt;

&lt;p&gt;There is another example. For encoder, pixel format (video) or sample format (audio) may be changed, by specified codec algorithm, which the decoder uses. So after init, the encoder may only accept another pixel format frame. So before encoding, you need to &lt;code&gt;rescale&lt;/code&gt; video frames to the specified pixel format, or &lt;code&gt;resample&lt;/code&gt; audio frames to the specified sample format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Overall, having an Object-oriented view would better understand those C APIs. And You can see all cpp codes in &lt;a href="https://github.com/carsonDB/frameflow/blob/6681b44073a65e5ab612e0bf6f24f71742095d5d/src/cpp/" rel="noopener noreferrer"&gt;FrameFlow-0.1.1 release&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>coding</category>
    </item>
  </channel>
</rss>
