<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Captain Jack Smith</title>
    <description>The latest articles on DEV Community by Captain Jack Smith (@jacob_is_surfing).</description>
    <link>https://dev.to/jacob_is_surfing</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3904627%2F842eff18-5bca-46a4-82fd-d78985a2cd2a.png</url>
      <title>DEV Community: Captain Jack Smith</title>
      <link>https://dev.to/jacob_is_surfing</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jacob_is_surfing"/>
    <language>en</language>
    <item>
      <title>Claude Code Harness and the Cost of Longer Agent Work</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Tue, 09 Jun 2026 03:44:55 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/claude-code-harness-and-the-cost-of-longer-agent-work-5fp4</link>
      <guid>https://dev.to/jacob_is_surfing/claude-code-harness-and-the-cost-of-longer-agent-work-5fp4</guid>
      <description>&lt;p&gt;Karpathy sharing a long piece about Claude Code Harness felt like a small signal with a large implication. The center of gravity in AI coding is moving from clever prompts to execution systems. A prompt asks a model to help. A harness gives the model a workplace, a memory trail, tools, checkpoints, and a rhythm for continuing when the task becomes larger than one clean conversation.&lt;/p&gt;

&lt;p&gt;That shift explains why the harness method is becoming so attractive, and also why it can look like another token hungry machine. The more responsibility we hand to agents, the more context they need to read, preserve, compare, verify, and clean up. The dream is autonomous progress. The bill arrives through planning tokens, tool output tokens, handoff tokens, verification tokens, and cleanup tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the repost mattered
&lt;/h2&gt;

&lt;p&gt;Karpathy has become a useful filter for ideas that change how builders behave. His attention to the Claude Code Harness discussion mattered because it pointed at a practical truth. The next jump in agent performance may come as much from the frame around the model as from the model itself.&lt;/p&gt;

&lt;p&gt;Claude Code already shows why this frame matters. Anthropic describes it as a system that can read a codebase, edit files, run tests, and deliver committed code. That is a very different experience from a chat answer. The model is still central, but the surrounding workflow decides what the model sees, which tools it can touch, when it must pause, how it records progress, and how it proves that work is complete.&lt;/p&gt;

&lt;p&gt;The long harness essays sharpen the same point. Long running agents fail in familiar ways. They start before gathering enough context. They drift from the plan. They grow anxious as the context window fills. They avoid complex work by shrinking the task. They write weak checks and declare success too early. They leave stale documentation and contradictory state behind. A harness exists to make these failures harder to ignore.&lt;/p&gt;

&lt;h2&gt;
  
  
  The task generated execution frame
&lt;/h2&gt;

&lt;p&gt;The most interesting idea is that the harness should be generated around the task. A small bug fix, a research synthesis, a full stack app, and a scientific workflow should not share the same operating pattern. Each task deserves its own execution frame.&lt;/p&gt;

&lt;p&gt;For a coding task, that frame might create a feature list, a progress file, an init script, and a rule that each session works on one feature at a time. For a design task, it might create a planner, a generator, and an evaluator. For a research task, it might create a source map, a claims table, and a final contradiction check. The user describes the goal. The agent first builds the scaffolding that will keep the work honest.&lt;/p&gt;

&lt;p&gt;This is why the method feels powerful. It turns a vague request into a concrete operating environment. The task is decomposed. Unknowns are named. Stop conditions are written down. Verification is separated from generation. A fresh context can review the result with less attachment to the earlier path. The agent becomes easier to supervise because its work leaves artifacts that humans can inspect.&lt;/p&gt;

&lt;p&gt;The cost is equally clear. Every artifact consumes tokens. Every review pass consumes tokens. Every handoff summary consumes tokens. A weak harness wastes tokens by adding ceremony. A good harness spends tokens to prevent expensive failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The token economics
&lt;/h2&gt;

&lt;p&gt;The real question is not whether harnesses consume many tokens. They do. The real question is whether the extra tokens buy reliability, speed, and fewer human interruptions.&lt;/p&gt;

&lt;p&gt;A bare model can answer quickly and cheaply, especially when the task is small. But as tasks stretch across many files, many sessions, and many decisions, cheap interaction often becomes expensive rework. The harness spends more at the beginning so the project does not pay later through hidden mistakes.&lt;/p&gt;

&lt;p&gt;This is already visible in agent workflows. Reading the repository costs tokens, but skipping context creates wrong plans. Writing a progress file costs tokens, but losing state forces the next session to rediscover the project. Running a separate verifier costs tokens, but letting the same agent grade its own work encourages soft tests. Cleanup costs tokens, but entropy makes the next task harder.&lt;/p&gt;

&lt;p&gt;The phrase token guzzler is fair when a harness expands without discipline. It is less fair when the harness is replacing human coordination, project management, test design, and code review. The practical measure is outcome per token. If a harness spends ten times more context and prevents one serious false completion, it may be cheap. If it produces beautiful process notes while the final result remains fragile, it is noise with a meter attached.&lt;/p&gt;

&lt;h2&gt;
  
  
  A useful pattern for builders
&lt;/h2&gt;

&lt;p&gt;The best harness pattern is compact and task aware. First, force context intake. The agent should identify the files, sources, constraints, and unknowns that matter before it plans. Second, create a visible task ledger. The ledger should show what has been attempted, what passed, what failed, and what remains. Third, keep verification independent. The checker should evaluate the requested behavior, not the easiest behavior to test. Fourth, clean the workspace after progress. Documentation, dead code, and stale assumptions are part of the task surface. Fifth, set a token budget with stop rules. Autonomy works better when it knows when to continue and when to ask.&lt;/p&gt;

&lt;p&gt;This pattern also matters outside code. A researcher can use &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; to convert a formula image into usable mathematical notation, ask &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; or &lt;a href="https://gemini.google.com/app" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; to compare interpretations, then use &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; to turn an AI generated paper figure into an editable vector format. The same harness logic applies. Capture the input, preserve the claim trail, verify the output, and keep the final artifact editable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger meaning
&lt;/h2&gt;

&lt;p&gt;The harness conversation is really about trust. People do not want agents that merely sound confident. They want agents that can stay oriented, respect constraints, expose their state, and recover from mistakes. A task generated execution frame is one answer to that demand.&lt;/p&gt;

&lt;p&gt;It will consume tokens. It should consume tokens. Long work needs memory, checks, and coordination. The important thing is to spend those tokens where they create leverage. Karpathy sharing the Claude Code Harness discussion brought attention to a simple lesson. The future of AI work will be shaped by the model, the tools, and the disciplined operating system that connects them.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>When AI Starts Building AI, The Pause Debate Becomes Real</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Mon, 08 Jun 2026 04:12:27 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/when-ai-starts-building-ai-the-pause-debate-becomes-real-1n6f</link>
      <guid>https://dev.to/jacob_is_surfing/when-ai-starts-building-ai-the-pause-debate-becomes-real-1n6f</guid>
      <description>&lt;p&gt;Anthropic published one of the most important AI governance posts of 2026 because it came from inside the race. A frontier lab described how its own models are already accelerating its own work, then asked what happens when that loop becomes much tighter.&lt;/p&gt;

&lt;p&gt;The central idea is recursive self improvement. In plain terms, it is the moment when an AI system can help design, build, test, and improve the next system with little human labor in the loop. Anthropic says that point remains ahead and uncertain. The uncomfortable part is the evidence that the slope is already bending toward it.&lt;/p&gt;

&lt;p&gt;The strongest signal is code. As of May 2026, Anthropic says more than 80 percent of the code merged into its production codebase was authored by Claude. Before Claude Code entered research preview in February 2025, the share was in the low single digits. The company also says the typical Anthropic engineer now ships about 8 times as much code per quarter as engineers did across 2021 to 2025. Lines of code are a rough measure, yet the direction is hard to ignore. The bottleneck has moved from typing to directing, reviewing, and deciding what should be built.&lt;/p&gt;

&lt;p&gt;That shift matters because model development is full of loops. Write code, run experiments, inspect failures, adjust infrastructure, compare results, rewrite the plan, and repeat. If a model can compress each loop, progress compounds. Anthropic reports that Claude has become much better at open ended coding tasks, reaching a 76 percent success rate in May 2026 on its hardest internal category. In a small research style optimization task, performance rose from about 3 times faster code in May 2025 to about 52 times faster by April 2026 with Mythos Preview. Those numbers should be treated as company reported evidence, yet they still reveal what frontier labs are watching from the inside.&lt;/p&gt;

&lt;p&gt;The real question is judgment. Writing code and running tests are now the easy part of many technical workflows. Choosing the problem, knowing which result matters, deciding when a measurement is misleading, and recognizing a dead end remain more human. Anthropic frames this as the remaining gap between powerful AI assistance and full recursive self improvement. If that gap narrows, the human role in frontier development becomes less like builder and more like reviewer, auditor, and governor of a virtual research lab.&lt;/p&gt;

&lt;p&gt;This is why Anthropic called for the option of a coordinated slowdown or temporary pause in frontier development. The wording matters. A single company stopping by itself would mainly hand advantage to competitors. A meaningful pause would need several well funded labs in several countries to agree on the same conditions, verify that others are complying, define what triggers the pause, define what ends it, and prevent a hidden actor from racing ahead. Reuters emphasized this as a coordinated and verifiable plan. Scientific American highlighted the political difficulty and noted that critics see the proposal as unrealistic, or even as a way for a leading lab to shape regulation while keeping its own advantage.&lt;/p&gt;

&lt;p&gt;Both reactions can be true at once. The risk can be serious, and the proposed governance path can still be very hard. Training runs are easier to hide than many older strategic technologies. Compute, talent, model weights, data pipelines, and private infrastructure are spread across companies and countries. The incentive to defect during a pause would be enormous because the remaining runner could inherit the frontier. A pause that cannot be verified becomes theater. A race with no brake becomes a wager with public consequences.&lt;/p&gt;

&lt;p&gt;So the practical meaning of AI self improvement sits between science fiction and ordinary software progress, with immediate operational stakes. It means every organization using frontier AI needs stronger review loops. It means audit trails for model generated work, evaluation suites that test long tasks, provenance for research claims, controls for autonomous agents, and people whose job is to ask whether speed has outgrown understanding. The human bottleneck should move upward while staying visible.&lt;/p&gt;

&lt;p&gt;For researchers and technical writers, this new workflow also changes the tools around knowledge production. &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; can help turn scattered source notes into structured arguments and expose weak assumptions before publication. &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; can convert formula images into usable formulas when AI research material moves into a draft. &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; can turn AI generated paper figures into editable vector graphics, which matters when diagrams need revision, translation, or careful peer review. These tools are small examples of the larger pattern. AI accelerates the work, and humans need better ways to inspect the artifacts it leaves behind.&lt;/p&gt;

&lt;p&gt;The hardest part of Anthropic position is that it asks society to build coordination faster than labs build capability. That may sound almost impossible, but the alternative is to discover the governance problem after the technical loop has already closed. A better response pairs urgency with discipline. It treats recursive self improvement as a near term management problem before it becomes a frontier science problem. The world needs measurements that outsiders can trust, institutions that can act before headlines force them to, and AI labs willing to expose enough of their internal acceleration for everyone else to understand the stakes.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>After Harness, The Next Agent Buzzword Will Be Persistence</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Fri, 05 Jun 2026 04:13:05 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/after-harness-the-next-agent-buzzword-will-be-persistence-c2l</link>
      <guid>https://dev.to/jacob_is_surfing/after-harness-the-next-agent-buzzword-will-be-persistence-c2l</guid>
      <description>&lt;p&gt;The agent world loves a new word when old language stops carrying the weight of new behavior. Harness became useful because it named the layer around a model. Tools, memory, permissions, sandboxes, retries, evaluations, context assembly, and observability suddenly belonged to one mental object. The word helped teams see that a model with a prompt is only the reasoning core. A useful agent needs a working environment.&lt;/p&gt;

&lt;p&gt;The next word will likely circle around persistence. It may arrive as agent memory, durable context, continuous workspace, lifespan engineering, task ledger, or stateful runtime. The packaging will change by vendor. The underlying question stays simple. Can the agent keep doing useful work across time, failure, people, devices, approvals, and changing information.&lt;/p&gt;

&lt;p&gt;Harness answered what surrounds the model. Persistence asks what survives after the first impressive demo. A serious agent needs to remember goals, decisions, constraints, artifacts, file locations, user preferences, tool results, cost history, approval status, and the current shape of a task. It also needs to resume after a server restart, a user interruption, a failed API call, or a week of silence.&lt;/p&gt;

&lt;p&gt;That is why the market is already moving in this direction. LangGraph makes checkpoints central to graph state. OpenAI Agents SDK sessions keep conversation history across agent runs. Google Agent Platform combines sessions with Memory Bank for continuous conversations and long term memories. Temporal frames durable execution as the backbone for workflows that must recover and continue after failure. Different product names point to the same pressure. Agents are becoming systems that need state management as much as reasoning.&lt;/p&gt;

&lt;p&gt;This is also why persistence will be repackaged many times. Memory sounds personal. Checkpoints sound technical. Durable execution sounds infrastructural. Context durability sounds enterprise friendly. Agent workspace sounds collaborative. Each term highlights a different buyer and a different anxiety. The builder worries about crashes. The manager worries about auditability. The user worries that the agent will forget the thing that was already explained. The security team worries that it will remember the wrong thing forever.&lt;/p&gt;

&lt;p&gt;The hard part is controlled persistence. A naive memory layer can preserve stale facts, private details, bad instructions, and accidental correlations. Research on long horizon agents already points to drift, noisy recall, and aging effects when memory grows without discipline. The valuable version of persistence needs boundaries. It needs memory review, expiry, permissions, provenance, checkpoints, rollback, compact summaries, and evaluations that measure reliability across weeks instead of a single fresh run.&lt;/p&gt;

&lt;p&gt;For creators and researchers, the practical workflow is easy to imagine. &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; can help frame the research question and turn scattered notes into a plan. &lt;a href="https://gemini.google.com/app" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; can add a second reasoning angle during source review. &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; can turn formula screenshots into usable formulas when technical material moves into a draft. &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; can convert AI generated paper figures into editable vector graphics for revision. A persistent agent should remember which equation came from which source, which figure version was approved, and which claim still needs checking.&lt;/p&gt;

&lt;p&gt;After Harness, vendors will sell continuity. The winning agent stack will present itself as a durable workspace with memory profiles, event logs, permission gates, resumable execution, artifact history, and recovery paths. Autonomy will remain attractive, yet continuity will decide whether agents become daily infrastructure. The agent that matters will remember enough to continue, forget enough to stay safe, and leave enough trace for humans to trust the work.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Why OpenAI Is Connecting Codex And ChatGPT</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Thu, 04 Jun 2026 02:23:42 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/why-openai-is-connecting-codex-and-chatgpt-hn8</link>
      <guid>https://dev.to/jacob_is_surfing/why-openai-is-connecting-codex-and-chatgpt-hn8</guid>
      <description>&lt;p&gt;OpenAI is turning Codex from a specialist coding tool into a continuous work layer inside the broader ChatGPT ecosystem. The move looks simple from the outside. A developer starts a thread in Codex, checks progress from ChatGPT on mobile, approves a command, reviews a diff, then returns to the desktop host where the real project context lives. Under that simple flow sits a deeper product strategy.&lt;/p&gt;

&lt;p&gt;The first reason is continuity. Coding agents now do far more than answer short prompts. They can run long tasks, compare approaches, inspect terminals, work in isolated worktrees, and keep state across hours. A standalone coding app is useful for focused sessions, yet many decisions happen away from the desk. Bringing Codex into ChatGPT mobile lets the user keep the thread alive when a task needs approval, context, or a quick change of direction.&lt;/p&gt;

&lt;p&gt;The second reason is distribution. ChatGPT already has the audience, the identity layer, the subscription system, and the habit loop. Codex has the local project context, shell access, diffs, tests, plugins, and workflow depth. Connecting them gives OpenAI a path to make agentic software work feel native to everyday ChatGPT usage.&lt;/p&gt;

&lt;p&gt;The third reason is trust. OpenAI has designed Codex around local hosts, explicit approvals, sandboxing, and reviewable changes. When ChatGPT mobile connects to a Codex session, the project files, credentials, permissions, and local setup remain on the machine where Codex is operating. The phone becomes a control surface. This keeps convenience from swallowing the security model, which matters for individuals and even more for teams.&lt;/p&gt;

&lt;p&gt;This also explains why the merger is gradual. Codex desktop, CLI, IDE, cloud tasks, Windows support, mobile remote access, Computer Use, skills, and automations are surfaces around the same core idea. The user should be able to assign work wherever the idea appears, then supervise it wherever attention is available.&lt;/p&gt;

&lt;p&gt;For creators, researchers, and builders, this shift changes the meaning of productivity tools. &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; can become the conversational hub where ideas are framed. &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; can help turn visual math into usable formulas when technical content moves from screenshot to editable work. &lt;a href="https://gemini.google.com/app" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; can provide another reasoning angle during research. &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; becomes useful when AI generated paper figures need to become editable vector graphics for publication or revision.&lt;/p&gt;

&lt;p&gt;The strategic point is that OpenAI is collapsing the distance between conversation and execution. ChatGPT captures intent. Codex performs work in real environments. Mobile access preserves momentum. Enterprise controls make adoption less risky. Together, these pieces make a coding agent feel like an operating layer for knowledge work.&lt;/p&gt;

&lt;p&gt;The long term bet is bigger than coding. Code is the place where agent output can be inspected, tested, reverted, and shipped. If OpenAI can make that loop reliable, the same pattern can expand into analysis, design, documentation, operations, and scientific workflows. Codex enters ChatGPT because the future agent interface will span many surfaces. It will be a persistent work system that follows the user across devices, contexts, and decisions.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Why GPT Image 2 Infographics Often Look AI Generated</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Wed, 03 Jun 2026 03:17:04 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/why-gpt-image-2-infographics-often-look-ai-generated-2eeh</link>
      <guid>https://dev.to/jacob_is_surfing/why-gpt-image-2-infographics-often-look-ai-generated-2eeh</guid>
      <description>&lt;p&gt;Many people now have a strange new visual instinct. They open an infographic, glance at the background, the icons, the labels, and the tiny decorative textures, then feel that the image came from an AI model before they can explain why. The suspicion usually comes from a cluster of signals. The image may be sharp, polished, and readable, yet it carries a thin layer of noise, small speckles, repeated texture, and overly busy micro detail. GPT Image 2 has made this feeling especially noticeable because it is strong at layout and text, which means the remaining flaws become easier to isolate.&lt;/p&gt;

&lt;p&gt;The first reason is mechanical. Modern image generators learn to create images by moving from noisy visual states toward coherent pictures. Even when the final result looks clean, traces of that denoising process can survive as grain, stippling, soft dirt, or patterned texture. With a photograph, noise usually follows the logic of a camera sensor, lens, lighting condition, or compression pipeline. With an AI image, the noise often follows the logic of generation. It appears across surfaces that should behave differently, such as paper, glass, skin, metal, and flat UI panels. That mismatch is one reason our eyes catch it so quickly.&lt;/p&gt;

&lt;p&gt;Infographics expose the issue more than portraits or landscapes. A good infographic needs clean regions, stable typography, simple icon geometry, and clear visual hierarchy. It also has many boundaries where errors become obvious. When GPT Image 2 tries to make every small label, connector, shadow, background panel, and diagram element feel visually rich, it can overfill the image with detail. The result is a surface that looks impressive at first and artificial on the second look. The model has solved the broad composition, yet the local texture feels too evenly generated.&lt;/p&gt;

&lt;p&gt;Another reason is context. Reports from users suggest that repeated edits in the same chat can sometimes amplify artifacts. A first image may contain only faint grain, then a revision can preserve and reinterpret that grain as part of the visual content. After several turns, the texture becomes more visible. This is why some creators see better results when they restart the image workflow, use a clean prompt, save lossless source files, and avoid using a noisy prior output as the next reference.&lt;/p&gt;

&lt;p&gt;There is also a provenance question. OpenAI has described ChatGPT Images 2.0 as a major step forward in realism, instruction following, world knowledge, and dense text generation, while also emphasizing safeguards for synthetic media. Public discussion has connected some recurring texture patterns with possible watermarking or provenance signals, although the exact causes of specific visible artifacts have not been fully published. For practical users, the safest conclusion is simple. Treat visible grain as a quality control issue, and treat provenance as a separate policy and trust issue unless the model provider gives a precise technical explanation.&lt;/p&gt;

&lt;p&gt;The most useful way to work with GPT Image 2 is to separate ideation from final production. Use &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; to explore the argument, structure the diagram, and test whether the visual story makes sense. Use &lt;a href="https://gemini.google.com/app" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; as a second reader for hierarchy, missing labels, and confusing flows. When formulas, equations, or technical notation appear in the image, &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; can help recover the math into usable formula form. When an AI generated paper figure looks promising but the pixels carry noise, &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; can turn that figure into an editable vector format so the final version can be cleaned, aligned, and prepared for publication.&lt;/p&gt;

&lt;p&gt;Prompting still matters. Ask for clean flat regions, restrained background detail, consistent lighting, minimal texture, large readable labels, and simple geometric icons. Avoid piling up style words that demand grit, paper grain, dust, cinematic detail, and microscopic texture in the same request. Inspect the original output before platform compression changes it. If the image starts to develop a dirty pattern, regenerate from a fresh prompt instead of asking for endless repairs inside the same conversation.&lt;/p&gt;

&lt;p&gt;The deeper lesson is that AI image quality is no longer judged only by whether the model can draw hands or spell labels. GPT Image 2 can produce complex, useful, and surprisingly legible information graphics. Its weakness appears in the material feeling of the image. A human designer often removes unnecessary texture to protect the idea. A generative model may add texture because it has learned that visual richness often correlates with finished work.&lt;/p&gt;

&lt;p&gt;That is why people can spot AI infographics so quickly. The problem is a mismatch between information design and generative aesthetics. Information design wants silence around the message. Generative aesthetics often wants every pixel to participate. Until image models become better at respecting empty space, stable flat color, and the quiet discipline of diagrams, the grain will keep giving them away.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>When GSAP Became a Skill for AI Agents</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Tue, 02 Jun 2026 11:35:06 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/when-gsap-became-a-skill-for-ai-agents-1b8e</link>
      <guid>https://dev.to/jacob_is_surfing/when-gsap-became-a-skill-for-ai-agents-1b8e</guid>
      <description>&lt;p&gt;GSAP has always occupied a special place in front end development. It is the library people reach for when CSS transitions begin to feel too thin, when a timeline needs rhythm, when scroll needs choreography, and when a product team wants motion that feels deliberate instead of accidental.&lt;/p&gt;

&lt;p&gt;The new official GSAP Skill matters because it gives AI coding agents a practical memory for animation craft. The point is simple. A general coding assistant can write JavaScript, yet animation has a different grammar. It needs timing, easing, sequencing, cleanup, accessibility, and a sense of how motion supports attention. Without that context, generated animation often works in a demo and then becomes fragile in a real interface.&lt;/p&gt;

&lt;p&gt;The GreenSock team has now packaged GSAP knowledge into structured Skill files. These files cover core methods such as gsap.to, gsap.from, gsap.fromTo, timelines, ScrollTrigger, framework patterns, plugins, and performance habits. That turns animation from a vague prompt into a guided workflow. The AI agent can choose transforms instead of layout heavy properties, use matchMedia for responsive motion, respect reduced motion preferences, and compose timelines with cleaner intent.&lt;/p&gt;

&lt;p&gt;This is important for designers and developers who are strong in product thinking but less fluent in motion design. They can describe a desired behavior in natural language, then let an agent build a first version with GSAP patterns that are closer to production quality. The result still needs human judgment. Motion has taste. It can clarify hierarchy or create noise. It can make a landing page feel alive or make a dashboard feel exhausting. The Skill raises the floor, while the human still decides what the interface should feel like.&lt;/p&gt;

&lt;p&gt;The bigger signal is that AI coding is moving from generic code generation toward domain guided execution. A Skill is useful when a tool has hidden rules that are difficult to rediscover from every prompt. GSAP has many of those rules. ScrollTrigger setup, timeline composition, React cleanup, plugin registration, and performance tuning are all small details that separate polished motion from code that merely moves pixels.&lt;/p&gt;

&lt;p&gt;That same pattern is spreading across creative and research workflows. &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; can help a team reason through interaction concepts and rewrite implementation prompts. &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; can turn mathematical images into editable formula text when technical content needs to move from screenshots into documents. &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; can convert AI generated paper figures into editable vector graphics, which is valuable when a research illustration must be cleaned up before publication.&lt;/p&gt;

&lt;p&gt;For web animation, the practical change is easy to imagine. A product manager asks for a scroll based reveal. An engineer asks for it to work in React and respect reduced motion. The AI agent reads the GSAP Skill, writes a timeline, scopes the context, registers the needed plugin, and explains where design judgment still matters. That workflow does not make everyone a motion designer overnight, but it does make competent animation less dependent on memorizing a library by hand.&lt;/p&gt;

&lt;p&gt;The release also changes how teams should think about AI assistance. The best agents will not only know more APIs. They will know when a domain has conventions, failure modes, and taste constraints. GSAP Skill is a small example with large implications. It suggests a future where AI tools carry craft knowledge in portable packages, and creators spend more energy deciding what should happen on screen.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Claude Code As A Working System</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Mon, 01 Jun 2026 03:02:48 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/claude-code-as-a-working-system-17gd</link>
      <guid>https://dev.to/jacob_is_surfing/claude-code-as-a-working-system-17gd</guid>
      <description>&lt;p&gt;Matt Van Horn recently shared a dense and unusually practical field report on Claude Code. The piece became interesting because it framed Claude Code as an operating rhythm for building, researching, writing, reviewing, and moving ideas through a real workflow.&lt;/p&gt;

&lt;p&gt;The center of his method is simple. Every meaningful idea becomes a plan file before it becomes code. A product thought, a bug report, a screenshot, a meeting transcript, a confusing terminal error, even a strategy question can be fed into Claude Code as material for a structured plan. The value comes from memory and visibility. A plan file gives the agent a stable object to inspect, revise, execute, and hand off between sessions. It also gives the human a checkpoint that is easier to review than a stream of chat.&lt;/p&gt;

&lt;p&gt;That is the first lesson. Claude Code becomes more useful when it is given concrete artifacts. A vague prompt produces a vague collaboration. A plan file turns the collaboration into something inspectable.&lt;/p&gt;

&lt;p&gt;Van Horn also describes a workflow built around voice. This sounds small until you imagine the friction it removes. When speech is routed into Claude Code, imperfect transcription matters less because the model has surrounding context. You can think aloud, restart a thought, describe a bug in messy human language, and still end up with a plan that can be edited. This changes the input bandwidth. The developer is no longer limited to typing perfect instructions while sitting at a desk.&lt;/p&gt;

&lt;p&gt;The second lesson is that agentic coding rewards rich capture. Screenshots, transcripts, spoken notes, repository context, and old plans all become raw material. Claude Code is strongest when it can compare the new request with the existing shape of the work. Anthropic describes Claude Code as an agentic coding tool that reads a codebase, edits files, runs commands, and integrates with development tools. Van Horn pushes that description into a broader practice. The terminal becomes a place where current context, project memory, and execution meet.&lt;/p&gt;

&lt;p&gt;His most provocative habit is running several Claude Code sessions at the same time. One session researches. Another drafts a plan. Another executes a previous plan. Another fixes a bug found during testing. This is a shift from single thread productivity to orchestration. The human stops acting like the only worker in the loop and starts acting like an editor, reviewer, and scheduler.&lt;/p&gt;

&lt;p&gt;That shift brings risk. Parallel agent work can create confusion, duplicated effort, or unsafe changes if the project has weak boundaries. Better operational discipline keeps the system healthy. Keep plans small enough to review. Use version control aggressively. Give each session a clear lane. Keep tests close to the work. Treat permissions as a security decision with real consequences. Claude Code can execute command line work, so its freedom should match the sensitivity of the repository.&lt;/p&gt;

&lt;p&gt;The most transferable idea in Van Horns article is research before planning. Before deciding how to build, he runs fresh research through his last30days tool, then feeds the results into the planning stage. That matters because modern developer choices age quickly. Libraries change. Community opinions shift. Product APIs move. A plan grounded in current evidence is much better than a plan built only from memory.&lt;/p&gt;

&lt;p&gt;This idea applies outside software engineering too. A marketer can turn customer calls into campaign plans. A founder can turn investor feedback into a product memo. A researcher can turn a rough sketch into an experiment plan. A student can turn messy notes into a study path. The common pattern is capture, research, plan, execute, review.&lt;/p&gt;

&lt;p&gt;This is also where adjacent AI tools fit naturally. When a workflow includes equations, screenshots, tables, or research figures, &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; can help recover formulas from images so they can be reused instead of retyped. When Claude Code produces a diagram concept for a paper or technical note, &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; can convert an AI generated academic figure into an editable vector figure format. For broader ideation, synthesis, and second pass critique, &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; can sit beside Claude Code as a reviewer or drafting partner.&lt;/p&gt;

&lt;p&gt;The deeper point is that Claude Code changes the shape of work around code. Planning becomes a file. Meetings become structured proposals. Research becomes input to implementation. Multiple sessions become a small production system. The human role becomes more deliberate, because the human has to decide what deserves automation, what needs review, and what should remain slow.&lt;/p&gt;

&lt;p&gt;The best version of this workflow feels less like asking a chatbot for help and more like running a compact studio. There is a research desk, a planning desk, an implementation desk, and a review desk. Claude Code can move through all of them, but the quality of the output still depends on taste, constraints, and judgment.&lt;/p&gt;

&lt;p&gt;Matt Van Horns article opens the imagination because it shows Claude Code as a medium for operational design. The breakthrough comes from arranging the work so that ideas have a path from capture to plan to shipped result.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Microsoft 2026 Future of Work Report: The AI Dividend Is Uneven and the Early Mover Window Is Closing</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Fri, 29 May 2026 03:55:53 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/microsoft-2026-future-of-work-report-the-ai-dividend-is-uneven-and-the-early-mover-window-is-36fn</link>
      <guid>https://dev.to/jacob_is_surfing/microsoft-2026-future-of-work-report-the-ai-dividend-is-uneven-and-the-early-mover-window-is-36fn</guid>
      <description>&lt;p&gt;In 2026, the most important question about AI at work is no longer whether people can use it. They already do. The sharper question is whether organizations are able to turn scattered individual gains into durable institutional advantage. Microsofts 2026 Work Trend Index and New Future of Work research point to the same uncomfortable answer. AI is creating real value, but that value is flowing first to people and companies with the culture, incentives, data habits, and managerial muscle to absorb it.&lt;/p&gt;

&lt;p&gt;That is why the AI dividend feels so uneven. Some teams are using agents to draft, analyze, test, summarize, design, and coordinate work with a speed that changes what a normal week can contain. Other teams have access to similar tools, yet still measure success by old activity metrics, old approval chains, and old job boundaries. The tool is new, but the operating system of the company remains unchanged.&lt;/p&gt;

&lt;p&gt;Microsofts report calls attention to a transformation paradox. Many AI users fear falling behind if they do not adapt quickly, yet a large share also feel safer focusing on current goals than redesigning work. The gap is not mainly a personal motivation problem. It is an organizational design problem. Workers can experiment, but if promotion, budget, compliance, and performance reviews reward the old workflow, the experiment stays local and fragile.&lt;/p&gt;

&lt;p&gt;The biggest signal in the 2026 research is that organizational factors matter more than individual enthusiasm. Culture, manager support, and talent practices explain far more reported AI impact than mindset alone. This changes the playbook for leaders. Buying seats is adoption. Capturing learning is absorption. The firms pulling ahead are building feedback loops where useful prompts, agent workflows, review standards, data assets, and process changes are shared across the company.&lt;/p&gt;

&lt;p&gt;The global picture is just as uneven. Microsofts AI diffusion data shows usage rising around the world, with stronger adoption in some high income economies and faster growth in parts of Asia. It also shows a widening gap between the Global North and Global South. Language support, infrastructure, device access, and trusted local use cases are now part of the productivity map. AI advantage is becoming a question of distribution as much as capability.&lt;/p&gt;

&lt;p&gt;For knowledge workers, the practical lesson is direct. The early advantage window is closing because basic tool access is becoming common. A year ago, simply knowing how to use a model could separate a team from its peers. In 2026, the edge comes from workflow depth. Can a research team move from a rough idea to a reviewed figure, a reproducible analysis, and a publication ready explanation faster than before. Can a marketing team convert customer signals into campaign tests without losing judgment. Can a product team let agents handle execution while humans raise the quality of decisions.&lt;/p&gt;

&lt;p&gt;This is where the tool stack matters. A researcher might use &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; to explore competing hypotheses, &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; to convert mathematical screenshots into editable formulas, and &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; to turn AI generated paper figures into editable vector graphics. The point is not to collect shiny apps. The point is to remove friction from the whole path between thinking, evidence, expression, and review.&lt;/p&gt;

&lt;p&gt;The same logic applies to companies. If AI is treated as a private productivity trick, value leaks away. If it is treated as a learning system, every useful interaction can improve the next one. Teams need shared libraries of proven workflows, clear standards for human review, permission to redesign roles, and managers who model responsible AI use in the open. Without that, the people most willing to experiment carry the risk while the organization captures only a fraction of the upside.&lt;/p&gt;

&lt;p&gt;The uncomfortable conclusion is that the AI dividend will not be evenly handed out. It will be earned by organizations that change the conditions around work. Access to models is becoming table stakes. Advantage now belongs to firms that can learn faster than their own bureaucracy, and to workers who can turn AI from a shortcut into a disciplined way of expanding judgment.&lt;/p&gt;

&lt;p&gt;The window is not closed yet, but it is narrower than it was. The next phase of AI at work will reward less talk about transformation and more evidence that work itself has been rebuilt.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>How to Build a Self Improving Company With AI</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Thu, 28 May 2026 04:09:59 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/how-to-build-a-self-improving-company-with-ai-11fi</link>
      <guid>https://dev.to/jacob_is_surfing/how-to-build-a-self-improving-company-with-ai-11fi</guid>
      <description>&lt;p&gt;Tom Blomfield, a general partner at Y Combinator, recently framed one of the most important questions for founders in the AI era: what happens when a company can observe its own work, reason about what is happening, change its tools, and learn from the result?&lt;/p&gt;

&lt;p&gt;The answer is bigger than a faster team. For years, software companies treated AI as an assistant placed beside existing workflows. An engineer asks for code. A marketer asks for copy. A support agent asks for a draft reply. That is useful, yet it leaves the company shape unchanged. The org chart still routes information upward through meetings, managers, dashboards, and status reports. Decisions still move downward through plans, tickets, and follow ups.&lt;/p&gt;

&lt;p&gt;Blomfield argues that AI pushes founders toward a different company design. The company becomes a set of learning loops. Every loop has sensors, memory, rules, tools, checks, and feedback. Product analytics notice friction. Customer conversations reveal repeated confusion. Sales calls surface objections. Engineering logs expose failures. AI systems read those signals, propose changes, test them, pass them through quality gates, and write the result back into the company memory.&lt;/p&gt;

&lt;p&gt;The phrase that matters is legible to AI. A company cannot improve itself if its knowledge lives only in private context and hallway conversation. Emails, calls, meetings, tickets, code reviews, decisions, experiments, and customer notes need to become searchable artifacts. The asset is the living context of the company. Software can be rebuilt. Reports can be regenerated. Playbooks can be rewritten. The durable advantage is a company brain that understands what the team has learned and can apply it again.&lt;/p&gt;

&lt;p&gt;This has immediate consequences for founders. The old question was how many people are needed to operate a function. The new question is how much of this function can become a closed learning loop. A product loop can find where users drop off, generate a test, ship a variant, measure the result, and update the roadmap. A support loop can cluster complaints, detect missing documentation, draft fixes, and route sensitive cases to humans. A research loop can collect papers, extract equations and charts, summarize evidence, and turn messy ideas into reusable assets. In that kind of workflow, tools such as &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; help convert formula images into usable math, &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; turns AI generated paper figures into editable vector graphics, while &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; and &lt;a href="https://gemini.google.com/app" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; can help reason across notes, drafts, code, and decisions.&lt;/p&gt;

&lt;p&gt;The provocative management implication is that coordination changes shape. Middle management historically existed because information was expensive to transmit and interpret. AI lowers that cost. A founder can ask the company brain what customers are struggling with this week, which experiments failed, which support topics are rising, and which policies conflict with the product roadmap. A strong individual contributor can operate with far more leverage because the system captures context, remembers decisions, and suggests the next move.&lt;/p&gt;

&lt;p&gt;Human judgment becomes more precise. People remain essential at the edge of the system: unusual customer situations, ethical calls, creative taste, final accountability, and moments where trust matters more than speed. The best human role shifts from moving information around to designing loops, setting standards, inspecting outputs, and handling ambiguity.&lt;/p&gt;

&lt;p&gt;There is also a measurement trap. Burning tokens can become more important than adding headcount, because tokens are the fuel of machine intelligence. Yet token usage alone is a poor trophy. A company can spend a huge number of tokens and learn nothing. The useful metric is whether each loop produces a better artifact: clearer documentation, better conversion, fewer support repeats, faster research, safer deployments, or stronger customer understanding.&lt;/p&gt;

&lt;p&gt;The path starts small. Pick one recurring workflow with clear inputs and visible outcomes. Capture every artifact. Define the policy. Give the AI tools it can use. Add quality gates. Measure the result. Feed the learning back into the system. Then repeat. The first version will feel modest. The tenth version may feel like a department that keeps improving while the team sleeps.&lt;/p&gt;

&lt;p&gt;The message from Blomfield lands because it changes the ambition. An AI company becomes an organization designed to see itself, remember itself, and revise itself. Founders who understand that shift will build teams that are smaller, faster, and stranger in the best possible way: companies whose real product is their own capacity to learn.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Hermes Agent Leads Global Token Use. What Does That Actually Mean?</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Wed, 27 May 2026 02:32:48 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/hermes-agent-leads-global-token-use-what-does-that-actually-mean-4n0n</link>
      <guid>https://dev.to/jacob_is_surfing/hermes-agent-leads-global-token-use-what-does-that-actually-mean-4n0n</guid>
      <description>&lt;p&gt;On May 26, 2026, OpenRouter displayed Hermes Agent at the top of its global app and agent ranking. The page reported 9.9 trillion tracked tokens for Hermes Agent and 6.25 trillion for OpenClaw in the popular apps view. Its daily global view showed 629 billion tokens for Hermes Agent and 154 billion for OpenClaw. This is an extraordinary surge for a young open source agent. It is also an invitation to ask a harder question. What does a token lead measure?&lt;/p&gt;

&lt;p&gt;The first answer is precise and limited. OpenRouter measures routed token activity from public applications and agents that choose to participate in usage tracking. The ranking reveals where a great deal of model computation is flowing on that platform. It does not measure every private deployment, local model run, direct provider request, task outcome, or user satisfaction score. Hermes has captured enormous visible activity. Capability evaluation still needs evidence about results, cost, latency, reliability, and risk.&lt;/p&gt;

&lt;p&gt;The activity itself makes sense when one examines the product. Nous Research presents Hermes Agent as a persistent, self improving agent with cross session memory, reusable skills formed from experience, more than 40 built in tools, scheduled automations, and subagents. A system designed to remain available, recall context, inspect environments, use tools, and revise its own routines has many occasions to call a model. OpenClaw also acts across messaging apps and real user actions. Both sit in the class of systems where one request can unfold into planning, browsing, tool calls, checking, memory updates, and follow up work.&lt;/p&gt;

&lt;p&gt;This shift changes the meaning of demand. During the chatbot era, a large token count often meant many people typed many questions. During the agent era, a large count may mean fewer users delegated longer processes. One research request can open sources, compare claims, extract data, draft a report, test outputs, and preserve lessons for another session. The useful unit for buyers and builders becomes successful work completed per dollar, per minute, and per permitted risk, rather than raw tokens processed.&lt;/p&gt;

&lt;p&gt;Research is already warning against simple conclusions. A recent study on token consumption in agentic coding tasks reports that agent work can consume vastly more tokens than coding chat, with input context driving much of the cost. The study also finds large variation between runs of the same task and reports that more token consumption does not reliably produce higher accuracy. These results fit a practical observation. An agent may spend tokens because it is exploring a difficult route, repeatedly recovering from mistakes, carrying unnecessary context, or completing valuable multi step work. A total count alone cannot distinguish among those cases.&lt;/p&gt;

&lt;p&gt;Hermes reaching first place therefore matters in three ways. It signals real appetite for persistent agents that remember and learn routines. It places cost control at the center of product design, since memory, skills, and unattended loops can multiply context rapidly. It also raises the bar for measurement. A credible dashboard should pair token totals with completion rates, human correction rates, cache use, tool failure rates, elapsed time, model mix, permissions exercised, and cost per accepted deliverable.&lt;/p&gt;

&lt;p&gt;For individual users, the right question is concrete. Did the agent finish a meaningful task with an acceptable bill and a reviewable trail? For teams, governance becomes equally concrete. Put budgets on unattended jobs. Record tool calls. Require approval for sensitive actions. Evaluate agents on representative tasks before broad deployment. Use smaller or cheaper models for routine stages when quality remains sufficient. Compress history and retrieve only relevant memory when the work allows it.&lt;/p&gt;

&lt;p&gt;The same logic applies to content and research workflows. A team might use &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; or &lt;a href="https://gemini.google.com/app" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; to organize a literature scan and shape a first draft. When a paper contains an equation only as an image, &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; can turn that visual equation into editable mathematical content. When an AI generated academic figure needs precise publication edits, &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; can convert it into an editable vector format. Tokens become worthwhile when the workflow reaches usable, checkable artifacts and avoids repeated manual reconstruction.&lt;/p&gt;

&lt;p&gt;There is also a strategic message for agent makers. More autonomy expands the surface area of every design decision. Loading every skill into every turn can waste context. Repeating failed tool loops can burn budgets. Persistent memory can improve continuity while also increasing retrieval and privacy obligations. Token efficiency is part of product quality because it influences affordability, speed, environmental load, and trust.&lt;/p&gt;

&lt;p&gt;Hermes Agent leading OpenRouter is a meaningful market signal. Developers appear eager to run agents that do sustained work and learn reusable procedures. The leaderboard provides evidence of attention and computation moving toward that model. The next contest is more demanding. Agents must show that their trillions of tokens turn into completed work, controlled costs, accountable actions, and outputs people can confidently use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openrouter.ai/apps" rel="noopener noreferrer"&gt;OpenRouter App and Agent Rankings&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Nous Research Hermes Agent Repository&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2604.22750" rel="noopener noreferrer"&gt;Study on Token Consumption in Agentic Coding Tasks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2605.10912" rel="noopener noreferrer"&gt;WildClawBench Evaluation of Long Horizon Agents&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Why Skills Matter When A Prompt Already Exists</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Tue, 26 May 2026 02:57:21 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/why-skills-matter-when-a-prompt-already-exists-18e6</link>
      <guid>https://dev.to/jacob_is_surfing/why-skills-matter-when-a-prompt-already-exists-18e6</guid>
      <description>&lt;p&gt;The skeptical question is reasonable. A skill often begins as a Markdown file full of instructions. A prompt is also text full of instructions. If both eventually enter a model context, why has the industry suddenly become so excited about skills?&lt;/p&gt;

&lt;p&gt;The short answer is that a prompt captures a request, while a skill captures a repeatable way of completing work. The difference appears once a task has files, tools, checks, team standards, and a need to work reliably again next week.&lt;/p&gt;

&lt;p&gt;The Agent Skills specification makes this concrete. A skill is a directory with a required &lt;code&gt;SKILL.md&lt;/code&gt; file. That file contains a name, a description, and operating instructions. The directory may also contain scripts, reference documents, and assets such as templates. This design turns a useful instruction into a portable work package. A team can read it, version it, revise it, and run it in compatible agent environments.&lt;/p&gt;

&lt;p&gt;Its most important design choice is progressive disclosure. At the start, an agent sees only a skill name and description. When the task matches, it loads the detailed instructions. It reaches for references, scripts, or assets only when the workflow requires them. The specification recommends a compact main instruction file and focused supporting resources. This gives an agent access to rich expertise while protecting attention and context capacity during unrelated work.&lt;/p&gt;

&lt;p&gt;A long reusable prompt has a different cost profile. Paste a complete style guide, spreadsheet policy, chart checklist, and publication template into every conversation, and the model must process all of them even for a small question. Keep those materials inside a skill, and the agent can discover the relevant capability before loading only the part needed for the current job. Good skills therefore improve context management as much as instruction quality.&lt;/p&gt;

&lt;p&gt;OpenAI Academy described skills for &lt;a href="https://chatgpt.com/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; in April 2026 as reusable workflows for recurring tasks. The practical point matters more than the label. A successful workflow records the required inputs, ordered steps, output format, supporting resources, and final checks. When such a workflow is shared, quality depends less on whether one person remembers the perfect prompt at the perfect moment.&lt;/p&gt;

&lt;p&gt;Consider preparing technical content for a research team. A prompt could ask an assistant to write a methods summary. A skill can require the assistant to collect the source files, apply an approved outline, preserve citation fields, flag missing evidence, and perform a final terminology check. If an equation arrives as a screenshot, the workflow can route it through &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; to produce editable mathematical content before writing begins. If an AI generated scientific illustration needs publication edits, it can route the image through &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; to obtain an editable vector format for precise labels, colors, and layout. The skill carries the procedure and its decision points, so the user does not need to reconstruct them in each conversation.&lt;/p&gt;

&lt;p&gt;This also explains why skills attract platform builders. They provide a clean layer between a general model and a particular organization. A laboratory can package figure preparation rules. A marketing group can package brand review. A finance team can package reporting steps and validation scripts. The underlying model can improve over time while the organization keeps its playbooks inspectable and under version control.&lt;/p&gt;

&lt;p&gt;There is no magic in the file extension. A poor skill with vague instructions can fail just as a poor prompt fails. A skill that triggers too broadly wastes context. A skill that hides unsafe scripts introduces real risk. Microsoft documentation for Agent Skills warns that skill instructions and executable scripts deserve the review and governance applied to third party code. Teams should inspect sources, restrict permissions, test on representative tasks, and keep approval boundaries visible.&lt;/p&gt;

&lt;p&gt;The most useful test is simple. A one time question deserves a well written prompt. A recurring process with source material, tools, quality gates, and multiple users is a strong candidate for a skill. Skills become valuable when they reduce repeated explanation, preserve hard won decisions, and let experts improve a workflow once for everyone who uses it later.&lt;/p&gt;

&lt;p&gt;That is the real reason for the attention. Models can already follow instructions. Skills give those instructions a home, a loading strategy, supporting materials, execution paths, and a reviewable life cycle. In daily work, reliability rarely comes from a dazzling sentence typed at the last minute. It grows from a process that can be found, followed, checked, and refined.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://agentskills.io/specification" rel="noopener noreferrer"&gt;Agent Skills Specification&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://openai.com/academy/skills/" rel="noopener noreferrer"&gt;OpenAI Academy: Using skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/agent-framework/agents/skills" rel="noopener noreferrer"&gt;Microsoft Learn: Agent Skills&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Gemini New Usage Limits Turn Prompts Into A Budget Question</title>
      <dc:creator>Captain Jack Smith</dc:creator>
      <pubDate>Mon, 25 May 2026 02:37:24 +0000</pubDate>
      <link>https://dev.to/jacob_is_surfing/gemini-new-usage-limits-turn-prompts-into-a-budget-question-4l8l</link>
      <guid>https://dev.to/jacob_is_surfing/gemini-new-usage-limits-turn-prompts-into-a-budget-question-4l8l</guid>
      <description>&lt;p&gt;On May 17, 2026, a routine session in Gemini began to feel different for many users. A long analysis, a media creation task, or an extended conversation could consume visible capacity quickly. A familiar daily allowance had given way to a meter shaped by computing effort. The change raises a fair question. Is Google following the wider AI industry in making firm numbers harder to see?&lt;/p&gt;

&lt;p&gt;Subscription notices and Google support guidance reported on May 19, 2026 provide the clearest description. The Gemini app is moving from daily prompt limits to a compute used model. The complexity of a prompt, the feature selected, and the length of a chat all affect usage. Capacity refreshes every five hours until a weekly limit is reached. Google also describes tiers through relative capacity, including AI Pro and AI Ultra plans with higher usage than their reference tier.&lt;/p&gt;

&lt;p&gt;This approach has a rational technical foundation. One short request for a summary requires much less infrastructure than a large coding session, video generation, deep research, or reasoning over extensive attachments. A fixed message count treats radically different workloads as equal. A compute based system can allocate scarce accelerators with greater precision, especially as the product adds richer models and agent features.&lt;/p&gt;

&lt;p&gt;The problem is legibility. Under a daily prompt number, a subscriber can plan a workday in advance. Under a dynamic meter, the user discovers the price of a task during the task. A researcher preparing a report may begin with ample capacity, add files and a complex request, then see a large portion of the allowance disappear. A creator may hesitate before experimenting because each iteration has an uncertain cost. Predictability is part of product value, especially for paying professionals whose deadlines cannot wait for a five hour refresh.&lt;/p&gt;

&lt;p&gt;Google has increased visibility in one sense by exposing usage status and reset timing in the product. At the same time, the new vocabulary centers on multipliers and compute use rather than a stable list of prompt counts for ordinary Gemini conversations. These two choices create an unusual trade. Users can see consumption after actions occur, while estimating consumption before a demanding action remains difficult. That is the origin of the frustration visible across subscriber discussions.&lt;/p&gt;

&lt;p&gt;Calling this simple concealment would miss an important part of the story. Variable pricing logic matches the real cost profile of multimodal AI. Yet transparency needs more than a meter. A strong implementation would show an estimated cost range before expensive actions, explain which features draw heavily on the current window, publish representative workload examples, and give subscribers a clear view of both their five hour capacity and weekly balance. Such information would help people choose models and workflows with confidence.&lt;/p&gt;

&lt;p&gt;The change also teaches users to separate casual prompting from production work. For quick brainstorming and general questions, &lt;a href="https://gemini.google.com/app" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; can remain a smooth starting point. For document heavy scientific workflows, specialized tools can preserve capacity and shorten revision cycles. When a formula is trapped in a screenshot or scanned page, &lt;a href="https://imgtoformula.com/" rel="noopener noreferrer"&gt;Miss Formula&lt;/a&gt; can convert it into editable mathematical content before it enters a larger writing process. When an AI generated paper figure needs refinement for a journal submission, &lt;a href="https://editablefigure.com/" rel="noopener noreferrer"&gt;Editable Figure&lt;/a&gt; can transform it into an editable vector format so labels, colors, and layout can be revised directly.&lt;/p&gt;

&lt;p&gt;This division of labor has a practical advantage. Sending cleaner inputs to a general assistant can reduce repeated clarification, reduce oversized context, and make every high value interaction more purposeful. It also keeps crucial research artifacts editable outside a chat session. A usage meter feels less threatening when the workflow does not depend on repeatedly asking one model to repair every intermediate asset.&lt;/p&gt;

&lt;p&gt;For Google, the stakes extend beyond server efficiency. An AI subscription is a promise about access during moments when intelligence is needed. If the consumer must treat an important prompt as an uncertain expense, trust becomes part of the quota debate. The company can earn that trust by publishing clearer guidance, improving cost previews, and making tier comparisons concrete enough for real projects.&lt;/p&gt;

&lt;p&gt;The May 17 shift is therefore a meaningful test for Gemini. Compute based limits may be the sensible infrastructure policy for increasingly powerful systems. Users also deserve planning tools that match the seriousness of their work. The winning AI service will offer capable models, visible limits, and enough foresight for a user to start an ambitious task without wondering whether the meter will end the session halfway through.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
  </channel>
</rss>
