<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tessl</title>
    <description>The latest articles on DEV Community by Tessl (@tessl).</description>
    <link>https://dev.to/tessl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F12956%2Fa0174916-e61b-4172-b5d6-29c9445932f5.png</url>
      <title>DEV Community: Tessl</title>
      <link>https://dev.to/tessl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tessl"/>
    <language>en</language>
    <item>
      <title>AI Native DevCon’26: The London conference for developers building with AI</title>
      <dc:creator>Rohan Sharma</dc:creator>
      <pubDate>Thu, 21 May 2026 06:06:32 +0000</pubDate>
      <link>https://dev.to/tessl/ai-native-devcon26-the-london-conference-for-developers-building-with-ai-4nm9</link>
      <guid>https://dev.to/tessl/ai-native-devcon26-the-london-conference-for-developers-building-with-ai-4nm9</guid>
      <description>&lt;p&gt;The bottleneck moved from writing code to governing it.&lt;/p&gt;

&lt;p&gt;The promise was 2× throughput. The reality is 2× the review queue, 2× the security exposure, and a CI signal you can no longer trust. &lt;a href="https://tessl.io/devcon" rel="noopener noreferrer"&gt;AI Native DevCon&lt;/a&gt; 2026 is for the engineering leaders who have to figure out how to ship anyway.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tessl.io/devcon" rel="noopener noreferrer"&gt;AI Native DevCon&lt;/a&gt; 2026 lands at The Brewery in London on June 1 and 2, with a hybrid track for remote. This is the conference for VPs of engineering, CTOs, platform owners, security leads, and senior engineers running agents in production, or about to. 500+ builders. Four tracks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgp2a7mihm3ub05mz53mx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgp2a7mihm3ub05mz53mx.png" alt="sponsors" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/guypo/" rel="noopener noreferrer"&gt;Guy Podjarny&lt;/a&gt;, founder of &lt;a href="https://tessl.io" rel="noopener noreferrer"&gt;Tessl&lt;/a&gt;, organizer of &lt;a href="https://tessl.io/devcon" rel="noopener noreferrer"&gt;AI Native DevCon&lt;/a&gt;, and previously of &lt;a href="https://snyk.io/" rel="noopener noreferrer"&gt;Snyk&lt;/a&gt;, frames the 2026 question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“If 2025 was the year coding agents started showing real promise, 2026 is the year we figure out how they hold up in production. The challenge is no longer getting an agent to work, it is getting it to work consistently across teams, codebases, and environments without constant human correction.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The schedule is organized around four tracks: &lt;strong&gt;context engineering&lt;/strong&gt; (building with agents), &lt;strong&gt;agent orchestration&lt;/strong&gt; (verification when CI is no longer enough), &lt;strong&gt;organizational enablement&lt;/strong&gt; (coordination at agent throughput), and &lt;strong&gt;agent enablement&lt;/strong&gt; (security and governance). Each maps to a problem most teams are already hitting.&lt;/p&gt;

&lt;p&gt;The agenda is built around the problems they actually have right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. “We do not know how to build with agents yet.”
&lt;/h2&gt;

&lt;p&gt;How the engineer’s role is changing, and what products designed for humans need to do once agents start using them. By 2026, that question lands on every platform team. This is the context engineering track.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/ryanlopopolo/" rel="noopener noreferrer"&gt;&lt;strong&gt;Ryan Lopopolo&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(OpenAI)&lt;/strong&gt;, &lt;em&gt;Harness Engineering&lt;/em&gt;. Concrete patterns for systems where humans set direction and agents execute, including the review and approval surfaces that keep it safe at scale.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/dglawson" rel="noopener noreferrer"&gt;&lt;strong&gt;Dana Lawson&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Netlify, CTO)&lt;/strong&gt;, &lt;em&gt;Built for Humans. Now Agents Are Here.&lt;/em&gt; What changes in a developer platform when half the users are non-human, and the API and UX decisions Netlify made in response.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/anatomic/" rel="noopener noreferrer"&gt;&lt;strong&gt;Ian Thomas&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Meta)&lt;/strong&gt;, &lt;em&gt;AI Native Engineering&lt;/em&gt;. How a large engineering org is restructuring workflows around agent-assisted development.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://uk.linkedin.com/in/steve-ruiz-61a150239" rel="noopener noreferrer"&gt;&lt;strong&gt;Steve Ruiz&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(tldraw)&lt;/strong&gt;, &lt;em&gt;Agents on the canvas&lt;/em&gt;. Interaction patterns for visual agents, with shipping examples you can copy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. “We can generate code. We cannot verify it.”
&lt;/h2&gt;

&lt;p&gt;CI is no longer evidence of correctness. Two years of agent-generated code has proved it. The agent orchestration track is about what to put in its place.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/justincormack/" rel="noopener noreferrer"&gt;&lt;strong&gt;Justin Cormack&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(ex-Docker CTO)&lt;/strong&gt;, &lt;em&gt;When Tests Lie&lt;/em&gt;. Runtime signals that flag agent-introduced drift before it reaches users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.linkedin.com/in/dave-farley-a67927/" rel="noopener noreferrer"&gt;Dave Farley&lt;/a&gt; (Founder &amp;amp; CEO of Continuous Delivery Ltd. - 250k on Youtube),&lt;/strong&gt; &lt;em&gt;Vibe Coding, really?&lt;/em&gt;  The ideas that may actually survive the AI programming revolution, beyond hype, demos, and generated boilerplate.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/fowlerchad/" rel="noopener noreferrer"&gt;&lt;strong&gt;Chad Fowler&lt;/strong&gt;&lt;/a&gt;, &lt;em&gt;Regenerative Software&lt;/em&gt;. An architectural model where components are regenerated rather than patched, and what verification looks like when code is short-lived by design.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. “AI writes code faster than teams can coordinate.”
&lt;/h2&gt;

&lt;p&gt;Two years into coding-agent adoption, throughput is up roughly 2×. Coordination cost scaled with it. The organizational enablement track covers review, ownership, and team structure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ng9y6sw60tvf7ann6jw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ng9y6sw60tvf7ann6jw.png" alt="guypo" width="800" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/guypo/" rel="noopener noreferrer"&gt;&lt;strong&gt;Guy Podjarny&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Tessl)&lt;/strong&gt;, &lt;em&gt;Skills are the new Code&lt;/em&gt; (keynote). The case for &lt;a href="https://tessl.io/registry" rel="noopener noreferrer"&gt;treating skills as proper software&lt;/a&gt;: versioned, tested, owned, reviewed. With the Tessl Registry now holding 2,000+ evaluated skills, the talk covers what that means for repo structure and review process.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/birgittaboeckeler/" rel="noopener noreferrer"&gt;&lt;strong&gt;Birgitta Böckeler&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Thoughtworks)&lt;/strong&gt;, &lt;em&gt;State of Play: AI Coding Assistants&lt;/em&gt; (keynote). Two years of field data on which adoption patterns work and which create future technical debt.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/patrickdebois/" rel="noopener noreferrer"&gt;&lt;strong&gt;Patrick Debois&lt;/strong&gt;&lt;/a&gt;, &lt;em&gt;The Rise of Agent Enablement&lt;/em&gt;. &lt;a href="https://tessl.io/agent-enablement" rel="noopener noreferrer"&gt;Agent Enablement&lt;/a&gt; is the function that owns reliable agent adoption inside an engineering org. It defines standards for skills, evals, and workflows, and sits next to DevOps and Platform Engineering. Patrick’s session covers who owns it, what they do, and how teams formalize it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. “Your AI is a new attack surface.”
&lt;/h2&gt;

&lt;p&gt;Vulnerability classes that did not exist 18 months ago, and the controls most teams have not put in place yet. This is the agent enablement track from a security and governance angle.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/talliran/" rel="noopener noreferrer"&gt;&lt;strong&gt;Liran Tal&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Snyk)&lt;/strong&gt;, &lt;em&gt;Your AI Agent Installed Malware Because a SKILL.md Told It To&lt;/em&gt;. Live demo of prompt-injection via SKILL.md manifests, with the threat model and mitigations.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://linkedin.com/in/jkcso" rel="noopener noreferrer"&gt;&lt;strong&gt;Joseph Katsioloudes&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(GitHub)&lt;/strong&gt;, &lt;em&gt;Code Security Reinvented&lt;/em&gt;. How SAST, secret scanning, and review need to change for AI-generated code.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/jack-wotherspoon/" rel="noopener noreferrer"&gt;&lt;strong&gt;Jack Wotherspoon&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(Google)&lt;/strong&gt;, &lt;em&gt;Humans vs. Slop&lt;/em&gt;. New rules for open source maintainers when an unknown share of contributors are agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why engineering leaders should attend
&lt;/h2&gt;

&lt;p&gt;Five things your team brings back to Monday:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A verification model that does not assume CI catches the regression&lt;/li&gt;
&lt;li&gt;Threat models for prompt-injection and SKILL.md attacks, with mitigations&lt;/li&gt;
&lt;li&gt;Team structures and review workflows that scale with agent throughput&lt;/li&gt;
&lt;li&gt;A working definition of Agent Enablement as a discipline, including ownership and scope&lt;/li&gt;
&lt;li&gt;A model for evaluating skills before they go org-wide, with review patterns and KPIs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvtxq925k99a5393mqw2d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvtxq925k99a5393mqw2d.png" alt="crowd" width="799" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hosts and the wider lineup
&lt;/h2&gt;

&lt;p&gt;Hosted by &lt;a href="https://www.linkedin.com/in/sammyhepburn/" rel="noopener noreferrer"&gt;&lt;strong&gt;Sam Hepburn&lt;/strong&gt;&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/patrickdebois/" rel="noopener noreferrer"&gt;&lt;strong&gt;Patrick Debois&lt;/strong&gt;&lt;/a&gt;. Day-one keynote from &lt;a href="https://x.com/lievenscheire" rel="noopener noreferrer"&gt;&lt;strong&gt;Lieven Scheire&lt;/strong&gt;&lt;/a&gt; on AI from outside the engineering bubble. The wider roster covers agent observability, MCP transports, runtime intelligence, brownfield adoption, and team-level adoption metrics, with practitioners from Anthropic, OpenAI, NVIDIA, Adobe, Hugging Face, Mozilla.ai, Cisco, Nearform, GitHub, and much more.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkdvcmrgty5seazs7mhf6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkdvcmrgty5seazs7mhf6.png" alt="speakers" width="800" height="857"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Full speaker list and abstracts: &lt;a href="https://tessl.io/devcon" rel="noopener noreferrer"&gt;tessl.io/devcon&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dates:&lt;/strong&gt; June 1 and 2, 2026&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format:&lt;/strong&gt; 2 days in-person or 1 day virtual&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Venue:&lt;/strong&gt; The Brewery, Barbican, London.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Register:&lt;/strong&gt; &lt;a href="https://luma.com/aidevcon-ldn26?coupon=R30" rel="noopener noreferrer"&gt;https://luma.com/aidevcon-ldn26?coupon=R30&lt;/a&gt; (&lt;code&gt;R30&lt;/code&gt; auto-applies at checkout to knocks off 30% off)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bringing a team?&lt;/strong&gt; Contact at &lt;a href="https://tessl.io/get-in-touch/" rel="noopener noreferrer"&gt;tessl.io/get-in-touch&lt;/a&gt;, and we can arrange a group purchase discount.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See you at the event!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>career</category>
      <category>eventsinyourcity</category>
    </item>
    <item>
      <title>Stop trusting your agent skills with vibes. Eliminate the context security risk.</title>
      <dc:creator>Tessl</dc:creator>
      <pubDate>Fri, 15 May 2026 04:55:29 +0000</pubDate>
      <link>https://dev.to/tessl/stop-trusting-your-agent-skills-with-vibes-eliminate-the-context-security-risk-1jld</link>
      <guid>https://dev.to/tessl/stop-trusting-your-agent-skills-with-vibes-eliminate-the-context-security-risk-1jld</guid>
      <description>&lt;p&gt;When you install an npm package, you can run &lt;code&gt;npm audit&lt;/code&gt;. When you install a Python package, there's &lt;code&gt;pip-audit&lt;/code&gt;. But when you install plugins that give your AI agent new skills and rules, you know, things that directly shape how it reasons and what it does, what do you run?&lt;/p&gt;

&lt;p&gt;If your answer is "nothing", you're not alone, and that's why I built &lt;code&gt;tessl-audit&lt;/code&gt;! You can check it out on &lt;a href="https://github.com/AI-Native-Dev-Community/tessl-audit" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and &lt;a href="https://www.npmjs.com/package/tessl-audit" rel="noopener noreferrer"&gt;npm&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters more than you think
&lt;/h2&gt;

&lt;p&gt;Agent plugins are &lt;em&gt;instructions&lt;/em&gt; that get loaded into your AI agent's context. A plugin with a security issue doesn't just expose a server endpoint. It can influence the agent's behaviour in ways that are subtle and hard to detect, perhaps nudging it toward unsafe patterns, exposing data it shouldn't, or simply making it worse at its job.&lt;/p&gt;

&lt;p&gt;Ask yourself these three questions about your agent skills, and if the answer to any of them is no, you’re seconds away from being able to say yes, with tessl-audit.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Have all your skills been security scanned?&lt;/strong&gt; If so, what was the result?&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Can you prove your skills are any good?&lt;/strong&gt; Quality scores tell you how well-written and complete a plugin is. A low score means the agent is getting poor guidance.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Do your skills and plugins actually help?&lt;/strong&gt; Uplift scores measure whether a plugin improves agent task performance compared to a vanilla agent alone.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://tessl.io/devcon" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbraf14e4s66n7ibzuwuk.png" alt="Join us at AI Native DevCon" width="800" height="267"&gt;&lt;/a&gt;&lt;br&gt;Join us at AI Native DevCon (use C0DE30 for 30% discount)
&lt;/p&gt;
&lt;h2&gt;
  
  
  Why not try it right now?
&lt;/h2&gt;

&lt;p&gt;It’s a free open source tool that uses Tessl under the covers. If you have a Tessl project with plugins installed, just run this in your project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;npx tessl-audit
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait, is that it? Absolutely, that's it. It reads your &lt;code&gt;tessl.json&lt;/code&gt;, fetches live data from the registry for every plugin, and prints a report in about 30 seconds.&lt;/p&gt;

&lt;p&gt;The script begins by looking through all your context file that it finds in the tessl.json manifest file. This should complete pretty quickly and you’ll soon see the table below, with a breakdown of your project context., and the types of warnings that have been picked up.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb0rrz9ig4r2nebvw87p3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb0rrz9ig4r2nebvw87p3.png" alt="image1" width="800" height="546"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, the tool gives a posture summary of all of your context, giving more details of the riskiest skills in your project and what the issues are.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft9xwxk46mxgxqvjtqios.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft9xwxk46mxgxqvjtqios.png" alt="img2" width="800" height="546"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can click through on any of these links to see the actual issues in the registry web UI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsib0z1ar0osa3lfxvrau.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsib0z1ar0osa3lfxvrau.png" alt="img3" width="800" height="617"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And finally, the tool provides next step actions of the CLI commands to use (you can use an agent to call these also) to optimize, create and run evals on your skills.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwwtr6gssymroeyl5g4cf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwwtr6gssymroeyl5g4cf.png" alt="img4" width="800" height="546"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The "so what" for each finding
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Advisory, Risky, or Critical security status?
&lt;/h3&gt;

&lt;p&gt;The report prints each flagged plugin with its warning codes and a direct link to the full security report on the registry. No need to chase them down, the security posture report lets you see the full summary in one listing, allowing you to deep dive here needed. Just open the link, read the finding, decide if it applies to your use case.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quality below 80%?
&lt;/h3&gt;

&lt;p&gt;The plugin you’re using is giving your agent incomplete or poorly-structured guidance. Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;tessl skill review --optimize workspace/plugin-name
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs a quality review and applies automatic improvements.&lt;/p&gt;

&lt;h3&gt;
  
  
  No uplift data?
&lt;/h3&gt;

&lt;p&gt;The plugin has never been evaluated against real tasks — so you have no idea if it's helping or hurting. Fix that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;tessl scenario generate --count 5 workspace/plugin-name
tessl eval run workspace/plugin-name
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Generate a set of test scenarios from the plugin, then run the eval. You'll get a concrete uplift score showing whether the plugin is worth keeping.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger picture
&lt;/h2&gt;

&lt;p&gt;Every team that uses AI agents is building a dependency graph of skills, rules, and knowledge, just like they build a dependency graph of packages. The tooling for auditing that graph is still being built, but the risks are real and growing.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tessl-audit&lt;/code&gt; is a small, practical step: one command, zero installation, actionable output. Run it today and find out what your agent is actually working with.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;npx tessl-audit
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;&lt;code&gt;tessl-audit&lt;/code&gt; requires the Tessl CLI (no worries, it’s already a dependency) and an authenticated Tessl session (just create a free account if you haven’t got one). You’ll need a &lt;code&gt;tessl.json&lt;/code&gt; in order to run the &lt;code&gt;tessl-audit&lt;/code&gt; tool, which is a context manifest tile.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Useful docs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://docs.tessl.io/evaluate/evaluate-skill-quality-using-scenarios" rel="noopener noreferrer"&gt;Evaluate skill quality using scenarios&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://docs.tessl.io/evaluate/evaluating-skills" rel="noopener noreferrer"&gt;Review a skill against best practices&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://tessl.io/registry/tessl-labs/skill-optimizer" rel="noopener noreferrer"&gt;Skill Optimizer plugin&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>security</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Tessl Admin Guide: Organizations, Workspaces, and Roles</title>
      <dc:creator>Tessl</dc:creator>
      <pubDate>Thu, 14 May 2026 06:45:55 +0000</pubDate>
      <link>https://dev.to/tessl/tessl-admin-guide-organizations-workspaces-and-roles-4m75</link>
      <guid>https://dev.to/tessl/tessl-admin-guide-organizations-workspaces-and-roles-4m75</guid>
      <description>&lt;p&gt;Just signed up to Tessl? Wondering next steps to rolling Tessl out to your team? The following article will take you through the steps of managing your top level Organization, invite your users, set policy items, then create your workspaces, assigning membership to those workspaces and defining their &lt;a href="https://docs.tessl.io/reference/roles" rel="noopener noreferrer"&gt;roles&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Organizations and Workspaces work in Tessl
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Organizations&lt;/em&gt;&lt;/strong&gt; are top level entities, often representing the billing or corporate entity, with a subcategory called &lt;strong&gt;&lt;em&gt;Workspaces&lt;/em&gt;&lt;/strong&gt; that provide role-based access to the various users across the company.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy8e06vahqpkgmjfc731a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy8e06vahqpkgmjfc731a.png" alt="A diagram showing a top level Organization, with many workspaces below. Some with Search,Install, and Publish permissions, some with just Install and Publish, and one with no access." width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up your Tessl Organization
&lt;/h2&gt;

&lt;p&gt;Organizations are sometimes created during the presales phase of acquiring Tessl, or may be created later. If one has not been created, it will be auto created when you create your first workspace. If prompted, click &lt;strong&gt;Create workspace&lt;/strong&gt; and name it after your team (i.e. YourCompanyName-Engineering) to start.&lt;/p&gt;

&lt;p&gt;Note workspace names must be unique at this time, and will appear in plugin-names when searched. This is most notable if the plugins are published publicly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Festzqydqhxluufhxo384.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Festzqydqhxluufhxo384.png" alt="View of the registry page where a Create workspace button is being discplayed." width="800" height="1466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The workspace should now be visible from the main interface&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitmjpzw8kyrt0ii7ag52.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitmjpzw8kyrt0ii7ag52.png" alt="The workspace selector will appear, displaying the workspaces you have access to,  with sub menu items like eval runs, projects, etc dependant on your permissions." width="800" height="631"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The organization can now be observed by clicking your Account, where your name is displayed, on the bottom left&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbl7k6wrz7zz90yaf4zkm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbl7k6wrz7zz90yaf4zkm.png" alt="By selecting your account/profile, the organization will be displayed with sub menu of members, settings, admin keys, depending on your permissions." width="800" height="1167"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once created, navigate to &lt;strong&gt;&lt;em&gt;Settings&lt;/em&gt;&lt;/strong&gt; for your Organization, rename the organization to your company name and specify if users can publicly share &lt;a href="https://docs.tessl.io/create/creating-skills" rel="noopener noreferrer"&gt;skills&lt;/a&gt; by enabling the button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdjvzjeg5zdqw9ceb5im.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdjvzjeg5zdqw9ceb5im.png" alt="Organization settings displayes an organization name, the ability to save, and an option to block public tile publishing by toggling a selector." width="800" height="599"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating and managing Users in Tessl
&lt;/h3&gt;

&lt;p&gt;Next, invite users to your organization, by navigating to the Organization’s &lt;strong&gt;&lt;em&gt;Members&lt;/em&gt;&lt;/strong&gt; menu, assigning the workspaces the users will have access to. Users will be created with the  &lt;strong&gt;&lt;em&gt;members&lt;/em&gt;&lt;/strong&gt; role, able to see, search and install skills from the chosen workspaces. Permissions can be promoted from the Workspace &lt;strong&gt;Members&lt;/strong&gt; menu, which will be discussed later below. Users will need to accept the invite they are sent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4g6r3dl7lo16v46uut0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4g6r3dl7lo16v46uut0.png" alt="Invite member screen displayes an email address, a selection of workspaces that can be added to the user specified." width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once created, you can elevate a user to Admin to allow workspace creation or manage users. To do so, navigate to the Organization &lt;strong&gt;&lt;em&gt;Members&lt;/em&gt;&lt;/strong&gt; screen, and click the three dots under &lt;em&gt;&lt;strong&gt;Actions.&lt;/strong&gt; Assign an appropriate role. Examples will be provided below of some common configurations.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7fsvjyckwcicbqzup30w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7fsvjyckwcicbqzup30w.png" alt="Expanding the options menu, which is three dots, next to each name yields a submenu with change role and remove" width="800" height="774"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Admin keys
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrfooi6l0jsv9cxkw57k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrfooi6l0jsv9cxkw57k.png" alt="image.png" width="800" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Admin keys are for integrations and applications where programmatic access is required across workspaces. This is typically used for automation purposes and an expiration can be set up to one year.&lt;/p&gt;

&lt;h2&gt;
  
  
  Managing Workspaces and Users in Tessl
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwo5e2gi1574xb0he4gaj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwo5e2gi1574xb0he4gaj.png" alt="On the side menu of the screen, users can select all plugins, eval runs, projects and members from a specified workspace." width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click the workspace drop-down to navigate workspaces. Navigate to &lt;strong&gt;&lt;em&gt;Members&lt;/em&gt;&lt;/strong&gt; at the workspace level to specify &lt;a href="https://docs.tessl.io/reference/roles" rel="noopener noreferrer"&gt;Roles&lt;/a&gt; for users who require more capabilities within the workspace, such as running evaluations, publishing or managing users.&lt;/p&gt;

&lt;p&gt;To modify a user, search for their name, select their checkbox, a &lt;a href="https://docs.tessl.io/reference/roles" rel="noopener noreferrer"&gt;role&lt;/a&gt;, and click the &lt;strong&gt;Add&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9fog1k29i17joa6vga0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9fog1k29i17joa6vga0.png" alt="The role selector allows user to select consumer, member, publisher, manager and owner when adding a user to a workspace." width="800" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Example role configurations for your team(s)
&lt;/h2&gt;

&lt;p&gt;The following users demonstrate common configurations and &lt;a href="https://docs.tessl.io/reference/roles" rel="noopener noreferrer"&gt;roles&lt;/a&gt; that may be used when rolling Tessl out:&lt;/p&gt;

&lt;h3&gt;
  
  
  Samira - Org. Admin
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Samira&lt;/strong&gt;, the administrator and skills champion, needs the ability to manage all workspaces, the ability to assign users, and create new workspaces. Make her an Organization admin.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3f8q5xoajbwb8bfdcf8m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3f8q5xoajbwb8bfdcf8m.png" alt="A diagram showing Samira with admin privileges at the Organization level , giving her full permissions on the workspaces below as a result" width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Eddie - Lead Engineer
&lt;/h3&gt;

&lt;p&gt;Another user, &lt;strong&gt;Eddie&lt;/strong&gt;, might be a member of an engineering workspace. He needs to be able to use plugins (skills) that have been published, but may need to have access to publish skills within the engineering workspace for others on his team. This could mean Eddie is the publisher &lt;a href="https://docs.tessl.io/reference/roles" rel="noopener noreferrer"&gt;role&lt;/a&gt; in certain workspaces. He may also be a Member role of other workspaces where he only needs to search and install from.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwgzuipack4tlngxvr26.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwgzuipack4tlngxvr26.png" alt="A diagram showing an organization with several workspaces. The user has publisher permission on several, giving search. install, and publish rights. Several other workspaces the user is only a member, providing more limited permissions like Search and Install. One workspace is no access because they were not given permissions." width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Jennifer - Manager
&lt;/h3&gt;

&lt;p&gt;Jennifer may require the ability to add users to a workspace that she owns, publish, and possibly need the ability to remove other managers etc. Typically the workspace permission "Owner" or "manager" may be given to that user, depending on the need to remove other "owners" or delete workspace.&lt;/p&gt;

&lt;h3&gt;
  
  
  Joe - New hire engineer
&lt;/h3&gt;

&lt;p&gt;Finally, Joe, a new hire, has the ability to search and install skills from the engineering workspace, but does not have the ability to share/create skills until later, after they’ve gained a little more experience. Joe would be made a member of “engineering” with just a “consumer” role.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps!
&lt;/h2&gt;

&lt;p&gt;Now that you have your users in, and assigned roles to the different workspaces, you and your users can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Start creating &lt;a href="https://docs.tessl.io/create/creating-skills" rel="noopener noreferrer"&gt;new skills&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Evaluate new or existing skill effectiveness through using &lt;a href="https://docs.tessl.io/evaluate/evaluating-skills" rel="noopener noreferrer"&gt;Reviews&lt;/a&gt;, and &lt;a href="https://docs.tessl.io/evaluate/evaluate-skill-quality-using-scenarios" rel="noopener noreferrer"&gt;Evals&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Publish those skills to the Tessl registry to share them for your users and agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Let us know what you think! Tessl would love to hear from you through any one of our &lt;a href="https://docs.tessl.io/support/giving-feedback" rel="noopener noreferrer"&gt;feedback channels (Discord, Email, CLI Feedback, etc)&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://tessl.io/blog/tessl-admin-guide-organizations-workspaces-and-roles/" rel="noopener noreferrer"&gt;Tessl.blogs&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>tutorial</category>
      <category>agents</category>
    </item>
    <item>
      <title>GPT-5.5 is OpenAI's best model. But paying more for it makes no sense.</title>
      <dc:creator>Rohan Sharma</dc:creator>
      <pubDate>Wed, 06 May 2026 13:13:28 +0000</pubDate>
      <link>https://dev.to/tessl/gpt-55-is-openais-best-model-but-paying-more-for-it-makes-no-sense-2227</link>
      <guid>https://dev.to/tessl/gpt-55-is-openais-best-model-but-paying-more-for-it-makes-no-sense-2227</guid>
      <description>&lt;p&gt;We added OpenAI’s gpt-5.5 model to our eval suite the day it launched. We ran 1,742 tests overall, which included over 45 task scenarios across using 11 real engineering skills, each run 6 times and averaged the data, which is shown in this blog.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;The gpt-5.5 model has the highest raw capability of any OpenAI model we've tested. When it uses agent skills and performs the same tasks, it pretty much ties with gpt-5.4 on score but costs 63% more per run.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Answer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Best Codex model out of the box?&lt;/td&gt;
&lt;td&gt;gpt-5.5: 75.6 avg baseline, highest in the family&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best Codex model with skills loaded?&lt;/td&gt;
&lt;td&gt;gpt-5.4 and gpt-5.5 tie at 89.3 and 89.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Worth the 63% price premium over gpt-5.4?&lt;/td&gt;
&lt;td&gt;With this data, we don’t think so&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Any scenario where it wins?&lt;/td&gt;
&lt;td&gt;Latency: 89.5s vs 135.4s for gpt-5.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Should you use gpt-5.3 instead?&lt;/td&gt;
&lt;td&gt;No, oddly enough, gpt-5.3 costs 47% more than gpt-5.4 for a worse result because of the token bloat.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The one-line verdict: gpt-5.5 is the most capable Codex model we've benchmarked, and when using agent skills to guide with tasks, it performs pretty much identically to a model that costs a third less. The interesting story is actually gpt-5.3, which costs more than gpt-5.4 and scores worse, because of the token bloat in 5.3. The per-token cost is, of course, more expensive with gpt-5.5.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Key Takeaways
&lt;/h2&gt;

&lt;p&gt;The most counterintuitive thing in this data: gpt-5.5 and gpt-5.4 score within 0.1 points of each other when given domain skills, 89.4 vs 89.3. The self-sufficiency story holds directionally, but these two models are functionally the same on skill-augmented work. The question is purely cost.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tessl.io/devcon" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbraf14e4s66n7ibzuwuk.png" alt="Join us at AI Native DevCon" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;
Join us at AI Native DevCon (use C0DE30 for 30% discount)



&lt;p&gt;The gpt-5.3 story is sharper. The headline numbers put it at 83.9 with skills against 89.3 for gpt-5.4, a 5.4 point gap. It also costs $0.44 per run against $0.30 for gpt-5.4. You pay more and get less, which is a complete description of a bad deal.&lt;/p&gt;

&lt;p&gt;You pay $0.49/run for 89.4 points with gpt-5.5. You pay $0.30/run for 89.3 points with gpt-5.4. The only dimension where gpt-5.5 leads is latency, at 89.5s against 135.4s. If you're running latency-constrained agents and can absorb the cost, it's a defensible choice. Otherwise you're paying a 63% premium for 0.1 points.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Stacks Up
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Task Scores (using agent skill)&lt;/th&gt;
&lt;th&gt;Cost/run&lt;/th&gt;
&lt;th&gt;Score/$&lt;/th&gt;
&lt;th&gt;Avg lift&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;claude-opus-4-7&lt;/td&gt;
&lt;td&gt;93.4&lt;/td&gt;
&lt;td&gt;$1.00&lt;/td&gt;
&lt;td&gt;93&lt;/td&gt;
&lt;td&gt;+12.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cursor:composer-2&lt;/td&gt;
&lt;td&gt;89.6&lt;/td&gt;
&lt;td&gt;$0.23&lt;/td&gt;
&lt;td&gt;389&lt;/td&gt;
&lt;td&gt;+15.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.5&lt;/td&gt;
&lt;td&gt;89.4&lt;/td&gt;
&lt;td&gt;$0.49&lt;/td&gt;
&lt;td&gt;182&lt;/td&gt;
&lt;td&gt;+13.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.4&lt;/td&gt;
&lt;td&gt;89.3&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;298&lt;/td&gt;
&lt;td&gt;+15.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.3-codex&lt;/td&gt;
&lt;td&gt;83.9&lt;/td&gt;
&lt;td&gt;$0.44&lt;/td&gt;
&lt;td&gt;191&lt;/td&gt;
&lt;td&gt;+18.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5-codex&lt;/td&gt;
&lt;td&gt;78.7&lt;/td&gt;
&lt;td&gt;$1.05&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;+10.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;gpt-5.5 and gpt-5.4 are functionally interchangeable on skill performance. The question is whether 45 seconds per run is worth $0.19.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Tested
&lt;/h2&gt;

&lt;p&gt;This benchmark runs on &lt;a href="https://tessl.io" rel="noopener noreferrer"&gt;Tessl&lt;/a&gt;, an agentic evaluation platform. A skill is a &lt;code&gt;SKILL.md&lt;/code&gt; file, which is a structured markdown document containing rules, patterns, and examples for a specific domain. For the baseline run, the agent sees only the task prompt with no additional context. For the with-skill run, the &lt;code&gt;SKILL.md&lt;/code&gt; is loaded into the agent's context alongside the task, same model, same task, same rubric. The score delta is the lift. The platform runs each scenario twice and scores the output against a pre-written rubric checklist automatically.&lt;/p&gt;

&lt;p&gt;Each scenario was run 6 times and scored independently; all figures are averaged across those runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why rubric checklists?&lt;/strong&gt; Because the scenarios have objectively right answers. "Does the agent delete &lt;code&gt;.eslintrc.json&lt;/code&gt; and create &lt;code&gt;eslint.config.js&lt;/code&gt;?" is not a matter of opinion. Neither is "Does it use PKCE method S256?" or "Does it call &lt;code&gt;pipeline()&lt;/code&gt; instead of chaining &lt;code&gt;.pipe()&lt;/code&gt;?" Binary criteria eliminate evaluation noise wherever possible.&lt;/p&gt;

&lt;p&gt;Example rubric: &lt;em&gt;Modernize the Linting Setup for a Node.js Library&lt;/em&gt;, 11 criteria, 101 points.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Points&lt;/th&gt;
&lt;th&gt;Pass condition&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;neostandard installed&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;neostandard present in devDependencies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;standard uninstalled&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;standard absent from devDependencies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flat config file&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;eslint.config.js or .mjs exists, not .eslintrc*&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;neostandard in config&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Config imports from neostandard and calls neostandard()&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lint script uses eslint&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;package.json lint script runs eslint ., not neostandard . or standard .&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;migrate command used&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Instructions reference npx neostandard --migrate to generate the config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lint:fix script present&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;lint:fix script runs eslint . --fix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI uses non-fix run&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;CI config runs lint without --fix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;standard config removed&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;No top-level standard key in package.json&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lint-staged uses eslint&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Pre-commit hook runs eslint --fix, not neostandard or standard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;eslint@9 installed&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;eslint at version 9.x in devDependencies&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A model that migrates the config correctly but leaves &lt;code&gt;standard&lt;/code&gt; in devDependencies scores 91/101. One that creates &lt;code&gt;eslint.config.js&lt;/code&gt; alongside &lt;code&gt;.eslintrc.json&lt;/code&gt; instead of replacing it scores 0 on three criteria at once.&lt;/p&gt;

&lt;p&gt;All skills and rubrics are published at &lt;a href="https://tessl.io/registry/simon/skills" rel="noopener noreferrer"&gt;simon/skills on the Tessl registry&lt;/a&gt;. Full eval results for this run &lt;a href="https://tessl.io/registry/simon/skills/evals" rel="noopener noreferrer"&gt;can be found here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Baseline scores (no skill), sorted by highest average
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;docs&lt;/th&gt;
&lt;th&gt;fastify&lt;/th&gt;
&lt;th&gt;init&lt;/th&gt;
&lt;th&gt;lint&lt;/th&gt;
&lt;th&gt;node&lt;/th&gt;
&lt;th&gt;node-core&lt;/th&gt;
&lt;th&gt;oauth&lt;/th&gt;
&lt;th&gt;octocat&lt;/th&gt;
&lt;th&gt;skill-opt&lt;/th&gt;
&lt;th&gt;snip&lt;/th&gt;
&lt;th&gt;ts&lt;/th&gt;
&lt;th&gt;Avg&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;claude-opus-4-7&lt;/td&gt;
&lt;td&gt;85.7&lt;/td&gt;
&lt;td&gt;80.9&lt;/td&gt;
&lt;td&gt;79.7&lt;/td&gt;
&lt;td&gt;92.9&lt;/td&gt;
&lt;td&gt;73.7&lt;/td&gt;
&lt;td&gt;91.6&lt;/td&gt;
&lt;td&gt;75.7&lt;/td&gt;
&lt;td&gt;84.7&lt;/td&gt;
&lt;td&gt;85.0&lt;/td&gt;
&lt;td&gt;60.1&lt;/td&gt;
&lt;td&gt;78.8&lt;/td&gt;
&lt;td&gt;80.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.5&lt;/td&gt;
&lt;td&gt;89.9&lt;/td&gt;
&lt;td&gt;71.8&lt;/td&gt;
&lt;td&gt;63.6&lt;/td&gt;
&lt;td&gt;94.4&lt;/td&gt;
&lt;td&gt;64.6&lt;/td&gt;
&lt;td&gt;72.3&lt;/td&gt;
&lt;td&gt;73.6&lt;/td&gt;
&lt;td&gt;85.5&lt;/td&gt;
&lt;td&gt;83.2&lt;/td&gt;
&lt;td&gt;54.7&lt;/td&gt;
&lt;td&gt;78.3&lt;/td&gt;
&lt;td&gt;75.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.4&lt;/td&gt;
&lt;td&gt;87.6&lt;/td&gt;
&lt;td&gt;66.7&lt;/td&gt;
&lt;td&gt;71.1&lt;/td&gt;
&lt;td&gt;84.5&lt;/td&gt;
&lt;td&gt;62.3&lt;/td&gt;
&lt;td&gt;77.4&lt;/td&gt;
&lt;td&gt;77.5&lt;/td&gt;
&lt;td&gt;80.5&lt;/td&gt;
&lt;td&gt;80.8&lt;/td&gt;
&lt;td&gt;50.9&lt;/td&gt;
&lt;td&gt;75.9&lt;/td&gt;
&lt;td&gt;74.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cursor:composer-2&lt;/td&gt;
&lt;td&gt;84.3&lt;/td&gt;
&lt;td&gt;74.7&lt;/td&gt;
&lt;td&gt;61.6&lt;/td&gt;
&lt;td&gt;94.1&lt;/td&gt;
&lt;td&gt;65.4&lt;/td&gt;
&lt;td&gt;78.8&lt;/td&gt;
&lt;td&gt;73.1&lt;/td&gt;
&lt;td&gt;78.5&lt;/td&gt;
&lt;td&gt;82.3&lt;/td&gt;
&lt;td&gt;58.5&lt;/td&gt;
&lt;td&gt;65.5&lt;/td&gt;
&lt;td&gt;74.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5-codex&lt;/td&gt;
&lt;td&gt;80.2&lt;/td&gt;
&lt;td&gt;67.3&lt;/td&gt;
&lt;td&gt;60.2&lt;/td&gt;
&lt;td&gt;84.9&lt;/td&gt;
&lt;td&gt;60.4&lt;/td&gt;
&lt;td&gt;76.5&lt;/td&gt;
&lt;td&gt;72.9&lt;/td&gt;
&lt;td&gt;75.3&lt;/td&gt;
&lt;td&gt;63.8&lt;/td&gt;
&lt;td&gt;47.5&lt;/td&gt;
&lt;td&gt;66.5&lt;/td&gt;
&lt;td&gt;68.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.3-codex&lt;/td&gt;
&lt;td&gt;63.5&lt;/td&gt;
&lt;td&gt;65.4&lt;/td&gt;
&lt;td&gt;52.1&lt;/td&gt;
&lt;td&gt;76.5&lt;/td&gt;
&lt;td&gt;62.4&lt;/td&gt;
&lt;td&gt;75.3&lt;/td&gt;
&lt;td&gt;77.9&lt;/td&gt;
&lt;td&gt;68.3&lt;/td&gt;
&lt;td&gt;70.5&lt;/td&gt;
&lt;td&gt;42.1&lt;/td&gt;
&lt;td&gt;66.4&lt;/td&gt;
&lt;td&gt;65.5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  With-skill scores, sorted by highest average
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;docs&lt;/th&gt;
&lt;th&gt;fastify&lt;/th&gt;
&lt;th&gt;init&lt;/th&gt;
&lt;th&gt;lint&lt;/th&gt;
&lt;th&gt;node&lt;/th&gt;
&lt;th&gt;node-core&lt;/th&gt;
&lt;th&gt;oauth&lt;/th&gt;
&lt;th&gt;octocat&lt;/th&gt;
&lt;th&gt;skill-opt&lt;/th&gt;
&lt;th&gt;snip&lt;/th&gt;
&lt;th&gt;ts&lt;/th&gt;
&lt;th&gt;Avg&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;claude-opus-4-7&lt;/td&gt;
&lt;td&gt;96.7&lt;/td&gt;
&lt;td&gt;98.9&lt;/td&gt;
&lt;td&gt;82.3&lt;/td&gt;
&lt;td&gt;97.2&lt;/td&gt;
&lt;td&gt;95.1&lt;/td&gt;
&lt;td&gt;84.7&lt;/td&gt;
&lt;td&gt;94.3&lt;/td&gt;
&lt;td&gt;97.7&lt;/td&gt;
&lt;td&gt;99.7&lt;/td&gt;
&lt;td&gt;92.9&lt;/td&gt;
&lt;td&gt;88.0&lt;/td&gt;
&lt;td&gt;93.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cursor:composer-2&lt;/td&gt;
&lt;td&gt;95.6&lt;/td&gt;
&lt;td&gt;93.9&lt;/td&gt;
&lt;td&gt;85.7&lt;/td&gt;
&lt;td&gt;96.4&lt;/td&gt;
&lt;td&gt;94.0&lt;/td&gt;
&lt;td&gt;92.3&lt;/td&gt;
&lt;td&gt;83.9&lt;/td&gt;
&lt;td&gt;94.5&lt;/td&gt;
&lt;td&gt;93.7&lt;/td&gt;
&lt;td&gt;85.3&lt;/td&gt;
&lt;td&gt;70.4&lt;/td&gt;
&lt;td&gt;89.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.5&lt;/td&gt;
&lt;td&gt;96.1&lt;/td&gt;
&lt;td&gt;86.0&lt;/td&gt;
&lt;td&gt;81.8&lt;/td&gt;
&lt;td&gt;96.3&lt;/td&gt;
&lt;td&gt;88.3&lt;/td&gt;
&lt;td&gt;88.6&lt;/td&gt;
&lt;td&gt;91.7&lt;/td&gt;
&lt;td&gt;92.1&lt;/td&gt;
&lt;td&gt;96.0&lt;/td&gt;
&lt;td&gt;86.5&lt;/td&gt;
&lt;td&gt;79.2&lt;/td&gt;
&lt;td&gt;89.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.4&lt;/td&gt;
&lt;td&gt;97.1&lt;/td&gt;
&lt;td&gt;76.9&lt;/td&gt;
&lt;td&gt;80.0&lt;/td&gt;
&lt;td&gt;98.1&lt;/td&gt;
&lt;td&gt;84.8&lt;/td&gt;
&lt;td&gt;93.7&lt;/td&gt;
&lt;td&gt;91.6&lt;/td&gt;
&lt;td&gt;95.7&lt;/td&gt;
&lt;td&gt;94.6&lt;/td&gt;
&lt;td&gt;90.9&lt;/td&gt;
&lt;td&gt;79.0&lt;/td&gt;
&lt;td&gt;89.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.3-codex&lt;/td&gt;
&lt;td&gt;96.9&lt;/td&gt;
&lt;td&gt;86.1&lt;/td&gt;
&lt;td&gt;80.4&lt;/td&gt;
&lt;td&gt;90.2&lt;/td&gt;
&lt;td&gt;75.9&lt;/td&gt;
&lt;td&gt;77.1&lt;/td&gt;
&lt;td&gt;93.1&lt;/td&gt;
&lt;td&gt;92.3&lt;/td&gt;
&lt;td&gt;77.3&lt;/td&gt;
&lt;td&gt;79.4&lt;/td&gt;
&lt;td&gt;74.1&lt;/td&gt;
&lt;td&gt;83.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5-codex&lt;/td&gt;
&lt;td&gt;62.9&lt;/td&gt;
&lt;td&gt;88.9&lt;/td&gt;
&lt;td&gt;74.8&lt;/td&gt;
&lt;td&gt;92.1&lt;/td&gt;
&lt;td&gt;66.3&lt;/td&gt;
&lt;td&gt;77.7&lt;/td&gt;
&lt;td&gt;89.3&lt;/td&gt;
&lt;td&gt;85.9&lt;/td&gt;
&lt;td&gt;80.7&lt;/td&gt;
&lt;td&gt;86.0&lt;/td&gt;
&lt;td&gt;61.1&lt;/td&gt;
&lt;td&gt;78.7&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Lift: what skills actually added per model, sorted by highest average lift
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;docs&lt;/th&gt;
&lt;th&gt;fastify&lt;/th&gt;
&lt;th&gt;init&lt;/th&gt;
&lt;th&gt;lint&lt;/th&gt;
&lt;th&gt;node&lt;/th&gt;
&lt;th&gt;node-core&lt;/th&gt;
&lt;th&gt;oauth&lt;/th&gt;
&lt;th&gt;octocat&lt;/th&gt;
&lt;th&gt;skill-opt&lt;/th&gt;
&lt;th&gt;snip&lt;/th&gt;
&lt;th&gt;ts&lt;/th&gt;
&lt;th&gt;Avg lift&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.3-codex&lt;/td&gt;
&lt;td&gt;+33.4&lt;/td&gt;
&lt;td&gt;+20.7&lt;/td&gt;
&lt;td&gt;+28.3&lt;/td&gt;
&lt;td&gt;+13.7&lt;/td&gt;
&lt;td&gt;+13.5&lt;/td&gt;
&lt;td&gt;+1.8&lt;/td&gt;
&lt;td&gt;+15.2&lt;/td&gt;
&lt;td&gt;+24.0&lt;/td&gt;
&lt;td&gt;+6.8&lt;/td&gt;
&lt;td&gt;+37.3&lt;/td&gt;
&lt;td&gt;+7.7&lt;/td&gt;
&lt;td&gt;+18.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cursor:composer-2&lt;/td&gt;
&lt;td&gt;+11.3&lt;/td&gt;
&lt;td&gt;+19.2&lt;/td&gt;
&lt;td&gt;+24.1&lt;/td&gt;
&lt;td&gt;+2.3&lt;/td&gt;
&lt;td&gt;+28.6&lt;/td&gt;
&lt;td&gt;+13.5&lt;/td&gt;
&lt;td&gt;+10.8&lt;/td&gt;
&lt;td&gt;+16.0&lt;/td&gt;
&lt;td&gt;+11.4&lt;/td&gt;
&lt;td&gt;+26.8&lt;/td&gt;
&lt;td&gt;+4.9&lt;/td&gt;
&lt;td&gt;+15.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.4&lt;/td&gt;
&lt;td&gt;+9.5&lt;/td&gt;
&lt;td&gt;+10.2&lt;/td&gt;
&lt;td&gt;+8.9&lt;/td&gt;
&lt;td&gt;+13.6&lt;/td&gt;
&lt;td&gt;+22.5&lt;/td&gt;
&lt;td&gt;+16.3&lt;/td&gt;
&lt;td&gt;+14.1&lt;/td&gt;
&lt;td&gt;+15.2&lt;/td&gt;
&lt;td&gt;+13.8&lt;/td&gt;
&lt;td&gt;+40.0&lt;/td&gt;
&lt;td&gt;+3.1&lt;/td&gt;
&lt;td&gt;+15.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.5&lt;/td&gt;
&lt;td&gt;+6.2&lt;/td&gt;
&lt;td&gt;+14.2&lt;/td&gt;
&lt;td&gt;+18.2&lt;/td&gt;
&lt;td&gt;+1.9&lt;/td&gt;
&lt;td&gt;+23.7&lt;/td&gt;
&lt;td&gt;+16.3&lt;/td&gt;
&lt;td&gt;+18.1&lt;/td&gt;
&lt;td&gt;+6.6&lt;/td&gt;
&lt;td&gt;+12.8&lt;/td&gt;
&lt;td&gt;+31.8&lt;/td&gt;
&lt;td&gt;+0.9&lt;/td&gt;
&lt;td&gt;+13.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;claude-opus-4-7&lt;/td&gt;
&lt;td&gt;+11.0&lt;/td&gt;
&lt;td&gt;+18.0&lt;/td&gt;
&lt;td&gt;+2.6&lt;/td&gt;
&lt;td&gt;+4.3&lt;/td&gt;
&lt;td&gt;+21.4&lt;/td&gt;
&lt;td&gt;-6.9&lt;/td&gt;
&lt;td&gt;+18.6&lt;/td&gt;
&lt;td&gt;+13.0&lt;/td&gt;
&lt;td&gt;+14.7&lt;/td&gt;
&lt;td&gt;+32.8&lt;/td&gt;
&lt;td&gt;+9.2&lt;/td&gt;
&lt;td&gt;+12.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5-codex&lt;/td&gt;
&lt;td&gt;-17.3&lt;/td&gt;
&lt;td&gt;+21.6&lt;/td&gt;
&lt;td&gt;+14.6&lt;/td&gt;
&lt;td&gt;+7.2&lt;/td&gt;
&lt;td&gt;+5.9&lt;/td&gt;
&lt;td&gt;+1.2&lt;/td&gt;
&lt;td&gt;+16.4&lt;/td&gt;
&lt;td&gt;+10.6&lt;/td&gt;
&lt;td&gt;+16.9&lt;/td&gt;
&lt;td&gt;+38.5&lt;/td&gt;
&lt;td&gt;-5.4&lt;/td&gt;
&lt;td&gt;+10.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Reading the lift table.&lt;/strong&gt; A few observations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;claude-opus-4-7 &lt;code&gt;node-core&lt;/code&gt;: -6.9.&lt;/strong&gt; Opus starts at 91.6 baseline on Node.js internals, the highest raw score on any skill for any model in the benchmark. Adding a skill that prescribes specific patterns for primordials and commit message format on top of a model that already knows the material produced interference, not uplift. The skill was written to close a gap that Opus doesn't have.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;gpt-5-codex &lt;code&gt;docs&lt;/code&gt;: -17.3.&lt;/strong&gt; The same skill that boosted gpt-5.3-codex by +33.4 points degraded gpt-5-codex by 17. The Diátaxis framework is highly prescriptive about structure: tutorial titles must start with verbs, reference sections must contain no instruction. gpt-5-codex starts at 80.2 baseline for docs, it produces fluent, correct-seeming prose, and the skill's structural constraints appear to actively conflict with its default output style. High baseline does not predict positive lift.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;gpt-5-codex &lt;code&gt;ts&lt;/code&gt;: -5.4.&lt;/strong&gt; Same pattern. A 66.5 baseline on TypeScript drops to 61.1 with the skill. The TypeScript skill enforces branded types and zero &lt;code&gt;any&lt;/code&gt;, rules that require restructuring code rather than extending it. For a model with established TypeScript habits, the prescriptive guidance appears to create noise rather than correct the specific gaps.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;claude-opus-4-7 &lt;code&gt;init&lt;/code&gt;: +2.6.&lt;/strong&gt; The lowest positive lift in the table. Claude Opus is the model that introduced the &lt;code&gt;AGENTS.md&lt;/code&gt; convention, it was already near-ceiling on this skill before any context was added.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;gpt-5.4 &lt;code&gt;snip&lt;/code&gt;: +40.0.&lt;/strong&gt; The single highest lift cell in the entire dataset. snipgrapher's private CLI documentation gives a model that knows nothing a complete specification for a tool it's never encountered. gpt-5.4's strong instruction-following amplifies that advantage cleanly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The cost of running gpt-5.5 vs the alternatives
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Cost/run (with skill)&lt;/th&gt;
&lt;th&gt;Time (with skill)&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Score/$&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;cursor:composer-2&lt;/td&gt;
&lt;td&gt;$0.23&lt;/td&gt;
&lt;td&gt;152.0s&lt;/td&gt;
&lt;td&gt;89.6&lt;/td&gt;
&lt;td&gt;389&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.4&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;135.4s&lt;/td&gt;
&lt;td&gt;89.3&lt;/td&gt;
&lt;td&gt;298&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.3-codex&lt;/td&gt;
&lt;td&gt;$0.44&lt;/td&gt;
&lt;td&gt;87.9s&lt;/td&gt;
&lt;td&gt;83.9&lt;/td&gt;
&lt;td&gt;191&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.5&lt;/td&gt;
&lt;td&gt;$0.49&lt;/td&gt;
&lt;td&gt;89.5s&lt;/td&gt;
&lt;td&gt;89.4&lt;/td&gt;
&lt;td&gt;182&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;claude-opus-4-7&lt;/td&gt;
&lt;td&gt;$1.00&lt;/td&gt;
&lt;td&gt;158.9s&lt;/td&gt;
&lt;td&gt;93.4&lt;/td&gt;
&lt;td&gt;93&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5-codex&lt;/td&gt;
&lt;td&gt;$1.05&lt;/td&gt;
&lt;td&gt;136.2s&lt;/td&gt;
&lt;td&gt;78.7&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  More details about the 11 skills and scenarios
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;fastify-best-practices&lt;/code&gt;:&lt;/strong&gt; Fastify has strong opinions, and the skill encodes them. Scenarios: &lt;em&gt;Security Hardening for a Healthcare Web API&lt;/em&gt; (CORS scoped to two named origins, CSP + HSTS headers, HTTPS redirect, a wildcard &lt;code&gt;*&lt;/code&gt; or a missing header scores zero); &lt;em&gt;Authentication Service for a SaaS Platform&lt;/em&gt; (passwords migrated from bcrypt to argon2id, in-memory rate limiting replaced with Redis for multi-instance correctness, SIGTERM handled with &lt;code&gt;close-with-grace&lt;/code&gt;); &lt;em&gt;Protecting a Product Catalogue API from Overload&lt;/em&gt; (does it reach for &lt;code&gt;@fastify/under-pressure&lt;/code&gt; or invent its own backpressure loop?); &lt;em&gt;Order Management API with PostgreSQL&lt;/em&gt; (uses &lt;code&gt;@fastify/postgres&lt;/code&gt; with correct pool lifecycle, not raw &lt;code&gt;pg&lt;/code&gt;); &lt;em&gt;Consistent Error Handling for a Multi-Tenant SaaS API&lt;/em&gt; (typed &lt;code&gt;createError&lt;/code&gt;, uniform JSON shape, no stack traces to clients).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;node-best-practices&lt;/code&gt;:&lt;/strong&gt; The patterns in this skill diverge from what you'd find on Stack Overflow. Scenarios: &lt;em&gt;Hardening Logging in a Fintech API&lt;/em&gt; (pino must redact auth tokens and raw card fields before they reach the SIEM, masking after the fact doesn't count); &lt;em&gt;Webhook Receiver Service&lt;/em&gt; (structured logging of sensitive payment provider fields, graceful shutdown under concurrent in-flight requests); &lt;em&gt;Fix Throughput Degradation in a High-Load API Gateway&lt;/em&gt; (&lt;code&gt;dns.lookup()&lt;/code&gt; saturating the libuv thread pool, the fix is &lt;code&gt;dns.resolve4()&lt;/code&gt; and &lt;code&gt;UV_THREADPOOL_SIZE&lt;/code&gt;, not a caching layer); &lt;em&gt;High-Throughput Merchant DNS Routing Service&lt;/em&gt; (concurrent resolution under load, observable thread pool saturation).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;snipgrapher&lt;/code&gt;:&lt;/strong&gt; A custom internal CLI with a non-public API. The model has never seen its documentation. At baseline, every model is essentially guessing (avg 50-60/100). With the skill, agents either follow the spec or they don't. Scenarios: &lt;em&gt;Automating Changelog Snippet Images in CI&lt;/em&gt; (correct flag order, env var overrides, pipeline integration) and &lt;em&gt;Code Snippet Image Pipeline for Documentation Site&lt;/em&gt; (batch rendering, profile configuration). This skill delivers the highest lift of any in the benchmark across every model, averaging between 27 and 40 points. The reason: it encodes knowledge that does not exist on the internet. Public skills are becoming less necessary as frontier models grow stronger. Private tooling is where skills still dominate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;typescript-magician&lt;/code&gt;:&lt;/strong&gt; Not "add types to this function." Scenarios: &lt;em&gt;Domain-Safe Payment Processing Types&lt;/em&gt; (branded types for &lt;code&gt;AccountId&lt;/code&gt;, &lt;code&gt;PaymentId&lt;/code&gt;, &lt;code&gt;RefundId&lt;/code&gt;, plain type aliases don't count, &lt;code&gt;as&lt;/code&gt; casts score zero); &lt;em&gt;Product Catalog API for an E-Commerce Platform&lt;/em&gt; (TypeBox schemas inferred as TypeScript types end-to-end, internal cost fields stripped from public responses, no &lt;code&gt;any&lt;/code&gt;); &lt;em&gt;Eliminate &lt;code&gt;any&lt;/code&gt; from a Data Pipeline Utility Library&lt;/em&gt; (&lt;code&gt;tsc&lt;/code&gt; output captured before and after, zero &lt;code&gt;any&lt;/code&gt; remaining, no &lt;code&gt;@ts-ignore&lt;/code&gt;); &lt;em&gt;Project Bootstrap: Node.js TypeScript Service&lt;/em&gt; (native &lt;code&gt;--strip-types&lt;/code&gt;, no &lt;code&gt;ts-node&lt;/code&gt;, no build step, no &lt;code&gt;tsc&lt;/code&gt; in the start script).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;oauth&lt;/code&gt;:&lt;/strong&gt; Is the implicit flow explicitly removed? Is PKCE method S256? Is the refresh token replaced on rotation? Scenarios: &lt;em&gt;Add User Authentication to a Fastify API&lt;/em&gt; (full Authorization Code + PKCE flow with &lt;code&gt;@fastify/oauth2&lt;/code&gt;, state verification, token rotation); &lt;em&gt;OAuth Login Integration for a Fastify Web App&lt;/em&gt; (CSRF-hardened flow, &lt;code&gt;@fastify/session&lt;/code&gt; for state, correct cookie flags).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;linting-neostandard-eslint9&lt;/code&gt;:&lt;/strong&gt; ESLint v9's flat config is a breaking change. Scenarios checked whether agents actually migrated, not just created a new config alongside the old one. Is &lt;code&gt;.eslintrc.json&lt;/code&gt; gone? Is &lt;code&gt;standard&lt;/code&gt; removed from devDependencies? Scenarios: &lt;em&gt;Modernize the Linting Setup&lt;/em&gt; (two variants: &lt;code&gt;envparser&lt;/code&gt; open-source library and &lt;code&gt;payments-api&lt;/code&gt; service); &lt;em&gt;Add Linting to the Inventory Service&lt;/em&gt; (neostandard from scratch); &lt;em&gt;Set Up Automated Lint Enforcement&lt;/em&gt; (husky + lint-staged pre-commit hook, CI step that blocks on violations).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;documentation&lt;/code&gt;:&lt;/strong&gt; Based on the Diátaxis framework. The skill teaches agents when to write a tutorial vs a how-to vs reference vs explanation. Scenarios: &lt;em&gt;Restructure Documentation for a Configuration Library&lt;/em&gt; (sprawling &lt;code&gt;confz&lt;/code&gt; README split into four Diátaxis types, tutorial title must start with a verb, reference section must contain no instruction); &lt;em&gt;Getting Started Guide for a CLI Deployment Tool&lt;/em&gt; (&lt;code&gt;shipctl&lt;/code&gt; onboarding tutorial with Goal→Prerequisites→Numbered steps→Verifiable result structure, no conceptual digressions in the steps).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;init&lt;/code&gt;:&lt;/strong&gt; Writing &lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;CLAUDE.md&lt;/code&gt; files that actually help AI assistants. Scenarios: &lt;em&gt;Set Up Agent Instructions for a Growing Python Monorepo&lt;/em&gt; (3-year-old codebase, multiple service packages, identify the three constraints that cause the most agent damage); &lt;em&gt;Set Up Agent Instructions for a Node.js Monorepo&lt;/em&gt; (workspace-aware package manager, per-package test commands, legacy directory exclusion); &lt;em&gt;Audit and Slim Down a Bloated AGENTS.md&lt;/em&gt; (what to cut, what to keep, signal vs noise after a year of uncurated growth); &lt;em&gt;Set Up Agent Instructions for a Growing Monorepo&lt;/em&gt; (hierarchical root-level vs per-package instructions, discoverability filtering).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;octocat&lt;/code&gt;:&lt;/strong&gt; GitHub CLI patterns and correct flag usage. Scenarios: &lt;em&gt;Automate Feature Branch PR Submission&lt;/em&gt; (correct &lt;code&gt;gh pr create&lt;/code&gt; flags, CI polling with &lt;code&gt;gh run watch&lt;/code&gt;, merge only after checks pass); &lt;em&gt;Preparing Commits for a Node.js Core Module Contribution&lt;/em&gt; (subsystem prefix, 72-char subject, &lt;code&gt;Reviewed-By&lt;/code&gt; trailers, the format changelog toolbots parse); &lt;em&gt;Prepare Node.js Core Contribution Commits&lt;/em&gt; (backport workflow, correct metadata for automated release pipelines); &lt;em&gt;Automate Pull Request Workflow&lt;/em&gt; (reusable shell script, idempotent, surfaces CI failures before merge). &lt;strong&gt;&lt;code&gt;nodejs-core&lt;/code&gt;:&lt;/strong&gt; Contributing to Node.js core: primordials, commit message format, native addons with &lt;code&gt;AsyncWorker&lt;/code&gt;. Scenarios: &lt;em&gt;Product Catalog Caching Service&lt;/em&gt; (async-cache-dedupe, concurrency control to prevent thundering herd on a rate-limited upstream); &lt;em&gt;Microservice Routing Layer: Latency Spike Investigation&lt;/em&gt; (diagnosing &lt;code&gt;UV_THREADPOOL_SIZE&lt;/code&gt; exhaustion, &lt;code&gt;dns.lookup()&lt;/code&gt; blocking the pool); &lt;em&gt;Diagnose and Fix V8 Performance Regression in Analytics Processor&lt;/em&gt; (&lt;code&gt;--prof&lt;/code&gt;, &lt;code&gt;--trace-opt&lt;/code&gt;, reading isolate-*.log, acting on deoptimization reasons). &lt;strong&gt;&lt;code&gt;skill-optimizer&lt;/code&gt;:&lt;/strong&gt; Meta: given a poorly-written skill or benchmark report, improve it or interpret it correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;gpt-5.5 is a better model than gpt-5.4 on raw capability, and on latency it is not close. For everything else, they are the same model at different price points. Pay the 63% premium if you need the speed. Skip it if you care about cost or value per dollar.&lt;/p&gt;

&lt;p&gt;The model to actually avoid is gpt-5.3. It costs 47% more than gpt-5.4 and scores 5.4 points worse. If you are running gpt-5.3 today, the case for switching to gpt-5.4 is strong on both cost and performance.&lt;/p&gt;

&lt;p&gt;Frontier models are becoming more self-sufficient. The ROI on domain skills is concentrating in genuinely proprietary knowledge: your internal APIs, your custom tooling, patterns that simply aren't on the internet. Snipgrapher lifted every model by 27 to 40 points because no model had ever seen its documentation. ESLint v9 flat config lifted them by 2 to 14 points because capable models already know it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The original author of this blog is &lt;a href="//uk.linkedin.com/in/simonmaple"&gt;Simon Maple&lt;/a&gt; and is originally posted on &lt;a href="https://tessl.io/blog/gpt-55-is-openais-best-model-but-paying-more-for-it-makes-no-sense/" rel="noopener noreferrer"&gt;tessl.io/blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Simon Maple is the Head of Developer Relations at Tessl, and AI Native Dev co-host. Previously, Simon was the Field CTO, and VP Developer Relations at Snyk, ZeroTurnaround, and IBM. He became a Java Champion in 2014, JavaOne Rockstar speaker in 2014 and 2017, Duke’s Choice award winner, Virtual JUG founder and organiser, and London Java Community co-leader.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>openai</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
