<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shmulik Cohen</title>
    <description>The latest articles on DEV Community by Shmulik Cohen (@shmulc).</description>
    <link>https://dev.to/shmulc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3800211%2F6b7433c9-9f19-4c23-99f4-bccd3df7d4bf.jpg</url>
      <title>DEV Community: Shmulik Cohen</title>
      <link>https://dev.to/shmulc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shmulc"/>
    <language>en</language>
    <item>
      <title>Stop “Vibe Merging”</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Fri, 27 Feb 2026 13:32:07 +0000</pubDate>
      <link>https://dev.to/shmulc/stop-vibe-merging-1jpo</link>
      <guid>https://dev.to/shmulc/stop-vibe-merging-1jpo</guid>
      <description>&lt;p&gt;&lt;em&gt;A Deep Dive into the Code Review Bench Results&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flozt1ovnv2287nat8pue.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flozt1ovnv2287nat8pue.jpeg" width="700" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The AI Code Explosion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We are living through an AI code explosion. Coding agents are writing more code than ever before, churning out boilerplate and building new features at a record pace. But this incredible speed has created a massive new bottleneck: generating code is fast, but reviewing it is slow.&lt;/p&gt;

&lt;p&gt;Recent telemetry from across the industry has exposed a “Productivity Paradox.” While developers using AI are completing more tasks, their Pull Request (PR) review time has spiked by &lt;strong&gt;91%&lt;/strong&gt; [&lt;strong&gt;Faros AI:&lt;/strong&gt; &lt;em&gt;The AI Productivity Paradox Research Report&lt;/em&gt;]. We’ve reached a point where individual velocity is up, but organizational delivery is stalling because the human verification layer cannot scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this post, we’ll explore:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Code Review Crisis:&lt;/strong&gt; Why AI-generated code is actually harder to review than human code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The “Vibe Merging” Danger:&lt;/strong&gt; How teams are accidentally sacrificing safety for speed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Market Divide:&lt;/strong&gt; The difference between “All-in-One” giants and “Pure-Play” specialists.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Code Review Bench:&lt;/strong&gt; A deep dive into the first neutral, data-driven benchmark to rank the top agents and it’s results.&lt;/p&gt;

&lt;p&gt;Thanks for reading AI Superhero! Subscribe for free to receive new posts and support my work.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Code Review Crisis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Let’s be clear: code review is a notoriously difficult problem even for senior engineers. It requires holding massive amounts of context in your head to ensure that a “simple” change doesn’t break a distant, existing system.&lt;/p&gt;

&lt;p&gt;AI code review is fundamentally harder than generation. While an agent can generate a local fix in seconds, a reviewer must reason globally across multiple files, architectural patterns, and intricate system integrations. AI-assisted changes are often larger and touch more surfaces, making the cognitive load on reviewers nearly unbearable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6b27r7itzp64ic134onw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6b27r7itzp64ic134onw.png" width="700" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The stakes at the review stage have never been higher. While AI can generate code &lt;strong&gt;10x&lt;/strong&gt; faster, data shows that AI-generated code produces &lt;strong&gt;1.7x more logic and correctness issues&lt;/strong&gt; than human-written code [&lt;strong&gt;CodeRabbit:&lt;/strong&gt; &lt;em&gt;AI vs. Human Code Quality Analysis&lt;/em&gt;]. The main problem in software engineering has shifted: it’s no longer about how fast we can author code, but how accurately we can ensure its quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The “Good Case” vs. The Reality&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The bottleneck described above is actually the &lt;strong&gt;good case&lt;/strong&gt;. In this scenario, teams are at least attempting to maintain their standards and keep a “human in the loop” to catch errors before they hit production.&lt;/p&gt;

&lt;p&gt;The far more dangerous reality is what’s happening in companies that have simply given up on the bottleneck. When PR queues get backed up for weeks, the pressure to ship becomes overwhelming. We are seeing a surge in &lt;strong&gt;“vibe merging”&lt;/strong&gt; — where developers, overwhelmed by the volume of AI-generated code, simply skim the diff or hit “Approve” based on a gut feeling rather than a proper review.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Vibe Merging:&lt;/strong&gt; The act of approving a Pull Request based on a “gut feeling” or the reputation of the author, rather than a line-by-line verification of the logic.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Is your team “Vibe Merging”? Look for these symptoms:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The “LGTM” Speedrun:&lt;/strong&gt; Approving a 300+ line diff in under three minutes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Green Light Fallacy:&lt;/strong&gt; Assuming that because the CI/CD pipeline passed, the logic must be sound. CI/CD catches &lt;em&gt;syntax&lt;/em&gt; and &lt;em&gt;crashes&lt;/em&gt;, but it doesn’t understand &lt;em&gt;intent&lt;/em&gt;. Vibe merging happens when we trust the “green check” to do the thinking for us.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Seniority Pass:&lt;/strong&gt; Skimming a PR because the author is a “rockstar” who rarely makes mistakes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Ghost Review:&lt;/strong&gt; Adding a comment like “Nice work!” without actually catching the logic bug on line 42.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When companies merge code to &lt;code&gt;main&lt;/code&gt; without deep verification, they aren’t just moving faster, they are accumulating technical debt and security risks at an exponential rate. This lack of a “Quality Gate” is how massive regressions and vulnerabilities slip into production unnoticed.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Market’s Answer: Code Review Agents&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The industry hasn’t ignored this crisis. In the last year, the market for “Code Review Agents” has exploded, but the players generally fall into two distinct camps:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The “All-in-One” Giants&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Platform powerhouses like &lt;strong&gt;GitHub (Copilot)&lt;/strong&gt;, &lt;strong&gt;Anthropic (Claude Code)&lt;/strong&gt;, and &lt;strong&gt;Anysphere (Cursor)&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Their Worldview:&lt;/strong&gt; They want to own the entire developer experience. They believe the best review comes from the same agent that helped you write the code, leveraging the shared context of your intent.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Power Play:&lt;/strong&gt; This shift was cemented when &lt;strong&gt;Anysphere&lt;/strong&gt; acquired &lt;strong&gt;Graphite&lt;/strong&gt; to bridge the gap between local coding and the final merge.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The “Pure-Play” Specialists&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Companies like &lt;strong&gt;CodeRabbit&lt;/strong&gt;, &lt;strong&gt;Qodo&lt;/strong&gt;, and &lt;strong&gt;Baz&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Their Worldview:&lt;/strong&gt; They believe the agent writing the code shouldn’t be the one grading it. They focus exclusively on the review layer, investing in deeper repository indexing to catch architectural breaks that “generalist” agents overlook.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Evaluation Nightmare: Why We’re Still Flying Blind&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Choosing between these players is a nightmare because they all appear nearly identical on the surface. This has left engineering leaders stuck with three flawed ways to evaluate their choices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The “Vibe Check”:&lt;/strong&gt; Install a tool, wait a week, and see if it “feels” correct. It’s subjective and ignores the critical bugs the tool missed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Internal Benchmark:&lt;/strong&gt; Trusting vendor marketing. As the saying goes, “Every vendor is #1 on their own test.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The False Economy (Cheapest Option):&lt;/strong&gt; Choosing based on price. You may save budget, but you can’t measure the impact on your actual safety goals.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Precision Trap:&lt;/strong&gt; The biggest hidden danger is &lt;strong&gt;Noise&lt;/strong&gt;. A player might claim a high “Recall” (they catch every bug), but if they achieve that by leaving 50 “nitpick” comments on a 10-line PR, they cause “Alert Fatigue.” Developers start ignoring the bot, eventually reverting to “vibe merging” just to clear the queue.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Introducing Code Review Bench&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We’ve needed a neutral, third-party way to measure these agents. That is exactly why &lt;strong&gt;Martian&lt;/strong&gt; built &lt;strong&gt;Code Review Bench&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Who is Martian?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Martian is an independent AI research lab (the team behind the Model Router). Because they do not sell a code generation or review agent themselves, they are in a unique and unbiased position to referee the industry. Their core research focuses on &lt;strong&gt;mechanistic interpretability&lt;/strong&gt; — unpacking the “black box” of LLMs to understand exactly how they make decisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1j9hbb60d70ezgsghr6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1j9hbb60d70ezgsghr6.png" width="270" height="186"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Methodology: Beyond the Static Test&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Released recently, &lt;strong&gt;Code Review Bench&lt;/strong&gt; is a public, &lt;a href="https://github.com/withmartian/code-review-benchmark" rel="noopener noreferrer"&gt;open-source&lt;/a&gt; benchmark designed to keep AI tools honest. Unlike previous benchmarks that rely on static datasets (which agents can eventually “memorize” or game), Martian uses a dual-layer approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Offline Benchmark (The Gold Set):&lt;/strong&gt; This is the controlled environment. Martian uses a curated set of 50 PRs from 5 major open source repositories with human-verified golden comments.&lt;br&gt;&lt;br&gt;
Each PR has curated golden comments with severity labels. An LLM judge matches each tool’s review against the golden comments and computes precision and recall.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Online Benchmark (The Continuous&lt;/strong&gt; &lt;strong&gt;Reality Check):&lt;/strong&gt; This is where the benchmark gets revolutionary. It continuously samples fresh real-world PRs from GitHub where code review bots left comments. Because the PRs are recent, tools can’t have memorized them during training.&lt;br&gt;&lt;br&gt;
Each tool ranked by extracting the bot suggestions for each PR and ranking it by matching the human actions on that comment - does the human developer (or his agent) fix the issue or ignore it?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6631j7541vc6i66stc0k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6631j7541vc6i66stc0k.png" width="700" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How the LLM judge works&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Standouts: Deciphering the Leaderboard&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The most important takeaway from Martian’s data is that there is no single “best” tool, only the best tool for your specific goals. While massive volume tools often dominate the charts, the results split clearly between controlled “lab” performance and real-world behavior about what’s the focus on each vendor.&lt;/p&gt;

&lt;p&gt;For checking the dataset by yourself, I suggest to take a look at &lt;a href="https://codereview.withmartian.com/" rel="noopener noreferrer"&gt;https://codereview.withmartian.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I will explain the results the way I see them next.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Offline Mode: The Augment Dominance&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4mt4xlf5szwl7qvxg8q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4mt4xlf5szwl7qvxg8q.png" width="700" height="642"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the controlled Offline Benchmark (the “Gold Set”), &lt;strong&gt;Augment&lt;/strong&gt; didn’t just lead — they dominated. In a “closed-book” environment where bugs are verified and static, Augment’s engine proved remarkably adept at connecting the dots.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Leader:&lt;/strong&gt; Augment took the top spot with a powerful &lt;strong&gt;53.8% F1 score&lt;/strong&gt;, creating a massive gap over the second-place finisher, Cursor, at &lt;strong&gt;44.9%&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Balance:&lt;/strong&gt; With &lt;strong&gt;62.8% recall&lt;/strong&gt; and &lt;strong&gt;47.0% precision&lt;/strong&gt;, Augment shows it can find a significant portion of problems without drowning the developer in noise.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4uvuw8dlj7zuodjv5dr.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4uvuw8dlj7zuodjv5dr.jpeg" width="700" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, the &lt;strong&gt;Graphite&lt;/strong&gt; case is perhaps the most interesting outlier in the offline set. Graphite operates like a surgeon: it achieved a staggering &lt;strong&gt;75.0% precision,&lt;/strong&gt; the highest in the category by a lot, but it came at a major cost. Its &lt;strong&gt;recall was only 8.8%&lt;/strong&gt;, leading to an overall &lt;strong&gt;15.7% F1 score&lt;/strong&gt;. This suggests that while Graphite is almost always right when it speaks, it stays silent on the vast majority of issues in the PR.&lt;/p&gt;

&lt;h4&gt;
  
  
  Update from 7.3.2026:
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1g9efh7h2yntalod0mpb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1g9efh7h2yntalod0mpb.png" width="800" height="678"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I didn’t fully realize this when I wrote the previous section, but I’ve since learned that the “offline” dataset is actually a replication of the code review benchmark used by companies like Augment and Greptile. This context makes Augment’s achievement feel slightly less groundbreaking, as they were essentially tested on familiar ground.&lt;/p&gt;

&lt;p&gt;Currently, &lt;strong&gt;Qodo&lt;/strong&gt; (another impressive Israeli company) holds second place with an F1 score of 47.9%. The story behind their ranking is quite interesting: they were initially in 5th place, but after internal review, the team discovered they had run the benchmark using an incorrect and outdated configuration. They coordinated with &lt;strong&gt;Martian&lt;/strong&gt; to resolve the issue, and once the correct settings were applied, their score jumped — marking another Israeli achievement to be proud of.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Online Mode: Baz’s “David &amp;amp; Goliath” Story&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ht8npewukvpf708tpqa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ht8npewukvpf708tpqa.png" width="700" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we move to the &lt;strong&gt;Online Benchmark&lt;/strong&gt; — which tracks how real developers react to AI comments in the wild — the narrative shifts toward the “underdog.” This is where &lt;strong&gt;Baz&lt;/strong&gt;, a newer Israeli startup, put up staggering numbers that challenge the industry giants.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The “Surgical Sniper” Results&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Taken at face value, Baz dominated the leaderboard in quality-centric metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;#1 in Precision — 70.9%&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;#1 in F1 Score — 52.5%&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;#1 in F0.5 Score — 62.2% (metric with higher weight for precision)&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The performance was so impressive that the Martian team actually made &lt;strong&gt;Baz&lt;/strong&gt; the default reviewer for their own &lt;strong&gt;ARES&lt;/strong&gt; repository.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fel58asjso15vk33715go.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fel58asjso15vk33715go.png" width="700" height="679"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Elephant in the Room: Scale&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We have to be intellectually honest about the “Sample Size” gap. &lt;strong&gt;CodeRabbit&lt;/strong&gt; has tracked nearly 300,000 PRs in this benchmark, &lt;strong&gt;Baz&lt;/strong&gt; has tracked 790.&lt;/p&gt;

&lt;p&gt;Because Baz is a smaller, newer player, their data is naturally noisier. With a smaller user base, a tool can provide specialized attention that is harder to maintain at a “Goliath” scale. However, Baz’s #1 ranking in precision suggests they are operating as a &lt;strong&gt;“Surgical Sniper.”&lt;/strong&gt; They aren’t trying to find every possible bug (which causes alert fatigue), they are trying to ensure that when they &lt;em&gt;do&lt;/em&gt; interrupt a developer, they are 70% likely to be right.&lt;/p&gt;

&lt;p&gt;It is incredible to see Israeli tech competing at this elite level in this field. While the giants have the data, the underdogs currently have the precision and attention. The real test will be whether Baz can maintain these surgical numbers as they scale to meet the volume of the industry leaders.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Giants: CodeRabbit and Cursor&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmoxdqpt0c9qpx5bggi03.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmoxdqpt0c9qpx5bggi03.jpeg" width="700" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, we have to look at the tools that are actually “stress-tested” by the market every single day (at least 3000 Prs). Their results highlight the classic trade-off between &lt;strong&gt;coverage&lt;/strong&gt; and &lt;strong&gt;conciseness&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CodeRabbit (The High-Volume King):&lt;/strong&gt; CodeRabbit is the clear leader in terms of sheer scale. It achieved the &lt;strong&gt;best F1 score (51.2%)&lt;/strong&gt; and the &lt;strong&gt;best recall (53.5%)&lt;/strong&gt; in the online category. If your priority is a “safety net” that catches as many bugs as possible across a massive organization, CodeRabbit is the current gold standard.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cursor (The High-Precision Specialist):&lt;/strong&gt; Cursor maintains its “surgical” reputation even in the wild. While its recall sits lower at &lt;strong&gt;36.6%&lt;/strong&gt;, it boasts a high &lt;strong&gt;precision of 68.1%&lt;/strong&gt;. Cursor isn’t trying to find every single bug, it’s trying to ensure that when it interrupts a developer’s flow, it’s for a very good reason.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why This Matters: Moving Beyond Goodhart’s Law&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This benchmark brings a much-needed layer of accountability to the AI coding space. It does for code review what &lt;strong&gt;SWE-bench&lt;/strong&gt; did for code generation, but with a critical evolutionary step: it accounts for human behavior.&lt;/p&gt;

&lt;p&gt;We’ve learned that static benchmarks will always eventually fall victim to &lt;strong&gt;Goodhart’s Law&lt;/strong&gt;: &lt;em&gt;“When a measure becomes a target, it ceases to be a good measure.”&lt;/em&gt; If AI vendors only optimize for a static “Gold Set,” they will eventually game those metrics without actually helping real-world developers.&lt;/p&gt;

&lt;p&gt;The future of AI evaluation requires these &lt;strong&gt;dynamic systems&lt;/strong&gt; tied directly to real-world impact. Whether you are optimizing for the highest possible recall to catch every potential bug, or the highest precision to protect your senior engineers from alert fatigue, we finally have the data to stop guessing and start measuring.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Which Agent Should You Choose?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Choose High Recall (Augment, CodeRabbit)&lt;/strong&gt; if you are in a high-stakes industry (FinTech/Security) or have many junior devs. You want the bot to catch everything, even if it adds some noise.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Choose High Precision (Cursor, Graphite, Baz)&lt;/strong&gt; if you have a lean team of senior engineers. You only want the bot to speak up if it’s 90% sure it found a real logic flaw, protecting your team from “Alert Fatigue.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Bottom Line&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We are still in the early innings of the “Verifier Era.” No tool has yet cracked the code on perfect, human-level review — the 63% recall ceiling proves that. But with frameworks like &lt;strong&gt;Code Review Bench&lt;/strong&gt;, we are finally moving past the “Vibe Check” and toward a future where we can trust the agents that help us ship.&lt;/p&gt;

&lt;p&gt;Before you buy a tool, look at your own telemetry. If your “Time-to-Merge” has spiked while your “Comments-per-PR” has dropped, you are already Vibe Merging. Use the &lt;strong&gt;Code Review Bench&lt;/strong&gt; results to pick a partner that fits your risk tolerance — whether you need a high-recall safety net or a high-precision surgical assistant.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>codereview</category>
      <category>devtools</category>
      <category>programming</category>
    </item>
    <item>
      <title>Vercel Skills 101</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Fri, 06 Feb 2026 14:25:06 +0000</pubDate>
      <link>https://dev.to/shmulc/vercel-skills-101-334l</link>
      <guid>https://dev.to/shmulc/vercel-skills-101-334l</guid>
      <description>&lt;p&gt;&lt;em&gt;The Package Manager Your AI Agents Were Missing&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1zusdujkjvheocwmom6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1zusdujkjvheocwmom6.png" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Agent Skills: The New Standard
&lt;/h4&gt;

&lt;p&gt;If you’ve been following the AI space lately, you’ve likely heard about &lt;strong&gt;Agent Skills&lt;/strong&gt;. Pioneered by Anthropic, this open standard allows us to package specialized instructions, tools, and scripts into a modular format.&lt;/p&gt;

&lt;p&gt;It’s a brilliant architectural shift: instead of bloating an AI’s context with a 10,000-line system prompt, you provide “dormant” manuals. The agent only reads and “activates” them when it actually needs to perform a specific task.&lt;/p&gt;

&lt;p&gt;In my previous post, &lt;strong&gt;Demystifying Coding Agents&lt;/strong&gt; , I took a deep dive into the problems Skills solve and why they are the natural evolution of context management for coding agents.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Manual Installation Problem
&lt;/h4&gt;

&lt;p&gt;But here’s the catch: &lt;strong&gt;A standard is not a manager.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While tools like &lt;strong&gt;Claude Code&lt;/strong&gt; have built-in ways to fetch skills, the rest of the ecosystem is a fragmented mess. If you’re a developer jumping between &lt;strong&gt;Cursor&lt;/strong&gt; , &lt;strong&gt;Windsurf&lt;/strong&gt; , &lt;strong&gt;Claude Code&lt;/strong&gt; , and &lt;strong&gt;OpenClaw (&lt;/strong&gt; or just using one that doesn’t have built in way to install Skills), you’re currently living in the “Manual Installation Era.”&lt;/p&gt;

&lt;p&gt;To give your agent a new capability today, you usually have to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Hunt down a GitHub repo or a &lt;code&gt;SKILL.md&lt;/code&gt; from anywhere.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Download the directory or file manually.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Manually paste it into a specific hidden folder, like &lt;code&gt;.cursor/skills&lt;/code&gt; or &lt;code&gt;.windsurf/skills&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We’ve essentially regressed to the days before &lt;code&gt;npm&lt;/code&gt; or &lt;code&gt;pip&lt;/code&gt;, where “installing” a library meant dragging a ZIP file into your project and praying your environment variables were correct. It’s manual, it doesn’t scale, and it’s a massive barrier to building truly portable AI agents.&lt;/p&gt;

&lt;h4&gt;
  
  
  A Package Manager for Skills
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Vercel Skills&lt;/strong&gt; solves exactly this. It isn’t trying to create a new standard, it is the &lt;strong&gt;Package Manager&lt;/strong&gt; for the existing one.&lt;/p&gt;

&lt;p&gt;Think of it as the “&lt;strong&gt;NPM for Agents&lt;/strong&gt;.” It provides a CLI and a central registry (&lt;a href="https://skills.sh" rel="noopener noreferrer"&gt;skills.sh&lt;/a&gt;) to find, install, and manage these open-standard skills across every AI agent in your workflow.&lt;/p&gt;

&lt;p&gt;In the first installment of &lt;strong&gt;My Digital Arsenal&lt;/strong&gt; , we discussed how tools like &lt;code&gt;uv&lt;/code&gt; and &lt;code&gt;pip&lt;/code&gt; revolutionized Python development, Vercel’s Skills CLI brings that same level of sanity to the AI ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Use Vercel Skills
&lt;/h3&gt;

&lt;p&gt;The beauty of Vercel Skills is that there is &lt;strong&gt;nothing to install&lt;/strong&gt;. It lives in the cloud and runs on your machine via &lt;code&gt;npx&lt;/code&gt;, mirroring the “zero-config” philosophy of modern dev tools.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;1. Discovery: Finding Your Edge&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Instead of scouring GitHub for the right instructions, you can search the registry directly from your terminal — &lt;code&gt;npx skills find&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F921na7wv393htv1o8nd9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F921na7wv393htv1o8nd9.png" width="800" height="355"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This triggers an interactive, searchable list. If you’re looking for something specific, you can pass a keyword (&lt;code&gt;npx skills find react&lt;/code&gt; for example)&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Installation: Adding Powers to Your Agents
&lt;/h4&gt;

&lt;p&gt;Once you’ve found a skill (e.g., &lt;code&gt;vercel-labs/agent-skills&lt;/code&gt;), you can download it with the command &lt;code&gt;npx skills add vercel-labs/agent-skills&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnah033qdm3f6sqc3hp2r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnah033qdm3f6sqc3hp2r.png" width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This starts an interactive process where you choose the exact skills you want, select which AI agents you are using, and decide if you want the skill at the &lt;strong&gt;global&lt;/strong&gt; or &lt;strong&gt;repo&lt;/strong&gt; level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works under the hood:&lt;/strong&gt; Vercel Skills clones the repo and then download the skills into a central &lt;code&gt;.agents/skills&lt;/code&gt; directory. For every agent you use, it &lt;strong&gt;copies or symlinks&lt;/strong&gt; the relevant files into the appropriate folders (like &lt;code&gt;.cursor/skills&lt;/code&gt; or &lt;code&gt;.windsurf/skills&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;It supports multiple source formats:&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# GitHub shorthand&lt;br&gt;
npx skills add vercel-labs/agent-skills
&lt;h1&gt;
  
  
  Direct path to a specific skill within a repo
&lt;/h1&gt;

&lt;p&gt;npx skills add &lt;a href="https://github.com/vercel-labs/agent-skills/tree/main/skills/web-design-guidelines" rel="noopener noreferrer"&gt;https://github.com/vercel-labs/agent-skills/tree/main/skills/web-design-guidelines&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Local development paths
&lt;/h1&gt;

&lt;p&gt;npx skills add ./my-local-skills&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;
&lt;br&gt;
  &lt;br&gt;
  

&lt;ol&gt;
&lt;li&gt;Managing Your Arsenal
&lt;/li&gt;
&lt;/ol&gt;
&lt;/h4&gt;


&lt;p&gt;Vercel Skills doesn’t just “drop and forget” files. It manages the lifecycle of your skills:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;npx skills list&lt;/code&gt;: List installed skills&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;npx skills check&lt;/code&gt;: Check for available skill updates&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;npx skills update&lt;/code&gt;: Update all installed skills to latest versions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;npx skills remove [skills]&lt;/code&gt;: Remove installed skills from agents&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  4. Verification: “What Can You Do?”
&lt;/h4&gt;

&lt;p&gt;After installing a skill, the best way to test it is to simply ask your agent. Try asking your AI agent:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What skills do you have access to right now?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If everything is wired up correctly, the agent will have acsses to the new skill and be able to “learn” it easily, ready to be “woken up” when the task demands it.&lt;/p&gt;

&lt;p&gt;You can see the full list of commands at &lt;a href="https://github.com/vercel-labs/skills" rel="noopener noreferrer"&gt;vercel-labs/skills&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sharing your own Skill
&lt;/h3&gt;

&lt;p&gt;publishing your own skill is just as easy. To demonstrate, I built a simple GitHub repository: &lt;a href="https://github.com/anuk909/Skills" rel="noopener noreferrer"&gt;anuk909/Skills&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cs8v4ek34426cilu49v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cs8v4ek34426cilu49v.png" width="800" height="328"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It currently contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;gh-pr-review&lt;/code&gt;: Improves agent capabilities for GitHub PRs (based on &lt;code&gt;agynio/gh-pr-review&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;git-worktree-workflow&lt;/code&gt;: A skill I built (based on some cursor rule from my college) tha finally made the worktree workflow usable for me.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To try them out, just run: &lt;code&gt;npx skills add anuk909/skills&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuq3ovy5x7m5taxyee1gx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuq3ovy5x7m5taxyee1gx.png" width="800" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Conclusion
&lt;/h4&gt;

&lt;p&gt;Vercel Skills is the missing link for anyone serious about using AI agents. It solves the “manual installation” headache and makes specialized knowledge portable across our entire stack.&lt;/p&gt;

&lt;p&gt;However, we are still in the early days. Unlike &lt;code&gt;npm&lt;/code&gt; packages, Skills don’t yet have a robust way to manage &lt;strong&gt;versions&lt;/strong&gt; or &lt;strong&gt;dependencies&lt;/strong&gt;. While this simplicity makes them easy to deploy today, it will be interesting to see if the ecosystem evolves toward the complexity of traditional package managers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about you?&lt;/strong&gt; Have you started using Skills in your workflow yet? What’s your preferred method of installing them today?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>devtools</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Demystifying Coding Agents</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Sun, 01 Feb 2026 22:51:49 +0000</pubDate>
      <link>https://dev.to/shmulc/demystifying-coding-agents-59il</link>
      <guid>https://dev.to/shmulc/demystifying-coding-agents-59il</guid>
      <description>&lt;p&gt;&lt;em&gt;Simple Concepts Can Take You a Long Way&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftuadaw4dntsywse4g245.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftuadaw4dntsywse4g245.jpeg" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The transition from “Chat” to “Agent” in software development is often framed as a mystical leap in artificial intelligence. However, from a systems engineering perspective, the shift is actually a result of standardizing the interface between three specific components: &lt;strong&gt;The Reasoning Engine&lt;/strong&gt; , &lt;strong&gt;External Tooling&lt;/strong&gt; , and &lt;strong&gt;Context Management&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Whether you are using Cursor, Windsurf, Claude Code, or a custom open-source setup, the underlying architecture follows a repeatable pattern that manages state and execution over a stateless core.&lt;/p&gt;

&lt;p&gt;The secret? The advancements in coding agents today aren’t about “magic”, they come down to these very simple concepts working in unison.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The LLM: The Reasoning Engine
&lt;/h2&gt;

&lt;p&gt;At the center of any agent is the Large Language Model. In 2026, we’ve moved past the “autocomplete” era. The leap from GPT-4 to the current generation (Claude 4.5, GPT-5) wasn’t just about parameters, it was the shift toward &lt;strong&gt;Native Reasoning&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Models are now trained specifically to utilize larger context windows (2M+ tokens) without losing the “needle in the haystack,” and they are fine-tuned on synthetic “Chain of Thought” data.&lt;/p&gt;

&lt;p&gt;This allows the LLM to act as a CPU with a massive, high-fidelity RAM. It doesn’t just predict the next token, it simulates the logic of the code before typing it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The Planning Loop:&lt;/strong&gt; Before writing a single line of code, a robust agent executes a “Plan → Critique → Act” cycle. It writes a plan, checks if that plan breaks anything, and then executes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  The Gateway (API Standardization)
&lt;/h4&gt;

&lt;p&gt;One of the silent drivers of the agent explosion is the standardization of the interface between the “Brain” and the machine.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Great Equalizer:&lt;/strong&gt; Libraries like &lt;strong&gt;LiteLLM&lt;/strong&gt; and standards like OpenAI’s structured outputs mean you can swap a local Llama-3 model for Claude 4.5 Opus with a single line of config. This “pluggability” allows agents to remain model-agnostic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Power Play:&lt;/strong&gt; Conversely, the “Big Three” (Anthropic, OpenAI, Google) often bake specialized headers into their APIs specifically for tool-calling. If you’re a big enough provider, you don’t follow the interface — you &lt;em&gt;are&lt;/em&gt; the interface, forcing Agent frameworks to write custom logic just to squeeze out that extra 5% of reliability.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ywa91cy0b2fjkg4xmu5.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ywa91cy0b2fjkg4xmu5.jpeg" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Tools &amp;amp; Instructions: The Execution Layer
&lt;/h2&gt;

&lt;p&gt;f the LLM is the &lt;strong&gt;Navigator&lt;/strong&gt; , the &lt;strong&gt;Tools&lt;/strong&gt; are the &lt;strong&gt;Driver&lt;/strong&gt;. An LLM on its own can only talk, Tools give it “hands.” The leap we’ve seen recently is about the orchestration of these hands through a specific set of instructions.&lt;/p&gt;

&lt;h4&gt;
  
  
  Tools
&lt;/h4&gt;

&lt;p&gt;Early tool calling was clunky, relying on rigid JSON blocks that often broke. Today, we use flexible, standardized execution environments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tool Calling:&lt;/strong&gt; Coding agents are proprietary tools for management and file editing. Tools like &lt;code&gt;apply_diff&lt;/code&gt; or &lt;code&gt;undo_rewrite&lt;/code&gt; allow for surgical changes to code and tools &lt;code&gt;todo_list&lt;/code&gt; to keep track of progress.&lt;br&gt;&lt;br&gt;
Each coding agent uses completely different tools that defines big part of it’s DNA.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sandboxed Terminal Execution:&lt;/strong&gt; Modern agents have direct access to a &lt;strong&gt;pseudo-terminal (PTY)&lt;/strong&gt;. The LLM generates standard shell commands (&lt;code&gt;grep&lt;/code&gt;, &lt;code&gt;find&lt;/code&gt;, &lt;code&gt;pytest&lt;/code&gt;, &lt;code&gt;sed&lt;/code&gt;) that run in a secure, isolated sandbox. By capturing &lt;code&gt;stdout&lt;/code&gt; and &lt;code&gt;stderr&lt;/code&gt;, the agent can “see” a compiler error or a failing test and self-correct, closing the loop between thinking and doing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model Context Protocol (MCP):&lt;/strong&gt; MCP is the open standard that connects AI assistants to systems. It decouples the tool logic from the agent UI. It allows a local or remote server to expose its resources, such as a database schema or a Jira board, via a unified JSON-RPC protocol.&lt;br&gt;&lt;br&gt;
The agent doesn’t need a custom plugin for every service, it only needs to speak MCP, and the server handles the rest.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6zwife9zphy2v6fx158y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6zwife9zphy2v6fx158y.png" width="720" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Instructions
&lt;/h4&gt;

&lt;p&gt;This is the “Secret Sauce.” It’s why &lt;strong&gt;Cursor&lt;/strong&gt; , &lt;strong&gt;Windsurf&lt;/strong&gt; , and &lt;strong&gt;Claude Code&lt;/strong&gt; can all use the same Claude 4.5 Sonnet model but product completely different results.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;System Prompt&lt;/strong&gt; is a massive, invisible set of instructions that acts as the agent’s “Operating Manual.” It tells the model &lt;em&gt;how&lt;/em&gt; to use its utility belt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Before you edit a file, you must search the codebase for related symbols. After every shell command, analyze the output for hidden warnings.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The difference in how these prompts are written, some prioritizing speed, others safety and testing, is what defines the product’s DNA. One agent feels like a cautious &lt;strong&gt;Senior Architect&lt;/strong&gt; , while another feels like a rapid-fire &lt;strong&gt;Prototyper&lt;/strong&gt; , all based on the orchestration of the same tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Context Management &amp;amp; Memory: The State Machine
&lt;/h3&gt;

&lt;p&gt;Since the LLM is a stateless engine, the Agent framework must maintain a stateful environment. This is the “Operating System” of the agent, and it’s where the heaviest engineering complexity lies.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Context Window Paradox
&lt;/h4&gt;

&lt;p&gt;A few years ago, we struggled with 8k token windows. In 2026, we have models with 2M+ tokens. However, a bigger bucket doesn’t automatically mean better results. &lt;strong&gt;The paradox is that as the window grows, our expectations grow faster.&lt;/strong&gt; We no longer ask for a single snippet, we expect agents to refactor entire modules, maintain architectural consistency across microservices, and debug complex integration errors.&lt;/p&gt;

&lt;p&gt;This “mission creep” means that even with millions of tokens, &lt;strong&gt;context remains the most precious currency in the system.&lt;/strong&gt; More noise increases the &lt;strong&gt;Needle in a Haystack&lt;/strong&gt; &lt;strong&gt;risk&lt;/strong&gt;. The more ‘fluff’ you add to support these massive tasks, the more likely the LLM is to miss the one critical line of code that matters.&lt;/p&gt;

&lt;p&gt;To manage this, we use &lt;strong&gt;Context Caching&lt;/strong&gt; to keep the codebase “hot” and affordable in GPU memory, and we structure the “State Machine” into three distinct layers:&lt;/p&gt;

&lt;h4&gt;
  
  
  A. The Baseline (Static Rules)
&lt;/h4&gt;

&lt;p&gt;This is the “BIOS” or system configuration, rules that are always true.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Global Rules:&lt;/strong&gt; Project-wide constraints (e.g., &lt;code&gt;.cursorrules, .instructions.md, .windsurfrules&lt;/code&gt;) like “Never use external CSS libraries.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Spatial Context:&lt;/strong&gt; Directory-specific rules (e.g., &lt;code&gt;AGENTS.md&lt;/code&gt;). The agent only loads the “map” for its current folder, keeping the context window lean and focused on the immediate task.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  B. The Knowledge (On-Demand Retrieval)
&lt;/h4&gt;

&lt;p&gt;This layer fetches information only when the agent realizes it doesn’t know something.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Codebase RAG:&lt;/strong&gt; Using Vector Search (for concepts) and Code Graphs (for definitions) to pluck specific snippets. It acts as the agent’s “Library.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Long-Term Memory:&lt;/strong&gt; Systems like Windsurf’s Cascade or Copilot index your past PRs and corrections. This creates a “Personal Profile” so the agent learns your specific habits over time.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;C. The Manuals (Just-in-Time Skills)&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Following the &lt;strong&gt;Anthropic Agent Skills&lt;/strong&gt; (agentskills.io) standard, “Skills” are dormant manuals.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JIT Loading:&lt;/strong&gt; You might have 500 specialized skills (e.g., “AWS Deployment,” “Stripe Integration”). The agent doesn’t “know” them by heart, it “copy-pastes” the relevant manual into its brain only when the task triggers that specific need.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Native Contextual Awareness:&lt;/strong&gt; Modern agents now use “Context Caching.” Instead of re-sending your entire 50,000-line codebase with every message, the API “remembers” the base code, only charging you for the new tokens. This makes “Bigger Context” not just a technical feat, but an economic one.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8napmc2stm7md091apb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8napmc2stm7md091apb.png" width="528" height="522"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;4. The Cumulative Toolkit: Layered Defense&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The advancement of the field isn’t about the newest tool replacing the old one, it’s about building a &lt;strong&gt;Layered Defense&lt;/strong&gt;. No single method is a silver bullet, so we stack them based on their specific strengths and operational costs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmws9zjbvv9lxnf621tjs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmws9zjbvv9lxnf621tjs.png" width="700" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We use Static Rules for safety, RAG for scale, and Skills for precision. We don’t choose one, we layer them so the system has multiple chances to find the right context before it fails&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;5. The Punchline: Standardization as the Catalyst&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The reason these tools finally feel like a “Senior Partner” today isn’t because the models became &lt;strong&gt;smarter&lt;/strong&gt;. It’s because we standardized the &lt;strong&gt;system&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;By moving toward open protocols like &lt;strong&gt;MCP&lt;/strong&gt; and &lt;strong&gt;Agent Skills&lt;/strong&gt; , we have replaced custom-coded complexity with &lt;strong&gt;composability&lt;/strong&gt;. You can write a skill once and share it across Cursor, Windsurf, or your own CLI.&lt;/p&gt;

&lt;p&gt;Once you strip away the marketing, you realize that ‘Agentic AI’ is mostly just a very sophisticated &lt;code&gt;while&lt;/code&gt; loop wrapped around a copy-paste mechanism. But in engineering, a sufficiently advanced loop is indistinguishable from intelligence.&lt;/p&gt;

&lt;p&gt;But that is the beauty of it. It’s not magic, it’s a highly efficient, automated loop of terminal calls and context-stuffing. When you apply that simple loop at scale, fueled by an engaged community building shared “manuals” and servers, the result is a system that works at a professional level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6auky9185n0ahybv2yzy.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6auky9185n0ahybv2yzy.jpeg" width="700" height="579"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next time you hear about a “groundbreaking” new trend in the AI world, there is a high chance it boils down to one of these simple concepts. And if it doesn’t? That’s when things get truly exciting.&lt;/p&gt;

&lt;p&gt;The future of coding isn’t only about building “smarter” brains, it’s about building better connections between the &lt;strong&gt;Brain&lt;/strong&gt; , the &lt;strong&gt;Tools&lt;/strong&gt; , and the &lt;strong&gt;Memory&lt;/strong&gt;. Simple concepts, standardization, and a community that shares its manuals will take us much further than a black box ever could.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>coding</category>
      <category>llm</category>
    </item>
    <item>
      <title>50 Shades of BERT</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Thu, 15 Jan 2026 12:04:48 +0000</pubDate>
      <link>https://dev.to/shmulc/50-shades-of-bert-10nn</link>
      <guid>https://dev.to/shmulc/50-shades-of-bert-10nn</guid>
      <description>&lt;p&gt;&lt;em&gt;The Encoder Architecture that Unified NLP&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvke998amnt5w2q0imn2s.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvke998amnt5w2q0imn2s.jpeg" width="700" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Introduction: The Era of Fragmentation&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before 2018, Natural Language Processing was a collection of siloed crafts. Researchers built custom sequential models, like LSTMs or GRUs, for every specific problem. If a model solved Sentiment Analysis, that progress did not easily transfer to Named Entity Recognition. It was an era of “reinventing the wheel” for every dataset.&lt;/p&gt;

&lt;p&gt;The release of Google’s BERT (Bidirectional Encoder Representations from Transformers) marked a turning point. It was built on the “Attention is All You Need” architecture, but it fundamentally changed the NLP world by being truly bidirectional and providing a single model that can solve many different tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0i7d0mfgjbkqy9hboclx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0i7d0mfgjbkqy9hboclx.png" width="700" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Encoder–Decoder Architecture&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To understand BERT, we must look back at the original Transformer. It was designed for &lt;strong&gt;Machine Translation&lt;/strong&gt; , which required two distinct specialized roles working together.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Encoder (The Reader):&lt;/strong&gt; Its job is to take an input sentence and look at all the words simultaneously. Instead of just one “thought vector,” it generates a rich, contextual mathematical signature for every single word.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Decoder (The Writer):&lt;/strong&gt; Its job is to take those signatures and generate a translation one word at a time, ensuring each new word fits with the ones it has already written.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;BERT is the pure distillation of the Encoder. It doesn’t just summarize, it provides a map of the entire sentence where every word “knows” about every other word.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Great Decoupling&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In 2018, the AI community realized these components were powerful enough to stand on their own. This led to the two main lineages of modern AI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Decoder-Only (GPT-style):&lt;/strong&gt; Optimized for generation. These models are mathematically restricted to looking only at the past to predict the future.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Encoder-Only (BERT-style):&lt;/strong&gt; Optimized for understanding. These models stack multiple Encoders to create a reading comprehension engine. They do not “chat,” but they understand context and nuance better than almost any other architecture.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5ar9m18zy5v7dev0hmw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5ar9m18zy5v7dev0hmw.png" width="700" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Engine: How BERT Learns to Read&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Engine: How BERT Learns to Read&lt;/strong&gt; Because BERT is an Encoder-only model, it utilizes bilateral context. In a sentence like “The bank was closed,” a traditional left-to-right model is blind to the future. It doesn’t know if “bank” refers to a financial building or a riverbed until it reaches the very end. BERT, however, sees the entire sentence at once, looking at words before and after every token simultaneously.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Two Games of Pre-training&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;BERT discovered the structure of language by solving billions of self-supervised puzzles through two primary methods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Masked Language Modeling (MLM):&lt;/strong&gt; Researchers hide about 15% of the words in a sentence. BERT must guess the hidden words using the surrounding 85% of the context. This forces the model to understand how words relate to each other semantically.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Next Sentence Prediction (NSP):&lt;/strong&gt; BERT is shown two sentences and must decide if the second logically follows the first. This teaches the model to understand the flow of ideas and the relationship between entire sentences.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Secret Sauce: Self-Attention&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Imagine every word in a sentence is a person in a room. Self-attention is the process where every person looks at everyone else to decide who is most relevant to them. In the phrase “The animal didn’t cross the street because it was too tired,” the word “it” uses attention to look at every other word. It realizes that “it” has a much stronger mathematical relationship with “animal” than with “street.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faqi9rqrqrzwetqwibjyn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faqi9rqrqrzwetqwibjyn.png" width="700" height="876"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This allows BERT to create context-aware embeddings. Instead of having one static number for the word “bank,” BERT generates a unique mathematical signature for “bank” when it is near “money” and a completely different one when it is near “river.”&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Input and Output: The Transformation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;While we think in words, BERT thinks in vectors (lists of numbers).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Input:&lt;/strong&gt; You feed BERT a sequence of tokens (words or pieces of words). Along with the words, we provide &lt;strong&gt;Positional Encodings,&lt;/strong&gt; essentially coordinates for each word so the model knows where they sit in the sentence compared to other words.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Output:&lt;/strong&gt; BERT outputs a high-dimensional vector for every single token you gave it. These aren’t just definitions, they are rich summaries of what that word means &lt;strong&gt;in that specific sentence&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fns1ekbxg43jtdlt11fbg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fns1ekbxg43jtdlt11fbg.png" width="700" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From here, you can take those outputs and feed them into a tiny final layer (a “head”) to perform your specific task, whether that is classifying an email or finding a person’s name.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note:&lt;/strong&gt; This is a high-level intuition, not a full mathematical breakdown. If you want to have deeper intuition (clearly explained with great visuals), I highly recommend this amazing video by &lt;strong&gt;StatQuest&lt;/strong&gt; :&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Encoder Renaissance&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The 5 Pillars: Deep Technical Variants&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In 2026, we don’t just use “BERT.” We use specialized “shades” optimized for different engineering constraints like speed, memory, and context length.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Original BERT (2018):&lt;/strong&gt; The Google pioneer. It established the bidirectional standard and the 512-token limit. While considered “legacy” by some, it remains the most documented and widely supported baseline for academic reproducibility.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RoBERTa (2019):&lt;/strong&gt; Facebook’s “Robustly Optimized” upgrade. By removing the Next Sentence Prediction (NSP) task and training on &lt;strong&gt;10x more data&lt;/strong&gt; (160GB vs 16GB), it proved that BERT hadn’t been trained long enough. It remains the gold standard for pure accuracy on sentence-level tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DistilBERT (2019):&lt;/strong&gt; Hugging Face’s production workhorse. Using &lt;strong&gt;knowledge distillation&lt;/strong&gt; , it retains 97% of BERT’s performance while being &lt;strong&gt;40% smaller and 60% faster&lt;/strong&gt;. It is the go-to for low-latency sentiment or classification pipelines running on standard CPUs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;TinyBERT (2020):&lt;/strong&gt; An ultra-compact variant from Huawei. Unlike other models, it uses &lt;strong&gt;layer-by-layer distillation&lt;/strong&gt; (mimicking the teacher’s attention and hidden states) to compress BERT down to just ~14.5M parameters. It is specifically designed for extreme constraints like mobile apps and IoT devices.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ModernBERT (2024):&lt;/strong&gt; A breakthrough by &lt;strong&gt;Answer.AI&lt;/strong&gt; and &lt;strong&gt;LightOn&lt;/strong&gt; that drags the architecture into the modern era. It shatters the context limit with a native &lt;strong&gt;8,192-token window&lt;/strong&gt; using &lt;strong&gt;RoPE&lt;/strong&gt;. By integrating &lt;strong&gt;Flash Attention 2&lt;/strong&gt; and being heavily pre-trained on code, it is a hardware-optimized powerhouse that is faster and more accurate than its predecessors for almost every 2026 use case.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The 10 Faces of Inference: The Multiplier&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When you cross those 5 variants with these tasks, you get the “50 Shades.” However, BERT’s utility is best understood through its functional strengths:&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;1. Token-Level Precision&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;NER (Named Entity Recognition):&lt;/strong&gt; Identifying medical codes or legal clauses.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Part-of-Speech Tagging:&lt;/strong&gt; Labeling grammar for deep linguistic analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Coreference Resolution:&lt;/strong&gt; The “pronoun solver” (e.g., figuring out what “it” refers to).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;2. Semantic Logic&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sentiment Analysis:&lt;/strong&gt; Quantifying emotional tone (e.g., brand reputation).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Aspect-Based Sentiment:&lt;/strong&gt; Analyzing specific features (e.g., Food: +, Service: -).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Natural Language Inference (NLI):&lt;/strong&gt; A logic-gate to check if statements are contradictory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Zero-Shot Classification:&lt;/strong&gt; Categorizing text into labels the model was never specifically trained for.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3. Search &amp;amp; Retrieval&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Extractive Question Answering:&lt;/strong&gt; Reading a 50-page PDF and highlighting the exact answer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Semantic Similarity:&lt;/strong&gt; Scoring how closely sentences align to deduplicate datasets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Paraphrase Detection:&lt;/strong&gt; Recognizing if two different search prompts seek the same intent.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why the Spotlight has Returned to Encoders&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;BERT was released in 2018, millennia ago in AI years, yet it remains the “Invisible Giant” of the ecosystem. Even today, the &lt;code&gt;bert-base-uncased&lt;/code&gt; checkpoint sees &lt;strong&gt;38M+ monthly downloads&lt;/strong&gt;(4th most downloaded model), maintaining its status as one of the most integrated architectures in history.&lt;/p&gt;

&lt;p&gt;In fact, the &lt;em&gt;&lt;a href="https://huggingface.co/models?sort=downloads" rel="noopener noreferrer"&gt;Hugging Face hub&lt;/a&gt;&lt;/em&gt; is dominated by Encoders. Models like &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; see over &lt;strong&gt;140M monthly downloads&lt;/strong&gt; , while others like &lt;code&gt;electra-base-discriminator&lt;/code&gt; pull in &lt;strong&gt;52M+&lt;/strong&gt;. This enduring popularity is due to an architecture that provides the surgical precision needed for high-stakes, real-world tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retrieval (RAG):&lt;/strong&gt; Using sentence transformers to find exact context within massive datasets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Classification:&lt;/strong&gt; Powering instant content moderation and sentiment analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Entity Extraction:&lt;/strong&gt; Identifying specific names or codes for privacy and regulatory compliance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While the world focuses on chatty generative models, the numbers show that Encoders continue to do the heavy lifting where accuracy, cost, and latency matter most.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Blind Spots: When NOT to Use an Encoder&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;While Encoders are surgical, they are not a universal solution for every understanding task. Even in “read-only” missions, there are structural boundaries where the BERT architecture reaches its limit. To be a practical architect, you must recognize when the task shifts from pattern recognition to complex reasoning.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Fine-Tuning Tax:&lt;/strong&gt; Unlike large-scale Decoders that excel at Zero-Shot or Few-Shot tasks, BERT is not “plug-and-play.” To achieve its legendary precision, you generally need a substantial labeled dataset to fine-tune the model on your specific domain. If you lack the data to “teach” the model your nuances, a multi-billion parameter Decoder will likely outperform a raw Encoder through sheer scale.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Reasoning Ceiling:&lt;/strong&gt; BERT is a master of &lt;strong&gt;pattern matching&lt;/strong&gt; , but it is not a deep &lt;strong&gt;reasoner&lt;/strong&gt;. If your mission requires multi-step causal logic — such as tracing a complex security vulnerability across multiple code files or following an agentic workflow — the “shallow” understanding of a 300M parameter model cannot compete with the emergent logic found in massive Decoders.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Contextual Rigidity:&lt;/strong&gt; While ModernBERT has expanded the context window, Encoders still process information in a relatively “flat” manner. For tasks that require a “holistic” understanding of a massive project or the ability to weigh conflicting abstract concepts, the dense, multi-layered representations of the largest models still hold a significant edge.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;My Personal Story: BERT Usage At Apiiro&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When I recently joined the AI team at Apiiro, I was surprised to find fine-tuned BERT models powering some of our most critical core projects. Initially, I thought they were historical relics. I quickly learned that &lt;strong&gt;for high-scale, mission-critical missions, BERT isn’t just a fallback — it’s the winner.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Latency:&lt;/strong&gt; When processing millions of queries, a CPU-based BERT beats a token-streaming LLM every time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; Running specialized encoders on standard hardware is a fraction of the cost of generative APIs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Precision:&lt;/strong&gt; For “Discriminative” tasks, like identifying a specific vulnerability in code, BERT’s bidirectional context provides surgical accuracy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fine-Tuning over Prompting:&lt;/strong&gt; Unlike API-based LLMs that rely on prompt engineering, BERT allows us to fine-tune the entire model on our specific domain data. This “muscle memory” makes the model a specialized expert that does one thing perfectly without being distracted by general-purpose “helpfulness.”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After that initial experience, I got to work on another project involving GPU Inference of BERT. That led me down a Rabbit Hole of evaluation, distillation, optimizations, benchmarks, and platform comparisons.&lt;br&gt;&lt;br&gt;
But I will keep all of that (and much more) for another post.&lt;/p&gt;

&lt;p&gt;Over all it was a humbling experience. I learned that sometimes the “senior” move isn’t using the newest model everyone talks about, but choosing the proven, efficient architecture that delivers the best results for your data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frei27iwv0fa6kny33rwg.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frei27iwv0fa6kny33rwg.jpeg" width="700" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Implementation: Three Shades of BERT&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Implementation has become trivial thanks to the &lt;code&gt;transformers&lt;/code&gt; library by HuggingFace. By 2026, the ecosystem has moved toward hardware-aware defaults, meaning these few lines of code often trigger specialized kernels like Flash Attention 2 automatically if they detect a compatible GPU.&lt;/p&gt;

&lt;p&gt;The beauty of these “shades” is that the API remains nearly identical. You simply swap the model checkpoint to change your entire performance profile.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from transformers import pipeline&lt;br&gt;
import torch
&lt;h1&gt;
  
  
  1. DistilBERT: The Production Workhorse
&lt;/h1&gt;
&lt;h1&gt;
  
  
  Task: Sentiment Analysis (High-throughput classification)
&lt;/h1&gt;

&lt;p&gt;classifier = pipeline(&lt;br&gt;
    “sentiment-analysis”, &lt;br&gt;
    model=”distilbert-base-uncased-finetuned-sst-2-english”&lt;br&gt;
)&lt;/p&gt;
&lt;h1&gt;
  
  
  2. RoBERTa: The Precision Specialist
&lt;/h1&gt;
&lt;h1&gt;
  
  
  Task: NER (Token-level sequence labeling)
&lt;/h1&gt;

&lt;p&gt;ner_tagger = pipeline(&lt;br&gt;
    “ner”, &lt;br&gt;
    model=”xlm-roberta-large-finetuned-conll03-en”&lt;br&gt;
)&lt;/p&gt;
&lt;h1&gt;
  
  
  3. ModernBERT: The 2026 Long-Context Standard
&lt;/h1&gt;
&lt;h1&gt;
  
  
  Task: Document-level Classification (Long-form analysis)
&lt;/h1&gt;

&lt;p&gt;doc_model = pipeline(&lt;br&gt;
    “text-classification”, &lt;br&gt;
    model=”answerdotai/ModernBERT-base”,&lt;br&gt;
    model_kwargs={”attn_implementation”: “flash_attention_2”} &lt;br&gt;
)&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  &lt;strong&gt;Conclusion: The Silent Workhorse&lt;/strong&gt;&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;While Generative AI captures the headlines and the public imagination, the BERT family remains the invisible foundation of enterprise software. It is the silent workhorse behind global search engines, automated content moderation, and the high-speed data pipelines that keep modern applications running.&lt;/p&gt;

&lt;p&gt;Understanding these “shades” is what separates a prompt engineer from a practical NLP architect. It is about knowing that you do not always need a trillion parameters to solve a problem. Sometimes, you just need a specialized expert that understands the context of a single sentence with surgical precision.&lt;/p&gt;

&lt;p&gt;As we move further into 2026, the trend is clear: the most senior engineering moves are not about using the biggest and shiny model, but about using the most efficient one for the job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about you?&lt;/strong&gt; Have you found yourself reaching back for “old-school” encoders to solve cost or latency issues in your recent projects, or are you still trying to make generative models fit every classification task? Let’s discuss in the comments below!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>python</category>
    </item>
    <item>
      <title>How AI Can Actually Make You More Authentic</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Sun, 30 Nov 2025 21:01:59 +0000</pubDate>
      <link>https://dev.to/shmulc/how-ai-can-actually-make-you-more-authentic-273b</link>
      <guid>https://dev.to/shmulc/how-ai-can-actually-make-you-more-authentic-273b</guid>
      <description>&lt;p&gt;&lt;em&gt;Use AI in personal branding the right way&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqyo5obfyvq5xq3jkpzyc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqyo5obfyvq5xq3jkpzyc.png" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The AI Era presents a tough choice for creators:&lt;/strong&gt; authenticity or productivity? Ever since I started writing this blog, I’ve been wrestling with that very battle. On one side is the huge temptation and potential of cutting-edge AI tools. on the other, the need to maintain genuine, personal content in a web saturated with “AI slop.” &lt;/p&gt;

&lt;p&gt;My readers deserve what I have to say and my personal experience. How do you find the value in creating content when new tools promise to generate it without any human touch?&lt;/p&gt;

&lt;p&gt;I’m not the only one facing this. The following post is from  of  . James doesn’t just talk about the struggle, he &lt;strong&gt;designs authentic, AI-powered content systems that turn founders into Unpromptable thought leaders.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;His publication name says it all: he’s managed to achieve both productivity and authenticity. Read carefully, and see what lessons you can take away.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;AI won’t make you fake unless you let it&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;You’ve probably heard the warnings. Use AI to build your brand and you’ll sound like everyone else. Automate your content and you’ll lose your voice. Let algorithms handle your messaging and you’ll become a hollow shell of ChatGPT-speak.&lt;/p&gt;

&lt;p&gt;There’s truth in those fears, of course.&lt;/p&gt;

&lt;p&gt;But only if you’re not paying attention.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The authenticity trap&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We need to clarify between what’s true and what’s not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;True&lt;/strong&gt; : Using AI makes you more likely to fall into this trap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False&lt;/strong&gt; : You can NEVER be authentic when you’re using AI. AI and authenticity are opposites, you can have one or the other, but never both.&lt;/p&gt;

&lt;p&gt;This shows up everywhere.&lt;/p&gt;

&lt;p&gt;Technical founders building in public worry that using AI to draft their updates makes them frauds. Creators using ChatGPT to speed up their newsletters feel guilty, like they’re cheating. Businesses who automate parts of their content pipeline wonder if they’re sacrificing the very thing that makes their brand theirs.&lt;/p&gt;

&lt;p&gt;So they choose. Either grind it out manually to stay “real,” or use AI and accept that their brand will feel manufactured.&lt;/p&gt;

&lt;p&gt;But that’s just not true.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;So, when does AI makes you less authentic?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;It’s when you implement AI mindlessly.&lt;/p&gt;

&lt;p&gt;Unmindful AI integration does exactly what the critics warn about.&lt;/p&gt;

&lt;p&gt;When you automate your thinking, you lose your edge. When you let AI decide what you should say, your voice disappears. When you use it to generate content without filtering through your values, your audience feels it immediately.&lt;/p&gt;

&lt;p&gt;They sense the hollow core.&lt;/p&gt;

&lt;p&gt;The posts that sound smart but say nothing. The articles that read like everyone else’s because they were generated the same way everyone else generates them. The fake images.&lt;/p&gt;

&lt;p&gt;This actively damages your brand.&lt;/p&gt;

&lt;p&gt;People don’t trust voices that feel manufactured. They scroll past content that could have come from anyone. They unfollow and block accounts that sound increasingly like bots.&lt;/p&gt;

&lt;p&gt;The fear isn’t unfounded. Bad AI use makes you promptable, easy to replicate, impossible to remember.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Does using AI for content creation makes you more or less authentic?&lt;br&gt;&lt;br&gt;
Please share with us your personal experience in the comments&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;But it doesn’t have to be that way&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Those risks are real. But they’re not inevitable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You can avoid them through mindful AI integration.&lt;/strong&gt; Not some complex framework, just conscious decision-making about where, why, and how you apply AI to your brand.&lt;/p&gt;

&lt;p&gt;The goal isn’t to use AI for everything. It’s to use AI for the right things, so you can be more human where it matters.&lt;/p&gt;

&lt;p&gt;How does this actually happen? Let’s go through the three most significant ways AI can make you more authentic, not less:&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;1. AI clears mental space for the work that matters&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;When AI handles the grunt work, you gain clarity.&lt;/p&gt;

&lt;p&gt;Most creators can’t think strategically because they’re buried in execution. You’re too busy drafting, editing, formatting, and scheduling to ask bigger questions: What does my audience actually need? What change am I trying to create? What makes my perspective unique?&lt;/p&gt;

&lt;p&gt;Using AI to handle repetitive tasks like first drafts, research compilation or formatting, frees your mind to think at a higher level. You can step back and see the forest instead of counting trees.&lt;/p&gt;

&lt;p&gt;This makes you more authentic because you can align your work with your actual values instead of just surviving your task list. When you’re not exhausted from manual labor, you can ensure every piece of content serves your mission.&lt;/p&gt;

&lt;p&gt;You can also use AI as a thinking partner. Feed it your half-formed ideas and let it ask the questions you haven’t considered. Challenge your assumptions. Spot gaps in your logic. Not to replace your thinking, but to sharpen it.&lt;/p&gt;

&lt;p&gt;Need help thinking through this? AI can act as your thinking partner. Copy this prompt to your AI:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Act as my thinking partner. I want to clarify my mission and positioning. Ask me 5 questions that will help me identify what truly matters in my work and what change I want to create. Wait for my answer after each question before moving to the next. Once you have enough information, summarize what you learned and give me suggestions.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt; clearer positioning, stronger messaging, work that sounds more like you because you’ve had time to figure out what “you” actually means.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;2. AI forces you to define what’s truly yours&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;AI makes you more authentic by forcing you to decide what you want to keep doing yourself.&lt;/p&gt;

&lt;p&gt;When you start delegating tasks to AI, you have to get specific. What parts of content creation matter to you? What do you actually enjoy? What feels essential versus what feels like busywork?&lt;/p&gt;

&lt;p&gt;For me, that looked like this: I realized I love coming up with ideas based on personal experience, identifying the emotional or practical value, and structuring the argument. What I hate is the mechanical task of turning bullets into paragraphs, the tedious work of first-draft generation.&lt;/p&gt;

&lt;p&gt;So I outsource that.&lt;/p&gt;

&lt;p&gt;AI handles the painful parts. I handle the parts that I know will make the difference. I double down on my strengths while shoring up my weaknesses.&lt;/p&gt;

&lt;p&gt;And here’s the thing: your audience feels this too. When you double down on what you’re naturally good at and use tools to cover your weaknesses, the quality improves. Your content gets sharper. Your ideas land harder. You show up more consistently because you’re not burning out on tasks you hate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here’s a quick prompt you can use:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I need you to act as my world-class branding coach. Help me map my content creation process. For each step: ideation, research, outlining, drafting, editing, formatting, distribution, ask me: Do I enjoy this? Am I good at this? Does this feel essential to my voice?&lt;/p&gt;

&lt;p&gt;Based on my answers, show me what I should keep doing myself and what I could delegate to AI.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Authenticity isn’t about doing everything yourself. It’s about ensuring the unique value is yours, then using whatever tools help you deliver it.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3. It creates space for what AI can never replace&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;The equation is simple: less time on menial tasks means more time talking to people.&lt;/p&gt;

&lt;p&gt;Real conversations with your audience. Understanding their problems. Solving those problems. Building actual relationships, not just collecting followers.&lt;/p&gt;

&lt;p&gt;In the age of AI, this matters more than ever. Everyone can generate content now. Not everyone can be genuinely present with their community.&lt;/p&gt;

&lt;p&gt;So, use AI to handle the scalable stuff so you have energy left for the irreplaceable stuff. Responding to comments thoughtfully. Having real conversations in DMs. Hosting calls with your community. Noticing patterns in what they’re struggling with and adjusting your work accordingly.&lt;/p&gt;

&lt;p&gt;This makes you more authentic because nothing replicates your personal presence.&lt;/p&gt;

&lt;p&gt;There’s no substitute for your specific insights shaped by your specific experiences. Your willingness to show up and give a damn about the people following your work.&lt;/p&gt;

&lt;p&gt;Strong relationships can’t be automated. But if you don’t use AI or some form of leverage, you’ll never build them at scale. You’ll be too busy fighting with sentence structure to notice what your audience actually needs.&lt;/p&gt;

&lt;p&gt;To push you in the right direction, here’s a prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Analyze this piece of content or post:&lt;/p&gt;

&lt;p&gt;[insert link]&lt;/p&gt;

&lt;p&gt;Identify the top 3 recurring questions or problems my audience mentions. Then suggest 3 specific ways I could spend 30 minutes this week having real conversations with my community about these issues—without creating more content.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The authenticity equation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Mindful AI integration makes you more authentic, not less.&lt;/p&gt;

&lt;p&gt;Not because it makes you work harder, but because it lets you work on what matters. It handles the mechanical so you can focus on the meaningful. It takes care of the repeatable so you can invest in the irreplaceable.&lt;/p&gt;

&lt;p&gt;The creators who are building trust at scale aren’t just using telling ChatGPT: “Write an article about X.”&lt;/p&gt;

&lt;p&gt;They’re using AI intentionally. They’re documenting their creative process and inserting AI where it matters. They’re creating space for the work only they can do.&lt;/p&gt;

&lt;p&gt;That’s how AI helps you be more authentic, not less.&lt;/p&gt;

&lt;p&gt;PS. Are you a founder or creator who wants to learn more about AI-powered authenticity in personal branding? Subscribe to my newsletter,  .&lt;/p&gt;

&lt;p&gt;Enjoyed the post? The most sincere compliment is to share our work, it means a lot.&lt;/p&gt;




&lt;p&gt;Thanks so much to James for this incredible collaboration. Our interaction was truly pleasant, and I’m very much looking forward to sharing the post soon on . &lt;/p&gt;

&lt;p&gt;If this inspires you, and you’re interested in a future collaboration, please reach out, let’s make it happen!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>psychology</category>
      <category>writing</category>
    </item>
    <item>
      <title>My Digital Arsenal #2: Keeping Your Codebase Clean with Pre-Commit Hooks</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Fri, 21 Nov 2025 13:38:32 +0000</pubDate>
      <link>https://dev.to/shmulc/my-digital-arsenal-2-keeping-your-codebase-clean-with-pre-commit-hooks-1goa</link>
      <guid>https://dev.to/shmulc/my-digital-arsenal-2-keeping-your-codebase-clean-with-pre-commit-hooks-1goa</guid>
      <description>&lt;p&gt;&lt;em&gt;Automate Code Quality with Pre-Commit Hooks&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhiuvs6brq17aq3kjgb1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhiuvs6brq17aq3kjgb1.png" width="700" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Welcome back to the second installment of &lt;strong&gt;“My Digital Arsenal,”&lt;/strong&gt; the series where I share the essential tools that power my development workflow.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://open.substack.com/pub/shmulc/p/my-digital-arsenal-1?r=4bvghf&amp;amp;utm_campaign=post&amp;amp;utm_medium=web" rel="noopener noreferrer"&gt;first post&lt;/a&gt; we dove deep into the world of Python package managers, the unsung heroes that keep our project dependencies from collapsing into chaos.&lt;/p&gt;

&lt;p&gt;Today, we are moving from &lt;strong&gt;managing dependencies&lt;/strong&gt; to &lt;strong&gt;managing quality.&lt;/strong&gt; We are setting up our automated guardian for clean code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this post:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Why “consistency” matters more than you think.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The “Manual Trap” of standard linters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How to set up &lt;code&gt;pre-commit&lt;/code&gt; with &lt;code&gt;uv&lt;/code&gt; (The 60-second setup).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bonus:&lt;/strong&gt; A look at &lt;code&gt;prek&lt;/code&gt;, the blazing-fast Rust alternative.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Code Quality Starts with Consistency&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before we talk about tools, let’s talk about the code itself.&lt;/p&gt;

&lt;p&gt;We often think of &lt;strong&gt;“Code Quality”&lt;/strong&gt; as high-level architecture or efficient algorithms. But there is a lower, grittier level of quality that impacts us every single hour: &lt;strong&gt;Consistency.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine reading a book where every paragraph uses a different font size, some sentences end with two periods, and random words are capitalized. Could you read it? Sure. Would it be exhausting? Absolutely.&lt;/p&gt;

&lt;p&gt;Code is no different. When you work on a team, or even alone on a project over several months, entropy naturally sets in.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;One file uses single quotesm,another uses double.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;One function has trailing whitespace, another doesn’t.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Imports are scattered randomly at the top of the file.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Code Review Nightmare:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os, sys # messy imports
def  calculate( x):
    print( “debug”) # remove print
    return x*2;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;These might seem like “non-important” issues, but they create &lt;strong&gt;cognitive friction&lt;/strong&gt;. Every time your brain has to process inconsistent formatting, it has less energy for solving the actual business logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 1: The Mechanical Fixers&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To remove the human element from style policing, the development community created two types of tools to automate the job:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Linters (e.g., Flake8, Ruff, Pylint):&lt;/strong&gt; These are the inspectors. They analyze your code for structural rot, catching errors like undefined variables or unused imports.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Formatters (e.g., Black, YAPF, isort):&lt;/strong&gt; These are the beautifiers. They don’t care what your code &lt;em&gt;does&lt;/em&gt; ; they care how it &lt;em&gt;looks&lt;/em&gt;. They rewrite your code to strictly follow a style guide.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The “Manual Execution” Trap&lt;/strong&gt; In the past, using these tools was a manual ritual. You had to remember to run a command like &lt;code&gt;black .&lt;/code&gt; or &lt;code&gt;flake8&lt;/code&gt; before every single commit.&lt;/p&gt;

&lt;p&gt;It sounds simple, but humans are terrible at repetitive tasks. If you were in a rush (and we are always in a rush), you would forget. You would push the code, wait for the CI pipeline, and then watch it fail 10 minutes later because of a trailing comma.&lt;/p&gt;

&lt;p&gt;This led to the infamous “Walk of Shame” in your git history: a stream of tiny commits labeled &lt;em&gt;“&lt;/em&gt; fix linting &lt;em&gt;,”&lt;/em&gt; &lt;em&gt;“&lt;/em&gt; formatting &lt;em&gt;,”&lt;/em&gt; and &lt;em&gt;“&lt;/em&gt; really fix formatting this time &lt;em&gt;.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We have the tools, but we lack the automation trigger.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is exactly what drove me to adopt&lt;/strong&gt;&lt;code&gt;pre-commit&lt;/code&gt;&lt;strong&gt;.&lt;/strong&gt; On a previous team, we had a CI stage that checked for linting errors, followed by a very long testing stage. If I forgot to run the formatter locally, the CI would fail early, but the context switch killed my momentum. I would have to fix a trivial whitespace error, push again, and wait for the pipeline to restart. We were losing hours of productivity to simple formatting mistakes.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 2: Enter The Automated Guardian&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is where &lt;strong&gt;pre-commit&lt;/strong&gt; comes in.&lt;/p&gt;

&lt;p&gt;If a Continuous Integration (CI) pipeline is the security checkpoint at the airport, &lt;code&gt;pre-commit&lt;/code&gt; is the helper at your front door checking if you have your keys and wallet before you leave the house.&lt;/p&gt;

&lt;p&gt;Under the hood, Git has a feature called “hooks” — scripts that run automatically at specific points in the Git lifecycle. Historically, managing these hooks was a pain. You had to copy-paste unwieldy Bash scripts between projects.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;pre-commit&lt;/code&gt; framework solves this. Instead of messy scripts, you define your rules in one simple &lt;code&gt;.pre-commit-config.yaml&lt;/code&gt; file. When you try to commit, the framework downloads the tools, runs them against your changes, and stops you if something is wrong.&lt;/p&gt;

&lt;p&gt;Here is why it is an essential part of my arsenal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;It Focuses Your Code Reviews:&lt;/strong&gt; The documentation says it best: it “allows a code reviewer to focus on the architecture of a change while not wasting time with trivial style nitpicks.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;It Fixes the “Small Stuff” Automatically:&lt;/strong&gt; It doesn’t just catch issues, it often fixes them. Trailing whitespace, end-of-file, and formatting issues are corrected before they ever hit your repository.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;It’s Multi-Language:&lt;/strong&gt; While this series focuses on the &lt;strong&gt;Python ecosystem&lt;/strong&gt; (and the incredible modern tooling like Ruff), pre-commit is language-agnostic. It can manage hooks for JavaScript, Terraform, JSON, and more, all without you needing to manage &lt;code&gt;npm&lt;/code&gt; or &lt;code&gt;gem&lt;/code&gt; files manually.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivshvqfoigbtwuuuqvd6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivshvqfoigbtwuuuqvd6.png" width="700" height="234"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;My Go-To Pre-Commit Toolkit&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Let’s walk through the minimal, high-impact setup I use to fix the most common annoyances.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. The 60-Second Setup&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Since we are already using &lt;strong&gt;uv&lt;/strong&gt; from the last article, let’s use it here. Instead of polluting your global Python install, we will install &lt;code&gt;pre-commit&lt;/code&gt; as an isolated tool.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# The modern way: Install as an isolated tool (can be done of course with pip too)
uv tool install pre-commit

# The “Magic” command that activates the hooks in this repo
pre-commit install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That second command is the most important one. It installs a tiny script into your &lt;code&gt;.git/hooks/&lt;/code&gt; directory. Now, every time you type &lt;code&gt;git commit&lt;/code&gt;, that script will trigger the framework before Git even saves your changes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;⚠️ The “First Run” Warning:&lt;/strong&gt; The very first time you commit after setting this up, it will be &lt;strong&gt;slow&lt;/strong&gt;. You will see a message like _&lt;code&gt;[INFO] Initializing environment for...&lt;/code&gt;&lt;/em&gt;. Don’t panic. It is creating isolated environments for your hooks. This happens only once. Future commits will be instantaneous._&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. The Recipe: My&lt;/strong&gt;&lt;code&gt;.pre-commit-config.yaml&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Create a file named &lt;code&gt;.pre-commit-config.yaml&lt;/code&gt; in your project’s root. Here is the configuration I use. It covers syntax checking, formatting, and basic file hygiene.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: “v5.0.0”
    hooks:
      - id: check-ast # Is it valid Python?
      - id: check-case-conflict # Avoid case-sensitivity issues on Windows/Mac
      - id: check-merge-conflict # Block commits with ‘&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;’ markers
      - id: check-toml # Validates pyproject.toml
      - id: check-yaml
      - id: check-json
      - id: end-of-file-fixer # Ensures files end with a newline
      - id: trailing-whitespace # Trims accidental whitespace at end of lines

  - repo: https://github.com/astral-sh/ruff-pre-commit
    # Ruff version.
    rev: v0.14.5
    hooks:
      # Run the linter (replaces Flake8)
      - id: ruff
        types_or: [python, pyi]
        args: [--fix]
      # Run the formatter (replaces Black)
      - id: ruff-format
        types_or: [python, pyi]

default_stages: [pre-commit]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;What it looks like in action:&lt;/strong&gt; When you commit, you’ll see this satisfying output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnojodckcoaqddmi15oz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnojodckcoaqddmi15oz.png" width="640" height="227"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. The “Escape Hatch” (For Emergencies)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Sometimes, you just need to commit. Maybe you are working on a messy prototype, or you are saving work before your computer dies. If the hooks are blocking you and you need to bypass them, use the &lt;code&gt;--no-verify&lt;/code&gt; flag:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git commit -m “wip: messy code saving for later” --no-verify
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Use this sparingly. If you bypass the guard too often, you defeat the purpose of having one.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Level Up: The&lt;/strong&gt;&lt;code&gt;pre-push&lt;/code&gt;&lt;strong&gt;Hook&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;You will notice I added &lt;code&gt;default_stages: [pre-commit]&lt;/code&gt; at the bottom. This means these checks run on every &lt;em&gt;commit&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;But what about heavier tasks? Running your entire unit test suite (&lt;code&gt;pytest&lt;/code&gt;) on every commit is too slow, it breaks your flow. But you definitely want them to run before you push your code to the team.&lt;/p&gt;

&lt;p&gt;Git has a specific hook for this called &lt;code&gt;pre-push&lt;/code&gt;. You can add a separate section to your config to run heavy tests only when you push:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- repo: local
    hooks:
      - id: pytest
        name: Run Unit Tests
        entry: uv run pytest
        language: system
        stages: [pre-push]      # &amp;lt;--- Only runs on ‘git push’
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To activate this, you need to run one extra install command&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pre-commit install --hook-type pre-push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now you have a tiered defense:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Commit:&lt;/strong&gt; Fast linting &amp;amp; formatting (Instant).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Push:&lt;/strong&gt; Heavy testing &amp;amp; security scans (Slower, but safe).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Level 5: Expanding Your Toolkit&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We have focused heavily on formatting, but the &lt;code&gt;pre-commit&lt;/code&gt; ecosystem is massive. You can find hooks for almost anything—from enforcing static typing to preventing security leaks.&lt;/p&gt;

&lt;p&gt;I highly recommend exploring hooks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;mypy&lt;/code&gt;: To catch type errors before execution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;detect-secrets&lt;/code&gt;: To prevent accidental commits of API keys or passwords.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;commitizen&lt;/code&gt;: To enforce standardized commit messages across your team.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where to find more?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The best place to start is the &lt;a href="https://pre-commit.com/hooks.html" rel="noopener noreferrer"&gt;Official Supported Hooks Index&lt;/a&gt;. It is a searchable database of thousands of hooks for every language and task imaginable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For a curated deep dive, I strongly recommend checking out Gatlen Culp’s article: &lt;strong&gt;&lt;a href="https://gatlenculp.medium.com/effortless-code-quality-the-ultimate-pre-commit-hooks-guide-for-2025-57ca501d9835" rel="noopener noreferrer"&gt;Effortless Code Quality: The Ultimate Pre-Commit Hooks Guide for 2025&lt;/a&gt;&lt;/strong&gt;. It was a major inspiration for this post and helped me refine my own setup.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pro Tip:&lt;/strong&gt; If you already rely on a specific CLI tool (like &lt;code&gt;bandit&lt;/code&gt;, &lt;code&gt;hadolint&lt;/code&gt;, or &lt;code&gt;sqlfluff&lt;/code&gt;), just Google &lt;code&gt;“tool-name pre-commit”&lt;/code&gt;. If the tool is popular, there is a very high chance a hook for it already exists.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Level 6: The Rust Revolution (&lt;/strong&gt;&lt;code&gt;prek&lt;/code&gt;&lt;strong&gt;)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you read my last post about &lt;code&gt;uv&lt;/code&gt;, you know I am bullish on the “Rust-ification” of the Python ecosystem. We are seeing a wave of tools that are faster, smarter, and easier to use than their predecessors.&lt;/p&gt;

&lt;p&gt;While pre-commit is the industry standard, it is starting to show its age. It requires a Python runtime and can be slow on large repos.&lt;/p&gt;

&lt;p&gt;Then there is the &lt;strong&gt;social aspect&lt;/strong&gt;. I had heard rumors that the maintainer’s interaction style could be ‘abrasive’ but I didn’t get it until I looked at the issue tracker myself. After reading through a few threads and rejected feature requests, I understood why some developers are looking for a friendlier alternative.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Enter&lt;/strong&gt;&lt;code&gt;prek&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;prek&lt;/code&gt; is a reimagined version of pre-commit, built entirely in Rust. It is designed to be a drop-in replacement that respects your existing config but runs circles around the original.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why I’m keeping my eye on it:&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Architectural Efficiency (Speed &amp;amp; Disk Space):&lt;/strong&gt; It completely redesigned how environments are managed. Unlike the original, &lt;code&gt;prek&lt;/code&gt; shares toolchains between hooks rather than duplicating them. It also clones repositories and installs hooks in parallel. Combined with its internal use of &lt;code&gt;uv&lt;/code&gt;, this results in significantly faster setups and half the disk usage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Zero-Hassle Setup:&lt;/strong&gt; It compiles to a single binary with no dependencies. You don’t need to manage Python versions or virtual environments manually — &lt;code&gt;prek&lt;/code&gt; handles all of that automatically. You just download it and run it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Modern Workflow Features:&lt;/strong&gt; It solves long-standing pain points like Monorepo support (via “workspaces”) out of the box. It also adds smarter CLI commands we’ve always wanted, like &lt;code&gt;prek run --directory &amp;lt;dir&amp;gt;&lt;/code&gt; to scope checks to a specific folder, or &lt;code&gt;--last-commit&lt;/code&gt; to check only your latest work.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How to try it&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Since we are already using &lt;code&gt;uv&lt;/code&gt;, installing &lt;code&gt;prek&lt;/code&gt; is a one-liner:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# The modern way: Install as an isolated tool (can be done of course with pip too)
uv tool install

# The “Magic” command that activates the hooks in this repo
prek install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Then, instead of running &lt;code&gt;pre-commit run&lt;/code&gt;, you just run:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prek run --all-files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note:&lt;/strong&gt; _&lt;code&gt;prek&lt;/code&gt; _is newer than&lt;/em&gt;&lt;code&gt;pre-commit&lt;/code&gt; &lt;em&gt;, but it is already battle-tested, powering massive projects like*&lt;em&gt;Apache Airflow&lt;/em&gt;*. While it is still reaching full feature parity, the speed and “plug-and-play” experience make it a compelling alternative if you are tired of the friction with the legacy tool.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion: The Guardian at the Gate&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We started this series by taming our dependencies. Today, we tamed our code quality.&lt;/p&gt;

&lt;p&gt;By setting up &lt;code&gt;pre-commit&lt;/code&gt; (or &lt;code&gt;prek&lt;/code&gt;), you are doing your future self a massive favor. You are stopping the entropy that slowly kills codebases. You are freeing your brain from worrying about whitespace and commas so you can focus on logic and architecture.&lt;/p&gt;

&lt;p&gt;Set it up once. Configure it. Then forget about it, and let the robots do the cleaning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do you have a favorite custom hook I didn’t mention? Let me know in the comments.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://shmulc.substack.com/p/my-digital-arsenal-2/comments" rel="noopener noreferrer"&gt; Leave a comment&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See you in the next installment of &lt;strong&gt;My Digital Arsenal&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>automation</category>
      <category>codequality</category>
      <category>git</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Zero-Cost Automation: How Students Get n8n Free for a Year</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Sun, 12 Oct 2025 11:39:35 +0000</pubDate>
      <link>https://dev.to/shmulc/zero-cost-automation-how-students-get-n8n-free-for-a-year-3e3f</link>
      <guid>https://dev.to/shmulc/zero-cost-automation-how-students-get-n8n-free-for-a-year-3e3f</guid>
      <description>&lt;p&gt;&lt;em&gt;The Step-by-Step to Max Out Your GitHub Student Developer Pack&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1gjklf4nrwd86uobvo2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1gjklf4nrwd86uobvo2.png" width="560" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Automation is the superpower of the modern developer. Whether you want to scrape a hundred websites, connect five different apps, or build an AI workflow that summarize posts for you, tools like &lt;strong&gt;n8n&lt;/strong&gt; make it incredibly simple with visual, low-code nodes. n8n allows you to build complex integrations that save you hours of manual work and introduce a whole new level of efficiency to your projects.&lt;/p&gt;

&lt;p&gt;The problem? That automated magic has to run somewhere. While it’s easy to build a powerful workflow, the associated hosting costs can quickly add up. You could pay for the convenience of n8n’s cloud service, spend a few dollars a month on a cloud hosting platform like AWS, or even run the instance locally on your laptop 24/7.&lt;/p&gt;

&lt;p&gt;Beyond just hosting, you’ll also typically need a domain name to easily access and manage your n8n instance from anywhere. For students, or anyone just running a random side-script or learning a new skill, all those options often feel like overkill, or just plain too expensive (or at least you scare that it will be expensive if you won’t pay attention).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if you could have the full power of n8n running in the cloud, accessible via your own domain, for an entire year without touching your wallet?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://shmulc.substack.com/subscribe?" rel="noopener noreferrer"&gt; Subscribe now&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;GitHub Student Developer Pack&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The secret to achieving zero-cost automation lies with &lt;strong&gt;GitHub&lt;/strong&gt; and its incredibly generous Student Developer Pack (GSDP). This isn’t just a free GitHub Pro account, it’s a bundled collection of free premium software and services that would normally cost hundreds of dollars.&lt;/p&gt;

&lt;p&gt;The GSDP gives you direct access to essential perks like &lt;strong&gt;GitHub Copilot Pro&lt;/strong&gt; for AI-powered coding, &lt;strong&gt;cloud credits&lt;/strong&gt; for hosting platforms like &lt;strong&gt;DigitalOcean&lt;/strong&gt; , and a free domain**** for a year by platforms like &lt;strong&gt;Namecheap&lt;/strong&gt;. By getting this pack, you unlock the specific benefits needed to host your self-managed n8n instance and secure a domain, all for free for a full year.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The First Step: Getting Your Developer Pack&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To unlock these benefits, you first need to gain access to the GSDP.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sign up for a GitHub account:&lt;/strong&gt; If you don’t have one, create a free account at &lt;a href="https://github.com/" rel="noopener noreferrer"&gt;github.com&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Apply for the GSDP:&lt;/strong&gt; Head to &lt;a href="https://education.github.com/pack" rel="noopener noreferrer"&gt;https://education.github.com/pack&lt;/a&gt; and apply for the Student Developer Pack. You’ll need to provide proof of your student status (like a school email or student ID).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Wait for Approval:&lt;/strong&gt; This verification process can take a few days. You’ll receive a confirmation email once you’re approved and gain access to all the partner offers.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This video walks through the eligibility criteria and the sign-up process for the GitHub Student Developer Pack:&lt;/p&gt;

&lt;p&gt;Once your github account is approved, you will be able to get all the benefits in this link &lt;a href="https://education.github.com/pack" rel="noopener noreferrer"&gt;https://education.github.com/pack&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Second Step: Claim your $200 in DigitalOcean Credits&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;One of the most valuable perks for hosting n8n is the &lt;strong&gt;$200 in free credits&lt;/strong&gt; from DigitalOcean. This provides more than enough allowance to run a robust n8n instance for a full year.&lt;/p&gt;

&lt;p&gt;To claim your credits:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Visit the special DigitalOcean GitHub Student Pack &lt;a href="https://cloud.digitalocean.com/github-student-pack?onboarding_origin=github-student-pack&amp;amp;activation_redirect=%2Fgithub-student-pack." rel="noopener noreferrer"&gt;offer page&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You may need to register for a new DigitalOcean account&lt;/strong&gt; if you don’t already have one. Simply follow the on-screen prompts to create your account and link it to your GitHub Student Developer Pack.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once successfully claimed, the $200 credit will be applied to your DigitalOcean account, ready to be used!&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Third Step: Registering a Free Domain with Namecheap&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To make your n8n instance accessible from anywhere and to ensure secure connections, you’ll need a domain name and an SSL certificate. Thankfully, the GitHub Student Developer Pack has you covered here too!&lt;/p&gt;

&lt;p&gt;For this, we’ll use Namecheap’s free .me domain offer, which also includes a one-year free SSL certificate.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Head over to the Namecheap GitHub Student Developer Pack portal: &lt;a href="https://nc.me/github/auth" rel="noopener noreferrer"&gt;https://nc.me/github/auth&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Follow the instructions to register your free &lt;code&gt;.me&lt;/code&gt; domain. I personally chose &lt;code&gt;shmulc.me&lt;/code&gt;, but you can pick any available &lt;code&gt;.me&lt;/code&gt; domain (or explore other domain offers within the GSDP if you prefer a different ending).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;During the process, ensure you activate the included free one-year SSL certificate. This is crucial for securing your n8n instance with HTTPS.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you need a bit more guidance, this video provides a helpful walkthrough for regisetring your domain with &lt;strong&gt;Namecheap&lt;/strong&gt; :&lt;/p&gt;

&lt;p&gt;With your &lt;strong&gt;DigitalOcean&lt;/strong&gt; credits and a free domain (with SSL) in hand, you’re now fully equipped for the final step: deploying n8n!&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Final Step: Creating Your n8n Instance&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;You now have all the components for a powerful, zero-cost n8n setup: the DigitalOcean credits for hosting and the free domain for access. The only thing left is the deployment itself.&lt;/p&gt;

&lt;p&gt;This video by &lt;strong&gt;Nick Saraev&lt;/strong&gt; does a great job visually explaining how to set up and deploy your n8n instance using a cloud server and your new domain name. This link takes you directly to the relevant part of the video that focuses on the deployment steps with &lt;strong&gt;DigitalOcean:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once your instance is live and secure under your new domain, the Deployment setup is complete.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Automate!&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now that you’ve deployed your private n8n instance, your final task is to simply create great automations and share them with the community*&lt;em&gt;!&lt;/em&gt;* Go build those AI summarizers, Telegram bots, and complex data pipelines without worrying about the bill for the next year.&lt;/p&gt;

&lt;p&gt;To help you get started, I highly recommend diving into the &lt;a href="https://docs.n8n.io/courses/" rel="noopener noreferrer"&gt;official text courses&lt;/a&gt;. Once you’ve grasped the fundamentals, challenge yourself to think about a cool automation that could genuinely improve your life or streamline your student workflow. The web is brimming with resources and examples, a quick search will unveil countless possibilities. If there’s a specific n8n tutorial you’d like me to cover in the future, just let me know!&lt;/p&gt;

&lt;p&gt;I truly hope this guide helps you break through the initial automation barrier and empowers you to build incredible things. I’m genuinely excited to hear what you create with this powerful, zero-cost setup! Feel free to share your projects in the comments below or reach out to me directly if you build something cool.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>automation</category>
      <category>n8n</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Forget Manual Solving, Let Z3 Crack The Code</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Wed, 08 Oct 2025 17:34:32 +0000</pubDate>
      <link>https://dev.to/shmulc/forget-manual-solving-let-z3-crack-the-code-2op4</link>
      <guid>https://dev.to/shmulc/forget-manual-solving-let-z3-crack-the-code-2op4</guid>
      <description>&lt;p&gt;&lt;em&gt;A Formal Approach for Solving Logic Puzzles with an SMT Solver&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzt5urwhvjriidwb88glk.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzt5urwhvjriidwb88glk.jpeg" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You know the feeling. That moment when a &lt;strong&gt;Sudoku&lt;/strong&gt; grid snaps into place, the hidden shapes of a &lt;strong&gt;Nonogram&lt;/strong&gt; finally emerge from the pixels, or a perfectly balanced &lt;strong&gt;Kakuro&lt;/strong&gt; sum falls into line. It’s the thrill of logic, the satisfaction of a challenge overcome by sheer brainpower.&lt;/p&gt;

&lt;p&gt;But have you ever noticed just how similar these seemingly different puzzles are? At their core, they all boil down to the exact same thing: &lt;strong&gt;a set of constrains that must be satisfied&lt;/strong&gt;. What if I told you there’s a powerful, elegant way to encode those rules, not just for one puzzle, but for &lt;em&gt;all&lt;/em&gt; of them, and instantly reveal the answer? Welcome to the formal approach to puzzle-solving.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;For the rest of this post, I’ll be assuming you’re familiar with the standard rules of*&lt;em&gt;Sudoku&lt;/em&gt;* , &lt;strong&gt;Kakuro&lt;/strong&gt; , and &lt;strong&gt;Nonograms&lt;/strong&gt;. If you need a quick refresher on any of them, please take a moment now to look them up before we proceed.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Introducing the SMT Solver&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You are right to see the similarity. The elegant solution to these diverse problems lies in &lt;strong&gt;Constraint Satisfaction&lt;/strong&gt;. This is where we introduce the star of the show: an &lt;strong&gt;SMT Solver&lt;/strong&gt; , specifically &lt;strong&gt;Z3&lt;/strong&gt; from Microsoft Research. SMT stands for &lt;strong&gt;Satisfiability Modulo Theories&lt;/strong&gt;. Don’t let the name scare you. Think of it as a highly specialized logic engine that takes all your puzzle’s rules, your “constraints”, and determines if a configuration exists that makes every single rule true simultaneously. If it does, Z3 finds that configuration for you: the solution.&lt;/p&gt;

&lt;p&gt;If no configuration exists that satisfies all the rules, Z3 returns a result of “&lt;strong&gt;unsatisfiable&lt;/strong&gt;.” This happens when your puzzle is logically flawed, perhaps because it was designed poorly or because the starting clues are contradictory. For instance, if you provide a Sudoku with two of the same number in a single row as a starting point, Z3 will quickly determine that the puzzle is impossible, which is often a useful insight in itself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmtiqtvndlq3uvedmp0m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmtiqtvndlq3uvedmp0m.png" width="502" height="504"&gt;&lt;/a&gt;Another example for Unsatisfiable Sudoku, can you see why?&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Not Sat Solver&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now, you may have heard of a simpler &lt;strong&gt;SAT Solver&lt;/strong&gt; , which only deals with pure Boolean logic, meaning statements that are strictly true or false. It’s worth a quick aside: in theoretical computer science, SAT is an &lt;strong&gt;NP-complete problem&lt;/strong&gt; , meaning it’s generally considered one of the hardest problems to solve efficiently. However, in the real world, modern SAT solvers use clever &lt;strong&gt;heuristics&lt;/strong&gt; and techniques that make them incredibly fast and capable of solving most practical instances.&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;SMT Solver&lt;/strong&gt; , however, is much more powerful. It extends SAT by adding support for common mathematical concepts, or “&lt;strong&gt;Theories&lt;/strong&gt; ,” like integers, real numbers, and arithmetic. This is critical for puzzles like &lt;strong&gt;Kakuro&lt;/strong&gt; , which rely heavily on summing numbers, or even &lt;strong&gt;Sudoku&lt;/strong&gt; , which uses integer constraints. Z3 allows us to easily model all those numeric, non-Boolean puzzle requirements, giving us a universal key to cracking these challenges.&lt;/p&gt;

&lt;p&gt;The concept of using computers to solve logic puzzles is certainly not new, specialized Sudoku and Nonogram solvers have been around for years. However, the true elegance of the &lt;strong&gt;formal approach&lt;/strong&gt; lies in its &lt;strong&gt;generality&lt;/strong&gt;. Instead of building a new, dedicated solver for every puzzle type, we can use one strong and general tool like Z3 to tackle a huge range of problems with surprisingly little, highly adaptable code. That versatility is the real game-changer, and we’ll see exactly how it works soon.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Introducing Z3&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before we dive into the code, let’s formally introduce the tool. &lt;strong&gt;Z3&lt;/strong&gt; is one of the world’s most powerful and widely used &lt;strong&gt;SMT solvers&lt;/strong&gt; , developed by &lt;strong&gt;Microsoft Research&lt;/strong&gt; and open-sourced under the &lt;strong&gt;MIT License&lt;/strong&gt;. Initially built for complex problems in &lt;strong&gt;software verification&lt;/strong&gt; , Z3 handles logical formulas and mathematical constraints with exceptional speed. While its core engine is C++, it provides robust bindings for languages like &lt;strong&gt;C, C++, Java, .NET, and Python&lt;/strong&gt;. For this post, however, we will exclusively use the user-friendly &lt;strong&gt;Python binding&lt;/strong&gt; , as it allows us to quickly and elegantly model our puzzles without unnecessary overhead.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Z3 101: The Essentials&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;First, you’ll need the Python bindings:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install z3-solver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now, let’s look at the four core components you need to translate any problem into Z3 code.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Variables: Defining the Unknowns&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In Z3, everything starts with variables that represent the unknown values you’re trying to find. Because Z3 supports various logical &lt;em&gt;Theories&lt;/em&gt; , these variables aren’t just true/false statements, they can represent numbers, arrays, and more.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Int(’name’)&lt;/code&gt;: Creates an &lt;strong&gt;Integer&lt;/strong&gt; variable (e.g., for numbers 1-9).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Bool(’name’)&lt;/code&gt;: Creates a &lt;strong&gt;Boolean&lt;/strong&gt; variable (True/False).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Ints(’a b c’)&lt;/code&gt;: A shorthand for creating multiple integer variables.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from z3 import *
x = Int(’x’) 
is_true = Bool(’flag’)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;2. Constraints: Writing the Rules&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Constraints are the logical formulas that must be true for the system to have a solution. You build these using familiar comparison operators (&lt;code&gt;&amp;lt;&lt;/code&gt;, &lt;code&gt;==&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;) and Z3’s specialized logic functions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;And(c1, c2, ...)&lt;/code&gt;: Requires all contained constraints to be true.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Or(c1, c2, ...)&lt;/code&gt;: Requires at least one contained constraint to be true.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Distinct(a, b, c)&lt;/code&gt;: Forces all listed variables to have different values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;x + y == 10&lt;/code&gt;: Represents standard arithmetic constraints.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We can use the simple &lt;code&gt;solve()&lt;/code&gt; function to find values for variables that meet a system of constraints:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from z3 import *
x, y = Ints(’x y’)
solve(x &amp;gt; 2, y &amp;lt; 10, x + 2*y == 7)
# Output: [y = 0, x = 7]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;3. The Solver: The Engine&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Solver&lt;/code&gt; object is the central component where you accumulate your constraints and ask Z3 to find a solution.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;s = Solver()&lt;/code&gt;: Initializes the solver.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;s.add(constraint)&lt;/code&gt;: Asserts constraints into the solver’s stack.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;s.check()&lt;/code&gt;: Executes the search. Returns &lt;code&gt;sat&lt;/code&gt; (solvable), &lt;code&gt;unsat&lt;/code&gt; (unsolvable), or &lt;code&gt;unknown&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The&lt;/strong&gt;&lt;code&gt;unsat&lt;/code&gt;&lt;strong&gt;Result:&lt;/strong&gt; If &lt;code&gt;s.check()&lt;/code&gt; returns &lt;code&gt;unsat&lt;/code&gt;, it means your problem is logically flawed—no set of assignments can satisfy all the rules simultaneously.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;4. The Model: The Answer&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If &lt;code&gt;s.check()&lt;/code&gt; returns &lt;code&gt;sat&lt;/code&gt; (satisfiable), the solver has successfully found a solution. This assignment of values is called the &lt;strong&gt;Model&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;m = s.model()&lt;/code&gt;: Retrieves the solution object.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;m[variable]&lt;/code&gt;: Extracts the solved value for a specific variable from the model.&lt;/p&gt;

&lt;p&gt;from z3 import *&lt;br&gt;
x = Int(’x’)&lt;br&gt;
s = Solver()&lt;br&gt;
s.add(x * 2 == 14)&lt;/p&gt;

&lt;p&gt;if s.check() == sat:&lt;br&gt;
    m = s.model()&lt;br&gt;
    # The output will be ‘7’&lt;br&gt;
    print(m[x])&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For deeper guidance and advanced topics, I suggest reading the &lt;a href="https://microsoft.github.io/z3guide/programming/Z3%20Python%20-%20Readonly/Introduction" rel="noopener noreferrer"&gt;Official Tutorial&lt;/a&gt; — it’s a great resource for getting a deeper understanding of Z3’s more advanced features.&lt;/p&gt;

&lt;p&gt;By understanding these four building blocks, you have everything required to translate the rules of our logic puzzles into Z3’s logical language. Our goal is to create a single, general function for each puzzle that captures its fundamental rules. Once those core constraints are in the solver, plugging in a specific puzzle instance is trivial. We’ll start with the most familiar puzzle.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Sudoku: The Constraint Classic&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A standard 9x9 Sudoku has four universal constraints: every cell must contain a value from 1 to 9, and every row, column, and 3x3 block must contain distinct values. We can translate these rules directly into Z3.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;The General Solver Function&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We’ll define a function, &lt;code&gt;solve_sudoku&lt;/code&gt;, that takes a &lt;code&gt;sudoku&lt;/code&gt; object (which holds the puzzle’s structure and clues) as its input.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from z3 import *

def solve_sudoku(sudoku: SudokoPuzzle) -&amp;gt; dict[str, str]:
  “”“Solves the given Sudoku puzzle using Z3’s SMT capabilities.”“”

  solver = Solver()

  # 9x9 grid of Z3 Integer variables
  # Assuming ‘sudoku.positions’ is a list of all 81 cell names (’A1’, ‘A2’, ... ‘I9’)
  symbols = {pos: Int(pos) for pos in sudoku.positions}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;1. General Constraints (The Rules)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Next, we add the three core uniqueness constraints using Z3’s powerful &lt;code&gt;Distinct&lt;/code&gt; function.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Constraint 1: Cell Range [1, 9]
  for symbol in symbols.values():
    solver.add(And(symbol &amp;gt;= 1, symbol &amp;lt;= 9))

# Constraint 2: Rows must have distinct values
for row in “ABCDEFGHI”:
  row_symbols = [symbols[row + col] for col in “123456789”]
  solver.add(Distinct(row_symbols))

# Constraint 3: Columns must have distinct values
for col in “123456789”:
  col_symbols = [symbols[row + col] for row in “ABCDEFGHI”]
  solver.add(Distinct(col_symbols))

# Constraint 4: 3x3 Blocks must have distinct values
for i in range(3):
  for j in range(3):
    # List comprehension to select cells in the 3x3 block
    block_symbols = [symbols[”ABCDEFGHI”[m + i * 3] + “123456789”[n + j * 3]] 
                     for m in range(3) for n in range(3)]
    solver.add(Distinct(block_symbols))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;2. Instance Constraints (The Clues)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With the general rules loaded, we now add the specific clues from the puzzle instance passed in the &lt;code&gt;sudoku&lt;/code&gt; parameter.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Assuming ‘sudoku.grid’ is a dictionary like {’A1’: ‘5’, ‘A2’: ‘0’, ...}
for pos, value in sudoku.grid.items():
  if value != “0”: 
    solver.add(symbols[pos] == int(value)) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;3. Solving and Interpreting the Model&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The final step is to ask the solver to find a solution and extract the model.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if solver.check() != sat:
  raise Exception(”Unsolvable Sudoku provided!”)

  # Retrieve the model (the solution)
  model = solver.model()

  # Extract the final integer value for every cell
  values = {pos: model.evaluate(symbol).as_string() for pos, symbol in symbols.items()}

  # The function returns the solved values
  return values
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The resulting code is essentially a direct translation of the Sudoku rules into Z3 constraints. The puzzle-agnostic power of the SMT solver is clear: we didn’t write an algorithm to &lt;em&gt;search&lt;/em&gt; for the solution, we simply wrote the code to &lt;em&gt;describe&lt;/em&gt; the solution.&lt;/p&gt;

&lt;p&gt;Now, let’s look at &lt;strong&gt;Kakuro&lt;/strong&gt; , where we’ll leverage Z3’s powerful &lt;strong&gt;Arithmetic Theory&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F20r810skf7vnn0s6m1iq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F20r810skf7vnn0s6m1iq.png" width="360" height="360"&gt;&lt;/a&gt;An Example for Sudoku Puzzle&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Kakuro: The Arithmetic Challenge&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Unlike Sudoku, &lt;strong&gt;Kakuro&lt;/strong&gt; (or Cross Sums) requires both uniqueness (no repeated digits in a run) &lt;em&gt;and&lt;/em&gt; arithmetic (the cells in a run must sum to a target). This is where the power of the &lt;strong&gt;Arithmetic Theory&lt;/strong&gt; within the SMT solver becomes essential.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The General Solver Function&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We define the &lt;code&gt;solve_kakuro&lt;/code&gt; function to accept the &lt;code&gt;KakuroPuzzle&lt;/code&gt; object. This solver leverages Z3’s &lt;code&gt;Sum&lt;/code&gt; and &lt;code&gt;Distinct&lt;/code&gt; functions together to enforce the puzzle rules.&lt;/p&gt;

&lt;p&gt;To manage the runs, we utilize a simple helper function, &lt;code&gt;get_sum_run&lt;/code&gt;, which identifies the sequence of cells for any given clue.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from z3 import Solver, Int, Sum, Distinct, sat, And, Not, Ints

# Helper function to identify the sequence of cells for a clue
def get_sum_run(
    puzzle: KakuroPuzzle, first_x: int, first_y: int, direction: str
) -&amp;gt; list[Cell]:
    “”“Get cells involved in a sum run starting from a clue cell”“”
    rows, cols = puzzle.size
    cells = []

    if direction == “right”:
        for x in range(first_x + 1, cols):
            if puzzle.is_wall(x, first_y): break
            cells.append((x, first_y))
    else:
        for y in range(first_y + 1, rows):
            if puzzle.is_wall(first_x, y): break
            cells.append((first_x, y))
    return cells

def solve_kakuro(puzzle: KakuroPuzzle) -&amp;gt; Solution | None:
    rows, cols = puzzle.size
    solver = Solver()

    # Create grid of Z3 Integer variables
    grid = [[Int(f”cell_{col}_{row}”) for row in range(rows)] for col in range(cols)]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;1. General Constraints (The Rules and Clues)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;These constraints establish the size and valid range for every cell in the grid, including 0 for the clue/wall cells.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Constraint 1: Cell Range [1, 9] for fillable cells
for row in range(rows):
    for col in range(cols):
        if puzzle.get_clue(col, row):
            # Clue cells (walls) are assigned a constant 0
            solver.add(grid[col][row] == 0)
        else:
            # Fillable cells must be between 1 and 9
            solver.add(And(grid[col][row] &amp;gt;= 1, grid[col][row] &amp;lt;= 9))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;2. Instance Constraints (Sums and Uniqueness)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For every sum clue provided, we apply two simultaneous constraints to the run of cells that follows: the numbers must sum correctly, and all numbers must be unique.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Add sum and uniqueness constraints for every clue
for clue in puzzle.clues:
    x, y, row_sum, col_sum = clue.x, clue.y, clue.row_sum, clue.col_sum

    # Horizontal Run
    if row_sum is not None:
        if right_cells := get_sum_run(puzzle, x, y, “right”):
            cell_vars = [grid[col][row] for col, row in right_cells]

            # Sum Constraint &amp;amp; Uniqueness Constraint
            solver.add(Sum(cell_vars) == row_sum)
            solver.add(Distinct(cell_vars))

    # Vertical Run
    if col_sum is not None:
        if down_cells := get_sum_run(puzzle, x, y, “down”):
            cell_vars = [grid[col][row] for col, row in down_cells]

            # Sum Constraint &amp;amp; Uniqueness Constraint
            solver.add(Sum(cell_vars) == col_sum)
            solver.add(Distinct(cell_vars))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;3. Solving and Interpreting the Model&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The final step is to retrieve the solution, filtering out the zeros corresponding to the clue/wall cells.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if solver.check() == sat:
        model = solver.model()
        solution_cells = []

        # Iterate over the grid to extract the non-zero (fillable) values
        for col in range(cols):
            for row in range(rows):
                if value := model.evaluate(grid[col][row]).as_long():
                    if value &amp;gt; 0:
                        solution_cells.append(SolutionCell(col, row, value))

        return solution_cells
    return None
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The difference between the Sudoku and Kakuro solutions is minimal in terms of code complexity — the SMT solver handles the massive increase in logical complexity (arithmetic) seamlessly.&lt;/p&gt;

&lt;p&gt;If you want to see the complete solutions, with visualizer and scrapper too, check out my &lt;a href="https://github.com/anuk909/Kakuro-Solver" rel="noopener noreferrer"&gt;github repository&lt;/a&gt; for the problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F42rn52hgs0sg4ex9nsbh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F42rn52hgs0sg4ex9nsbh.png" width="449" height="449"&gt;&lt;/a&gt;An Example For Kakuro Puzzle&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Nonogram: Modeling Challenge&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The journey from &lt;strong&gt;Sudoku&lt;/strong&gt; (simple uniqueness) to &lt;strong&gt;Kakuro&lt;/strong&gt; (uniqueness plus arithmetic) shows how Z3’s &lt;strong&gt;SMT capabilities&lt;/strong&gt; handle increasing logical complexity with minimal code changes. However, &lt;strong&gt;Nonograms&lt;/strong&gt; (or Picross) present a slightly different, more abstract modeling challenge.&lt;/p&gt;

&lt;p&gt;Nonograms swap the &lt;strong&gt;Integer Theory&lt;/strong&gt; for a complex sequence problem. The constraints aren’t simple sums or distinct values; they are rules governing the &lt;em&gt;arrangement&lt;/em&gt; of blocks (True/False cells) in a line, separated by at least one space, to match the clue numbers.&lt;/p&gt;

&lt;p&gt;This problem is solvable using the exact same Z3 tools you’ve already learned — Boolean variables, logic operators, and creative constraint formulation — but correctly defining the sequence logic is considerably harder.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Your Challenge:&lt;/strong&gt; I spent several hours finding an elegant way to model the sequence constraints for Nonograms using the Z3 API. Now that you have the knowledge of &lt;strong&gt;Variables&lt;/strong&gt; , &lt;strong&gt;Constraints&lt;/strong&gt; , &lt;strong&gt;Solvers&lt;/strong&gt; , and &lt;strong&gt;Models&lt;/strong&gt; , I’m leaving the Nonogram solver as a challenge to you. If you solve it, I would love to see and discuss your solution in the comments! 💡&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Thank you for joining me on this formal journey into puzzle-solving. I hope this post inspires you to look at your daily newspaper puzzles not just as a pastime, but as a fascinating challenge in &lt;strong&gt;Constraint Satisfaction&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2cnh9m2zgw9l2fcyw0ae.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2cnh9m2zgw9l2fcyw0ae.jpeg" width="700" height="700"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>security</category>
    </item>
    <item>
      <title>The Table That Saved Me</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Mon, 14 Jul 2025 13:56:14 +0000</pubDate>
      <link>https://dev.to/shmulc/the-table-that-saved-me-5cmk</link>
      <guid>https://dev.to/shmulc/the-table-that-saved-me-5cmk</guid>
      <description>&lt;p&gt;&lt;em&gt;How The Big Tasks Table Rescued My Sanity and Deadlines&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3600hl6u0h8m7b5xpoow.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3600hl6u0h8m7b5xpoow.jpeg" width="800" height="533"&gt;&lt;/a&gt;An Homage to the movie “The Girl That Saved Me”, that I never watched and found out about him 5 minutes ago.&lt;/p&gt;

&lt;p&gt;You know that feeling when your brain just can’t hold all your to-dos? In the tech industry, that feeling goes into overdrive. We juggle countless meetings, intricate details, ongoing projects, and so many other obligations that demand serious organization, whether you’re working solo or collaborating with a team.&lt;/p&gt;

&lt;p&gt;While countless resources offer task management solutions, I want to introduce &lt;strong&gt;my personal method: “The Big Tasks Table.”&lt;/strong&gt; It’s far from perfect, incredibly simple, but surprisingly effective.&lt;/p&gt;

&lt;p&gt;In this post, I’ll walk you through the method, share how I started using it, discuss its pros and cons compared to other approaches, and explain why discovering a personal system that genuinely suits your needs is so vital.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;The Task Management Tightrope&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Before I found my footing, task management felt like walking a tightrope. As a junior software engineer, my initial role was manageable with basic organization, mostly relying on OneNote. But my second role? That’s where things spiraled. Our team operated with tight Scrum sprints and GitLab issues, and for me, it was a nightmare. I spent almost half my time organizing and describing tasks instead of actually doing the work. I’m not saying these methods are inherently wrong; it was just my personal experience then, but it left me feeling traumatized by complex systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjslj39byrq4lbeds14w.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjslj39byrq4lbeds14w.jpeg" width="700" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When I transitioned to being a course instructor for a technology course, the chaos continued. We had a demanding one to two months to organize the curriculum, followed by an intense course period. The sheer volume of tasks meant we had to be incredibly organized. One day, my own instructor from when I took the course came to help. They mentioned a common way to categorize tasks: &lt;strong&gt;“Must Have,” “Really Want,” and “Nice To Have.”&lt;/strong&gt; Something immediately clicked.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Introducing: The Big Tasks Table&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;That simple idea sparked “The Big Tasks Table.” I took those three categories and organized them into a structured table, which quickly became full of tasks, hence the name! It took some time to refine and adjust it, even experimenting with colors for collaborative work at one point. I’ve been using it consistently ever since.&lt;/p&gt;

&lt;p&gt;The method itself is incredibly straightforward: You create a table with &lt;strong&gt;three columns&lt;/strong&gt; and &lt;strong&gt;three rows&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Columns:&lt;/strong&gt; This Week, This Month, Ongoing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rows:&lt;/strong&gt; Must Have, Really Want, Nice to Have&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdam6u707c2tewasuw15t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdam6u707c2tewasuw15t.png" width="700" height="127"&gt;&lt;/a&gt;It should look like that!&lt;/p&gt;

&lt;p&gt;Each week, you fill the blocks with your tasks, marking them off as you complete them. At the end of the week, you simply copy the table as is, remove any finished tasks, reset any “ongoing” marks, and keep going. I primarily use OneNote for this, but it works just as well in other platforms like Obsidian or any note-taking app.&lt;/p&gt;

&lt;p&gt;When deciding which tasks to tackle, I recommend starting with the &lt;strong&gt;“Must Have”&lt;/strong&gt; items for &lt;strong&gt;“This Week”&lt;/strong&gt; (the upper-left block) and then moving across or down. If a task requires more detail or has sub-tasks, you can add them directly within that block. For tasks where you’re waiting on someone else, you can mark them in any way you prefer — I personally use highlighting to indicate a dependency.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Handy OneNote Shortcuts&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;OneNote is one of the best products of Microsoft in my opinion, here are some shortcuts you must know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Ctrl+1 — Apply, select, or clear the &lt;strong&gt;To Do&lt;/strong&gt; tag.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ctrl+2 — Apply or clear the &lt;strong&gt;Important&lt;/strong&gt; tag.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ctrl+K — Insert a hyperlink.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ctrl+Alt+H — Highlight the selected text.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Alt+Shift+D — Insert the current date.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Alt+Shift+T — Insert the current time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ctrl + f — search current page&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ctrl + e — search all notebooks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more, check out &lt;a href="https://support.microsoft.com/en-us/office/keyboard-shortcuts-in-onenote-44b8b3f4-c274-4bcc-a089-e80fdcc87950" rel="noopener noreferrer"&gt;Keyboard shortcuts in OneNote&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Why It Works (And When It Might Not)&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;The beauty of the Big Tasks Table lies in its simplicity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Good:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Super Simple:&lt;/strong&gt; It takes only a few minutes to fill with all your tasks. If it takes longer, you’re probably overthinking it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Always Visible:&lt;/strong&gt; You can see all your tasks at a glance, ensuring nothing gets lost or forgotten.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Highly Adaptable:&lt;/strong&gt; It’s easy to adjust to your specific needs. For example, a friend I shared this method with had so many tasks that he adapted it to a daily and weekly view!&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Challenges:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Minimal Detail:&lt;/strong&gt; Its simplicity means there isn’t much space for comprehensive task descriptions or detailed notes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No Analytics:&lt;/strong&gt; You can’t run analytics or generate reports from it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Solo Focus:&lt;/strong&gt; It’s primarily designed for individual work and doesn’t seamlessly integrate with complex collaborative systems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fciy19lekedxle94dkhs7.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fciy19lekedxle94dkhs7.jpeg" width="580" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, these challenges are manageable. For the past two years, I’ve successfully used the Big Tasks Table alongside team-based systems like Jira. It takes almost no extra time to manage my personal tasks in OneNote, even when collaborating within more complex platforms. While I still don’t particularly enjoy intricate systems like Jira, I understand their necessity when working with a team.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Find Your Own System&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;So, do you already have a method to manage your tasks? If yes, I’d love to hear what it is. Share your insights in the comments below, including what works well for you and any unique adaptations you’ve made. Your experiences could inspire others!&lt;/p&gt;

&lt;p&gt;And if you don’t yet have a system, I truly recommend you start with one. The specific tools or methodologies matter less than the act of taking control. Whether it’s my Big Tasks Table, a simple to-do list app, or a complex project management software, any system is infinitely better than no system at all.&lt;/p&gt;

&lt;p&gt;Experiment, adapt, and find what resonates with your personal workflow. The goal is to free up your mental energy from remembering tasks so you can focus on actually completing them, leading to less stress and more accomplished goals.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>learning</category>
      <category>career</category>
      <category>management</category>
    </item>
    <item>
      <title>AI Snippets — Napkin AI</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Sun, 29 Jun 2025 22:11:35 +0000</pubDate>
      <link>https://dev.to/shmulc/ai-snippets-napkin-ai-2nd6</link>
      <guid>https://dev.to/shmulc/ai-snippets-napkin-ai-2nd6</guid>
      <description>&lt;p&gt;&lt;em&gt;Become the Presentation Champion with only one Napkin&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;How often have you found yourself drowning in a sea of text, desperately wishing for a magic wand to turn it into something visually digestible? In the high-stakes world of presentations, words alone often fall short of making an impact.&lt;/p&gt;

&lt;p&gt;Welcome to the first AI Snippets post, a series dedicated to showcasing innovative AI solutions that tackle real-world problems. Today, we’re diving into &lt;strong&gt;Napkin AI&lt;/strong&gt; , a tool designed to revolutionize how you communicate by turning text into stunning diagrams in mere seconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0k0xw62h7gd8i8q1bxf6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0k0xw62h7gd8i8q1bxf6.png" width="700" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Sometimes text is not enough&lt;/strong&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“A picture is worth a thousand words.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Has it ever happened to you that you struggled to understand a piece of text, and then a single diagram made everything clear? Or perhaps you needed to create a presentation that just looked too dry, no matter how much you tweaked the text?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visuals don’t just clarify; they make information memorable, foster engagement, and can transform a tedious presentation into a captivating experience.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Napkin AI website is designed to help you with exactly that: transforming any text into a visual diagram in mere seconds.”&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Example — Cash Flow and Profit&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To illustrate, let’s look at the following text:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Cash flow refers to the actual money moving in and out of a business. When a business receives payments from sales, loans, or investments, that’s cash inflow. When it pays for expenses like salaries, rent, or supplies, that’s cash outflow. Positive cash flow means the business has more money coming in than going out, which is essential for paying bills and funding operations. Negative cash flow means more money is leaving the business than entering it, which can lead to financial difficulties.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Profit, or net income, is the money left over after all expenses have been subtracted from total revenue. This includes both cash expenses (like salaries) and non-cash expenses (like depreciation, which accounts for the wear and tear on assets over time). A business can be profitable on paper (meaning its revenues exceed its expenses) but still have poor cash flow if customers are not paying quickly enough, or if a lot of its expenses are due immediately. Therefore, both strong cash flow and good profit are important indicators of a business’s financial health.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To truly put Napkin AI to the test, I fed it the text on cash flow and profit. After exploring various diagram options, I settled on a visual that clarifies these concepts:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjd08dmwwftqoza8gu6fz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjd08dmwwftqoza8gu6fz.png" width="700" height="685"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Getting Started with Napkin AI&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To start using Napkin AI, go to &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.napkin.ai/" rel="noopener noreferrer"&gt;https://www.napkin.ai/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(PC only) and press on “Get Napkin Free/Sign In”, after that you will be able to sign in with your google account and start creating.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnh1bxxj4wv2oij5xja5o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnh1bxxj4wv2oij5xja5o.png" width="700" height="339"&gt;&lt;/a&gt;Opening Screen&lt;/p&gt;

&lt;p&gt;On the website, you can create a new “Napkin” in three ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start with a blank one&lt;/strong&gt; and insert your text.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Import text from an existing file&lt;/strong&gt; , with support for Docs, PDF, PPTX, MD, and HTML.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Generate text with AI&lt;/strong&gt; , where you can choose any topic in the world and receive AI-generated text on it.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Turtles!&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To test the last option, I prepared some text about turtles (of course) with a few diagrams I created in a minute. Feel free to take a look:&lt;a href="https://app.napkin.ai/page/CgoiCHByb2Qtb25lEiwKBFBhZ2UaJDc3MWZhODg1LWE0NmQtNGY3Yy05OThlLWI4ZWJhYTg4OGVjNw?s=1" rel="noopener noreferrer"&gt; https://app.napkin.ai/page/CgoiCHByb2Qtb25lEiwKBFBhZ2UaJDc3MWZhODg1LWE0NmQtNGY3Yy05OThlLWI4ZWJhYTg4OGVjNw?s=1&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here are some of the diagram that I got:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyuffcd8z7o1u7huw6uqv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyuffcd8z7o1u7huw6uqv.png" width="552" height="408"&gt;&lt;/a&gt;Understanding Turtle Biology&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kqvow0xxioz58n2eyfc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kqvow0xxioz58n2eyfc.png" width="700" height="552"&gt;&lt;/a&gt;Turtle Reproduction Cycle&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Pricing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Currently, Napkin is in beta, and &lt;strong&gt;most of its features are available for free&lt;/strong&gt;. For users needing more, Napkin offers &lt;strong&gt;Plus&lt;/strong&gt; and &lt;strong&gt;Pro&lt;/strong&gt; tiers. These paid plans provide additional benefits such as more AI credits, the option to remove Napkin branding, team management tools, exclusive designs and more.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Plus plan costs $12 per month&lt;/strong&gt; , while the &lt;strong&gt;Pro plan is $30 per month&lt;/strong&gt;. You can also get a &lt;strong&gt;25% discount if you opt for an annual subscription&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For most casual users, the free tier will likely be more than enough. One commendable aspect of Napkin’s approach is their refreshingly &lt;strong&gt;unobtrusive promotion of premium features&lt;/strong&gt;. Unlike many freemium services, they don’t aggressively push upgrades, I didn’t even know that there is paid plan until I found the pricing tab on accident.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fypcxf1elp89r191n711o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fypcxf1elp89r191n711o.png" width="700" height="424"&gt;&lt;/a&gt;Full Plans Information&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Limitations, and Human Touch&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;So, will Napkin AI solve all your problems? In my opinion, not yet. It’s a innovative and unique product, but the diagrams can be a bit repetitive (at least in the free plan), and with more complex texts full of numbers, it might falter. Sometimes none of the diagrams really fit what you have in mind and you get nothing from it.&lt;/p&gt;

&lt;p&gt;It’s important to note that human involvement is still required in the process to decide which diagram is most suitable for the case and to ensure there are no errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In conclusion, I believe this is a brilliant tool that can sometimes help you transform dry text into a visual and impressive diagram. And if not, at least you can have better directions to present the text.&lt;/p&gt;

&lt;p&gt;Why not give Napkin AI a spin yourself? Head over to &lt;a href="https://www.napkin.ai/" rel="noopener noreferrer"&gt;https://www.napkin.ai/&lt;/a&gt; and unleash your inner presentation champion.&lt;/p&gt;

&lt;p&gt;I would love to see the incredible diagrams you create, share them in the comments below!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnq92neptyyhht732srle.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnq92neptyyhht732srle.png" width="700" height="394"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tools</category>
      <category>productivity</category>
      <category>design</category>
    </item>
    <item>
      <title>My Digital Arsenal #1: Mastering Python Package Managers</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Wed, 04 Jun 2025 22:49:11 +0000</pubDate>
      <link>https://dev.to/shmulc/my-digital-arsenal-1-mastering-python-package-managers-3gb7</link>
      <guid>https://dev.to/shmulc/my-digital-arsenal-1-mastering-python-package-managers-3gb7</guid>
      <description>&lt;p&gt;&lt;em&gt;Mastering Python Package Managers&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzidexgb7q9rn4qstqbpv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzidexgb7q9rn4qstqbpv.png" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Welcome to the first installment of my new series, “My Digital Arsenal,” where I’ll be sharing the essential tools that power my development workflow. Forget the dusty old toolbox, this is about the sleek, powerful software that makes coding less of a chore and more of a creative adventure. Each post will spotlight a tool or family of tools I’ve found invaluable and think you might too.&lt;/p&gt;

&lt;p&gt;Today, we’re diving headfirst into the often-underestimated world of &lt;strong&gt;package managers&lt;/strong&gt;. These aren’t just utilities; they’re your project’s lifeline, your sanity savers, and the unsung heroes that prevent your coding environment from collapsing into a chaotic mess of conflicting libraries.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Our Focus: Python’s Package Management&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;It’s true that nearly every modern programming language comes equipped with its own package manager(s) — you might have heard of &lt;strong&gt;npm&lt;/strong&gt; for JavaScript, &lt;strong&gt;Maven&lt;/strong&gt; or &lt;strong&gt;Gradle&lt;/strong&gt; for Java, and &lt;strong&gt;Cargo&lt;/strong&gt; for Rust, to name a few. So, you might wonder why this series kicks off with Python, and why it will likely remain a frequent topic. The main reason is straightforward: Python has been my primary programming language for several years now, and I’m most familiar with its ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Appeal of Python in This Context&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Beyond personal preference, Python has some distinct advantages, especially when we talk about managing external code. It’s known for being incredibly beginner-friendly. If you’ve ever faced the challenge of manually handling dependencies in a language like C++, the relative simplicity of Python’s package management can feel like a breath of fresh air. Coupled with Python’s vast versatility and widespread popularity across many fields, understanding how to effectively manage its packages is a truly valuable skill for any developer working with the language.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What Are Python Package Managers &amp;amp; Why Use Them?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;Python package manager&lt;/strong&gt; is a tool that automates managing the external “code toolkits” (packages, libraries or tools) your project needs. It handles finding, installing, updating, and resolving issues for these packages.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why they’re essential:&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Without one, you’re navigating a tricky path! Here’s how they save the day:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Handle Dependencies:&lt;/strong&gt; Packages often rely on other packages. A manager automatically finds and installs all these necessary “sub-packages” correctly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ensure Stability with Versioning:&lt;/strong&gt; Code toolkits change. Package managers let you control which version of a toolkit your project uses. This prevents updates from unexpectedly breaking your code and ensures everyone on your team uses the same versions (often via a “lock file”).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Keep Projects Separate:&lt;/strong&gt; Using &lt;strong&gt;virtual environments&lt;/strong&gt; (often managed by or with the package manager), you can isolate each project’s toolkits. This stops different projects from having conflicting toolkit versions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Save Time:&lt;/strong&gt; They automate the tedious tasks of downloading, installing, and managing packages, letting you focus on coding.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Simplify Teamwork:&lt;/strong&gt; When everyone uses the same package manager and configuration, it’s easy to share projects and ensure everyone has a consistent development environment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, package managers are crucial for efficiently and reliably using external code in Python, preventing many common headaches.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;My Go-To Python Package Management Toolkit&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Let’s look at the tools I frequently reach for in my Python adventures.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;pip: The OG, The Standard, The Everywhere Tool&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What It Is&lt;/strong&gt; : &lt;strong&gt;pip&lt;/strong&gt; (Pip Installs Packages) is Python’s standard package installer. It’s the command-line tool you’ll use most often to add libraries from the Python Package Index (PyPI) and other sources to your projects. If you’re using a modern version of Python, pip is typically available right out of the box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why It’s Foundational&lt;/strong&gt; :&lt;/p&gt;

&lt;p&gt;Pip is the universal starting point for Python package management due to its ubiquity with Python installs. Its simple commands (like &lt;code&gt;pip install package-name&lt;/code&gt;) make common tasks straightforward, and it reliably handles core installation needs for countless developers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Catch?&lt;/strong&gt; : pip is primarily an &lt;em&gt;installer&lt;/em&gt;. Its dependency resolver aims to find a compatible set of packages rather than creating a deterministic lock file for the entire dependency graph. This can sometimes lead to subtle conflicts or variations in environments over time if dependencies are not meticulously pinned. It also doesn’t inherently create a “lock file” to guarantee identical environments everywhere, though &lt;code&gt;pip freeze&lt;/code&gt; is a common way to pin versions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Basic Usage (The Classics)&lt;/strong&gt; :&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Install a package from PyPI (Python Package Index)
pip install requests

# Install a specific version
pip install requests==2.25.1

# Generate a list of installed packages
pip freeze &amp;gt; requirements.txt

# Install all packages from a requirements file
pip install -r requirements.txt

# Uninstall a package
pip uninstall requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Useful Tip:&lt;/strong&gt; If you find yourself with a Python installation that somehow doesn’t have pip, you can usually install it using Python’s built-in &lt;code&gt;ensurepip&lt;/code&gt; module: &lt;code&gt;python -m ensurepip --upgrade&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvtvgq4p77epbt3g8ip7n.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvtvgq4p77epbt3g8ip7n.jpeg" width="700" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Poetry: The All-in-One Project &amp;amp; Dependency Maestro&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What It Is&lt;/strong&gt; : &lt;strong&gt;Poetry&lt;/strong&gt; is a modern Python tool for comprehensive dependency management, packaging, and project organization. It moves beyond just installing packages, offering an integrated workflow for developing, managing, and distributing Python applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why It’s a Game Changer&lt;/strong&gt; : Poetry brings robust structure, consistency, and reliability to Python projects. It standardizes project configuration and metadata through the &lt;code&gt;pyproject.toml&lt;/code&gt; file and excels at creating fully reproducible environments, which is critical for serious development.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Features &amp;amp; How It Addresses pip's Limitations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Unified Project Definition with&lt;/strong&gt;&lt;code&gt;pyproject.toml&lt;/code&gt;: Poetry uses &lt;code&gt;pyproject.toml&lt;/code&gt; (a now-standard Python project file) as the single source of truth for your project's metadata, dependencies (both main and development), scripts, and even other tool configurations. This is more organized than relying solely on a &lt;code&gt;requirements.txt&lt;/code&gt; or a &lt;code&gt;setup.py&lt;/code&gt;file. Example _&lt;em&gt;&lt;code&gt;pyproject.toml&lt;/code&gt; _&lt;/em&gt; snippet &lt;em&gt;:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;[tool.poetry]&lt;br&gt;
name = "my-awesome-project"&lt;br&gt;
version = "0.1.0"&lt;br&gt;
description = "A truly awesome project."&lt;br&gt;
authors = ["Your Name &lt;a href="mailto:you@example.com"&gt;you@example.com&lt;/a&gt;"]&lt;/p&gt;

&lt;p&gt;[tool.poetry.dependencies]&lt;br&gt;
python = "^3.8"  # Specifies compatible Python versions&lt;br&gt;
requests = "^2.25.1" # Main dependency with version constraint&lt;/p&gt;

&lt;p&gt;[tool.poetry.group.dev.dependencies]&lt;br&gt;
pytest = "^7.0" # Development-only dependency&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reliable, Reproducible Builds&lt;/strong&gt; : Addressing limitations of simpler tools, Poetry employs a sophisticated dependency resolver. This generates a detailed &lt;code&gt;poetry.lock&lt;/code&gt; file, which, while often large and machine-generated (and should be committed to your repository), precisely records all package versions and hashes. This guarantees identical, deterministic builds for everyone on the team.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automatic Virtual Environments&lt;/strong&gt; : Poetry automatically creates and manages a dedicated virtual environment per project, simplifying isolation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Potential Downsides&lt;/strong&gt; : While powerful, Poetry might be overkill for very simple scripts and projects. Its thorough dependency resolution can sometimes be slower than &lt;code&gt;pip&lt;/code&gt; for initial setups in complex projects, and it presents a slightly steeper learning curve due to its richer feature set.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Basic Usage (A Full Workflow)&lt;/strong&gt; :&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Install Poetry (pip works; official docs also recommend their installer
# or pipx for isolated installs)
pip install poetry

# Start a new Poetry-managed project
poetry new my-awesome-project

# Navigate into your project
cd my-awesome-project

# Add a new dependency (updates pyproject.toml and poetry.lock)
poetry add requests

# Install all dependencies from poetry.lock (or pyproject.toml if no lock)
poetry install

# Run a script within the project's virtual environment
poetry run python your_script.py

# Update a specific package (and its dependencies if needed)
poetry update requests # Or 'poetry update' to update all

# To regenerate the lock file based on pyproject.toml (e.g., after manual edits)
poetry lock

# See your dependency tree
poetry show --tree
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Good to Know&lt;/strong&gt; : Poetry also streamlines the process of building your project into distributable formats (e.g., wheels, sdists) and publishing them to PyPI or private repositories, making it a complete lifecycle tool. You can also manage different dependency groups (e.g., for ‘dev’ tools, ‘docs’ generation, or ‘testing’) beyond the main project dependencies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xfinkvtyn089dnnpik9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xfinkvtyn089dnnpik9.png" width="700" height="424"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;uv: The Blazing-Fast Python Packager ⚡&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What It Is&lt;/strong&gt; : &lt;strong&gt;uv&lt;/strong&gt; is an extremely fast Python package installer, resolver, and virtual environment manager, built in Rust by Astral (the creators of Ruff, which I hope to write more about in future post). It’s designed to be a significantly faster alternative to &lt;strong&gt;pip&lt;/strong&gt; and &lt;strong&gt;virtualenv&lt;/strong&gt; , and can work with &lt;code&gt;pyproject.toml&lt;/code&gt; for full project &lt;strong&gt;management&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why It’s Gaining Traction&lt;/strong&gt; : &lt;strong&gt;uv&lt;/strong&gt; '&lt;strong&gt;s&lt;/strong&gt; standout feature is its &lt;strong&gt;exceptional speed&lt;/strong&gt;. It can be 10-100x faster than &lt;strong&gt;pip&lt;/strong&gt; for common operations like installing packages or creating virtual environments, especially when leveraging its global cache. This performance dramatically reduces waiting times, particularly in CI/CD and for large projects.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2F%24s_%21IPvd%21%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252F7e711305-60a1-42d5-90ce-c0418d46b517_496x107.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2F%24s_%21IPvd%21%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252F7e711305-60a1-42d5-90ce-c0418d46b517_496x107.svg" width="496" height="107"&gt;&lt;/a&gt;Speed Comparison&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Capabilities &amp;amp; How It Compares&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Speed Demon:&lt;/strong&gt; This is &lt;strong&gt;uv&lt;/strong&gt; ’&lt;strong&gt;s&lt;/strong&gt; hallmark, drastically cutting down time for package installation, resolution, and virtual environment creation across all its usage modes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Versatile as a pip Replacement:&lt;/strong&gt; &lt;strong&gt;uv&lt;/strong&gt; offers a &lt;code&gt;uv pip&lt;/code&gt; interface mirroring many common &lt;strong&gt;pip&lt;/strong&gt; commands (&lt;code&gt;install&lt;/code&gt;, &lt;code&gt;freeze&lt;/code&gt;, &lt;code&gt;uninstall&lt;/code&gt;), serving as a high-speed drop-in for &lt;code&gt;requirements.txt&lt;/code&gt;-based workflows. It also includes &lt;code&gt;uv venv&lt;/code&gt; for rapid virtual environment creation and &lt;code&gt;uv pip compile&lt;/code&gt; for fast &lt;code&gt;requirements.in&lt;/code&gt; to &lt;code&gt;requirements.txt&lt;/code&gt; compilation (similar to &lt;code&gt;pip-tools&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Powerful as a Project Manager:&lt;/strong&gt; &lt;strong&gt;uv&lt;/strong&gt; also excels at managing projects that use a &lt;code&gt;pyproject.toml&lt;/code&gt; file (with the standard &lt;code&gt;[project]&lt;/code&gt; table for dependencies), offering an organized, centralized approach. Example _&lt;em&gt;&lt;code&gt;pyproject.toml&lt;/code&gt; _&lt;/em&gt; snippet &lt;em&gt;:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;[project]&lt;br&gt;
name = "my-awesome-project"&lt;br&gt;
version = "0.1.0"&lt;br&gt;
description = "A truly awesome project."&lt;br&gt;
readme = "README.md"&lt;br&gt;
dependencies = [&lt;br&gt;
  "httpx",&lt;br&gt;
  "ruff&amp;gt;=0.3.0"&lt;br&gt;
]&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dedicated CLI Tool Management:&lt;/strong&gt; &lt;strong&gt;uv&lt;/strong&gt; includes commands like &lt;code&gt;uv tool install&lt;/code&gt; and &lt;code&gt;uv tool run&lt;/code&gt; to install and run Python CLI applications in isolated environments, similar to &lt;strong&gt;ruff&lt;/strong&gt; or &lt;strong&gt;pipx&lt;/strong&gt; , offering a convenient way to manage your Python-based developer tools.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Potential Downsides &amp;amp; Considerations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Maturity &amp;amp; Feature Gaps:&lt;/strong&gt; As a newer tool, &lt;strong&gt;uv&lt;/strong&gt; is still rapidly evolving. While highly capable, it might not yet mirror every niche feature of the more tenured &lt;strong&gt;pip&lt;/strong&gt; or offer Poetry's full, mature suite of integrated project lifecycle commands (e.g., complex build configurations or extensive publishing plugins). Most existing projects will also be using these established tools, which might influence adoption for ongoing work.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Practical Adoption Hurdles:&lt;/strong&gt; Unlike &lt;strong&gt;pip&lt;/strong&gt; (usually bundled with Python), &lt;strong&gt;uv&lt;/strong&gt; requires a separate installation step. Furthermore, while its speed is a major draw, this specific benefit might be less critical for smaller projects or workflows with infrequent package operations, where existing tools may already be "good enough."&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Basic Usage (Speeding Up Your Workflow):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Install uv (e.g., via pip; check official uv docs for all install options)
pip install uv

# Create a blazing-fast virtual environment
uv venv .venv
# Activate: source .venv/bin/activate (Linux/macOS) or .venv\Scripts\activate (Windows)

# Install packages (like pip; can use pyproject.toml if present)
uv pip install requests

# Add dependency to pyproject.toml &amp;amp; install (if in uv project mode)
uv add httpx --python "&amp;gt;=3.8"

# Install packages from a requirements.txt file
uv pip install -r requirements.txt

# Generate requirements.txt from current environment (like pip freeze)
uv pip freeze &amp;gt; requirements.txt

# Compile a requirements.in file to requirements.txt
uv pip compile requirements.in -o requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Good to Know:&lt;/strong&gt; &lt;strong&gt;uv&lt;/strong&gt; is part of Astral's high-performance Python tooling suite. It's under very active development, with its feature set expanding quickly, and can even manage Python installations directly (&lt;code&gt;uv python install &amp;lt;version&amp;gt;&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8x7otwco88gxluamrqjj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8x7otwco88gxluamrqjj.png" width="700" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Conclusions: Choosing Your Package Management Ally&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Selecting the right Python package manager is key for an efficient, stable development experience. There’s no single “best” tool, as each shines in different scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;pip&lt;/strong&gt; is your foundational tool, perfect for simple scripts and its universal availability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Poetry&lt;/strong&gt; excels at robust, end-to-end project management, offering guaranteed reproducibility for libraries and applications.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;uv&lt;/strong&gt; delivers exceptional speed, whether used as a super-fast &lt;strong&gt;pip&lt;/strong&gt; replacement or an increasingly capable modern project manager.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While these three cover a wide range of needs, the Python ecosystem offers other excellent tools. For instance, &lt;strong&gt;PDM&lt;/strong&gt; provides a modern, comprehensive approach similar to &lt;strong&gt;Poetry&lt;/strong&gt; ; &lt;strong&gt;Conda&lt;/strong&gt; is widely adopted in scientific computing for managing packages (Python and non-Python) and complex environments; and &lt;strong&gt;Hatch&lt;/strong&gt; offers powerful project management and build capabilities.&lt;/p&gt;

&lt;p&gt;Consider your project’s complexity, team collaboration needs, whether absolute reproducibility or sheer speed is paramount, &lt;strong&gt;and if your team (including yourself) leans towards adopting the newest, potentially fastest tools or prefers the stability and widespread familiarity of more established options.&lt;/strong&gt; The Python tooling landscape is always improving, so experiment to find what best suits your workflow — it’s a worthwhile investment for any developer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdeo93ufsvczo9n5wolab.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdeo93ufsvczo9n5wolab.png" width="700" height="398"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>python</category>
      <category>tooling</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Stop Wasting LLM Tokens!</title>
      <dc:creator>Shmulik Cohen</dc:creator>
      <pubDate>Sun, 18 May 2025 19:21:19 +0000</pubDate>
      <link>https://dev.to/shmulc/stop-wasting-llm-tokens-5bj0</link>
      <guid>https://dev.to/shmulc/stop-wasting-llm-tokens-5bj0</guid>
      <description>&lt;p&gt;&lt;em&gt;The Shocking Truth About What Really Affects Your LLM&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fquwgsx9mlkknskdc9d8o.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fquwgsx9mlkknskdc9d8o.jpeg" width="800" height="639"&gt;&lt;/a&gt;Zero Waste Tokens&lt;/p&gt;

&lt;p&gt;In recent years, Large Language Models (LLMs) and vision-language models (VLMs) have taken the world by storm. With their meteoric rise, a new discipline emerged: &lt;strong&gt;Prompt Engineering&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As prompt engineering exploded, so did the myths around it. In this post, I break down what works, what doesn’t, and why being concise might just be the real prompt superpower.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Prompt Engineering?
&lt;/h3&gt;

&lt;p&gt;Prompt engineering is the art of crafting task-specific instructions — prompts — to elicit high-quality outputs from AI models, without modifying their internal architecture or retraining them. Instead of changing the model, we change the &lt;em&gt;input&lt;/em&gt; to unlock the model’s latent capabilities.&lt;/p&gt;

&lt;p&gt;These prompts can be natural language instructions, few-shot examples, or even learned vector embeddings. At their best, prompts act like keys that unlock the right behavior within a powerful pre-trained model.&lt;/p&gt;

&lt;p&gt;The results have been impressive. Prompt engineering has powered everything from more coherent summaries to stronger reasoning and even complex task automation. Naturally, this led to an explosion of prompt libraries, marketplaces, and tools promising “10x better results.”&lt;/p&gt;

&lt;h4&gt;
  
  
  The Prompt Engineering Hype
&lt;/h4&gt;

&lt;p&gt;Take a simple instruction like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Summarize this article in 3–4 paragraphs”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A prompt engineer might turn it into something like::&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiq6hv92wtpp3q6l7kbna.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiq6hv92wtpp3q6l7kbna.png" width="715" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Companies and researchers claim these elaborate prompts significantly outperform basic ones. Prompt marketplaces even emerged, selling optimized templates as premium assets.&lt;/p&gt;

&lt;p&gt;But here’s the thing: &lt;strong&gt;many of these verbose constructions don’t actually improve results as much as people think.&lt;/strong&gt; In many cases, they just waste tokens — and money.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Skeptical View
&lt;/h4&gt;

&lt;p&gt;I’ve always held a healthy skepticism toward “Prompt Engineering”. Not because it’s useless, it can be incredibly valuable and I’ve seen its value firsthand many times, but because its impact is often &lt;strong&gt;overstated.&lt;/strong&gt; Many token-heavy components in so-called optimized prompts don’t meaningfully affect the output at all.&lt;/p&gt;

&lt;p&gt;In this post, I’ll explore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The Rise Of Prompt Engineering, from academic labs to mainstream practice&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Great Simplification process that allows us to craft amazing prompts using fewer words.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Prompt Debloat, a tool that let you see which parts of your prompt matter the most&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Future of Prompting&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s separate prompt engineering fact from fiction and learn how to communicate with LLMs more efficiently.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Rise Of Prompt Engineering
&lt;/h3&gt;

&lt;p&gt;When GPT-3 debuted in 2020, users made a fascinating discovery: the &lt;em&gt;way&lt;/em&gt; you asked the model a question mattered as much as the question itself. This observation birthed prompt engineering — a discipline focused on the art and science of communicating with AI.&lt;/p&gt;

&lt;h4&gt;
  
  
  From Academic Labs to Mainstream Practice
&lt;/h4&gt;

&lt;p&gt;The field evolved rapidly through key research breakthroughs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Few-Shot Learning (2020)&lt;/strong&gt; : Researchers found that showing a model examples of what you wanted — “Here’s how you solve problem X, now solve problem Y” — dramatically improved performance. This technique allowed models to adapt to new tasks with minimal guidance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Chain-of-Thought (2022)&lt;/strong&gt; : The simple instruction “think step by step” revolutionized how models handled complex reasoning. Accuracy on math and logic problems jumped by 20–40%, simply by asking models to show their work.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These techniques helped bridge the gap between general-purpose language models and task-specific results — without any fine-tuning.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Prompt-Heavy Era: Midjourney and Maximum Verbosity
&lt;/h4&gt;

&lt;p&gt;Nowhere was prompt maximalism more visible than in the early days of image generation. In Midjourney v1–4, generating a compelling image required long, detailed prompts:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3a51szn0jjfmmgi6og45.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3a51szn0jjfmmgi6og45.jpeg" width="800" height="800"&gt;&lt;/a&gt;Food photography of cute tiny people preparing a gigantic cheeseburger, ultra realistic, ultra detailed, UHD image — style raw — s 750 — v 5.1&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshrpgl9u13p1xc2fnr8w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshrpgl9u13p1xc2fnr8w.png" width="800" height="533"&gt;&lt;/a&gt;A meticulously engineered external sensor pod for an isolated outpost on Mars, designed to capture environmental data in real time, with a futuristic, aerodynamic shape, integrated soft lighting, and rugged construction that withstands the extreme conditions of a Martian night, perfectly aligned with the advanced space exploration theme. in photorealistic style, Isometric 3D style, isolated on white background with negative space&lt;/p&gt;

&lt;p&gt;A whole subculture emerged on Discord around “prompt recipes.” People spent hours crafting elaborate incantations to control everything from lighting to lens distortion. Some prompts grew to hundreds of tokens. Prompt marketplaces like &lt;strong&gt;PromptBase&lt;/strong&gt; started selling “optimized prompts” that sometimes cost more to run than the image itself.&lt;/p&gt;

&lt;p&gt;The philosophy was clear: &lt;em&gt;more is better.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But that’s no longer the case.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Great Simplification
&lt;/h3&gt;

&lt;p&gt;As models advanced in 2023 and 2024, the need for elaborate prompts sharply declined. Why? Because the models got &lt;strong&gt;smarter&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Models Now Understand More with Less
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Auto-Reasoning&lt;/strong&gt; : GPT-4, Claude, and others began reasoning step-by-step without needing the phrase “let’s think step by step.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Intent Inference&lt;/strong&gt; : These models now infer your goal even from vague or poorly phrased requests.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Self-Prompting Architecture&lt;/strong&gt; : With systems like GPT-4o and Claude 3.5, the models effectively write internal prompts for themselves as they solve problems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, this means your original prompt matters &lt;em&gt;less&lt;/em&gt; than it used to — especially for reasoning tasks.&lt;/p&gt;

&lt;h4&gt;
  
  
  Midjourney v3 vs. GPT-4o: One Line Is Enough
&lt;/h4&gt;

&lt;p&gt;Want to generate a stunning image?&lt;/p&gt;

&lt;p&gt;In Midjourney v3, it might have taken you 80 carefully chosen tokens. In &lt;strong&gt;GPT-4o&lt;/strong&gt; , you can just type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“A surreal painting of a cat floating in space.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;…and get something genuinely beautiful. No need to specify “high detail,” “octane render,” or “golden hour lighting” — the model fills in the gaps intelligently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnc8nvktpl4rwqtl9d89p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnc8nvktpl4rwqtl9d89p.png" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This reflects a broader truth: &lt;strong&gt;prompt engineering today is less about verbosity and more about clarity.&lt;/strong&gt; The goal isn’t to cast a magic spell — it’s to &lt;em&gt;specify what you want&lt;/em&gt; and &lt;em&gt;how you want it formatted&lt;/em&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Reasoning Models: The Ultimate Simplification
&lt;/h4&gt;

&lt;p&gt;The latest frontier is models specifically designed for reasoning, O1 by OpenAI or R1 by DeepSeek. These systems represent a fundamental shift:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;They don’t need explicit scaffolding to reason logically&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;They &lt;strong&gt;i&lt;/strong&gt; nternally generate their own task breakdowns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Initial prompt has weaker affect on the final result&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These models are designed to “figure things out” rather than follow exhaustive instructions. Prompt engineering for them is no longer about micromanaging behavior — it’s about &lt;em&gt;efficiently triggering&lt;/em&gt; the right internal processes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt Debloat
&lt;/h3&gt;

&lt;p&gt;Recently I joined LinkedIn and Amidst the usual stream of motivational posts, Job offers and thoughts about the future of AI, I found this post by &lt;a href="https://www.linkedin.com/in/iddogino?miniProfileUrn=urn%3Ali%3Afsd_profile%3AACoAAAfVMegBgfDLAeNK-zqC-WSis-nNw0X4hCE" rel="noopener noreferrer"&gt;Iddo Gino&lt;/a&gt; (The founder of RapidAPI and talented software engineer featured in Forbes 30 under 30):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.linkedin.com/posts/iddogino_aioptimization-promptengineering-activity-7320842006878908416-svq_?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAAAyeke8BITafzZWtP6WMCOl4gpfmzLZMDS4" rel="noopener noreferrer"&gt;Built a tool to analyze prompt bloat and quality | Iddo Gino posted on the topic | LinkedIn&lt;/a&gt;&lt;/strong&gt;&lt;a href="https://www.linkedin.com/posts/iddogino_aioptimization-promptengineering-activity-7320842006878908416-svq_?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAAAyeke8BITafzZWtP6WMCOl4gpfmzLZMDS4" rel="noopener noreferrer"&gt;&lt;br&gt;&lt;br&gt;
&lt;/a&gt; &lt;em&gt;[Been exploring the concept of "prompt bloat" this week, and ended up building a tool to analyze the importance of words…](&lt;a href="https://www.linkedin.com/posts/iddogino_aioptimization-promptengineering-activity-7320842006878908416-svq" rel="noopener noreferrer"&gt;https://www.linkedin.com/posts/iddogino_aioptimization-promptengineering-activity-7320842006878908416-svq&lt;/a&gt;&lt;/em&gt;?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAAAyeke8BITafzZWtP6WMCOl4gpfmzLZMDS4)&lt;em&gt;[&lt;a href="http://www.linkedin.com](https://www.linkedin.com/posts/iddogino_aioptimization-promptengineering-activity-7320842006878908416-svq" rel="noopener noreferrer"&gt;www.linkedin.com](https://www.linkedin.com/posts/iddogino_aioptimization-promptengineering-activity-7320842006878908416-svq&lt;/a&gt;&lt;/em&gt;?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAAAyeke8BITafzZWtP6WMCOl4gpfmzLZMDS4)&lt;/p&gt;

&lt;p&gt;In the post, Gino introduced a tool he built to analyze which parts of a prompt actually influence the model’s output. The inspiration? A now-famous comment from Sam Altman noting that users adding “please” and “thank you” to prompts was costing OpenAI millions in unnecessary tokens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28mnndwb6c0c5ny13dic.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28mnndwb6c0c5ny13dic.png" width="799" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Gino’s tool is both clever and practical. It lets you visually inspect which parts of a prompt are contributing to the result — and which parts are just taking up space (and money). You can use it as an educational tool to understand prompt mechanics or as a utility to strip down bloated prompts to their most effective core.&lt;/p&gt;

&lt;p&gt;This is exactly the kind of tooling the community needs more of: not magic recipes, but clarity tools**** — ways to make prompting more intentional and efficient.&lt;/p&gt;

&lt;p&gt;You can try out for yourself in this link: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://promptdebloat.datawizz.ai/" rel="noopener noreferrer"&gt;https://promptdebloat.datawizz.ai/&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  A Few Examples
&lt;/h4&gt;

&lt;p&gt;To check the tool, I checked some prompts I found online and the results were stunning.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;&lt;a href="https://docs.anthropic.com/en/resources/prompt-library/python-bug-buster" rel="noopener noreferrer"&gt;Python bug buster&lt;/a&gt;: A&lt;/strong&gt;nthropic has Prompt Library with optimized prompt for a breadth of business and personal tasks, not a place you’d expect to find useless token.I tried the ‘Python bug buster’ prompt, and here are the results:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdim9ieuzq0t7nl4zo2z5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdim9ieuzq0t7nl4zo2z5.png" width="800" height="167"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;140 IQ Senior:&lt;/strong&gt; Recently someone if a vibe coding group that I’m in asked about a general system prompt for an agent and someone else responsed with this draft:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;You are a senior independent {role} with a 140 IQ and an unparalleled attention to detail. Your mission is to act as a quality assurance inspector for the final {task you want to criticize}.&lt;br&gt;&lt;br&gt;
 Your review must ensure that every aspect of the {task} is spot on, all assignment requirements are fully met, and answers are 100% accurate.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I checked this prompt too and got this result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuveffyscl9agjsl7oms3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuveffyscl9agjsl7oms3.png" width="768" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Apparently the IQ reference had minimal impact, showing that verbosity doesn’t equate to utility.&lt;/p&gt;

&lt;p&gt;And you can try any prompt that you got (until 500 words). But how does it work? And can you trust the results or these are just random green and red coloring.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;How Does It Work?&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Prompt Debloat&lt;/strong&gt; uses a method called &lt;strong&gt;token ablation&lt;/strong&gt; (also known as &lt;strong&gt;input perturbation&lt;/strong&gt;) to figure out which words in your prompt actually matter. The basic idea is simple: it removes words from your prompt one by one and sees how much the model’s response changes.&lt;/p&gt;

&lt;p&gt;If removing a word makes little or no difference to the output, that word is probably not pulling its weight — and might just be wasting tokens. On the other hand, if removing a word &lt;em&gt;does&lt;/em&gt; change the response significantly, it’s likely doing important work.&lt;/p&gt;

&lt;p&gt;This process helps you trim down your prompt by spotting the “bloat” — unnecessary words that can safely be cut to save on cost and improve clarity.&lt;/p&gt;

&lt;h4&gt;
  
  
  Limitations of Token Ablation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context Matters&lt;/strong&gt; : A word that seems unimportant in one prompt might be critical in another. Results aren’t always universal.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Too Many Possibilities&lt;/strong&gt; : Testing every combination of tokens quickly becomes overwhelming as the prompt gets longer, so most tools test one token at a time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Subtle Changes&lt;/strong&gt; : Sometimes a word might influence the tone or nuance in ways that aren’t easy to measure just by comparing probabilities or visible output.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So now that you know how it works and what are the Limitations, I invite you to try your most complex prompts &lt;a href="https://promptdebloat.datawizz.ai/" rel="noopener noreferrer"&gt;Prompt Debloat &lt;/a&gt;and share the results in the comment below.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Future of Prompting: Less Magic, More Simplicity
&lt;/h3&gt;

&lt;p&gt;Not long ago, I attended a meetup where the organizers shared their experience building an AI agent to assist with software engineering tasks. As expected, they kicked things off by talking about prompt engineering — how they refined their inputs, added detailed instructions, and experimented with all sorts of formatting tricks to boost performance.&lt;/p&gt;

&lt;p&gt;It sounded like classic prompt wizardry: custom templates, carefully worded system messages, and all the usual prompt engineering lore.&lt;/p&gt;

&lt;p&gt;But then they said something that caught everyone off guard.&lt;/p&gt;

&lt;p&gt;In the end, most of the improvements didn’t come from some secret prompt formula. What actually made the biggest difference? Just using &lt;a href="https://www.cursor.com/" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;’s built-in prompt suggestions — simple, well-structured, and focused on clear output formats. That was it. No prompt maximalism, no elaborate frameworks. The tooling alone lifted their agent far beyond the baseline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8qkd716wg4uhfyixbr0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8qkd716wg4uhfyixbr0.png" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That moment really stayed with me.&lt;/p&gt;

&lt;p&gt;It confirmed something I’ve been noticing for a while: the future of prompting isn’t about conjuring magic words. It’s about thoughtful design — clear intent, less noise, and trusting the model to do its job without micromanagement.&lt;/p&gt;

&lt;p&gt;Prompt engineering isn’t dead. But the “more is more” era is fading. The real power now lies in restraint — knowing what to say, what to leave out, and how to shape the interaction like a good interface, not a spellbook.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The best prompts aren’t the longest. They’re the clearest.&lt;br&gt;&lt;br&gt;
Let’s stop wasting tokens — and start communicating better with our models.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shmulc.substack.com" rel="noopener noreferrer"&gt;AI Superhero&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
