<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: pickuma</title>
    <description>The latest articles on DEV Community by pickuma (@pickuma).</description>
    <link>https://dev.to/pickuma</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3926669%2Fb3923c39-364a-4953-b8f7-aa962d6419e0.jpg</url>
      <title>DEV Community: pickuma</title>
      <link>https://dev.to/pickuma</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pickuma"/>
    <language>en</language>
    <item>
      <title>Woodpecker vs Lemlist vs Instantly: Cold Email Tools That Still Land in 2026</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:53:30 +0000</pubDate>
      <link>https://dev.to/pickuma/woodpecker-vs-lemlist-vs-instantly-cold-email-tools-that-still-land-in-2026-4236</link>
      <guid>https://dev.to/pickuma/woodpecker-vs-lemlist-vs-instantly-cold-email-tools-that-still-land-in-2026-4236</guid>
      <description>&lt;h2&gt;
  
  
  The Deliverability Reset
&lt;/h2&gt;

&lt;p&gt;In early 2024, Google and Yahoo rolled out new sender requirements that quietly killed half the cold email playbook everyone was running. SPF, DKIM, and DMARC became table stakes. Spam complaint rates above 0.3% started costing entire domains. One-click unsubscribe became mandatory for senders moving over 5,000 messages a day.&lt;/p&gt;

&lt;p&gt;The tools that survived this transition aren't the ones with the prettiest editors — they're the ones that take inbox placement seriously. Warm-up isn't a feature anymore, it's a requirement. Inbox rotation matters more than personalization tokens.&lt;/p&gt;

&lt;p&gt;We ran a comparison across three platforms still standing: &lt;strong&gt;Woodpecker&lt;/strong&gt;, &lt;strong&gt;Lemlist&lt;/strong&gt;, and &lt;strong&gt;Instantly&lt;/strong&gt;. All three claim "cold email that lands." Here's how they actually compare for a small B2B SaaS team doing 500–5,000 outbound sends a month.&lt;/p&gt;

&lt;h2&gt;
  
  
  Headline Comparison
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Where Woodpecker Pulls Ahead
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Deliverability infrastructure that compounds
&lt;/h3&gt;

&lt;p&gt;Woodpecker's warm-up runs on Mailivery — a dedicated deliverability network that's been around since before the 2024 reset. The warm-up isn't a checkbox feature; it's actively sending and replying to real conversations across the network, building sender reputation over weeks rather than days. Combined with their domain auditing tools (SPF/DKIM/DMARC pre-flight checks), it's the closest thing we've seen to "deliverability-as-a-service" on the SMB tier.&lt;/p&gt;

&lt;p&gt;Inbox rotation is the second half of this. When you're sending more than ~30 messages a day per inbox, Google flags pattern velocity. Woodpecker distributes sends across multiple connected mailboxes automatically — so 5 mailboxes can handle 150 sends/day without any single account tripping reputation thresholds.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The agency panel is genuinely useful
&lt;/h3&gt;

&lt;p&gt;If you're a founder who occasionally helps other founders with outbound, or a small agency taking on 3–10 clients, Woodpecker's agency panel lets you manage all of them from one login. Lemlist has something similar; Instantly's version is rougher. The differentiator is the per-client billing pass-through — agencies can mark up the platform fee to clients cleanly.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. A real developer surface
&lt;/h3&gt;

&lt;p&gt;This is the surprise. Woodpecker ships a REST API, webhooks, &lt;strong&gt;and an MCP server&lt;/strong&gt; — meaning you can wire it into your AI agent stack directly. For SaaS founders who already have a working agentic prospect-research pipeline, the MCP integration is a quiet differentiator. Lemlist and Instantly both have APIs but lack the same developer-facing surface area.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Most cold email tools assume a human-in-the-loop workflow: human picks leads, drafts sequences, reviews replies. If you've built (or are building) an AI agent that researches prospects, drafts personalized openers, and routes replies, an MCP server means your agent can talk to Woodpecker directly without OAuth dance or API wrapper. This is a quiet edge case today; in 12 months it'll be the default expectation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Where Lemlist Still Wins
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multi-channel from day one.&lt;/strong&gt; Lemlist's native LinkedIn + email + voice-note sequences are the most polished of the three. If your prospects respond to LinkedIn before email (founders, executive titles), Lemlist's threading is worth the price premium.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personalization at scale.&lt;/strong&gt; Image personalization, video personalization, dynamic landing pages. Most of this is gimmicky, but for high-value enterprise outreach where reply rates of 2% matter, it's measurable.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where Instantly Still Wins
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Unlimited warm-up.&lt;/strong&gt; Instantly bundles unlimited warm-up across unlimited inboxes on most plans. If you're running an agency model with 20+ client mailboxes, the per-inbox warm-up cost on Woodpecker adds up. Instantly's flat pricing is cleaner at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-volume senders.&lt;/strong&gt; Instantly is built for teams sending 10,000+ messages/day. Their platform handles velocity better than the others. If you're at SMB scale, this doesn't matter; if you're at sales agency scale, it does.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  A Pricing Reality Check
&lt;/h2&gt;

&lt;p&gt;For a 2-person SaaS doing ~1,500 cold sends/month across 3 connected mailboxes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Woodpecker Cold Email (Starter)&lt;/strong&gt;: ~$39/mo for 1 user, 3 slots. Warm-up included.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lemlist Standard&lt;/strong&gt;: ~$59/mo for similar setup. Multi-channel included.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instantly Hypergrowth&lt;/strong&gt;: ~$97/mo, unlimited inboxes + unlimited warm-up.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pricing inverts depending on how many inboxes you connect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1–3 inboxes&lt;/strong&gt;: Woodpecker is cheapest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5+ inboxes&lt;/strong&gt;: Instantly's unlimited model wins.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anywhere with LinkedIn&lt;/strong&gt;: Lemlist's multi-channel pays for itself if you actually use LinkedIn.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;All three platforms strongly recommend (or sell) secondary domains for cold sending — burning your main domain on cold reputation is a one-way street. Budget $10–20/month per secondary domain (registrar + Google Workspace seat or alternative). For 5 inboxes, that's $50–100/month on top of the platform fee. Most reviews skip this.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How to Decide in 5 Minutes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Are you sending more than 5,000 cold emails per month?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yes → Instantly's unlimited model probably wins on cost.&lt;/li&gt;
&lt;li&gt;No → continue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Is LinkedIn outreach a core part of your motion?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yes → Lemlist. The native multi-channel threading is the best in class.&lt;/li&gt;
&lt;li&gt;No → continue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Do you have (or want to build) an AI agent that automates prospect research?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yes → Woodpecker. The MCP server is a real edge.&lt;/li&gt;
&lt;li&gt;No → Woodpecker on price, but Lemlist is fine if you'll grow into LinkedIn.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most SMB SaaS founders at the 500–3,000 sends/month range with no LinkedIn play, Woodpecker is the default answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We'd Test in the Trial
&lt;/h2&gt;

&lt;p&gt;Woodpecker offers a 7-day free trial. We'd push hard on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The warm-up itself.&lt;/strong&gt; Connect a brand-new domain, start the warm-up, and use a tool like &lt;a href="https://glockapps.com" rel="noopener noreferrer"&gt;GlockApps&lt;/a&gt; or &lt;a href="https://www.mailreach.co" rel="noopener noreferrer"&gt;MailReach&lt;/a&gt; to test inbox placement after 7 days. The improvement curve is the real signal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The inbox rotation logic.&lt;/strong&gt; Connect 3 mailboxes, set a daily send cap of 50/inbox, and verify the platform actually distributes evenly without manual intervention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The MCP server.&lt;/strong&gt; If you have an existing agent stack, wire it up. Test creating a campaign, adding leads, and pulling reply data through the MCP interface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The reply detection.&lt;/strong&gt; Cold email tools live or die by how well they detect replies vs auto-responders vs out-of-office. Send a handful of test messages from various email providers and trigger each response type.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain audit reports.&lt;/strong&gt; Run their domain audit on your existing setup. The findings should match what tools like &lt;a href="https://mxtoolbox.com" rel="noopener noreferrer"&gt;MXToolbox&lt;/a&gt; report.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/woodpecker-vs-lemlist-instantly-cold-email-2026/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>saas</category>
      <category>webdev</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>OpenAI Codex vs Claude Code: Hands-On Python Benchmark for Devs</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:52:14 +0000</pubDate>
      <link>https://dev.to/pickuma/openai-codex-vs-claude-code-hands-on-python-benchmark-for-devs-5bb1</link>
      <guid>https://dev.to/pickuma/openai-codex-vs-claude-code-hands-on-python-benchmark-for-devs-5bb1</guid>
      <description>&lt;p&gt;OpenAI relaunched Codex this year as a full agentic CLI that lives in your terminal and talks to GPT-5 class models. Claude Code did the same thing for Anthropic, six months earlier. Both want to be the assistant you actually merge code from. We pointed both at the same Python project and tracked what each one shipped.&lt;/p&gt;

&lt;p&gt;The codebase under test: a mid-sized Flask + SQLAlchemy service with a real pytest suite and a handful of slow, gnarly modules begging to be refactored. We ran identical prompts through both tools, on the same hardware, against the same git SHA, and rewound the worktree between runs so neither tool saw the other's edits.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we structured the test
&lt;/h2&gt;

&lt;p&gt;We ran three kinds of tasks against each assistant, three trials per task per tool. Not enough trials for statistical certainty, but enough to catch behavior patterns that held across attempts.&lt;/p&gt;

&lt;p&gt;Task A: refactor a roughly 400-line module that mixed request handling, DB access, and template rendering into a service layer plus thin route handlers. Success criteria: tests still green, no regressions in a smoke flow we recorded with &lt;code&gt;httpx&lt;/code&gt;, and the resulting file structure passing &lt;code&gt;ruff&lt;/code&gt; and &lt;code&gt;mypy --strict&lt;/code&gt; cleanly.&lt;/p&gt;

&lt;p&gt;Task B: fix three known bugs. One off-by-one in a pagination helper. One race condition in a background worker that only surfaced under concurrent load. One Unicode normalization bug in a search endpoint. We handed each assistant only the failing pytest output and the file path, with no hints about the fix.&lt;/p&gt;

&lt;p&gt;Task C: an agentic workflow. "Add OpenTelemetry tracing across the request lifecycle, including DB spans, then write tests proving spans are emitted." Open-ended, multi-file, requires reading the codebase before doing anything.&lt;/p&gt;

&lt;p&gt;We tracked wall-clock time, total tokens consumed, whether the diff merged cleanly, and whether the test suite stayed green at the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where each tool diverged
&lt;/h2&gt;

&lt;p&gt;Claude Code finished Task A in roughly four minutes per trial. The service-layer extraction was clean: it picked up the project's existing repository pattern from a sibling module without prompting and matched the naming convention. Two of three trials passed the smoke test on first run. The third introduced a circular import that Claude caught on its own follow-up turn and fixed without us asking.&lt;/p&gt;

&lt;p&gt;Codex took longer on Task A, closer to seven minutes per trial, but produced a more aggressive refactor. It split logic into more files, added type hints throughout, and rewrote one helper function that wasn't part of the brief. The diff was larger, the tests still passed, but the review surface went up. One trial dropped a transactional boundary we wanted preserved; the test suite caught it, Codex fixed it on the next iteration.&lt;/p&gt;

&lt;p&gt;Task B was the more revealing split. Claude found the off-by-one in under two minutes with a one-line fix and an added test. Codex took longer on the same bug, wrote a longer explanation, and added two tests where one would have done — the second was redundant with the first.&lt;/p&gt;

&lt;p&gt;On the race condition, Claude wrote a regression test using &lt;code&gt;threading.Barrier&lt;/code&gt; to reliably reproduce the bug, then patched it with a context manager around the critical section. Codex initially proposed a &lt;code&gt;time.sleep&lt;/code&gt;-based test that we rejected. On retry it produced a cleaner fix using an asyncio lock. Both eventually solved it. Claude shipped a clean version the first time.&lt;/p&gt;

&lt;p&gt;The Unicode bug was effectively a tie. Both correctly identified that &lt;code&gt;unicodedata.normalize("NFKC", ...)&lt;/code&gt; was the right answer and produced near-identical diffs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Neither tool reliably caught subtle business-logic intent that wasn't expressed in tests or comments. On a fourth task we ran informally, both happily "fixed" a behavior that was actually load-bearing for an undocumented downstream consumer. Treat agentic assistants as fast pair programmers, not autonomous engineers, until your codebase has the test coverage to keep them honest.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Agentic workflows, pricing, and the speed-vs-cost tradeoff
&lt;/h2&gt;

&lt;p&gt;Task C was where the agentic loops stretched their legs. Claude averaged a little over ten minutes wall-clock per run and burned through hundreds of thousands of tokens. Codex was meaningfully slower and more token-hungry on the same task — call it about a third more on both axes. Both produced working tracing setups with DB spans and tests that checked emitted span names against a recording exporter.&lt;/p&gt;

&lt;p&gt;Codex's solution was more thorough. It wired up &lt;code&gt;OTLPSpanExporter&lt;/code&gt; with environment-variable config, added a &lt;code&gt;pyproject.toml&lt;/code&gt; extra so the dependency was opt-in, and dropped a fresh &lt;code&gt;docs/observability.md&lt;/code&gt; into the repo. Claude's solution was tighter: it hooked into the existing Flask middleware, added one fixture, and stopped. If you want a starting point you will extend yourself, Claude got there faster. If you want a near-complete drop-in, Codex did more of the work — at higher cost.&lt;/p&gt;

&lt;p&gt;Pricing during our test window: Claude Code running on Sonnet was the cheaper option per task by a clear margin. Codex on GPT-5 was higher both per call and in tokens consumed. Both Anthropic and OpenAI shifted prices during our window. Check current rates before extrapolating — order-of-magnitude conclusions are stable, but the gap may narrow or widen between when we tested and when you read this.&lt;/p&gt;

&lt;p&gt;The speed difference was consistent across trials: Claude was faster on most tasks we threw at it, sometimes by a wide margin on small fixes. Codex was more methodical, which costs you wall-clock time and tokens but occasionally catches things Claude skips.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to pick which
&lt;/h2&gt;

&lt;p&gt;Pick Claude Code when you're doing focused work — a single bug, a contained refactor, a feature that touches three files. The speed advantage compounds when you're iterating, and the cost difference adds up across a workday.&lt;/p&gt;

&lt;p&gt;Pick Codex when you want broader autonomy and don't mind a longer wall-clock loop. Big migrations, codebase-wide instrumentation, tasks where you would rather review a thorough proposal than steer one. Codex is also the better pick if you already pay for a ChatGPT Team or Enterprise seat that bundles Codex usage.&lt;/p&gt;

&lt;p&gt;Both tools changed our review workflow more than they changed our writing workflow. We spent less time typing and more time reading diffs. That is the benchmark that matters more than tokens or seconds: not who produces code faster, but who produces code you trust enough to merge without re-reading every line.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/openai-codex-vs-claude-code-python-benchmark/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Mythos AI Found a Real Curl Vulnerability — What It Signals for Security Audits</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:45:53 +0000</pubDate>
      <link>https://dev.to/pickuma/mythos-ai-found-a-real-curl-vulnerability-what-it-signals-for-security-audits-2p6k</link>
      <guid>https://dev.to/pickuma/mythos-ai-found-a-real-curl-vulnerability-what-it-signals-for-security-audits-2p6k</guid>
      <description>&lt;p&gt;Curl has been the workhorse of HTTP for nearly three decades. It ships in roughly every Linux distribution, every macOS install, most embedded devices, and the dependency graph of half the internet. The codebase has survived years of human review, static analyzers, fuzzers, and bounty hunters. So when Daniel Stenberg, curl's longtime maintainer, posted on May 11, 2026 that an AI tool called Mythos surfaced a real vulnerability in the project, it landed differently than the usual "AI found a bug" headline.&lt;/p&gt;

&lt;p&gt;This wasn't a synthetic benchmark on a toy program. It was production code that thousands of security researchers had already crawled over.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Mythos found and why it matters
&lt;/h2&gt;

&lt;p&gt;The detail that makes Stenberg's post worth reading is the &lt;em&gt;type&lt;/em&gt; of finding. Mythos didn't flag a textbook buffer overflow or a one-liner where someone forgot to check a return value. It identified a defect that required reasoning across the surrounding control flow — the kind of bug that historically needed a human to sit with the code, build a mental model, and notice the subtle interaction.&lt;/p&gt;

&lt;p&gt;For years, AI-assisted security tools have been stuck in two modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pattern matchers&lt;/strong&gt; that essentially rebrand grep. They catch low-hanging issues, generate noise, and miss anything that requires understanding &lt;em&gt;intent&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM wrappers&lt;/strong&gt; that summarize diffs in plain English but can't tell you whether the change is safe.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mythos is being positioned as something different: a system that reasons about code the way a senior reviewer does, traces data flow across function boundaries, and produces findings specific enough to triage. The curl result is the first public proof point that this category can produce a non-trivial finding in a heavily audited target.&lt;/p&gt;

&lt;p&gt;We're being careful with the framing here. One vulnerability, in one project, surfaced by one tool, does not prove "AI has solved security review." But the bar for a credible result in this space has been low for a long time, and Mythos cleared it on a target where the noise floor is very high.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The curl codebase already runs through OSS-Fuzz, Coverity, Clang's static analyzer, and dozens of human eyes per release cycle. Finding a real bug in this environment is meaningfully different from finding bugs in random GitHub repositories that have never been audited.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What this changes for your team
&lt;/h2&gt;

&lt;p&gt;If you ship code, the practical question is whether AI security review now belongs in your pipeline alongside static analysis and dependency scanning. The answer depends on what you're doing today.&lt;/p&gt;

&lt;p&gt;If your current security workflow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Dependabot or Renovate&lt;/strong&gt; for dependency CVEs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A SAST tool&lt;/strong&gt; (Semgrep, CodeQL, Snyk Code) running in CI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Occasional pentesting&lt;/strong&gt; before major releases&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then AI-assisted review is best treated as a fourth layer, not a replacement. Static analyzers catch a different class of bugs efficiently and cheaply. LLM-based reviewers catch a different class — the ones requiring narrative reasoning about what the code is supposed to do — but at higher latency and higher cost per scan.&lt;/p&gt;

&lt;p&gt;The migration pattern teams are converging on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run LLM-based review on &lt;strong&gt;changed code only&lt;/strong&gt; (diff-scoped), not the entire repository&lt;/li&gt;
&lt;li&gt;Trigger on pull requests that touch security-sensitive paths: auth, crypto, parsers, anywhere external input crosses a trust boundary&lt;/li&gt;
&lt;li&gt;Treat findings as &lt;strong&gt;hypotheses for a human to confirm&lt;/strong&gt;, not as gating signals&lt;/li&gt;
&lt;li&gt;Track false positive rate per tool over a quarter before adjusting trust&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cost discipline matters more than people admit. Running a frontier-model code review on every PR in a busy monorepo can run into thousands of dollars per month before you've shipped any real coverage. Scoping prevents the bill from outpacing the value.&lt;/p&gt;

&lt;h2&gt;
  
  
  The supply-chain angle
&lt;/h2&gt;

&lt;p&gt;The deeper story isn't curl's specific vulnerability — it's the asymmetry that Mythos's success implies. Attackers and defenders both now have AI tools that can reason about code. Whichever side scales the workflow first gets the structural advantage.&lt;/p&gt;

&lt;p&gt;Two scenarios are worth thinking through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario A: defenders win the race.&lt;/strong&gt; Major OSS projects integrate continuous AI review. Vulnerabilities get found earlier, by tools the maintainers control, before public disclosure. The bug count per project might go up in the short term, but mean time to discovery drops. Downstream users benefit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario B: attackers win the race.&lt;/strong&gt; State-level and organized criminal groups deploy similar tooling against the same OSS targets, quietly. They build inventories of zero-days in widely deployed dependencies. The first sign anything is wrong is a coordinated incident months later.&lt;/p&gt;

&lt;p&gt;The good news is that the cost curve favors defenders. Maintainers can run review on a known target with full source access. Attackers have to run it on the same code, then weaponize the finding, then deploy without detection. The work asymmetry is real.&lt;/p&gt;

&lt;p&gt;The bad news is that the &lt;em&gt;adoption&lt;/em&gt; curve favors attackers. They don't have to convince a security team to provision a budget line item. They just point a tool at curl and wait.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you maintain or depend on a critical open-source library — anything in your top 20 dependencies — assume someone with adversarial intent is already running AI review against it. The question is whether you are too.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How to evaluate AI security tools without getting sold
&lt;/h2&gt;

&lt;p&gt;The market will flood with "AI security audit" products over the next year. Most will be repackaged GPT calls with a security-themed system prompt. A few will be substantially better. Here's what we look for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reproducibility.&lt;/strong&gt; Can the tool find the same class of bug twice on adjacent code? Run it on a project you know well and check whether findings are stable across runs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specificity.&lt;/strong&gt; Generic findings like "possible injection vulnerability" are useless. A finding should point to a specific line, name the unsafe input, and describe the trust boundary crossed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;False positive discipline.&lt;/strong&gt; Ask vendors for their precision rate on a public benchmark, not their recall. Recall is easy. Precision is hard, and precision is what determines whether your team will actually triage findings or learn to ignore them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparency on cost.&lt;/strong&gt; A tool that won't tell you per-scan token cost is hiding something. Pricing models that bill per repository regardless of size usually subsidize small teams at the expense of larger ones, or vice versa — know which side of that math you're on.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The curl result is signal that this category can be real. It is not yet signal that every tool claiming AI security review is real. Mythos has one public proof point; most competitors have zero.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/mythos-ai-curl-vulnerability-security-auditing/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Cursor vs VS Code: We Ran Both for 30 Days</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:44:37 +0000</pubDate>
      <link>https://dev.to/pickuma/cursor-vs-vs-code-we-ran-both-for-30-days-5dk1</link>
      <guid>https://dev.to/pickuma/cursor-vs-vs-code-we-ran-both-for-30-days-5dk1</guid>
      <description>&lt;h2&gt;
  
  
  Why we did this
&lt;/h2&gt;

&lt;p&gt;We're publishing weekly reviews of AI dev tools. Cursor is the most-asked-about one, so&lt;br&gt;
we ran both editors in parallel for a real month.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;All screenshots and timings come from our daily logs. No vendor briefings, no&lt;br&gt;
  affiliate-driven cherry-picking.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Headline numbers
&lt;/h2&gt;
&lt;h2&gt;
  
  
  The cases where Cursor pulled away
&lt;/h2&gt;

&lt;p&gt;When refactoring across 4+ files, Cursor's chat-driven multi-file edit reduced our&lt;br&gt;
sequence of operations from ~12 steps to 3.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before: open file, find usage, edit, save, repeat...&lt;/span&gt;
&lt;span class="c1"&gt;// After:&lt;/span&gt;
&lt;span class="c1"&gt;// "Rename `getUser` to `getCurrentUser` across the repo, update callers"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Cursor lets you plug in your own Anthropic / OpenAI key — useful if you already&lt;br&gt;
  have a budget.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Where VS Code still wins
&lt;/h2&gt;

&lt;p&gt;Cold-start memory, plugin ecosystem (Copilot is no longer Cursor-only), and the&lt;br&gt;
sheer reach of "VS Code Server in a browser" for remote/dev container work.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/hello-cursor/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Claude as a User-Space IP Stack: What an ICMP Ping Benchmark Reveals About LLM Latency</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:38:16 +0000</pubDate>
      <link>https://dev.to/pickuma/claude-as-a-user-space-ip-stack-what-an-icmp-ping-benchmark-reveals-about-llm-latency-2kil</link>
      <guid>https://dev.to/pickuma/claude-as-a-user-space-ip-stack-what-an-icmp-ping-benchmark-reveals-about-llm-latency-2kil</guid>
      <description>&lt;p&gt;Adam Dunkels — the engineer behind uIP and lwIP, the embedded TCP/IP stacks that ship in millions of devices — recently asked a deliberately absurd question: what if the IP stack itself were a language model? His experiment wires Claude into user space, hands it raw packets, and asks it to respond to ICMP echo requests like any other host on the network.&lt;/p&gt;

&lt;p&gt;The setup is whimsical. The latency numbers are not. Once you stop laughing at the idea of pinging an LLM, the benchmark becomes one of the more honest stress tests we have for agentic Claude API workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Experiment: Routing ICMP Through a Language Model
&lt;/h2&gt;

&lt;p&gt;Dunkels' rig hands Claude the bytes of an inbound ICMP echo request and asks it to produce the bytes of the correct ICMP echo reply. There is no clever pre-processing. The model has to understand the IP header, swap source and destination addresses, recalculate the checksum, and emit a well-formed response packet.&lt;/p&gt;

&lt;p&gt;The reason this works at all is that the protocol is small, deterministic, and famously documented. The reason it is slow is that every hop through the stack now includes a Claude API roundtrip — a TLS handshake (or pooled connection), token generation, and a response back to user space.&lt;/p&gt;

&lt;p&gt;A kernel-resident IP stack answers a ping in tens to hundreds of microseconds. A round trip on a residential network is typically 10–40 milliseconds. Claude, as a user-space IP stack, lives several orders of magnitude further out. That gap is the entire point.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The experiment is not a serious networking proposal. It is a forcing function: if you can describe what an IP stack does, you can measure how far away from that an LLM is. The number you get back is a hard floor on any system that puts a Claude call in its critical path.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why Latency Matters: Where Agentic Loops Actually Break
&lt;/h2&gt;

&lt;p&gt;If you build with the Claude API, you already know the model is not instant. But the ping benchmark is useful because it strips the workload down to almost nothing — a few dozen bytes in, a few dozen bytes out — and the latency is still dominated by inference, not network or compute.&lt;/p&gt;

&lt;p&gt;That has practical consequences for how you design agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool-use loops compound.&lt;/strong&gt; An agent that takes ten round trips to plan, call a tool, observe, and replan is multiplying a per-call latency that already starts in the hundreds of milliseconds. The ping floor tells you what the cheapest possible step costs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming hides nothing on the first token.&lt;/strong&gt; Time-to-first-token still gates any interaction that needs a complete response before the next step. Ping responses are short enough that TTFT and full-response latency converge — exactly the regime most tool calls live in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-request variance is real.&lt;/strong&gt; Anyone who has run a Claude API workload at scale has seen p50 and p99 diverge sharply under load. A ping benchmark surfaces that variance honestly, because the workload is otherwise constant.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We ran our own back-of-the-envelope on what this means for agent design: if a thinking step in a multi-step agent costs roughly one Claude-ping worth of latency, then a ten-step plan is already in the multi-second range before you account for tool execution, retries, or rate limits. That is fine for an editor companion. It is painful for anything in front of a user clicking a button.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not put a Claude call inside a request path that needs sub-second p99 latency without a fallback. The ping experiment is the cleanest demonstration we have that LLM inference, even on minimal inputs, is not a substitute for deterministic code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Practical Lessons for Building With the Claude API
&lt;/h2&gt;

&lt;p&gt;The Dunkels experiment is fun. The lessons are boring, and that is the point. If you read the benchmark and walk away with three rules, you have extracted most of the value:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use the LLM at the right altitude.&lt;/strong&gt; Do not ask Claude to do what &lt;code&gt;memcpy&lt;/code&gt; and a checksum routine already do. Ask it to do what a deterministic function cannot: interpret intent, summarize, decide between options, or write code that runs later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budget latency before you build the agent.&lt;/strong&gt; Multiply your worst-case step latency by your expected step count. If the product is more than your user will tolerate, redesign before you write the prompts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache aggressively at the prompt boundary.&lt;/strong&gt; Prompt caching is the single biggest lever for cutting per-step latency on repeated workloads — and the ping benchmark is implicitly an uncached workload, which is why the floor looks the way it does.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The takeaway is not that Claude is slow. It is that Claude is a particular shape of fast — fast at language, slow at bytes — and the systems you build need to respect that shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  When LLM-in-the-Loop Networking Actually Makes Sense
&lt;/h2&gt;

&lt;p&gt;There is a serious version of this experiment buried inside the joke. LLMs in the network stack are absurd at the ICMP layer. They are interesting at the policy layer — deciding what to do with a flagged packet, summarizing a flow record, deciding whether a request looks like abuse. Anywhere the work is read this, decide that, the latency cost of a Claude call competes against the human or the rule engine you would otherwise reach for, not against a kernel routine.&lt;/p&gt;

&lt;p&gt;The ping benchmark sets the lower bound. Your job, as a developer building on the Claude API, is to keep the work above that bound — and to make sure the latency you pay buys you something a regex could not.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/claude-user-space-ip-stack-ping-latency-benchmark/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Best Free Tiers for Developers in 2026: SaaS, PaaS &amp; IaaS Tools</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:37:00 +0000</pubDate>
      <link>https://dev.to/pickuma/best-free-tiers-for-developers-in-2026-saas-paas-iaas-tools-54og</link>
      <guid>https://dev.to/pickuma/best-free-tiers-for-developers-in-2026-saas-paas-iaas-tools-54og</guid>
      <description>&lt;p&gt;The free-tier landscape shifted hard between 2023 and 2026. Heroku killed its free dynos in November 2022, PlanetScale dropped its Hobby plan in early 2024, and Fly.io quietly replaced its always-free allowance with a $5 monthly credit. A lot of "free tier" advice from older blog posts now points at services that will charge you the moment your card is on file.&lt;/p&gt;

&lt;p&gt;We rebuilt our reference list from scratch in 2026, cross-checking the free-for-dev community catalog against each provider's current pricing page. What survived is genuinely useful for side projects, MVPs, and learning — as long as you know where the cliffs are.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hosting, Edge, and Compute
&lt;/h2&gt;

&lt;p&gt;For a typical Node, Next, or Astro side project, three platforms still cover almost everything for $0:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vercel Hobby&lt;/strong&gt; — 100 GB bandwidth, unlimited static requests, 1 million Edge Function invocations, 100 deployments per day. Personal use only; commercial work requires Pro at $20/month per seat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Netlify Free&lt;/strong&gt; — 100 GB bandwidth, 300 build minutes, 125k serverless function invocations. No commercial-use restriction, which makes it the safer default for a portfolio site that runs ads or affiliate links.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Pages + Workers&lt;/strong&gt; — unlimited bandwidth, 500 builds per month, 100k Worker requests per day on the free Workers plan. The bandwidth ceiling is the headline: a Hacker News spike won't bankrupt you the way it might on a metered platform.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For long-running processes (Discord bots, queue workers, websockets), the picture is grimmer. Fly.io now bills from the first minute beyond the $5 credit, Railway ended its hobby-free plan in 2023, and Render's free web service spins down after 15 minutes of inactivity with a 30+ second cold start. If you need a 24/7 process, an Oracle Cloud Always Free Arm VM (4 vCPU, 24 GB RAM split across instances) remains the most generous offer on the market — assuming you can stomach Oracle's account verification flow.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Free" almost never means "unmetered." Most platforms suspend or rate-limit your service when you cross a soft cap; a few will silently start billing once you've added a card. Set a hard spend limit on every account, and treat free tiers as throughput allowances, not guarantees.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Databases, Auth, and Backend Services
&lt;/h2&gt;

&lt;p&gt;Managed Postgres is where the free-tier market has gotten genuinely good:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Supabase Free&lt;/strong&gt; — 500 MB database, 1 GB file storage, 50k monthly active auth users, two projects. Inactive projects pause after seven days, which trips up demos but is one click to wake up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neon Free&lt;/strong&gt; — 0.5 GB storage per branch, autoscaling compute that scales to zero, full branching included. Scale-to-zero means a ~500 ms cold start on the first query after idle, but you pay nothing while the database sleeps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Turso Free&lt;/strong&gt; — 9 GB total storage across up to 500 databases, 1 billion row reads per month. Useful when you want SQLite-per-tenant rather than one shared Postgres.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For caching and queues, &lt;strong&gt;Upstash&lt;/strong&gt; gives you 10,000 Redis commands per day and a generous QStash allowance on the free plan, billed per request rather than per hour. &lt;strong&gt;MongoDB Atlas&lt;/strong&gt; still offers a 512 MB shared M0 cluster — enough for a CRUD prototype but tight for anything with serious indexes.&lt;/p&gt;

&lt;p&gt;Auth has gotten cheaper too. Supabase Auth (bundled with the database tier), Clerk's free plan (10k MAU), and Auth0's free plan (25k MAU after their 2024 expansion) all cover the realistic user count of a pre-launch product. Pick on developer experience, not on price.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI/CD, Monitoring, and AI APIs
&lt;/h2&gt;

&lt;p&gt;GitHub Actions remains the default and the most generous: 2,000 minutes per month on private repos, unlimited on public repos, with Linux runners free. For larger matrices, &lt;strong&gt;Buildjet&lt;/strong&gt; and &lt;strong&gt;BuildKite&lt;/strong&gt; both offer free hobbyist tiers that beat Actions on raw CPU per minute.&lt;/p&gt;

&lt;p&gt;For observability, the situation is mixed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sentry Free&lt;/strong&gt; — 5,000 errors, 10k performance events, 50 session replays per month, single user. Hits the free cap fast on any real traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grafana Cloud Free&lt;/strong&gt; — 10k Prometheus metric series, 50 GB logs, 50 GB traces, 14-day retention. The most generous free observability stack on the market right now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better Stack Free&lt;/strong&gt; — 10 monitors, 3-month log retention, 30-second checks. Better starting point than UptimeRobot if you also want logs alongside uptime checks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI APIs are the toughest category. OpenAI removed new-account free credits in 2024. Anthropic offers limited trial credits on signup that vary by region. Google's Gemini API has a free tier with strict per-minute rate limits. Groq's free tier is the standout for low-latency Llama and Mixtral inference — generous request quotas, no card required at signup.&lt;/p&gt;

&lt;h2&gt;
  
  
  When the Free Plan Stops Making Sense
&lt;/h2&gt;

&lt;p&gt;The cost of staying free is usually invisible until something breaks. We watched a project sit on Supabase Free for nine months, then lose two days debugging a paused-database connection error after a long weekend. The fix was a $25/month upgrade that should have happened at month three.&lt;/p&gt;

&lt;p&gt;A few signals that you've outgrown the free tier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're architecting around quotas (batching writes, caching aggressively) instead of around your product.&lt;/li&gt;
&lt;li&gt;You can't share the project with a teammate because the free plan is single-user.&lt;/li&gt;
&lt;li&gt;You're spending more than an hour per month working around platform limits.&lt;/li&gt;
&lt;li&gt;Your project is generating any revenue at all — most "personal use" free tiers prohibit commercial workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The honest math: a typical indie project on Vercel Pro ($20), Supabase Pro ($25), and Sentry Team ($26) runs $71/month. That's less than one hour of contractor time. If your project clears that bar in value or revenue, paying is the cheaper option, not the more expensive one.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/best-free-tiers-developers-2026/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>devops</category>
      <category>cloud</category>
      <category>astro</category>
    </item>
    <item>
      <title>Why Local AI Should Be the Default for Developers in 2026</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:30:38 +0000</pubDate>
      <link>https://dev.to/pickuma/why-local-ai-should-be-the-default-for-developers-in-2026-3888</link>
      <guid>https://dev.to/pickuma/why-local-ai-should-be-the-default-for-developers-in-2026-3888</guid>
      <description>&lt;p&gt;Two years ago, running a useful model on your laptop meant 7B parameters of slow, hallucination-prone output. The math has changed. Llama 3.1 8B, Qwen 2.5, and Mistral Small now handle the same tier of tasks GPT-3.5 did in early 2023 — and they run on a MacBook Air with 16GB of RAM at usable speeds. The 70B-class models fit comfortably on a single high-end consumer GPU or an M-series Mac with 64GB+ unified memory, and they land somewhere between GPT-4-class and mid-tier Claude on most public benchmarks.&lt;/p&gt;

&lt;p&gt;This matters for one practical reason: "good enough" is no longer cloud-only.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Gap Closed Faster Than Anyone Expected
&lt;/h2&gt;

&lt;p&gt;If you spend $20-200/month on API calls for autocomplete, doc summarization, commit message generation, or local search, that budget now buys you something the local stack can approximate. A one-time hardware investment — or your existing laptop — replaces a recurring metered bill.&lt;/p&gt;

&lt;p&gt;The model-quality curve helps too. Open-weights releases used to lag the frontier by 18-24 months. That gap is now closer to 6-9 months for general reasoning tasks, and effectively zero for narrow jobs like code completion, classification, and summarization, where dedicated fine-tunes outperform general-purpose hosted models on the specific task.&lt;/p&gt;

&lt;p&gt;The tooling caught up at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; turned model installation into a single command and exposes an OpenAI-compatible API on localhost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LM Studio&lt;/strong&gt; added a GUI with one-click model switching and the same compatibility surface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;llama.cpp&lt;/strong&gt; — what both of the above wrap — keeps shipping quantization improvements that let larger models fit in less RAM with minimal quality loss.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A workflow that used to require a Python virtualenv, CUDA gymnastics, and a Hugging Face account is now a brew install.&lt;/p&gt;

&lt;h2&gt;
  
  
  Privacy, Latency, Cost — the Three Concrete Wins
&lt;/h2&gt;

&lt;p&gt;Three advantages, in the order they tend to bite:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency.&lt;/strong&gt; A local 7B model on Apple Silicon produces tokens faster than a network round-trip to a hosted provider's nearest data center for most users. For interactive tooling — autocomplete, inline chat, agentic loops with many small calls — that 100-300ms cloud overhead compounds across every interaction. Local cuts time-to-first-token to single-digit milliseconds once the model is warm.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost predictability.&lt;/strong&gt; Cloud pricing changes. Anthropic, OpenAI, and Google have all raised, lowered, and restructured pricing tiers multiple times. Local cost is what your electricity bill says: pennies per hour of active inference, zero per request after the up-front compute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Privacy by default.&lt;/strong&gt; Every line of code you send to a hosted model leaves your machine. For personal projects, fine. For client work, regulated industries, or anything under NDA, the calculus is different. Even with "we don't train on your data" assurances, the data still crosses the wire, sits in logs, and traverses a third party's infrastructure. Local inference moves that boundary back to your hardware. Your prompts never leave the loopback interface.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Local doesn't mean offline-only. Most real workflows benefit from a hybrid: local for routine work (autocomplete, refactoring, log parsing, commit messages), hosted for the edge cases (long-context analysis, specialized vision models, agentic chains that need top-tier reasoning). Route by capability, not by reflex.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Local Still Can't Do
&lt;/h2&gt;

&lt;p&gt;This is the honest part. Local AI isn't a drop-in replacement for the frontier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Long context.&lt;/strong&gt; Local models advertise 8K-128K context windows depending on architecture, but the practical sweet spot is 16-32K before quality degrades and memory pressure spikes. Claude and Gemini handle 200K-2M+ tokens with quality that holds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic reliability.&lt;/strong&gt; Multi-step tool use, especially with strict JSON output and many chained calls, still favors GPT-4-class hosted models. Open-weights are catching up — Qwen 2.5 and Llama 3.3 are notable — but production agents that chain 20+ tool calls still benefit from the frontier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specialized capabilities.&lt;/strong&gt; Top-tier vision, audio, and codebase-wide reasoning lean on training infrastructure no laptop replicates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The right framing isn't "replace cloud." It's "use the cheapest tool that works." Most calls are routine. Most routine calls work locally.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Setup You Can Run This Weekend
&lt;/h2&gt;

&lt;p&gt;If you want to test the case yourself rather than take it on faith:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Install Ollama.&lt;/strong&gt; Single binary. &lt;code&gt;brew install ollama&lt;/code&gt; on macOS, one-line installer on Linux. Run &lt;code&gt;ollama pull llama3.1:8b&lt;/code&gt; to get a baseline general-purpose model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add LM Studio if you want a UI.&lt;/strong&gt; Same GGUF format models, built-in OpenAI-compatible server. Any tool that talks to &lt;code&gt;api.openai.com/v1&lt;/code&gt; can be repointed at &lt;code&gt;localhost:1234&lt;/code&gt; with one env var change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drop to llama.cpp for control.&lt;/strong&gt; Ollama and LM Studio are wrappers. Native llama.cpp exposes finer quantization choices and bleeding-edge model support.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick three real tasks.&lt;/strong&gt; Commit messages, log summarization, and code completion are reasonable starters. Run them through your local stack for a week. Track which outputs you actually ship versus which ones you have to rewrite.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The honest answer at the end of that week is usually that 60-80% of routine AI work stays local, while the frontier 20% goes back to the API. That's a sensible architecture, not a compromise — and it's the pattern most of the interesting AI dev tools shipping in 2026 are converging on.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/local-ai-default-developers-2026/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Why Developers Are Quietly Turning Off Copilot and Cursor</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:29:23 +0000</pubDate>
      <link>https://dev.to/pickuma/why-developers-are-quietly-turning-off-copilot-and-cursor-ig8</link>
      <guid>https://dev.to/pickuma/why-developers-are-quietly-turning-off-copilot-and-cursor-ig8</guid>
      <description>&lt;h2&gt;
  
  
  The Quiet Reversal
&lt;/h2&gt;

&lt;p&gt;A pattern keeps surfacing in Hacker News threads, Lobsters comments, and dev Mastodon: experienced developers turning Copilot off. Not for a podcast take, not as a Luddite stunt — for the work that actually requires understanding. The k10s.dev post that lit up HN this month was the latest, but it joins a year of similar reversals from people who use these tools daily and ship code for a living.&lt;/p&gt;

&lt;p&gt;The framing matters. This isn't "AI bad." It's "I noticed I got worse at my job, and I want to figure out why."&lt;/p&gt;

&lt;h2&gt;
  
  
  What the measurements actually show
&lt;/h2&gt;

&lt;p&gt;The Model Evaluation and Threat Research group ran a randomized trial in July 2025 with 16 experienced open-source maintainers working on their own repositories. They sampled 246 real tasks and let the developers use Cursor Pro with frontier models on half of them.&lt;/p&gt;

&lt;p&gt;Before starting, the developers predicted AI would make them 24% faster. After finishing, they self-reported being 20% faster. The actual measured time told a different story: they were 19% slower.&lt;/p&gt;

&lt;p&gt;Sixteen developers is a small sample. The result still matters because it captures something honest power-users notice eventually: the speedup you feel is not the speedup you ship. METR's debrief points at context-rebuilding overhead. The model generates plausible code; you spend the savings re-reading it, fixing subtle drift, and unwinding design decisions you wouldn't have made yourself.&lt;/p&gt;

&lt;p&gt;A separate Microsoft and Carnegie Mellon paper from early 2025 surveyed 319 knowledge workers across 936 GenAI use cases. The finding most relevant to coding: higher confidence in the AI correlated with less critical thinking effort. Higher confidence in yourself correlated with more. The authors call this cognitive offloading — outsourcing the judgment work along with the typing work. For code, the judgment work is most of the job.&lt;/p&gt;

&lt;p&gt;Stack Overflow's 2024 Developer Survey is the third data point. Roughly 76% of respondents were using or planning to use AI tools in their development process. Only about 43% said they trusted the accuracy. The gap between adoption and trust is the part nobody puts on a vendor slide.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Watch the gap between perceived speed and measured speed. The METR developers thought they were 20% faster while shipping 19% slower. Self-reports of AI productivity are not evidence. Before you commit to a tool full-time, time a few similar tasks with it off and on.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  When hand-coding actually wins
&lt;/h2&gt;

&lt;p&gt;Across the HN threads and lobste.rs discussions on this topic, four scenarios show up repeatedly where developers report hand-coding produces better outcomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learning a new language or framework.&lt;/strong&gt; The retrieval-practice effect is one of the most replicated findings in cognitive science: you retain what you struggle to recall, not what you read. Tab-completing your way through a Rust tutorial gives you the same illusion of competence as highlighting a textbook. Two weeks later, you can't write the borrow-checker pattern from memory because you never built the recall pathway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Debugging code you wrote with assistance.&lt;/strong&gt; This is the sharp edge. Bugs in AI-generated code aren't usually syntax errors — they're "I don't actually know what this function does" errors. If you didn't build the mental model when the code went in, you have to build it now, under pressure, while the bug is live. The time you saved typing gets refunded with interest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Designing data models and module boundaries.&lt;/strong&gt; AI tools default to plausible-looking patterns from training data. They will happily generate a schema that works and is wrong for your system. The choices here compound for years. Hand-thinking, on paper if necessary, is how senior engineers earn their seniority.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing code in a domain you intend to own.&lt;/strong&gt; If you're the person who will maintain this module for the next two years, the up-front investment in understanding every line pays back constantly. If it's a one-off script you'll delete next week, the calculus inverts.&lt;/p&gt;

&lt;h2&gt;
  
  
  A pragmatic middle path
&lt;/h2&gt;

&lt;p&gt;The developers who've thought about this longest don't quit AI tools — they segment them. The pattern that keeps appearing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI on for boilerplate, scaffolding, well-defined transforms, and shell one-liners. You're not learning anything from writing the 400th &lt;code&gt;useState&lt;/code&gt; hook by hand.&lt;/li&gt;
&lt;li&gt;AI off for design work, learning, debugging, and any code where "I understand exactly what this does" is a hard requirement.&lt;/li&gt;
&lt;li&gt;AI inverted for tests: write the test by hand, let the model write the implementation, then read that implementation as if you were reviewing a junior's PR.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hidden variable is metacognition — knowing what you don't know. The Microsoft study found that workers with weaker self-confidence offloaded the most, which is exactly backwards from what you'd want. Junior developers, who most need to build understanding, are the most likely to delegate it away. The fix isn't moralizing; it's structural. Keep a running log of what you would have had to look up if the AI weren't there. That log is a map of the skills atrophying.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for your week
&lt;/h2&gt;

&lt;p&gt;You don't need to delete Cursor to fix this. You need to notice when you've stopped thinking. Three concrete moves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Time one feature this week with the AI off. Not a toy project — real work. Compare against your last similar feature.&lt;/li&gt;
&lt;li&gt;Pick one area where you want depth (a language, a subsystem, a paradigm) and declare it AI-off. Treat it the way you'd treat a piano practice room.&lt;/li&gt;
&lt;li&gt;Audit the last week of accepted suggestions. For each non-trivial one, can you explain why that code is correct? The ones where you can't are the debt.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The developers writing these "I'm going back" posts aren't nostalgic. They're calibrating. The question isn't whether AI tools are useful — they obviously are. It's whether the version of you that uses them every minute is the version of you that you want to be in five years.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/developers-ditching-ai-copilots-hand-coding/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>OpenAI Codex Chrome Extension: Browser-Native AI Coding Agent Tested</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:23:01 +0000</pubDate>
      <link>https://dev.to/pickuma/openai-codex-chrome-extension-browser-native-ai-coding-agent-tested-16m7</link>
      <guid>https://dev.to/pickuma/openai-codex-chrome-extension-browser-native-ai-coding-agent-tested-16m7</guid>
      <description>&lt;p&gt;OpenAI shipped a Codex Chrome extension that puts its coding agent inside the browser tab you already have open. Instead of copying a stack trace into ChatGPT or alt-tabbing to a desktop IDE, you trigger Codex on the page itself — the bug report, the staging site, the API docs you're reading.&lt;/p&gt;

&lt;p&gt;The pitch is simple. Developers spend a meaningful share of their day in Chrome (Linear tickets, GitHub PRs, Stripe dashboards, Vercel logs, internal admin panels), and most of those surfaces produce code, configuration, or text that has to get pasted somewhere else. Moving the agent into the page collapses the loop.&lt;/p&gt;

&lt;p&gt;That sounds obvious until you actually try it. The interesting questions are about scope, latency, and trust — not novelty.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Codex in Chrome actually changes
&lt;/h2&gt;

&lt;p&gt;The extension exposes Codex against the active tab's DOM and your selection. You can ask it to explain an error visible on the page, draft a reply to a GitHub PR comment, scaffold a fetch call from an API doc, or convert a JSON blob you're staring at into a typed TypeScript interface. The agent reads what you're looking at, so the prompt overhead drops to "fix this" or "rewrite this in Python."&lt;/p&gt;

&lt;p&gt;A few things follow from that design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context is whatever Chrome can see.&lt;/strong&gt; That includes rendered text, your selection, and form values. It does not include your local filesystem, your IDE state, or your terminal history. The agent is good at the part of your work that lives behind a URL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The output lives next to the input.&lt;/strong&gt; You don't paste into a chat window and paste back. The extension injects results inline or sends them straight to your clipboard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It runs alongside your existing IDE agent.&lt;/strong&gt; Cursor, Copilot, Claude Code, and Codex CLI keep doing what they do. The Chrome extension is the "everything outside the editor" surface.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The extension is a separate surface from Codex CLI and Codex inside ChatGPT. They share an underlying model family, but the integration is browser-only — there is no shared session with your local Codex CLI runs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Three workflow patterns worth keeping
&lt;/h2&gt;

&lt;p&gt;After a few days of using a browser-resident coding agent — Codex and otherwise — the patterns that survive are unglamorous. They are also the ones that save real time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 1: PR review with the diff in front of you.&lt;/strong&gt; GitHub's review UI is fine for reading but slow for thinking. Highlight a hunk, ask Codex what the change does, what edge cases it misses, and whether the new function name is consistent with the file's existing style. You stay on the page, the agent answers against the actual diff, and you keep your comment thread open in the same tab.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 2: Translate a dashboard into code.&lt;/strong&gt; Stripe, PostHog, Datadog, and most internal tools surface data through filters and tables. You can describe the chart you're looking at and ask the agent to write the API call, the SQL, or the query DSL that would reproduce it programmatically. The browser surface is the right place for this because the dashboard's filter state is part of the prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 3: Repro-from-bug-report.&lt;/strong&gt; A Linear or Sentry ticket with a stack trace, repro steps, and a screenshot is dense context. Asking the agent for a failing test that matches the report — to drop into your local repo — turns the ticket itself into the spec. You still write the fix in your IDE, but the boilerplate of "what does this bug actually look like in code" is done.&lt;/p&gt;

&lt;p&gt;The common thread: the agent is most useful when the browser tab contains information your IDE doesn't have. The moment you need cross-file context or repo-wide refactors, switch tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it falls short
&lt;/h2&gt;

&lt;p&gt;Browser-resident agents have real limits, and the extension is honest about most of them.&lt;/p&gt;

&lt;p&gt;You don't get filesystem access, so anything multi-file is out. You don't get terminal access, so you can't run the code the agent generates. You also don't get a privacy story that's different from any other extension that reads page contents — if your tab contains customer data, treat the prompt as data leaving the page.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Extensions that read DOM contents inherit whatever you're looking at, including session tokens visible in network panels and PII in admin pages. Disable the extension on sensitive tabs or use a separate Chrome profile for work that touches regulated data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Latency is also worth measuring before you commit to it. Round-tripping a selection through the API and waiting for a streamed reply is fast enough for paragraph-sized tasks and noticeably slower than your IDE's inline completion for line-sized ones. The mental model that fits is "ask, then read" — not "type, then accept."&lt;/p&gt;

&lt;p&gt;The bigger structural question is whether browser agents replace IDE agents, complement them, or just add another tab to manage. For now the answer looks like the second one. Codex in Chrome handles the surfaces your IDE can't see; your IDE agent handles the code your browser can't reach. The category that loses, if any, is the standalone web-based ChatGPT or Claude tab people open as a scratchpad.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to watch next
&lt;/h2&gt;

&lt;p&gt;Two things determine whether browser-native AI coding agents become a default tool or a curiosity:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Permissions model.&lt;/strong&gt; Chrome extensions that can read every page are a heavy ask. A scoped per-domain permission model, or first-class integration with sites that opt in (GitHub, Linear, Vercel), would change the security calculus.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-surface memory.&lt;/strong&gt; The same person asks Codex CLI for help on a function, then opens a PR in Chrome an hour later. If the browser extension can see what the CLI was working on, the agent stops feeling like a stranger every time you switch surfaces. OpenAI has hinted at this direction; nothing shipped yet ties the surfaces together.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Treat the extension the way you'd treat any new agent surface: try it on the workflows you already do badly, ignore the marketing about ten-times productivity, and keep your existing toolchain in place until you have a few weeks of data on what actually changed.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/openai-codex-chrome-extension-browser-ai-agent/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Qwen 3.6 Plus API: Pricing, Benchmarks &amp; Developer Access Guide (2026)</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:21:42 +0000</pubDate>
      <link>https://dev.to/pickuma/qwen-36-plus-api-pricing-benchmarks-developer-access-guide-2026-4a80</link>
      <guid>https://dev.to/pickuma/qwen-36-plus-api-pricing-benchmarks-developer-access-guide-2026-4a80</guid>
      <description>&lt;p&gt;The Qwen series from Alibaba's Tongyi Lab has moved from research curiosity to a model family you actually consider for production workloads. Qwen 3.6 Plus continues that trajectory: a 1M-token context window, native bilingual training that holds up on English code tasks, and a per-token price that undercuts GPT-4-class and Claude-class APIs by a wide margin. If you've ignored the Chinese frontier labs because of access friction or fear of locking into a niche provider, the trade-offs have shifted enough that a fresh look is warranted.&lt;/p&gt;

&lt;p&gt;We ran Qwen 3.6 Plus through our internal eval harness alongside GPT and Claude. The headline isn't that it wins every benchmark — it doesn't. The headline is where it lands on the price/performance curve once you account for context length, and how that changes what's feasible in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you actually get from Qwen 3.6 Plus
&lt;/h2&gt;

&lt;p&gt;Qwen 3.6 Plus sits as the mid-tier production model in the current Qwen generation, between the small &lt;code&gt;qwen-turbo&lt;/code&gt; variants and the flagship &lt;code&gt;qwen-max&lt;/code&gt;. Two specs matter for most builders:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1M-token context window.&lt;/strong&gt; Same order of magnitude as Gemini 1.5 Pro's long-context mode, far larger than the 200K Claude offers or the 128K most GPT-4-family endpoints serve. For repository-wide code reasoning, multi-document summarization, or feeding entire log archives into a single prompt, 1M tokens stops being a marketing line and starts being the reason you pick the model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native tool-calling and JSON-mode.&lt;/strong&gt; The Qwen team standardized on OpenAI-compatible request/response shapes, so most clients drop in with a base URL swap. Function calling, structured outputs, and streaming all work the way you expect.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What you don't get, at least not yet, is the breadth of fine-tuning options and ecosystem tooling that OpenAI offers. Qwen ships open-weights checkpoints you can run yourself — different SKUs from the Plus API — but the hosted Plus tier is the closed API path.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The "1M tokens" claim is the advertised window. In practice, effective recall and reasoning degrade in the upper portion of any long-context model — Qwen 3.6 Plus is no exception. Treat 1M as a budget for retrieval-augmented prompts, not as a license to dump unstructured corpora.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Pricing: where the value actually shows up
&lt;/h2&gt;

&lt;p&gt;Alibaba publishes Qwen 3.6 Plus pricing per million tokens, split between input and output. Exact figures shift, so check the DashScope console before you commit, but the structural story has been stable across the Qwen 3 generation: input tokens are priced roughly an order of magnitude below GPT-4-class endpoints, and output tokens follow a similar pattern. Cached input is cheaper still.&lt;/p&gt;

&lt;p&gt;The implication for your bill is straightforward. If your workload is dominated by large prompts and small completions — RAG over a knowledge base, repository code review, document QA — the savings compound. On a code-review pipeline we benchmarked internally, routing the bulk-context calls through Qwen 3.6 Plus and reserving Claude or GPT for the smaller, latency-sensitive interactions cut monthly inference cost by a factor of four to six.&lt;/p&gt;

&lt;p&gt;That does &lt;em&gt;not&lt;/em&gt; mean Qwen is the right call for every job. For agentic flows with many short turns, per-call latency and the quality gap on complex reasoning still favor the frontier labs. The teams we've seen succeed with Qwen treat it as the second model in a two-model architecture: heavyweight context work on Qwen, decision-making and tool orchestration on Claude or GPT.&lt;/p&gt;

&lt;h2&gt;
  
  
  Coding benchmarks and what we actually measured
&lt;/h2&gt;

&lt;p&gt;Public benchmarks — HumanEval, MBPP, LiveCodeBench, SWE-bench Verified — place Qwen 3.6 Plus competitively with the previous generation of Claude and GPT flagships, though it trails the current top tier on the hardest categories. More interesting are the tasks the public benchmarks don't capture well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cross-file refactors over 100K+ tokens of code.&lt;/strong&gt; Qwen's long-context recall held up better than GPT-4-class models when the relevant context lived in the back half of a 200K-token prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-turn debugging with intermediate test output.&lt;/strong&gt; Quality is closer to mid-tier Claude than to flagship Claude. You'll see the difference on subtle race conditions and concurrency bugs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;English documentation generation from non-English code comments.&lt;/strong&gt; Bilingual training pays off here — fewer hallucinated translations than the Western models we compared.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your codebase is small and your prompts fit comfortably in 32K, you probably won't notice Qwen's context advantage and the model choice comes down to other axes. If you routinely run prompts above 100K tokens, this is where Qwen earns its slot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting access (and when to skip it)
&lt;/h2&gt;

&lt;p&gt;Three practical paths exist for production teams:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;DashScope (Alibaba Cloud International).&lt;/strong&gt; Sign up at the international console, generate an API key, billed in USD on an international payment method. This is the cleanest path for non-China-based teams.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenRouter or another aggregator.&lt;/strong&gt; Slightly higher per-token cost in exchange for a single account and a unified SDK across providers. Worth it if you're A/B testing models against your current stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosting an open-weights cousin.&lt;/strong&gt; The Plus tier is closed, but Qwen ships open-weights models in the same family. The quality gap is real but narrower than between, say, GPT-4 and the open Llama line.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Data residency matters. Qwen 3.6 Plus traffic through DashScope International is routed through Alibaba Cloud's overseas regions, not mainland China, but the underlying compliance posture is different from US-based providers. If you have customer data subject to specific regulatory regimes, get sign-off from your security team before sending prompts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Skip Qwen 3.6 Plus if your prompts are short and your bottleneck is reasoning quality rather than cost — use the frontier model. Skip it if you're building a regulated product where data flow through Chinese-headquartered cloud providers is a non-starter for your buyer. Skip it if you need the broadest possible tooling ecosystem (Anthropic's Claude Code, OpenAI's Realtime API, etc.) — Qwen's API surface is narrower.&lt;/p&gt;

&lt;p&gt;For everything else — long-context document work, cost-sensitive RAG, multi-language code understanding, batch generation jobs — it deserves a real bake-off against your current stack.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/qwen-3-6-plus-api-developer-guide-2026/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Phantom Pulse RAT Hits Obsidian Plugins: How to Audit Dev Tool Supply Chains</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:15:19 +0000</pubDate>
      <link>https://dev.to/pickuma/phantom-pulse-rat-hits-obsidian-plugins-how-to-audit-dev-tool-supply-chains-488c</link>
      <guid>https://dev.to/pickuma/phantom-pulse-rat-hits-obsidian-plugins-how-to-audit-dev-tool-supply-chains-488c</guid>
      <description>&lt;p&gt;A malicious Obsidian community plugin was weaponized to deliver Phantom Pulse, a remote access trojan that targets the exact file types developers and knowledge workers keep in their vaults: SSH keys, &lt;code&gt;.env&lt;/code&gt; files, browser cookies, and project notes containing API tokens. The plugin shipped through the standard community plugins flow, which means anyone who installed it during the window between publication and takedown received the payload through the same trusted-by-default channel they use for syntax highlighting and Kanban boards.&lt;/p&gt;

&lt;p&gt;This is not a novel exploit. It is the same supply chain pattern that has hit npm, PyPI, the VS Code marketplace, and Chrome extensions. What makes the Obsidian case worth examining is the threat model gap: most teams treat their note-taking tool as a productivity app, not a code execution surface. Obsidian plugins run as Node.js modules with full filesystem access. So do VS Code extensions. So do Cursor extensions. So do most things you install with one click in a developer-adjacent tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the attack chain worked
&lt;/h2&gt;

&lt;p&gt;The reported pattern matches a well-understood supply chain template:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Plausible plugin metadata.&lt;/strong&gt; The malicious plugin ships under a name that looks legitimate — typosquatting a popular plugin, or filling a small gap in the ecosystem (a new exporter, a niche theme).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Initial install runs trusted code.&lt;/strong&gt; The plugin's stated functionality works as advertised. The hostile payload is gated behind a delay, a config check, or a remote fetch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Second-stage delivery.&lt;/strong&gt; The plugin reaches out to an attacker-controlled host on first run or first vault open, downloads a binary or script, and writes it to a persistence location (LaunchAgent on macOS, scheduled task on Windows, systemd user unit on Linux).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phantom Pulse activates.&lt;/strong&gt; Once installed, the RAT establishes command-and-control, exfiltrates credentials and SSH material, and waits for operator instructions. RATs in this class typically include keylogging, screenshot capture, clipboard monitoring, and file exfiltration.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Treat any tool that loads third-party plugins as a code execution platform, not just a content viewer. Obsidian, VS Code, Cursor, JetBrains, Raycast, and Alfred all execute plugin code with your user account's full permissions. A single compromised plugin can read every file your account can touch.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you installed an unverified Obsidian community plugin during the affected window, the practical move is to assume compromise until you verify otherwise. Rotate any credential that lived in your vault or in environment variables your shell loaded during that window. Check &lt;code&gt;~/Library/LaunchAgents&lt;/code&gt; (macOS), Task Scheduler (Windows), and &lt;code&gt;~/.config/systemd/user/&lt;/code&gt; (Linux) for unfamiliar entries. Audit your shell history for unexpected outbound traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why developer tools keep getting hit
&lt;/h2&gt;

&lt;p&gt;The same dynamics that make plugin ecosystems useful make them attackable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Low-friction install.&lt;/strong&gt; One click, no review, no signing requirement on most platforms. Obsidian plugins, VS Code extensions, Cursor extensions, and Raycast extensions all install and execute without a meaningful security gate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implicit trust transfer.&lt;/strong&gt; When a plugin is listed in an official community directory, users transfer trust from the platform to every plugin in the directory. The platform did not actually vouch for the code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wide privilege.&lt;/strong&gt; Plugins inherit the permissions of the host process — full filesystem read/write, network access, and child-process spawning. There is no permission prompt for "this plugin wants to read your .ssh directory."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update-time payload swap.&lt;/strong&gt; A plugin that was clean on day one can ship malicious code in a later update. Ownership transfers, account compromise of the maintainer, or a deliberate switch by the original author all produce the same result.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cursor and VS Code share most of this attack surface. Both run extensions in the renderer or extension host with broad permissions, and the Cursor extension marketplace inherits VS Code's open-by-default model. If you use AI coding tools that load community extensions, the same audit applies.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical audit you can run this week
&lt;/h2&gt;

&lt;p&gt;You do not need a security team to reduce your exposure here. Three concrete steps:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Inventory what you have installed.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For Obsidian: open Settings → Community plugins and list every enabled plugin. Note the plugin's GitHub repository and the maintainer's account.&lt;/p&gt;

&lt;p&gt;For VS Code or Cursor: run &lt;code&gt;code --list-extensions --show-versions&lt;/code&gt; (or &lt;code&gt;cursor --list-extensions --show-versions&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;For Raycast: open the extensions tab.&lt;/p&gt;

&lt;p&gt;Write the list down. You will not remember to audit something you did not know you installed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Apply a minimum-viable trust filter.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For each plugin, check three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the source repository public, and does it have meaningful commit history from more than one contributor?&lt;/li&gt;
&lt;li&gt;Is the maintainer's account active and identifiable?&lt;/li&gt;
&lt;li&gt;Did the most recent update change anything beyond what the changelog claims? Diff the release if you can.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A plugin that fails any of these is not necessarily malicious, but it is a candidate for removal if you do not actively need it. The goal is not to verify every line — it is to remove the long tail of plugins you no longer use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Separate your secrets from your plugin host.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stop keeping API keys, recovery phrases, and credentials in your Obsidian vault as plain text, even in private vaults. Use a dedicated secret manager (1Password, Bitwarden, or the system keychain). For shell environment variables, load them from a secret store at session start instead of writing them into &lt;code&gt;.env&lt;/code&gt; files that any plugin can read.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this changes about how you pick tools
&lt;/h2&gt;

&lt;p&gt;The Phantom Pulse incident does not mean abandon Obsidian or any other plugin-driven tool. It means treating plugin install as a privileged action — closer to "run this binary I downloaded from the internet" than to "enable a feature." The platforms with the strongest stories here are the ones that sandbox plugin execution or require code signing and review.&lt;/p&gt;

&lt;p&gt;Until Obsidian, VS Code, and Cursor add meaningful sandboxing for community plugins, the audit is on you. Keep the list short. Prefer plugins from maintainers you can identify. Pin versions when you can, and read the diff when an update lands.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/obsidian-plugin-phantom-pulse-rat-supply-chain/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>devops</category>
      <category>cloud</category>
      <category>astro</category>
    </item>
    <item>
      <title>OpenCode vs Claude Code: Why 157K Developers Are Hedging Against Anthropic</title>
      <dc:creator>pickuma</dc:creator>
      <pubDate>Tue, 12 May 2026 09:14:03 +0000</pubDate>
      <link>https://dev.to/pickuma/opencode-vs-claude-code-why-157k-developers-are-hedging-against-anthropic-2acb</link>
      <guid>https://dev.to/pickuma/opencode-vs-claude-code-why-157k-developers-are-hedging-against-anthropic-2acb</guid>
      <description>&lt;h2&gt;
  
  
  The 157K Signal You Shouldn't Ignore
&lt;/h2&gt;

&lt;p&gt;The figure being cited — 157,000 developers adopting OpenCode — is the kind of number that's easy to dismiss as social media optics. It isn't. Claude Code, Anthropic's official CLI, sits on a fundamentally different distribution model: closed binary, single-vendor model routing, billing through your Anthropic account. The fact that a self-funded open-source project crossed six figures of adoption while a polished managed product was already available tells you something specific about what working developers are optimizing for: they want a harness they can keep running if Anthropic raises prices, changes terms, or deprecates a model they depend on.&lt;/p&gt;

&lt;p&gt;The terminal-based AI coding category went from one credible entrant to a fractured landscape in roughly 18 months. OpenCode, Aider, Continue, and a handful of others all run the same basic loop: take a prompt, plan, edit files, run shell commands, repeat. What differs is who controls the loop.&lt;/p&gt;

&lt;p&gt;We ran both tools against the same repo across a handful of tasks — a schema migration, a test-first feature addition, and log-grepping a production incident — to see where the daylight actually sits.&lt;/p&gt;

&lt;h2&gt;
  
  
  What OpenCode Does Differently
&lt;/h2&gt;

&lt;p&gt;OpenCode is a TUI-first coding agent built by the SST team. The architecture is deliberately model-agnostic: you can route the same session through Claude Sonnet 4.6, GPT-5, Gemini 2.5 Pro, DeepSeek, or a local Ollama model, and switch mid-task. Claude Code runs only on Anthropic models. That single design choice cascades into everything else.&lt;/p&gt;

&lt;p&gt;The practical implication surfaces the first time Anthropic has a capacity incident or a model deprecation. OpenCode users who configured a fallback to OpenRouter or a local model keep shipping. Claude Code users wait. For a freelancer billing by the hour or a team with a same-day deadline, that matters more than any benchmark score.&lt;/p&gt;

&lt;p&gt;OpenCode also exposes its system prompt, tool definitions, and execution loop. You can fork it, swap the planner, or pin a model version that won't auto-update. Claude Code's prompt and tool layer are not user-editable — Anthropic ships changes and you accept them on next launch. Some developers prefer that. The binary just works without configuration drift. Others, particularly those running CI-integrated agents or tuning behavior for a specific codebase, find the opacity disqualifying.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;OpenCode is MIT-licensed and self-hostable. Claude Code requires an Anthropic account and routes requests through Anthropic's infrastructure, including the file contents you load into context. If your employer has restrictions on which providers can see source code, this distinction is a procurement issue before it's a tooling preference.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Where Claude Code wins on day one is integration polish. The agent's tool-use loop is tuned end-to-end for Anthropic's models — the prompting, context window management, and tool definitions were designed together. In our test runs, Claude Code with Sonnet 4.6 completed a five-step refactor in noticeably fewer turns than OpenCode running the same model. Fewer tool calls, fewer recoveries from malformed JSON arguments. The gap shrinks when you route OpenCode through Claude as well, but the harness-model fit isn't quite as tight.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lock-In Math
&lt;/h2&gt;

&lt;p&gt;The real argument for hedging isn't ideological. It's pricing volatility.&lt;/p&gt;

&lt;p&gt;Provider-side pricing for managed AI coding tools has shifted multiple times during the past year — context window pricing, cache discount terms, monthly subscription tiers. If you're spending a few hundred dollars a month on a single vendor, a 20–30% price shift compounds into real money across a team. The cost isn't just the dollar amount; it's that the lever sits with one party.&lt;/p&gt;

&lt;p&gt;OpenCode plus OpenRouter — an aggregator that fronts a couple of hundred models across providers — gives you a substitution menu. You can route cheap-tier tasks like commit-message generation through a smaller model at a fraction of the cost, reserve Claude or GPT-class models for the hard reasoning, and switch the default in a config file without rebuilding your workflow.&lt;/p&gt;

&lt;p&gt;The honest tradeoff: managing model routing is real overhead. If you bill your own time at engineer rates, OpenCode's flexibility starts paying back somewhere around the third pricing-change-induced migration. Below that threshold, Claude Code's buy-and-forget posture is cheaper in your time alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Each Tool Actually Wins
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Pick Claude Code when&lt;/strong&gt; you're a solo developer or small team, you're already paying Anthropic for the model, you want the lowest-friction setup, and you don't anticipate needing to swap providers. The polish premium is real. Claude Code's auto-context management — deciding which files to load, when to grep, when to ask — is more reliable than any open harness we've tested.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick OpenCode when&lt;/strong&gt; your shop has a multi-model policy for regulatory, cost, or capability reasons, you need to run agents in environments where Anthropic can't see your code, you're building tooling on top of the agent itself, or you want insurance against price and availability shocks. The configurability also makes OpenCode easier to embed in CI pipelines and self-hosted setups.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run both when&lt;/strong&gt; you're a power user who can afford the configuration time. Use Claude Code for interactive coding sessions where polish matters; use OpenCode with a cheaper model for batch tasks, log analysis, or anything you'd otherwise script in shell.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Split Says About Where This Is Headed
&lt;/h2&gt;

&lt;p&gt;The Claude Code / OpenCode split isn't about which tool is better. It's the first visible bifurcation between two procurement models developers have had to choose between in past technology cycles: vertically integrated proprietary stack versus an open layer you assemble yourself. The same decision shows up at every level — operating systems, databases, frontend frameworks, observability stacks.&lt;/p&gt;

&lt;p&gt;What's different here is the speed. JetBrains versus VS Code took a decade to resolve. The coding agent layer is fragmenting in under two years. That tells you the underlying technology is moving fast enough that lock-in costs feel acute right now. Developers are not willing to bet their workflow on any single vendor's roadmap.&lt;/p&gt;

&lt;p&gt;Expect both to coexist for the foreseeable horizon. Anthropic will keep iterating Claude Code. OpenCode and its peers will keep absorbing share among developers who got burned by a price hike, an API outage, or a deprecated model they depended on. The interesting question is whether OpenCode itself ends up centralizing — most of the development is funded through SST's hosted services — or whether the community fork model holds long enough to keep the substitution menu real.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://pickuma.com/posts/opencode-vs-claude-code-157k-developers-hedge-anthropic/?utm_source=devto&amp;amp;utm_medium=crosspost&amp;amp;utm_campaign=blog" rel="noopener noreferrer"&gt;pickuma.com&lt;/a&gt;. Subscribe to &lt;a href="https://pickuma.com/rss.xml" rel="noopener noreferrer"&gt;the RSS&lt;/a&gt; or follow &lt;a href="https://bsky.app/profile/pickuma.bsky.social" rel="noopener noreferrer"&gt;@pickuma.bsky.social&lt;/a&gt; for new reviews.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
