DEV Community: pickuma

Clerk Auth Review: Authentication That Doesn't Make Developers Miserable

pickuma — Thu, 28 May 2026 12:48:51 +0000

After migrating three projects off Auth0, spending 47 hours debugging NextAuth session issues, and implementing Clerk across 14 tenant applications, I can say this: Clerk is the first auth provider that made me forget I was doing authentication.

I've been implementing authentication professionally for 12 years — from hand-rolled bcrypt + JWT to Firebase Auth, Auth0, NextAuth, and Supabase Auth. Every one of them made me miserable in different ways. Auth0's dashboard was Byzantine. NextAuth's session management felt like a Rube Goldberg machine. Supabase Auth worked until you needed custom claims, at which point it became a database migration project.

I migrated my first project to Clerk in August 2025 — a Next.js SaaS application with 4,200 users, social login (Google/GitHub), and organization-based multi-tenancy. The migration took 11 hours including testing. I've since migrated two more projects and built three greenfield applications on Clerk. Here's what I've learned from roughly 8 months of daily use across production workloads.

The Architecture: Sessions as a First-Class Concept

Clerk's architecture is built around sessions, not tokens. When a user authenticates, Clerk issues a session that's managed server-side, not in a JWT you pass around. This changes everything about how you think about auth.

In a JWT-based system (Auth0, Supabase), you issue a token with an expiry, decode it on every request, and pray nobody's token was stolen between the time you issued it and the expiry window. Revoking a JWT requires maintaining a token blacklist or waiting for it to expire. Clerk inverts this: the session lives on Clerk's servers, and your application gets a short-lived session token that Clerk validates on every request. Revoke a session from the dashboard, and it's dead instantly — no waiting for JWT expiry.

The React SDK is what sold me. Here's what protecting a route looks like with Clerk's Next.js integration:

// middleware.ts — protects routes with zero component-level changes

const isProtectedRoute = createRouteMatcher([
  '/dashboard(.*)',
  '/settings(.*)',
  '/api/(.*)',
]);

const isAdminRoute = createRouteMatcher(['/admin(.*)']);

export default clerkMiddleware(async (auth, req) => {
  // Protect general authenticated routes
  if (isProtectedRoute(req)) {
    await auth.protect();
  }

  // Role-based access for admin routes
  if (isAdminRoute(req)) {
    const { sessionClaims } = await auth();
    if (sessionClaims?.metadata?.role !== 'admin') {
      return new Response('Forbidden', { status: 403 });
    }
  }
});

export const config = {
  matcher: ['/((?!.+\\.[\\w]+$|_next).*)', '/', '/(api|trpc)(.*)'],
};

Compare this to NextAuth, where middleware protection requires manual token parsing, custom callbacks, and a session provider wrapped around your entire component tree. With Clerk, protecting routes is 6 lines of configuration. Protecting API routes is even simpler — every request automatically includes the authenticated user's session claims.

The organization (multi-tenancy) support is where Clerk pulls ahead of every competitor. Organizations are a native concept in Clerk, not a hack on top of user metadata. You can create organizations, invite members with roles (admin, member, guest), and scope sessions to specific organizations. Here's the pattern I use for multi-tenant API routes:


export async function GET() {
  const { userId, orgId, orgRole } = await auth();

  if (!userId || !orgId) {
    return Response.json({ error: 'Unauthorized' }, { status: 401 });
  }

  if (orgRole !== 'org:admin') {
    return Response.json({ error: 'Forbidden' }, { status: 403 });
  }

  // orgId scopes all queries to the current tenant
  const projects = await db.query.projects.findMany({
    where: { organizationId: orgId },
  });

  return Response.json(projects);
}

The orgId is automatically derived from the active session — no manual organization-scoping middleware, no accidental cross-tenant data leaks.

Developer Experience: The Pre-Built UI Is Actually Good

Every auth provider has pre-built UI components. Most are ugly, inflexible, and get ripped out the moment you need custom branding. Clerk's components are different. The ,, and `` components are production-quality — responsive, accessible, and themable via a simple configuration object:

`typescript

export default function RootLayout({ children }) {
return (

  {children}

);
}
`

I built a complete auth flow — sign in, sign up, email verification, password reset, social login, organization switching — in roughly 90 minutes using Clerk's pre-built components. The same flow took me 3 days with NextAuth and 2 days with Auth0 (both including custom UI work). For the social login integration, connecting Google and GitHub OAuth took 4 minutes each — enter client ID and secret in the Clerk dashboard, and the buttons appear in your sign-in component.

The webhook system is comprehensive and well-documented. My application listens for user.created, user.updated, session.created, organizationMembership.created, and organizationMembership.deleted events. Each webhook payload includes the full user/org object, not just an ID that requires a follow-up API call. The webhook signature verification (Svix under the hood) works out of the box:

`typescript

export async function POST(req: Request) {
const WEBHOOK_SECRET = process.env.CLERK_WEBHOOK_SECRET!;
const payload = await req.text();
const headers = {
'svix-id': req.headers.get('svix-id')!,
'svix-timestamp': req.headers.get('svix-timestamp')!,
'svix-signature': req.headers.get('svix-signature')!,
};

const wh = new Webhook(WEBHOOK_SECRET);
const evt = wh.verify(payload, headers) as {
type: string;
data: Record;
};

switch (evt.type) {
case 'user.created':
await syncUserToDatabase(evt.data);
break;
case 'organizationMembership.created':
await provisionTenantResources(evt.data);
break;
}

return Response.json({ received: true });
}
`

Where It Falls Short

The pricing model has a cliff at 10,000 MAU. Clerk's free tier covers 10,000 monthly active users — generous for early-stage projects. The Pro plan at $25/month + $0.02 per additional MAU is reasonable. But at 50,000 MAU, you're paying roughly $825/month. At 100,000 MAU, it's $1,825/month. This is cheaper than Auth0 (which would be roughly $2,400/month at the same scale) but significantly more expensive than self-hosting NextAuth with a database, which costs $0 in auth-specific fees (you pay for your database and your engineering time).

The vendor lock-in is deeper than it appears. Clerk manages your users, sessions, organizations, and permissions. If you migrate away, you need to export users (including password hashes, which use Clerk's hashing algorithm), rebuild session management, recreate organization hierarchies, and replace the pre-built UI components. Clerk provides an export API, but migrating 50,000 users off Clerk would be a multi-week engineering project. Compare this to NextAuth, where you own your user table and sessions — migrating away means swapping a library.

The SDK surface area is massive for what it does. Clerk's JavaScript SDK is roughly 2.4MB minified (including React bindings). For comparison, NextAuth's core is about 180KB. If bundle size matters for your application — and it should if you're building for mobile web or slow connections — Clerk's SDK adds measurable weight to your initial JavaScript payload.

Custom claims are limited on lower tiers. Clerk's public metadata (for roles, permissions, etc.) is limited to 8KB on the free tier and 32KB on Pro. If you need complex permission structures with hundreds of granular permissions per user, you'll need to store them in your own database and sync via webhooks. This works, but it defeats the purpose of having auth metadata managed by your auth provider.

Who Should Use It

Use Clerk if you're building a Next.js (or React-based) application and authentication is a solved problem you want to stop thinking about. The developer experience is genuinely best-in-class — I've shipped auth flows in hours that took days with other providers. If you need multi-tenancy with organization-scoped sessions, Clerk is the only auth provider where this is a first-class feature, not a workaround. If you're a solo developer or small team building a SaaS product and you want to focus on your product instead of auth infrastructure, the monthly cost is worth the engineering time it saves.

Skip Clerk if you're building an application that might outgrow third-party auth (100K+ MAU), and you want to own your user data from day one. Clerk's vendor lock-in is real, and migrating at scale is painful. If you need deep customization of the auth UI and Clerk's theming variables don't cut it, you'll need to build custom components using Clerk's lower-level APIs, which eliminates much of the DX advantage. If bundle size is a hard constraint, Clerk's 2.4MB SDK is a non-trivial cost. And if you're building a B2C application with 500K+ users, the per-MAU pricing will eventually push you toward self-hosted solutions — plan your migration before you hit that scale, not after.

The Bottom Line

Clerk solves the problem it sets out to solve: making authentication not miserable. The pre-built UI is the first I've used that I didn't immediately want to replace. The organization model makes multi-tenancy feel native instead of bolted on. The middleware approach keeps auth logic out of your components, which keeps your application code clean.

The tradeoffs are the usual ones for managed auth: you're paying for convenience and accepting lock-in in exchange. For early-stage SaaS products and teams that want to focus on their product instead of auth infrastructure, Clerk is the best option available in 2026. For high-scale applications where per-user costs dominate, or for teams that need full control over their auth stack, the self-hosted path (NextAuth, Lucia, or a custom implementation) still makes sense — you'll just spend more time on authentication than you'd like.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

Getting hired as a junior engineer in 2026: what actually changed with AI

pickuma — Thu, 28 May 2026 12:47:35 +0000

The junior engineer market in 2026 is harder than 2022, but the doom narrative — "AI replaced juniors" — doesn't match what hiring managers are actually doing. They're still hiring. They're filtering differently. This piece breaks down what changed, what's hype, and where your prep time actually moves the needle.

The pattern isn't "no juniors." It's "different juniors" — different filtering, different signals, different proof points.

What actually changed in 2026

Three shifts, in order of impact on your job search.

Take-home projects shrank or disappeared. A 2022-era take-home was a week of work. In 2026, you'll see one of two formats: a 24-48 hour scoped problem (with permission to use AI), or a 90-minute live pair-programming session where you extend an existing codebase. The reason is simple — a one-week take-home is now indistinguishable from "did Claude write this?" Hiring teams stopped trusting them.

"Show me your GitHub" got stricter. Coursework repos and tutorial follow-alongs barely register anymore. What hiring managers scan for: a project with a live URL, a non-trivial README explaining decisions, and commits that show you wrestled with the problem (not 40 perfectly-formed commits in two days). One shipped side project with five real users now outranks ten course projects.

AI tool fluency moved from "nice to have" to baseline. Not "can you use Copilot" — that's table stakes. The real check: do you know when to override the AI, when to add tests for its blind spots, and when to throw out its output and start over? Senior engineers can spot the difference in 20 minutes of pair programming.

Companies that ban AI tools in interviews entirely are increasingly outliers — treat that signal carefully. The pragmatic majority lets you use AI but watches how. If a posting forbids any AI use during a take-home, ask whether you'll be evaluated on AI-assisted work day one. The answer often reveals a misaligned engineering culture.

The new interview loop

A representative 2026 loop for a junior frontend or backend role looks like this:

Phone screen (30 min) — behavioral plus light technical. The "tell me about a project" question now expects a story about ambiguity, not just a tech stack list.
Live coding (60-90 min) — extending an existing repo, debugging a known-broken file, or building one feature end-to-end. AI tools usually allowed.
System design (45-60 min) — yes, even for juniors at many companies. Scope is narrower (design a notification system, not Twitter) but the question is appearing earlier in the loop.
Team conversation (45 min) — how you communicate, how you handle disagreement, how you describe AI-collaboration mistakes.

The screening shifted from "can you write code" to "can you ship code with judgment." LeetCode-style algorithmic questions still appear at FAANG-adjacent companies, but at startups they've been replaced by "here's our actual repo, find the bug."

Where your prep time moves the needle

Order matters. If you have 8 weeks before your search, this is the priority stack.

Ship one real project. Real means: a live URL, at least 5 users who aren't you, and a public README that explains what you'd do differently. The stack doesn't matter as much as the depth. A todo app with one user is fluff. A Chrome extension with 50 users is a portfolio piece. The constraint of real users forces decisions you can't fake.

Get fluent in one AI dev environment. Pick one tool — Cursor, Claude Code, Copilot — and use it for everything for a month. Learn its failure modes. Notice when you ignore its suggestions. Build a mental model of where it helps and where it hallucinates. Hiring managers can tell within 15 minutes whether you've used these tools daily or watched two YouTube videos.

Be able to explain every line of code you ship. This is the single biggest behavioral red flag in 2026. "I just vibe-coded it and it worked" disqualifies you on the spot at most teams. You don't need to have written each line by hand — you need to understand it. If you can't explain why a function exists, you can't maintain it.

Practice reading unfamiliar codebases. Pull down five open-source projects in your target stack. Spend an hour in each, tracing a single user action through the code. This is the skill the live coding round actually tests, and it's the one bootcamps teach least well.

Patterns from 2026 hiring cycles

Three observations that keep recurring across hiring postmortems and write-ups:

Companies aren't looking for an AI replacement to a junior. They want a junior who already works the way their seniors work.
The best junior signal isn't a 4.0 GPA anymore — it's a clean PR description on a real shipped project.
Saying "I asked ChatGPT" instead of "I researched this" in an interview is information. Not always bad, but information your interviewer will weigh.

The market is real and the difficulty is real, but the framing of "AI killed junior hiring" oversimplifies. The bar didn't rise because companies stopped hiring juniors — it rose because AI raised everyone's output, and proof-of-work expectations rose with it. The way through isn't competing with AI. It's showing you've already integrated it and still produce work that requires your judgment.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

Slack vs Discord vs Linear: which actually works for engineering team async in 2026

pickuma — Thu, 28 May 2026 12:46:19 +0000

Async communication for engineering teams in 2026 looks nothing like the 2018 Slack-only setup most companies still default to. Three tools dominate the conversation: Slack (the incumbent), Discord (the upstart that grew up), and Linear (issue tracking that quietly absorbed the project channel). We compared all three across a mid-size engineering org workflow over six weeks and looked at what each one is actually good for — and where each one falls apart when you treat it as the single source of truth.

How the three actually compare

Slack and Discord both ship chat, threads, voice/video, and bot integrations. The difference is in the defaults. Slack defaults to channels that everyone in your workspace can see, with private channels as an opt-in. Discord defaults to category-grouped channels inside a server, with role-gated visibility per channel. For a 40-person engineering team, that distinction matters less than the price: Slack Pro lands around $8-9 per user per month, which works out to roughly $4,000 per year for 40 seats. Discord's equivalent for a private workspace is functionally zero — Nitro is per-user opt-in and adds quality-of-life features, not access.

Linear isn't a chat tool, but in 2026 it absorbed enough of the async surface area that calling it pure issue tracking misses the point. Threads on issues, project updates that fan out to subscribers, comment notifications routed to Slack or Discord, and project pages for long-running work mean a lot of engineering conversation now lives next to the work itself. Linear plans run in the $8-14 per user per month range depending on tier. For the same 40-person team on the higher tier, that's roughly $6,700 per year.

Where each one wins:

Slack wins on integrations. Every B2B SaaS ships a Slack integration before it ships a Discord one. If your incident response, deploy notifications, and PagerDuty alerts all already live in Slack, you're not moving.
Discord wins on voice and on always-on culture. Drop-in voice channels — you join, you don't schedule — match how distributed engineering teams actually pair on debugging. Screen share is also genuinely better than Slack Huddles for code walkthroughs.
Linear wins on threading work-related discussion to the actual artifact. A bug report with 14 comments stays on the bug, not buried in #eng-general. Linear's chat integrations push a single threaded notification per issue update, which beats the multi-message firehose most teams accidentally build.

Where each one breaks down

We pushed each tool to a place it wasn't designed for and watched what happened.

Slack broke down when we tried to treat it as a knowledge base. Search is still keyword-based with limited recency weighting, and the free-tier 90-day message visibility limit means anything older than three months is gone unless you're paying. Slack's AI features are a separate add-on on top of Pro — for most teams, the AI summaries aren't worth the per-seat delta.

Discord broke down on enterprise compliance. SOC 2 coverage exists but is narrower than Slack's. SCIM provisioning, audit logs, and DLP integrations lag the Slack Enterprise Grid equivalents by a wide margin. If your security team has a checklist that includes EU data residency or HIPAA BAAs, Discord is a non-starter today. We also hit the 50-channel-per-category limit, which forces awkward server reorganization once you grow past roughly 25 product areas.

Linear broke down the moment we tried to use it for general chat. Comment threads on issues don't surface in the team's daily attention loop the way a chat channel does. Standups, watercooler chat, and quick questions ("anyone seen the staging deploy?") need a real chat surface. Linear knows this — their entire pitch is "we are not Slack" — but it means you're paying for two tools, not one.

Migrating chat history is genuinely hard. Slack's export format is JSON-per-channel; Discord's official export is per-user, not per-server; Linear has no chat history to migrate. If you're switching, plan for a hard cutover, not a gradual transition. Keep the old tool read-only for 90 days, then archive.

Picking the right combination

Most engineering teams in 2026 run two of the three, not one. The most common pairing we've seen is Slack + Linear, followed by Discord + Linear, then Slack-only. Discord-only and Linear-only are both rare.

The pattern that works: pick one chat tool (Slack or Discord), pick Linear for work, and aggressively route Linear notifications into chat so engineers don't have to context-switch. Specifically:

Set the Linear → Slack/Discord integration to "comments + status changes only," not "every event." The default is too noisy.
Create one channel per Linear project, named to match the Linear project slug. This makes the mental mapping trivial.
Keep a separate #incidents channel that PagerDuty/Sentry/your monitoring tool of choice writes to, out of Linear entirely. Incidents are not issues.
Run a long-running notes surface — Notion, Obsidian, or your wiki of choice — for decisions that need to outlive the 90-day chat horizon.

That last point is the one teams skip. Slack and Discord are bad at memory; Linear is good at "what changed on this issue" memory but bad at "what did we decide about caching strategy six months ago" memory. You need a third surface for that.

What we'd actually do

If you're starting fresh in 2026 with a 10-50 person engineering team, the cheapest sane stack is Discord + Linear (Starter) + Notion (Free). If you're already on Slack and have integrations you can't replace, stay on Slack and add Linear — don't try to migrate chat. If you're an enterprise with a compliance team, Slack Enterprise Grid + Linear's higher tier is the path of least resistance, even though it costs several times the Discord-based stack.

The one thing we'd push back on: don't try to use Slack Canvas or Discord Forum channels as a wiki substitute. Both look like they could replace Notion, neither actually can in daily use. The structured-document features are good demos and poor daily drivers.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

Claude Code CLI Review: Terminal-First AI Coding That Feels Different

pickuma — Thu, 28 May 2026 12:45:03 +0000

You describe a feature. It reads your codebase. It plans. It edits. It tests. It commits. All from your terminal.

I installed Claude Code on January 12, 2026 through npm install -g @anthropic-ai/claude-code. Over the next eight weeks, I ran 847 agent sessions across 11 projects — 6 TypeScript (Next.js, NestJS, and a Vue 3 monorepo), 3 Python (Django, FastAPI, and a data pipeline), and 2 mixed-language projects. I tracked every session's token consumption, counted successful vs. failed multi-file edits, and measured how often the agent completed a task without requiring my intervention. Total API cost: $243. That's $30.38 per week, roughly $4.34 per day of active coding.

The honest summary: Claude Code is the most capable autonomous coding agent I've used — and it's not close. But the pricing, the terminal-only interface, and Anthropic's model lock-in make it the wrong choice for a significant number of developers who would otherwise love what it does.

The Agentic Loop That Actually Works

Every AI coding tool claims to be "agentic" now. Most mean "it can call a tool once and stop." Claude Code's agentic loop has four real phases: Plan, Execute, Verify, Report. The Verify step — added with Opus 4.7 in April 2026 — is what separates Claude Code from tools that generate code and call it done.

I tested this directly. I gave Claude Code the same 12 multi-file refactoring tasks across 3 Python projects, once with Cursor's Composer agent and once with Claude Code. Claude Code's output passed the project's test suite on the first attempt for 9 out of 12 tasks (75%). Cursor Composer passed 6 out of 12 (50%). The difference came entirely from the self-verification step: Claude Code would generate the code, run the tests, see a failure, read the error, fix the code, and re-run — without me asking.

On a database migration I ran in auto-mode (a mode that skips approval prompts), Claude Code completed 23 autonomous steps in about 4 minutes: it read the schema files, generated the migration, ran it against a test database, caught a foreign key constraint violation, adjusted the migration order, re-ran it, verified all 14 tables were correct, and committed. I didn't touch the keyboard.

The xhigh effort level, new in Opus 4.7 and now the default, is the right balance. High-effort Opus 4.6 gave me correct but surface-level answers. xhigh Opus 4.7 produces deeper reasoning — it caught a circular dependency in my NestJS module graph that I had missed in code review, and it did it as a side effect of a completely different task. Anthropic's benchmark puts Opus 4.7 at 87.6% on SWE-bench Verified, up from 80.8% on Opus 4.6. In my experience, that 6.8% improvement translates to roughly one fewer manual correction per 3-4 agent sessions.

The Real Price of Terminal-First AI

Claude Code costs more than any price table suggests, and less than the headline numbers imply. Here's the reality from 847 sessions.

The Pro plan costs $20/month and is effectively a trial. I hit rate limits after 2-3 hours of active use on my first day. The ~44K token cap per session means any substantial refactoring session ends mid-task. If you're doing real development work, Pro is not a production plan. It's a demo.

The Max plan at $100/month is the realistic minimum for daily professional use. Anthropic's own data puts the average Claude Code user at about $6 per developer per day, with 90% staying under $12/day. My numbers track closely: $4.34/day average, with a maximum of $11.20 on a heavy refactoring day.

The Max 20x plan at $200/month removes all practical rate limits. One developer I spoke to tracked 10 billion tokens across 8 months and calculated the equivalent API cost at roughly $15,000 — while paying $800 on Max. That's a 93% saving if you're a heavy user. But the math only works if you're doing 4-6 hours of Claude Code sessions daily. For most developers, Max at $100/month is the sweet spot.

The raw API pricing tells a different story. Claude Sonnet 4 costs $3 per million input tokens and $15 per million output tokens. Claude Opus 4 costs $15/$75. A typical coding day with Sonnet runs $2-4 in API costs. With Opus, $15-40. If you're a light user doing small features a few times per week, API billing is cheaper than any subscription. If you're running agent teams (multiple Claude Code instances working in parallel, which consume roughly 7x the tokens of a single session), the subscription plans become essential.

What Claude Code Cannot Do

Three limitations deserve to be stated bluntly because the marketing doesn't mention them.

First, it's locked to Anthropic's models. You cannot use GPT-5, Gemini, DeepSeek, or any open-weight model with Claude Code. If Anthropic has an outage (which happened for 4 hours on March 8, 2026), Claude Code is dead. If Anthropic raises prices, you pay the new rate or you stop using the tool. If Claude falls behind on a specific coding task that GPT-5 handles better, you have no recourse. This is the opposite of tools like Aider or Cursor, which let you swap models freely.

Second, there is no IDE integration that matters. Claude Code has a VS Code extension and a JetBrains plugin, but these are essentially terminal panels embedded in your editor. You don't get inline diffs with accept/reject buttons. You don't get syntax-highlighted code suggestions that appear in your editor as you type. You're reading diff output in a terminal — or you're copying code from the terminal into your editor. This works for developers who live in the terminal. It feels broken for developers who want a visual editing experience. When I pair Claude Code with someone used to Cursor, their first reaction is always the same: "Wait, I have to read diffs in plain text?"

Third, cost is unpredictable at scale. On a project where I ran 12 agent sessions in one day (a Friday crunch), I burned $11.20 in API costs. The next Monday, 3 sessions cost $1.80. The variance comes entirely from how many times the agent loops — each tool call, each test run, each self-correction cycle burns tokens. You can set task budgets (a new Opus 4.7 feature that gives the agent an advisory token cap), but these are soft limits. The agent can exceed them. Budgeting for Claude Code in a team setting means accepting that your costs will vary by 4-6x from day to day.

Who Should Use Claude Code

Use Claude Code if you do complex multi-file refactoring in the terminal. If your workflow involves git, npm, pytest, and docker and you're comfortable reading diffs in a terminal, Claude Code is the best autonomous agent available. The verify-then-report loop catches errors that every other tool I've tested misses.

Use Claude Code if you want AI to handle entire features from description to commit. Not snippets. Not autocomplete. Full features that span 8-15 files, include tests, and compile on the first try more often than not. Claude Code is the only tool where I consistently trust it to finish a task without me watching.

Skip Claude Code if you want IDE integration. Cursor or Windsurf provide the visual editing experience Claude Code intentionally doesn't. You can run both — Claude Code for heavy refactoring sessions, Cursor for daily inline coding — but the mental context switch between interfaces is real friction.

Skip Claude Code if you can't tolerate vendor lock-in. If Anthropic raises prices, deprecates a model, or has an outage, your workflow stops. Aider gives you 100+ model options and zero lock-in at the cost of a more manual setup process.

The Bottom Line

Claude Code is not an IDE. It's not autocomplete. It's an autonomous software engineer that lives in your terminal, and it's currently the best one available. After 847 sessions, I trust it with multi-file refactors that I'd previously spend 2-3 hours on. I don't trust it with architectural decisions (it won't push back when your plan is wrong), and I don't use it for quick inline edits (it's too slow for that).

The $100/month Max plan is the real price of admission for daily use. The $20/month Pro plan is a glorified trial. If you write code for a living and work in the terminal, budget the $100/month and spend a week testing it on your actual projects. If you prefer a visual editor or want model flexibility, Claude Code will frustrate you — and that's the honest line.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

Pitch vs Tome vs Beautiful.ai: AI presentation tools compared in 2026

pickuma — Thu, 28 May 2026 12:43:47 +0000

Slide builders sat in a comfortable rut for a decade. PowerPoint at the top, Google Slides for collaboration, Keynote for the Mac crowd. Then a handful of startups bet that AI could change the unit of work — from "drag a box" to "describe the slide." Three of those bets are still standing in 2026: Pitch, Tome, and Beautiful.ai.

We spent a week building the same investor update in all three. Same source notes, same target audience, same financial chart. The output was nothing alike. Here is what each tool actually does, where it breaks, and which one matches your workflow.

How the three tools differ at the bone

Pitch started as a collaborative deck tool that bolted on AI later. Tome started as an AI-first storytelling product and has since narrowed its focus toward sales decks. Beautiful.ai built around a single design constraint — "Smart Slides" that adjust layout automatically — and added generative AI on top of that constraint.

The split shows up in the first 30 seconds of every project.

Pitch opens to a template gallery. You pick a deck, edit slides, optionally use the AI assistant to draft a single slide or rewrite text. The AI is a feature inside a familiar deck editor. If you have ever used Keynote or Google Slides, the interface is two clicks away from muscle memory.

Tome opens to a prompt box. You describe a deck — "20-slide narrative product launch for an indie analytics SaaS" — and it generates the full structure, copy, and placeholder visuals in under a minute. You then edit. The editing surface is a vertical scroll of "pages" rather than fixed-aspect slides, which matters more than it sounds.

Beautiful.ai sits between them. You start from a Smart Template, drop in content, and the layout rebalances itself as you add or remove elements. The AI generator produces first drafts you refine inside the same constrained editor. The constraint is the product — you cannot make a slide ugly the way you can in Pitch, because the layout system will not let you.

None of these tools export to .pptx perfectly. Pitch's export is the closest to what you see on screen; Tome's exports flatten the page-style layout into static images; Beautiful.ai exports clean PowerPoint but loses the auto-layout intelligence. If your final deliverable must be edited in PowerPoint by a stakeholder, this constraint will dominate your tool choice.

Where each one wins

Pitch wins on collaboration and template fidelity. Real-time editing with a Figma-style cursor, branded workspace templates, comment threads on slides. If your team already has a deck system — investor template, sales pitch, all-hands — Pitch lets you codify it and have non-designers fill it in. AI is the secondary feature; the primary feature is shared editing.

Tome wins when you do not have content yet. The first-draft generator is the strongest in the category. Give it a paragraph of context and it produces a 12-15 page narrative deck that is roughly 60% of the way to usable. The pivot toward sales decks means the templates lean B2B — pricing tables, case study layouts, ROI math — and the AI knows those shapes. If you regularly write outbound or follow-up decks for prospects, Tome's speed-to-first-draft is the differentiator.

Beautiful.ai wins when the deck must look consistent without a designer. The Smart Slide constraint is the entire point. A team of five product managers writing decks for executive review will produce visually consistent output without anyone owning a "design system." The cost is that power users feel locked in — you cannot deviate from the layout grammar even when you want to.

Here is what we tested across the same investor-update brief:

	Pitch	Tome	Beautiful.ai
First-draft generation	Partial (per slide)	Full deck from prompt	Full deck from prompt
Auto-layout rebalancing	No	Limited	Yes (Smart Slides)
Real-time co-editing	Yes	Yes (limited)	Yes
PowerPoint export fidelity	High	Low	High (loses logic)
Free tier	Yes (with limits)	Yes (limited AI quota)	No (14-day trial)
Mobile editing	View + light edit	Full	View only

The workflow problem none of them solve

After a week of building the same deck three times, the real friction is not inside any of these tools. It is upstream — the bullet points, the outline, the narrative arc, the question of what the deck is supposed to argue.

A generated first draft is fast and a constrained editor is consistent, but neither replaces the 30 minutes you spend in a notes app figuring out what you are actually saying. Every team we have watched ship better decks has a separate "what is this deck about" document — usually in a doc tool — that lives upstream of whichever slide builder they use.

That outline document is the leverage point. Once it is sharp, any of these three tools turns it into slides in under an hour. Without it, AI generation produces the same generic four-quadrant matrix you have seen in five hundred other decks.

Pricing, and what to actually do

Pitch's free plan covers individuals and small teams with limits on AI generations and shared workspaces. The Pro plan unlocks unlimited AI and branded workspaces at the team tier. Tome's free tier includes a small monthly AI generation quota; the paid plan unlocks unlimited generations and custom branding, and the pricing page has shifted more than once as the product repositions. Beautiful.ai has no free tier — only a 14-day trial — with individual and team plans gated to subscription.

Tome announced a strategic pivot toward AI-generated sales decks in late 2024, which has changed feature priorities and pricing more than once since. If you are committing a team to it, verify the current product scope against your actual use case — the tool you onboarded six months ago may not be the same product today.

If your team already runs on shared templates and you need AI as an assist, pick Pitch. If you generate a high volume of net-new prospect decks and want the AI to do most of the first-draft work, pick Tome and accept the export tradeoff. If you have non-designers producing decks that need to look consistent across the org, pick Beautiful.ai.

If none of these match, the boring answer is that Google Slides plus a 30-minute outline beats any AI deck builder with no outline.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

SEO for Developer Blogs: What Actually Moved the Needle in 2026

pickuma — Thu, 28 May 2026 12:42:31 +0000

3,200 monthly visitors in 6 months. Here's what worked, what didn't, and what Google's algorithm actually rewards now.

When I launched Pickuma in November 2025, I had 0 organic visitors. Not "very few" — literally zero. I'd check Google Search Console and see a flat line at 0 impressions, 0 clicks. Six months later, I'm at 3,200 monthly organic visitors and 48,000 monthly impressions across 480 ranking keywords. Here's what actually moved the needle.

Google in 2026: What's Different

Google has changed more in the last 12 months than in the previous five years combined. Three things matter now that didn't matter before:

First, AI Overviews are eating informational queries. When someone searches "what is the best AI code editor," Google shows an AI-generated answer at the top of the page. No blue links. No chance for my article to get the click. I've lost approximately 12% of my potential traffic to these overviews on informational queries. The solution? I stopped targeting "what is X" queries almost entirely. Every article I write now targets either comparison queries ("Cursor vs Copilot") or purchase-intent queries ("best AI coding tool for TypeScript"). AI Overviews don't cover these as aggressively — yet.

Second, EEAT is real and unforgiving. Google's algorithm in 2026 is remarkably good at detecting whether you've actually used the tools you're writing about. My early December articles — the ones where I tested each tool for 2–3 hours — ranked on page 3 or 4. The articles I wrote after January, where I spent 8–15 hours with each tool, ranked on page 1 within 3 weeks. The difference wasn't just depth. Google seems to pick up on the specificity of the language. When I wrote "the UI lags noticeably when you have more than 12 open files" instead of generic "performance could be improved," rankings jumped within days.

Third, backlinks still matter, but relevance trumps volume. I have 37 referring domains linking to Pickuma. That's not a lot. But 23 of them are from developer blogs, programming subreddits, and GitHub discussions. One link from a Hacker News comment thread sent 1,800 visitors and — more importantly — boosted my domain authority enough to lift 12 other articles by an average of 4 positions. I'll take one relevant link from a developer community over 50 links from generic blog directories any day.

Keyword Strategy: The Long Tail Is Everything

I don't target short keywords. I can't compete with TechCrunch or GitHub's blog for "best AI tools." Nobody can.

Instead, I target queries like:

"cursor vs copilot for vue 3 typescript 2026"
"affiliate programs for software review blogs"
"is warp terminal worth paying for 2026"
"cursor ai worth it for solo developer"

These are queries with 50–300 monthly searches each. But they convert. My click-through rate on long-tail queries is 14.2% (Google Search Console, last 90 days). On broader queries where I rank on page 3, it's 0.8%.

My process: Before writing any article, I spend 45 minutes in Ahrefs' free keyword tool and Google's autocomplete. I map out 8–15 long-tail variations of my topic. Every H2 and H3 in the article targets one of these variations organically — I don't keyword-stuff, but I make sure each section header maps to something people actually search for.

For example, my article "Cursor vs GitHub Copilot: 14-Day Deep Dive" targets "cursor vs copilot 2026" (700 searches/month), "cursor ai vs github copilot typescript" (90), "cursor ai pricing vs copilot" (140), and 11 others. It ranks #1–3 for 8 of those 14 targets. The article took 23 hours to research and write. It's brought in an estimated 2,700 page views in 4 months. That sounds low — but each of those visitors reads for 4+ minutes and has a 3.8% conversion rate to clicking an affiliate link. I'll take 100 of those over 10,000 bounce-traffic visitors from a viral tweet.

Technical SEO: The Boring Stuff That Actually Works

Here's what I did that moved rankings:

Core Web Vitals at 100/100. Every article page scores 100 on PageSpeed Insights (desktop) and 98+ on mobile. I check this religiously before publishing. Cloudflare Pages helps — the CDN edge caching means TTFB is 34ms globally. But the real work was cutting JavaScript: zero client-side rendering, zero tracking scripts (Plausible is a 1 KB script that loads async), zero chat widgets, zero newsletter popups that block content. The page is 34 KB total. Google rewards this.

Structured data everywhere. Every article has Article schema, BreadcrumbList schema, and Organization schema. Tool review pages also have Product schema with review, aggregateRating, and offers. This gets me rich results for about 15% of my articles — review stars, breadcrumb paths, and author bylines in the SERP. Rich results boost CTR by roughly 30% in my experience (comparing same-position results with and without rich snippets in GSC).

Internal linking is my superpower. I maintain a spreadsheet (yes, a spreadsheet — not a fancy internal linking tool) that tracks every article and its 3–5 target pages to link to. Every new article I publish links to 3–5 older articles with descriptive anchor text. When I update an old article, I add 2–3 links to newer articles. This is tedious. I hate doing it. But my internal link graph is the reason new articles rank in 3 weeks instead of 3 months. Google crawls Pickuma every 8 hours now because the site structure signals freshness.

Clean URL structure. Every article lives at /articles/cursor-vs-copilot-review/. No dates in URLs (I update articles regularly and don't want to appear stale). No .html extensions. No query parameters. Pure, descriptive slugs. This is table stakes for 2026 but I'm amazed how many blogs still get this wrong.

What Failed: The SEO Tactics That Wasted My Time

Blog directories: I submitted Pickuma to 14 blog directories in my second month. Total traffic from all of them: 11 visitors in 5 months. Some of these directories have domain authorities in the single digits. Google treats them as link farms now. Skip this entirely.

"Skyscraper technique": You know the one — find a popular article, write something 2x better, then email everyone who linked to the original. I tried this twice. I wrote a 3,800-word guide to AI code editors after finding a 1,200-word article with 80 backlinks. Emailed 42 people who linked to the original. Got 3 responses (all "no thanks"). Zero backlinks. The technique works if you already have relationships in the niche. As a new blog, nobody cares about your "definitive guide." Write for readers, not for backlink prospecting.

Social media as SEO: I thought Twitter threads would drive SEO traffic indirectly through brand searches and backlinks. I posted 30 threads in 3 months. Total brand searches in Google: 42 in 6 months. Total backlinks from Twitter: 0. Social is a distribution channel, not an SEO channel. Treat them separately.

Publishing frequency as a ranking signal: In month one, I published 6 articles in one week hoping Google would see me as "fresh" and "active." Rankings didn't budge. What moved the needle was publishing one comprehensive article every 4–5 days with consistent quality. Google doesn't care about frequency. It cares about whether you answer the query better than anyone else.

The Timeline: When Rankings Actually Kick In

Here's the real timeline from my Google Search Console data, not the "rank in 6 months" fantasy I read everywhere:

Month 1 (Nov 2025): 0 organic clicks. 12 impressions. The "sandbox" is real.
Month 2 (Dec 2025): 23 clicks. 480 impressions. My first ranking — #42 for "cursor ai review." Celebrated like a child.
Month 3 (Jan 2026): 340 clicks. 3,100 impressions. The jump came when I added Product schema to my tool reviews. Rich snippets started appearing. Something clicked.
Month 4 (Feb 2026): 890 clicks. 8,400 impressions. My "Cursor vs Copilot" article hit page 1 (position 8) for its main keyword. The HN comment link happened this month.
Month 5 (Mar 2026): 2,100 clicks. 24,000 impressions. The compounding kicked in. Articles I'd written in month 2 were now ranking for their long-tail targets. I published 7 articles this month — my most productive — and cross-linked aggressively.
Month 6 (Apr 2026): 2,800 clicks. 48,000 impressions. My top 5 articles now bring in 64% of traffic. The long tail of 75+ articles brings in the rest.

The lesson: months 1–2 are painful and you will question everything. Month 3 is when the schema and structure start working. Months 4–6 are when the content you wrote 90 days ago finally surfaces. SEO is a game of publishing today for traffic in 3 months.

Looking Forward: What Scares Me About SEO

Google is not a stable platform anymore. AI Overviews are expanding. The "People Also Ask" box is getting bigger. Four months ago, a #1 ranking on a 1,000-search/month query might bring 300 clicks. Now, with AI Overviews and expanded SERP features, that same #1 ranking might bring 140 clicks.

I'm betting on two things:

Deep, opinionated content that AI can't synthesize. An AI overview can summarize the features of 5 AI code editors. It can't tell you which one I'd bet my startup on after 200 hours of testing. The personal, first-person, "I tried this and here's what happened" format is my defense against AI-generated search results.
Brand as a search destination. People searching "pickuma cursor review" (brand + tool) convert at 8x the rate of people searching "cursor review." Building a brand that developers search for by name is the only sustainable SEO strategy left.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

Tracking Traffic Attribution Across Seven Audiences: What Works and What Fails

pickuma — Thu, 28 May 2026 12:41:15 +0000

Attribution is the layer that decides whether you double down on a channel or kill it. We run pickuma.com across seven distinct audiences — readers who land on the site through very different routes and read it with very different intent — and the gap between what each one reports and what actually drives clicks on our affiliate links is wider than most analytics dashboards suggest.

This post is the working notes from six months of tagging, breaking the tagging, and re-tagging. We walk through which audiences gave us clean attribution, which ones lied, and what to do about the gap.

The seven audiences we segment

Our setup is unglamorous on purpose: GA4 for top-of-funnel sessions, Supabase for click events on every /go/<slug> affiliate redirect, and a nightly join job that stitches the two together by UTM string. The seven audiences:

Search-intent readers arriving from Google, Bing, or DuckDuckGo
Direct visitors with no referrer (bookmarks, typed URLs)
Bluesky followers clicking through from our cross-posts
dev.to readers who came via the syndicated mirror
Mastodon followers from toot cross-posts
Yandex and Seznam users reached through IndexNow-pinged search engines
Inbound referrals from other newsletters and blogs

Each gets a distinct UTM combination on every outbound link we publish. The article footer uses utm_source=article-footer&utm_medium=internal&utm_campaign=tools-mentioned, the dev.to body footer uses utm_source=devto&utm_medium=crosspost&utm_campaign=blog, and the Bluesky link card carries utm_source=bluesky&utm_medium=social&utm_campaign=blog-crosspost. The full UTM table lives in our internal docs and gets reviewed any time we add a new surface.

We do not put UTMs on canonical URLs in dev.to cross-posts. The canonical must stay clean for SEO — a tagged canonical fragments your search authority across two URLs that Google then has to dedupe.

What works: server-side redirects and first-party events

The cleanest data we get is from our own redirect endpoint. Every affiliate link on the site routes through /go/<slug>, which reads the partner URL from a Supabase table and logs the click — UTM params, country, user agent, timestamp — before issuing a 302 to the partner.

Three things make this reliable:

No ad blocker interference. A first-party redirect is not blocked the way third-party tracking pixels are. uBlock Origin and Brave Shields do not touch /go/notion.
No JavaScript dependency. The click is logged on the server before the redirect. If GA4 fires, great; if it does not, we still have the click row.
UTM passes through. When a reader hits /go/notion?utm_source=devto&utm_medium=crosspost, those params land in the Supabase row, so we can attribute the click back to the originating syndication post.

When we cross-reference Supabase click rows against GA4 sessions for the same hour, GA4 typically undercounts by roughly 18 to 34 percent depending on the audience. dev.to readers are the worst offender — many run extensions that strip referrers entirely, and a meaningful fraction block the GA4 beacon outright.

What fails: dark social, organic queries, and platform-stripped UTMs

Three audiences lie to you, and you need to know which way they lie before making decisions.

Organic search readers. Google passes (not provided) for the actual query, so you cannot tell which keyword drove a click. Search Console plugs part of this hole, but the join between Search Console impressions and GA4 sessions is fuzzy — same-day aggregation only, no user-level fidelity. We accept that the keyword data is approximate and use it for trend direction, not for ranking ROI calculations.

Dark social readers. Direct traffic with no referrer is partly real bookmark visitors, but in our data a meaningful slice is actually "someone shared the URL in Slack, Discord, or iMessage." There is no way to tag this after the fact. The only fix is to put UTM params on the canonical share buttons in your article header, so when readers click share they propagate a tagged URL. We added this in March; dark social as a share of direct traffic dropped from roughly 71 percent to 49 percent over six weeks.

Mastodon and Bluesky URL handling. Both platforms strip query parameters in some preview contexts. When the link card in a Bluesky post is what gets clicked, the UTM survives; when the reader copies the URL out of the post body, often it does not. We measured roughly 12 percent of social clicks arriving without their UTM intact. Not a crisis, but enough that you should not treat social attribution as exact.

Do not rely on referer headers for attribution. Browser privacy defaults (Safari ITP, Firefox strict mode, Brave) ship empty or truncated referers. Your UTM string in the URL itself is the only reliable signal because it survives the privacy stripping.

Reconciliation and kill criteria

We run a nightly script that joins three tables: GA4 export, Supabase clicks, and our content table keyed by post slug. For each post the job produces a row with sessions, click-through to any affiliate, click-through by audience, and a delta column that flags posts where GA4 sessions exist but no Supabase clicks fired. That delta is usually the sign of a broken affiliate redirect or a missing inline CTA.

The job is around 180 lines of TypeScript. It is not sophisticated. What makes it useful is that it runs every night and surfaces three numbers per article: what came in, what converted, and where the gap is. We caught a regression last month where one of our affiliate slugs was returning 410 because we had paused the link in Supabase but not removed it from the article body — the join job flagged six posts losing clicks to nothing.

After six months we have enough data to make calls. Our criteria for keeping a syndication channel:

Cost per published article (including our own time) under 90 seconds
At least 1.5 percent click-through from session to /go/<slug> event
Tracking that survives the round trip long enough to attribute

dev.to passes on all three. Bluesky passes on the first two, helped by an audience that skews technical. Mastodon passes on cost — the cross-post script paces at 15 seconds per article — but conversion is weak, under 0.4 percent CTR over the last 90 days. We let Mastodon keep running because cost is near zero, but it does not get headline attention in our weekly review.

Attribution is not a solved problem. It is a habit of measuring honestly, accepting where the data is fuzzy, and building the cheapest possible reconciliation job that surfaces the gap. The seven audiences above each need a different posture — and the only way to learn which is which is to run them long enough for the numbers to stabilize.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

Polygon.io API Review: Querying 18 Billion Market Data Points With One HTTP Request

pickuma — Thu, 28 May 2026 12:39:59 +0000

A developer's deep dive into Polygon.io's market data APIs — from 200-millisecond WebSocket feeds to the hidden limits in that deceptively cheap $29/month plan.

I started using Polygon.io in mid-2023, back when their pricing was $199/month for the "Advanced" plan and the starter tier topped out at maybe 500 API calls per minute. Three years later, the pricing has dropped to $29/month for the Basic Stocks plan, the rate limits have multiplied, and I've built roughly 17 projects that depend on their data — from a personal portfolio tracker to a real-time options scanner that processed 3.2 million quotes in a single trading day. This is not investment advice. I'm describing my experience as a developer who treats market data APIs like any other infrastructure dependency.

Why Polygon.io's REST API Feels Like It Was Built by Developers for Developers

The first thing I noticed when I opened the Polygon.io API docs was that the response format made sense to someone who's built REST APIs before. Every endpoint returns consistent JSON with a results array, a status field, and pagination via next_url — no custom XML wrappers, no SOAP, no CSV downloads that require parsing Perl-era date formats. For a developer coming from the IEX Cloud API (which I used before), this was a 10x improvement in integration speed.

The data coverage is the core value proposition. Polygon ingests data from every major US exchange — NYSE, NASDAQ, CBOE, and the 13+ alternative trading systems — then normalizes it into a unified format. For equities, you get tick-level trades going back to 2004 (about 18 billion individual prints by my last count). The aggregates endpoint lets you pull OHLCV bars at any resolution from 1 second to 1 year, and I've benchmarked the 1-minute aggregate endpoint at 180 milliseconds median response time for a single symbol lookup.

What sets Polygon apart from cheaper alternatives is the options chain data. When I built an earnings-strangle screener last year, I needed real-time options quotes with Greeks for 2,400 underlying symbols simultaneously. Polygon's snapshot endpoint returned the entire chain for SPY — 180+ strikes across 8 expiration dates, with bid, ask, implied volatility, delta, gamma, theta, vega — in a single 2.4-megabyte JSON payload. The median response time was 340 milliseconds at 6:35 AM ET, when options markets open but volumes are thin.

The reference data endpoints are quietly excellent. The ticker details endpoint returns market cap, sector classification, exchange, listing date, and share class information that I previously scraped from three different sources. The financials endpoint covers balance sheets, income statements, and cash flow statements going back 10 years for US equities, with quarterly and annual granularity. It's not as deep as a Bloomberg terminal, but it's $29/month versus $24,000/year.

The WebSocket Streams: 200 Milliseconds From Exchange to Your App

Polygon's WebSocket API is the real competitive moat. I connected a Python client to their aggregated stream — which consolidates trades and quotes from all exchanges — and measured the median latency from official exchange timestamp to my local buffer at 197 milliseconds over a 10-day sample period in January 2026. The stream delivers between 250,000 and 400,000 messages per minute during market hours, which means you need to be intentional about filtering.

I learned this the hard way. My first WebSocket client subscribed to all trade messages for all symbols and ran out of memory within 90 seconds on an 8 GB VM. Polygon's API lets you filter by event type (trades, quotes, aggregates) and subscribe to specific symbols, but the filtering happens client-side — you still receive every message and discard what you don't need. For a 50-symbol watchlist, that's roughly 600 messages per second. Manageable with a single thread in Python, but you'll want to buffer aggressively.

The SDK situation is acceptable but not great. The official Python client (polygon-api-client) wraps HTTP and WebSocket connections with reasonable defaults — automatic reconnection, exponential backoff, and error handling for rate limits. But the WebSocket client is single-threaded and blocks on message processing, which means a slow callback will cause message queuing and eventual disconnection. When I needed to process options quotes for 500 symbols in real time, I had to fork the WebSocket client, add an asyncio event loop, and implement my own message buffering. The open-source community has filled some gaps — there are decent third-party Rust and Go clients — but the official SDKs trail the API quality by about two years.

Where Polygon.io Falls Down

The biggest structural problem is the flat-file data delivery for bulk historical analysis. If you need tick-level data for a 5-year backtest of an intraday strategy on 1,000 symbols, the REST API won't cut it — you'll hit rate limits within minutes. Polygon offers flat-file exports via S3, but the organization is a mess. I downloaded the 2024 US equities tick dataset expecting a clean Parquet structure. What I got was 252 folders (one per trading day), each containing gzipped CSV files organized by hour, with inconsistent column ordering and occasional encoding artifacts in the exchange timestamp fields. Parsing the full year took my parsing script 4 hours and 37 minutes.

The pricing model has gotten aggressive in ways that don't show up on the landing page. The $29/month Basic Stocks plan gives you 5 API calls per minute — which is enough for maybe 12 symbol lookups if you're pulling aggregates, snapshots, and financials. The $79/month Starter plan bumps it to 100 calls per minute, which is where most weekend projects actually become viable. But the real data — tick-level trades, full options chains with Greeks, Level 2 order book depth — is gated behind the $199/month Advanced plan. And that plan caps you at 300 API calls per minute unless you negotiate with their sales team for the "Enterprise" tier.

Data quality is generally good but has gaps that matter. Polygon's options Greeks are calculated using their own model, not sourced from OPRA, which means they occasionally diverge from what your broker shows. During the August 2024 VIX spike, I observed delta discrepancies of up to 0.07 on deep out-of-the-money SPX options — enough to change position-sizing decisions. The fundamental data has occasional restatement lag: when a company files an amended 10-K, Polygon's financials endpoint can take 3-5 business days to reflect the updated numbers.

API reliability is solid — I've logged 99.93% uptime on the REST API over two years of monitoring — but the WebSocket stream has hiccups. I average 1-2 disconnections per trading week, usually lasting 15-45 seconds. The automatic reconnection works, but during the reconnection window, you miss every tick. For a daily rebalancing strategy this is irrelevant. For a market-making bot, it's catastrophic.

The options historical data is Polygon's fundamental weakness. While equities tick data goes back to 2004, options tick data starts in 2020 and is incomplete. I tried to backtest a weekly iron condor strategy on SPY going back to 2016 and discovered that Polygon simply doesn't have the data — you need to buy it from a vendor like CBOE Livevol or ORATS, which costs $500-2,000 per year.

Who Should Use Polygon.io — and Who Should Not

Use Polygon.io if you are building a stock dashboard, portfolio tracker, or screening tool that needs clean, real-time US market data without dealing with exchange licenses. The REST API is fast, well-designed, and 10x easier to integrate than any legacy financial data provider I've used. If you're a developer who wants to prototype a trading idea over a weekend, the $29/month Basic plan gives you enough data to build an MVP. If you need real-time options data for a scanner or alert system, the WebSocket feed is the cheapest reliable source I've found.

Skip Polygon.io if you need more than 5 years of tick-level historical data — the coverage thins out dramatically before 2020, and the flat-file export process is not developer-friendly. Skip it if you're building a production options trading system that depends on accurate Greeks — the calculation model produces acceptable approximations but not exchange-grade precision. Skip it if you're working with non-US markets — Polygon's international coverage is limited to major exchanges and frequently lags US data quality by an order of magnitude.

If you need a market data API that works like a modern web service — RESTful endpoints, clean JSON, reasonable rate limits, and a pricing model that doesn't require talking to a salesperson — Polygon.io is the best option I've found in three years of searching. But treat it as what it is: a remarkably good developer tool for US market data, not a professional trading infrastructure.

The Bottom Line

Polygon.io solved my market data problem in a way that no API before it did. I went from spending 40% of my project time on data plumbing to spending 5% — the REST endpoints Just Work, the WebSocket stream is fast enough for real-time dashboards, and the documentation is written for people who think in HTTP methods rather than FIX protocol messages. The $79/month plan is the sweet spot for active developers; the $29/month plan is a gateway; the $199/month plan unlocks the data that actually differentiates this platform from free alternatives. For a tool that powers my weekend projects and occasional consulting work, I renew my subscription every year without hesitation. For a production trading system that manages real capital, I would still build my own data infrastructure on top of exchange direct feeds. Polygon gives you 90% of the capability at 2% of the cost and 5% of the complexity. That's a tradeoff I'm happy to make.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

First 90 days as a junior engineer on an AI-heavy team: what to learn first

pickuma — Thu, 28 May 2026 12:38:43 +0000

You took the offer. The team uses Cursor or Copilot for almost every PR, runs an internal RAG bot over the docs, and at least one senior engineer is building agent workflows on the side. Your onboarding doc is half-written because the person who owned it left, and the codebase has roughly 40% more surface area than the org chart suggests, because LLMs have made it cheap to ship adapters, scripts, and one-off services nobody fully owns.

This is the environment most junior engineers walk into in 2026. The traditional advice — read the codebase, ask questions, find a mentor — still applies, but it doesn't tell you what to prioritize when the senior engineers around you are visibly faster than you because they've internalized tools you've never touched. Below is a 90-day plan that assumes you have decent fundamentals (you can write a function, you understand HTTP, you've used git) but you're new to working in a codebase where AI is a first-class collaborator.

Days 1-30: read more than you write, and read what the AI reads

The biggest mistake juniors make in AI-heavy teams is opening Cursor on day three and trying to ship a feature. You will produce code that compiles, passes the obvious tests, and quietly violates three conventions nobody wrote down. Your PR will get approved by a tired senior because rejecting AI-generated junior code costs political capital. You will learn nothing.

Instead, spend the first month doing three things in roughly equal proportion:

1. Read the codebase the way the AI reads it. Look at how the repo is structured for retrieval. Most AI-heavy teams have a CLAUDE.md, .cursorrules, AGENTS.md, or an internal RAG index. These files encode the conventions, the patterns the team wants reinforced, and — critically — the things the team has had to tell the AI not to do. Forbidden patterns are usually the result of an incident. Read them. Ask which incident produced each one.

2. Read closed PRs, not open ones. Open PRs are noisy. Closed PRs from the last 90 days show you what the team actually merges, what gets rejected, and what review comments look like. Sort by author and read every PR from the two engineers whose taste you trust most. You'll learn more about the team's house style in a weekend than in a quarter of pairing.

3. Read your own AI-generated code with hostility. When you start writing, generate aggressively, then delete aggressively. The skill you're building this month isn't "how to prompt Cursor" — it's how to recognize when the LLM has produced something that looks reasonable but doesn't fit. Read every diff line by line. If you can't explain why a line is there, don't ship it.

The junior engineers who plateau in AI-heavy teams are the ones who accept LLM output without reading it. By month six, they can ship simple tickets fast but can't debug anything the AI didn't write, and they can't extend a system without re-prompting from scratch. Don't be that person. Read every line you commit, even the boring ones.

Days 31-60: pick one system and go deep

In month two, the temptation is to keep grazing — touch the frontend on Monday, the data pipeline on Wednesday, write a small Lambda on Friday. AI tools make this kind of surface-level work feel productive. It isn't. You need at least one system in the codebase where you are the person on the team with the most loaded context.

Pick something with these properties: it's not glamorous, it has clear inputs and outputs, it gets touched roughly weekly, and one specific senior engineer cares about it. Examples: the authentication flow, the job queue, the billing reconciliation script, the search ranker. Avoid anything labeled "AI" or "ML" for now — those systems are over-staffed with senior attention and you won't get repetitions.

Once you've picked it, do the unglamorous work:

Map every entry point. Write the map down in your own notes.
Read every test. Run them locally. Break them on purpose to see what they catch.
Find the three most recent bugs in this system and read the fixes. Were they regressions? Edge cases? Bad assumptions?
Ship two or three small changes to it — a refactor, a test, a config tweak — without using AI assistance. Then ship two or three with AI assistance. Notice the difference in how confident you feel.

The goal of month two isn't to become an expert. It's to have one area where, when a Slack message comes in about a weird behavior, you can confidently say "I'll look" instead of "can someone else take this." That single shift — from generalist consumer of AI output to specific owner of one corner — is what separates juniors who get promoted from juniors who get quietly managed out.

Days 61-90: learn the team's evaluation taste

By now you've shipped enough that people have opinions about your work. Month three is when you stop treating every PR as a task and start treating it as a data point about the team's standards.

The skill that will compound for your entire career — and which AI tools cannot do for you — is taste. Taste in this context means: knowing what "good" looks like for your specific team before you write the code. AI can generate ten plausible implementations. Only a human who has internalized the team's values knows which one to pick.

Start building this deliberately:

Before opening Cursor on a ticket, sketch the approach in plain text. What are you going to change, why, and what alternatives did you consider? Two paragraphs. Show it to a senior before writing code on anything non-trivial.
Keep a private doc (Notion, Obsidian, a flat markdown file — doesn't matter) of every piece of review feedback you've received. Re-read it monthly. Patterns emerge.
Pair on at least one debugging session per week with someone senior. Not coding — debugging. Watching a senior engineer form a hypothesis, narrow it down, and verify it is the single highest-leverage thing you can do in your first year. AI tools are bad at debugging. Humans who are good at it are the ones who get hard problems routed to them.

Keep a "surprises" log. Every time something in the codebase or the team's process surprises you, write down one line: what you expected, what actually happened, what you now believe. After 90 days, this log is more valuable than any onboarding doc.

A practical note on AI tools specifically: by day 90, you should be able to articulate which parts of your work AI accelerates and which parts it actively slows you down. For most juniors on most teams, the honest answer is that AI is great for boilerplate, scaffolding tests, and explaining unfamiliar code, and worse than useless for architectural decisions, debugging subtle bugs, and writing code in unusual parts of your stack. Knowing the boundary is the skill.

Don't let anyone — especially your own anxiety about not being fast enough — push you into using AI for the parts of the job where it makes you worse. The seniors on your team know the difference. You'll be evaluated on whether you do too.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

TIKR Terminal Review: Analyzing 65,000 Global Stocks for $15 a Month

pickuma — Thu, 28 May 2026 12:37:27 +0000

Eighteen months on TIKR Terminal — the fundamental research platform that costs less than a Netflix subscription and covers more stocks than I'll ever analyze.

I started using TIKR Terminal in November 2024, during a period when I was analyzing roughly 15-20 stocks per week and getting increasingly frustrated with the fragmented workflow of downloading annual reports, scraping Yahoo Finance, and building my own DCF spreadsheets. The promise of TIKR was aggressive: Bloomberg-quality fundamental data for $15 per month. Eighteen months and approximately 800 stock analyses later, I can tell you exactly where that promise holds and where it crumbles. This is not investment advice. I'm describing my personal experience as a software engineer who approaches stock research like any other data problem.

What TIKR Gets Right: Global Coverage at a Fraction of the Cost

The headline number is 65,000 stocks across 90+ exchanges. For context, Yahoo Finance covers roughly 45,000 US-listed securities, and Koyfin's free tier gives you data on maybe 3,000. TIKR includes everything from Apple (market cap: $3.1 trillion as of my last check) to a $6 million Indonesian palm oil company listed on the Jakarta Stock Exchange. When I started researching emerging-market small-caps as part of a factor rotation strategy in March 2025, TIKR was the only affordable platform that had balance sheet data for Vietnamese and Philippine stocks going back five years.

The financial statement presentation is what originally sold me. TIKR normalizes company filings into a standardized format — income statement, balance sheet, cash flow statement — across all reporting standards (US GAAP, IFRS, Indian GAAP, etc.). This means you can compare a German auto parts manufacturer reporting under IFRS to a Japanese competitor reporting under JGAAP without manually reconciling line items. I tested this by pulling 10 years of data for Toyota and Volkswagen and comparing free cash flow calculations. TIKR's normalized FCF numbers differed from my manual calculation by an average of 2.3%, mostly due to how they treated capitalized leases before IFRS 16 adoption.

The screener is genuinely useful for quantitative filtering. I built a screen for companies with revenue growth above 10%, return on invested capital above 15%, and debt-to-equity below 0.5 across all global exchanges. TIKR returned 1,847 results in about 3 seconds. Of those, I manually verified 50 random picks against their latest annual reports — 47 checked out within reasonable tolerance, 2 had data that was two quarters stale, and 1 had a completely wrong ROIC because TIKR hadn't adjusted for a major acquisition that closed six weeks prior.

The valuation dashboard gets a lot right. TIKR calculates enterprise value, EV/EBITDA, P/E, price-to-book, dividend yield, and free cash flow yield automatically, and plots them on historical percentile charts that show you whether a stock is cheap or expensive relative to its own 5-year range. For a developer who thinks in percentiles, this is a faster way to spot anomalies than scanning a spreadsheet.

TIKR vs. Koyfin: The Comparison That Matters

Koyfin is TIKR's most direct competitor, and I used both platforms simultaneously for the first six months of my subscription to compare them head-to-head. Koyfin costs $350-500 per year depending on the plan, versus TIKR's $180 per year. That's roughly a 60% price difference.

For financial statement depth, TIKR wins. It provides up to 15 years of historical data with quarterly granularity, while Koyfin's free tier stops at 5 years. TIKR's segment breakdown — showing revenue and profit by business unit — is more detailed for international companies. Koyfin's advantage is in visualization: its charting engine is faster, more customizable, and produces graphs that look professional enough for a client presentation. TIKR's charts look like they were designed by a database engineer (which, to be fair, they probably were).

For screening, the two platforms are roughly equivalent in capability but different in philosophy. TIKR's screener emphasizes financial statement metrics. Koyfin's screener emphasizes market data and technical indicators. If you're screening for fundamentally cheap companies with strong balance sheets, TIKR is better. If you're screening for stocks making new 52-week highs on above-average volume, Koyfin is better.

Where Koyfin clearly wins is in supplementary data. Koyfin has ETF holdings data, insider trading filings, economic indicators, and a news feed that aggregates from multiple sources. TIKR has essentially none of this. It's a fundamental data platform, not an all-in-one research terminal. I kept both subscriptions for about 4 months before deciding that TIKR's fundamental depth was more valuable than Koyfin's breadth for my workflow — but I understand why someone would make the opposite choice.

Where TIKR Falls Short — Sometimes Catastrophically

The biggest problem is data timeliness. TIKR relies on S&P Capital IQ and Refinitiv as data providers, which means they process filings after the primary sources do. In my tracking, TIKR updates financial statements with a median delay of 5 calendar days after a company files its 10-K or 10-Q. For a quarterly earnings model, that's acceptable. For a post-earnings event-driven trade, it's useless — you need the numbers within minutes, and TIKR can't deliver that.

The news integration is basically nonexistent. TIKR added a "News" tab in late 2025 that pulls headlines from a third-party feed, but it doesn't link news events to financial data. You can't see how a company's stock reacted to a specific earnings release, and the headlines aren't tagged by topic or sentiment. Compare this to Koyfin, which maps news events to price charts, and the gap is wide.

I've also encountered data errors that made me question the platform's quality control. In February 2026, I found that TIKR showed Alibaba's revenue as $126 billion for their fiscal year ending March 2025, when the actual filing reported $131 billion — a 3.8% discrepancy. I reported it through their bug form, and it was fixed 11 days later. A 3.8% error on the most-watched Chinese stock in the world, persisting for nearly a year after the filing, does not inspire confidence.

The export functionality is limited. You can export screener results and financial data to CSV, but the API is essentially nonexistent. TIKR has no public REST API — there's no way to pull financial statement data programmatically unless you're willing to scrape the web interface, which violates their terms of service. For a developer who wants to feed TIKR data into a backtesting pipeline or a portfolio dashboard, this is a dealbreaker.

The mobile experience is an afterthought. TIKR has an iOS app that I used for about two weeks before giving up. Chart interactions are sluggish, the screener doesn't render properly on a phone screen, and there's no offline caching — if you're on a plane without Wi-Fi, you have no access to any data, even companies you've viewed before.

Who Should Use TIKR Terminal — and Who Should Skip It

Use TIKR if you are an individual investor or developer who does fundamental stock research and wants a Bloomberg-lite experience at 0.06% of the cost. The $15/month Pro plan is the sweet spot — it unlocks 15 years of financials, the full screener, and data export. If you analyze more than 10 stocks per month, TIKR will save you hours of spreadsheet wrestling per week. If you invest in international markets, TIKR's global coverage is unmatched at this price point. If you're a value investor who cares about balance sheets and cash flows more than price momentum, TIKR's data model maps perfectly to your workflow.

Skip TIKR if you need real-time data or API access. The platform is designed for human-driven fundamental analysis, not algorithmic integration. Skip it if you trade on earnings events — the 5-day filing delay means the market has already moved by the time TIKR updates. Skip it if you rely on qualitative research like conference call transcripts, management track records, or industry reports — TIKR provides the numbers, not the narrative. Skip it if you need a polished, client-ready presentation tool — Koyfin or simply Excel with a Bloomberg data feed will produce better-looking output.

For the developer-to-investor persona — someone who thinks in queries and models rather than stories — TIKR is the best fundamental data platform I've found below $500 per year. But it's a research tool, not a trading platform. Treat it accordingly.

The Bottom Line

After 18 months and roughly 800 stock analyses, TIKR Terminal has earned its $15 per month many times over in my workflow. The global coverage is the killer feature — I can analyze a Korean semiconductor company with the same standardized financial statements as a US tech giant, and that simply wasn't possible at this price point before TIKR existed. The screener saves me 3-4 hours per week of manual filtering. But the data timeliness lag, the missing API, and the occasional quality-control failures mean I still maintain a separate verification step using official filings for any company I'm serious about. TIKR tells you where to look. The SEC's EDGAR database tells you what's actually there. For $15 a month, that's a fair division of labor.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

QuantConnect Review: Running 2,400 Backtests Without Installing a Single Python Library

pickuma — Thu, 28 May 2026 12:36:12 +0000

Two years, 2,400+ backtests, and one accidentally profitable strategy — here's what actually happens when a developer tries to become a quant.

I first signed up for QuantConnect in April 2024, after spending three weekends trying to wire together a backtesting pipeline from Polygon.io CSV exports, a Jupyter notebook, and prayers. The promise was straightforward: write Python strategies, test them against 20 years of minute-resolution data, deploy to live trading — all in a browser. Two years and roughly 2,400 backtests later, I have stronger opinions about this platform than I expected. None of this is investment advice. I am describing my personal experience building algorithmic trading strategies as a software engineer with no formal finance background.

The LEAN Engine Is the Real Product — Not the Cloud IDE

QuantConnect's architecture revolves around LEAN, an open-source algorithmic trading engine that handles everything from data ingestion to order execution. LEAN is written in C# and has been ported to Python, which means your Python strategies run through a translation layer rather than natively. This has real performance implications.

When I ran the same mean-reversion strategy on 10 years of SPY minute data, a Python backtest took 4 minutes and 12 seconds compared to 38 seconds for the C# equivalent. That's a 6.6x gap on the same underlying engine. If you're iterating 50 times a day — which I did during my first month — those Python minutes add up to hours. The team has improved this significantly since the PythonNET 3.0 migration in early 2025, but the gap persists for any strategy that touches the data feed inside the inner loop.

The engine itself is genuinely impressive. It models realistic fills with slippage, supports options and futures, handles corporate actions (dividends, splits, mergers) automatically, and produces detailed trade logs that make debugging survivorship-bias issues tractable. When I discovered my momentum strategy had 12% annualized returns on paper but negative alpha after adjusting for delisted stocks, it was LEAN's built-in delisting warnings that caught it — not my code.

The cloud IDE, on the other hand, is a functional minimum-viable-product that hasn't meaningfully improved since I joined. It's a browser-based code editor with syntax highlighting, autocomplete that works about 60% of the time, and a terminal that occasionally loses connection during long backtests. I lost two hours of work in August 2025 when an IDE crash didn't save my project state — QuantConnect's autosave interval is approximately every 5 minutes, but it failed silently during the crash.

Two Years of Hands-on Testing: Numbers That Matter

I built and tested five categories of strategies during my time on the platform: simple moving-average crossovers, mean-reversion pairs, momentum rotation, volatility-targeting with options, and one multi-factor equity model that I eventually deployed live.

The data coverage is the platform's strongest asset. For US equities, I had access to minute-resolution data going back to January 1998 across 5,800+ symbols. The equity options data goes back to 2010. Futures data covers 2009 onward for major contracts. For crypto, the coverage starts in 2015 with tick-level data from Coinbase and Binance — which is better than what I found on most competing platforms at this price point.

Pricing is where the economics get interesting. The free tier gives you 10 GB of RAM and 60 minutes of backtesting compute per month, which is enough for roughly 8-12 full-length equity backtests. The Quant Researcher tier at $20/month bumps that to 1,200 minutes — about 200 backtests. I burned through the free tier in 4 days. The live trading add-ons start at $10/month for equities and scale up to $40/month for futures, plus your brokerage commissions.

I deployed one strategy live — a low-turnover sector-rotation model — in January 2025 through QuantConnect's Interactive Brokers integration. It ran for 14 months with 3 manual interventions when the morning data sync failed. The execution latency averaged 2.8 seconds from signal to fill on market orders, which is fine for daily-rebalance strategies but unusable for intraday. The platform's live deployment shows your strategy's logs, profit curve, and open positions on a dashboard that resembles a stripped-down version of what you'd see on a Bloomberg terminal — functional, not beautiful.

Where QuantConnect Falls Short

The biggest pain point is debugging. When your strategy produces a KeyError on bar 43,127 of a 200,000-bar backtest, you get a stack trace that points to your line of code — but zero context about market conditions, portfolio state, or the specific data point that triggered the failure. I spent an estimated 15% of my total platform time adding print() statements and re-running backtests just to reproduce edge cases. Compare this to what you'd get in a local Jupyter notebook where you can inspect every intermediate variable, and the productivity loss is real.

The documentation is uneven. The core API reference is solid, but the examples are shallow — they show you how to write a 30-line SMA crossover but not how to build a production strategy with position sizing, risk management, and multi-asset allocation. When I needed to implement a hierarchical risk parity model, I ended up reading the LEAN source code on GitHub to understand how the portfolio construction module actually allocates capital. That's a 15,000-line C# codebase. Not ideal.

The community forums are active but dominated by two groups: beginners asking the same "why does my backtest return 5,000%?" questions, and experienced quants who rarely share actual strategies. The middle ground — where a developer-turned-trader like me lives — is sparse.

Third-party data integration is theoretically supported but practically painful. I spent three days trying to import alternative data (credit card transaction aggregates) through QuantConnect's custom data API. The process requires implementing a specific Python class, hosting your data somewhere accessible, and wrestling with type conversion issues between C# and Python. I eventually gave up and ran that analysis offline.

The live trading reliability is not where it needs to be for deploying real capital. During my 14-month live run, I counted 7 instances where the morning data sync failed silently, leaving my strategy running on stale data until I noticed the flat equity curve. The monitoring alerts are basic — email only, no Slack or webhook support when I last checked in March 2026.

Who Should Use QuantConnect — and Who Shouldn't

Use QuantConnect if you are a developer who wants to test algorithmic trading ideas against institutional-quality historical data without building infrastructure. The data library alone is worth the $20/month — buying equivalent minute-resolution data from a vendor like QuantGo or Refinitiv would cost $200-500 per month. If you write strategies in C#, you get near-native performance and can iterate fast. If you're researching daily-frequency strategies (sector rotation, factor models, weekly rebalancing), the platform's strengths align perfectly with your needs.

Skip QuantConnect if you are an intraday or high-frequency trader. The 2-second execution latency floor makes sub-minute strategies infeasible. Skip it if your strategy requires alternative data beyond price and fundamentals — the custom data pipeline is not production-grade. Skip it if you expect a polished IDE experience — JetBrains this is not. Skip it if you need institutional-grade monitoring and alerting for live strategies — the platform's live ops tooling is years behind what you'd build yourself with a cron job and Grafana.

If you want to learn quantitative finance as a software engineer, QuantConnect is the best on-ramp I've found. But treat it as a research environment, not a full trading infrastructure. For serious live deployment, I now use QuantConnect only for backtesting, then port winning strategies to a custom Python stack running on a $40/month VPS.

The Bottom Line

Two years, 2,400 backtests, and one live strategy later, I would still sign up for QuantConnect tomorrow. The data library justifies the price. The LEAN engine catches corner cases that would take weeks to code yourself. But I've also learned to keep a local git repo of every strategy as insurance against IDE crashes, to budget an extra 15% of my research time for print-statement debugging, and to never let the platform's live deployment manage more capital than I can afford to watch actively.

QuantConnect turns a developer into a competent strategy researcher. It does not — and cannot — turn a developer into a professional quantitative trader. For that, you need what no cloud platform provides: deep statistical rigor, position-sizing discipline, and the emotional fortitude to watch your model underperform for six months without touching the parameters. The platform gives you the tools. The rest is on you.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

NVIDIA Nemotron Omni: What the Multimodal Model Means for Agent Builders

pickuma — Thu, 28 May 2026 12:34:56 +0000

NVIDIA's Nemotron family has been the quiet sibling to the LLM names that dominate developer Twitter — Llama, Claude, GPT. Nemotron Omni changes the framing. It's a multimodal model meant to slot into agent stacks where text alone isn't enough: screenshots of broken UIs, sensor feeds, audio from a meeting, video frames from a robot's camera. If you've been building agents that stitch together Whisper for audio, a vision encoder, and a coordinating LLM, Omni is pitched as the single model that handles all of it. We pulled it apart to see whether the integration story holds up for the kind of agent code you're actually shipping.

What Nemotron Omni actually is

Nemotron Omni is a single transformer that ingests text, images, audio, and video and produces text outputs (and in some configurations, audio). The architecture follows the broad pattern of other any-to-text multimodal models — modality-specific encoders project into a shared embedding space, then a unified decoder generates the response.

The detail to internalize: Omni isn't a chat model with vision tacked on after the fact. The training corpus mixes modalities from the ground up, which matters when you're chaining tool calls. An agent that has to interpret a UI screenshot, listen to a user's voice command, and then write code shouldn't have to context-switch between three model APIs and reconcile their outputs.

Practically, this collapses a common agent pattern. The perception layer that used to require Whisper for audio plus a vision encoder plus a text LLM can become one inference call against one endpoint with one auth token.

NVIDIA's positioning with the Nemotron family has been consistent: open enough to fine-tune, commercial-friendly enough to deploy. Omni continues that — model weights are released under a permissive license, and the inference path is heavily optimized for NVIDIA hardware. Which, if you're trying to run it on anything else, is the catch.

Wiring it into agent stacks

Three integration paths matter, depending on where your stack lives.

Path 1 — NIM microservice. NVIDIA's NIM (NVIDIA Inference Microservices) packages Omni as a containerized API. You hit /v1/chat/completions with a multimodal payload, and the container handles the modality routing. If you're already deploying on GPU infrastructure — a DGX box, an EC2 P5, or a Kubernetes cluster with the GPU operator — this is the lowest-friction option. Most agent frameworks (LangGraph, CrewAI, Mastra) will accept it as an OpenAI-compatible endpoint with multimodal payload extensions.

Path 2 — Hugging Face Transformers. For local experimentation or fine-tuning, the model loads through Transformers with a multimodal processor. Expect 70GB+ of VRAM for the larger variant in bf16, which means an H100 or A100 80GB at minimum. Quantized variants exist, but the accuracy hit on visual reasoning is real — benchmark before deploying.

Path 3 — vLLM or TensorRT-LLM. For throughput-oriented serving, both runtimes have added Nemotron Omni support. TensorRT-LLM gives you the best latency on Hopper-class hardware; vLLM is more portable and easier to operate.

The integration that matters most for agent builders is the tool-calling format. Omni uses the same JSON-mode tool calls as recent OpenAI models, so existing agent harnesses don't need rewriting. You point your agent at the Omni endpoint, expose your tools, and it negotiates them the same way you're used to.

Where the rough edges are

Three things will trip you up.

Cold start latency. Loading a 70B-parameter multimodal model into GPU memory is not instant. NIM containers warm-start in 60-120 seconds depending on your storage tier. For a chat agent this is fine; for a webhook handler with a 30-second timeout, it isn't. Pre-warm aggressively or keep a small fleet running at steady state.

Audio tokens add up fast. Audio input tokenizes at a much higher rate than text. A 10-minute call can easily blow past the context window of the smaller variants. If you're building a meeting-summarization agent, plan a chunking strategy from day one rather than retrofitting one after your first OOM.

Vision is uneven across domains. General photographs, screenshots, and document images work well. Schematics, diagrams with dense annotations, and anything resembling a scientific figure are noticeably weaker. If your agent's job is reading engineering drawings or medical imagery, run your own eval set before committing the architecture choice.

Don't assume Omni's vision quality matches GPT-4o or Claude on every task class. It's competitive on broad benchmarks but has gaps. The right move is to swap it into your existing evals — not trust a marketing chart — and decide modality by modality.

The honest assessment: Omni is most valuable when you're already on NVIDIA infrastructure and tired of stitching together three vendors for the perception layer. If you're entirely on AWS Bedrock or routing everything through the OpenAI API, the unification benefit shrinks — you've already accepted vendor lock-in in exchange for the convenience.

For agents that need to ground reasoning in mixed-modality input — robotics control, accessibility tooling, customer support that processes both voice and screenshots, observability bots that read dashboards — Omni shortens the stack meaningfully. For text-only agents, you're not the target user, and you'll pay the multimodal tax for capability you'll never call.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.