DEV Community: Michael Solati

How 129KB of Whitespace (and a Recursive Loop) Broke the Web

Michael Solati — Sat, 13 Dec 2025 04:00:00 +0000

It’s been about one week since the disclosure of React2Shell (CVE-2025-55182). The initial "drop everything" panic has mostly subsided, and hopefully, your PagerDuty alerts have stopped screaming. Now that the smoke has cleared, we can actually take a breath and look at the wreckage to understand what just happened to the React ecosystem.

For me, the reality of the situation really hit home when I got 8 emails from GCP (Google Cloud). It wasn't the usual billing alert warning (the other type of email that causes panic). It looked like this:

New Advisory Notification

Dear Google Cloud customer,

You've received an important Google Cloud notification affecting your resource...

Notification Title: Important Security Information Regarding React & Next.js Vulnerability (CVE-2025-55182)

When your cloud provider starts sending out a bunch of "Advisory Notification" emails naming a JavaScript framework, you know it’s not just a bug; it’s an event!.

This wasn't just a bad week for Next.js developers; it was a wake-up call for the entire industry. So, with the benefit of hindsight, and some unfortunate new developments regarding a "Second Wave" of vulnerabilities; let's dissect exactly how a CVSS 10.0 vulnerability slipped into the default config of the world's most popular React framework.

How We Got Here

To understand the exploit, you have to look at the architecture. For years, we pushed for a "seamless" integration between client and server. We wanted React Server Components (RSC) to fetch data directly on the backend and stream it to the frontend.

But here is the trade-off we don't talk about enough: Trust Boundaries.

In the old days (literally 12 months ago!), we mostly sent JSON back and forth. JSON is safe because it's dumb... It’s just data. But RSC needs to transport "execution context" (think Promises, Symbols, and Server Actions). JSON couldn't handle that, so React built the Flight protocol.

The Fatal Flaw in Flight

The vulnerability lies in how the react-server-dom-* packages handle the Flight protocol. By design, Flight allows the server to deserialize complex objects sent by the client.

If you've studied security history, the word "deserialize" should make you flinch. Java (Struts), PHP, and Python have all suffered catastrophic failures here. React2Shell proved that JavaScript is not immune.

The vulnerability allowed an unauthenticated attacker to send a specially crafted HTTP request, specifically manipulating Promise-like objects known as "thenables" to the server. React's internal logic would aggressively try to "resolve" this malicious object, allowing the attacker to hijack the execution flow and run arbitrary code.

The WAF Bypass (Why the Email Came Too Late)

One of the most annoying parts of this week was watching what we thought were our security defenses fail. We assumed our Web Application Firewalls (WAFs) would catch this. They didn't.

Attackers realized that most WAFs optimize for speed by only inspecting the first 8KB to 128KB of a request body.

So, the attackers used a stupid simple technique: padding.

They simply added ~129KB of "junk" data (whitespace, comments) to the beginning of their malicious payloads. The WAF would scan the junk, see nothing wrong, and pass the request to the Next.js server. The server, which does read the whole body, would then deserialize the payload and trigger the remote code execution.

The Second Wave: It Wasn't Just RCE

And just as soon as you think you can pat yourself on the back for patching the RCE vulnerability, the security researchers (and the React team) found that the rabbit hole went deeper.

On December 11, we learned that the parser wasn't just vulnerable to code execution; it was fragile to structural abuse. This led to two new CVEs that you need to know about right now.

1. The Infinite Loop (CVE-2025-55184 & CVE-2025-67779)

The Flight protocol deserializer is recursive by nature, it has to be to resolve references within references. It turns out, if you send a payload where a chunk references itself in a specific loop, the Node.js process enters a synchronous infinite loop.

Because Node.js is single-threaded, this is catastrophic. Your CPU spikes to 100%, and the server becomes instantly unresponsive to all users.

In the serverless world (Vercel, AWS Lambda), this leads to what we call "Denial of Wallet". An attacker can force your functions to run until they time out, spinning up thousands of maxed-out instances and racking up a massive bill for compute time you didn't actually use.

2. The Spy in the Reflection (CVE-2025-55183)

This one is a bit spookier. It allows an attacker to trick the server into revealing its own source code.

If your Server Actions use toString() on arguments (or implicitly convert them), an attacker can pass a crafted reference object that serializes the internal state of the closure back to the client.

If you follow best practices and use environment variables (process.env.DB_PASS), you're mostly okay; the attacker sees the variable name, not the value. But if you hardcoded API keys or secrets directly into your code? Those are now public knowledge.

The "Patch of the Patch"

Here is the frustrating part that caught a lot of us off guard.

When the DoS vulnerability was first found, React released version 19.0.2. We all updated. We thought we were safe.

But researchers found a way to bypass that fix by adding a layer of indirection to the circular reference. This forced a second patch cycle. If you updated to fix the RCE but stopped there, you are still vulnerable to the DoS and Source Code Exposure flaws.

Where We Go From Here

If you haven't patched in the last 24 hours, you are likely living on borrowed time. There is no configuration change that fully mitigates this vulnerability; upgrading dependencies is mandatory, and you need to be on the final safe versions.

The Upgrade List:

Next.js 15.x: Update to 15.0.7+ (Do not stop at 15.0.6)
Next.js 14: Update to 14.2.35+
React: Update to 19.0.3+ (for 19.0.x branch) or 19.1.4+

Crucial Step: You must rebuild (next build) and redeploy your application. The vulnerable code is bundled into your server artifacts; a simple restart won't save you.

The Hindsight Perspective

React2Shell and its "offspring" vulnerabilities are going to change the conversation around "Full Stack" frameworks. We traded strict separation of concerns for developer convenience, and we got burned.

Does this mean RSC is dead? No. But the days of assuming the server-side code in your Next.js app is "safe by default" are over. We need to treat our frontend-backend hybrids with the same security rigor we apply to backend-only services.

Time to patch up (again) and get back to building. Happy Friday!

I Built an AI-Powered TTRPG Adventure Generator (Because Generic Hallucinations Are Boring)

Michael Solati — Wed, 03 Dec 2025 16:00:00 +0000

I grew up reading the gripping and petrifying narratives of R.L. Stine and spending way too much time playing story-driven video games. Now that I'm older, I've gotten into the TTRPG space because it hits a certain je ne sais quoi that tickles my lizard brain.

I've also found that I'm not as good at coming up with ideas for adventures as I used to be. For anyone who has ever sat behind you laptop screen and keyboard, staring into the blinking cursor, you know the struggle: you have a cool concept, like "a Cyberpunk heist in a floating city," but when you try to flesh it out, you hit a wall.

Naturally, we now can turn to AI for help. But here's the problem: standard LLMs are great at hallucinating generic tropes. You ask for a "scary forest," and you get the same old "twisted trees and whispering winds." It lacks soul. It lacks... planning.

I wanted a tool that didn't just make things up but actually researched real-world lore, wikis, what other people have done, and forums to generate grounded, creative adventures. So, in true developer fashion, I stopped prepping a campaign and spent the weekend building a tool to do it for me.

Meet Adventure Weaver, an application built with Exa that helps TTRPG Game Masters, and writers in general, overcome their writer's block by turning the entire internet into a procedurally generated library of inspiration.

The "Research-then-Generate" Workflow

Unlike a standard chatbot that just spits out text, we are going to build something sophisticated. Our app follows a "Research-then-Generate" workflow:

User Prompt: You describe the vibe (e.g., "A city built on the back of a dying god").
The Agent: We dispatch an AI agent via Exa. It doesn't just search for keywords; it understands concepts.
Streaming Updates: We stream the agent's actions ("Crawling wiki...", "Reading blog...") to the user in realtime, because loading spinners are boring.
Inspiration Graph: We visualize the web of inspiration using D3.js, so you can see exactly where that creepy villain idea came from.

Here is how I built it, and how you can get it running on your machine right now.

The Stack

We are keeping it modern and fast:

Next.js: The framework for production.
Exa: The search engine made for AIs that powers our research.
Tailwind CSS: Because I don't want to spend 3 hours centering a div.
D3.js: For that "exploration board" visualization.

Step 1: Defining the Adventure Schema

To ensure the AI gives us usable data (and not just a wall of text), we need to define a strict JSON schema. This acts as a contract, telling the AI exactly what fields we need: titles, plot hooks, NPCs, and locations.

// src/app/api/generate/route.ts
const adventureSchema = {
  type: 'object',
  required: ['adventure_title', 'summary', 'plot_hooks', 'npcs', 'locations'],
  properties: {
    adventure_title: {type: 'string'},
    summary: {type: 'string'},
    plot_hooks: {type: 'array', items: {type: 'string'}},
    npcs: {
      type: 'array',
      items: {
        type: 'object',
        properties: {name: {type: 'string'}, description: {type: 'string'}},
        required: ['name', 'description'],
      },
    },
    locations: {
      type: 'array',
      items: {
        type: 'object',
        properties: {name: {type: 'string'}, description: {type: 'string'}},
        required: ['name', 'description'],
      },
    },
  },
};

Step 2: The "Secret Sauce" (Exa)

This is where the magic happens. We aren't just matching strings; we are doing Neural Search.

If you search for "realistic dragon biology" on a normal engine, you might get a movie listicle. Exa's neural model understands you are looking for speculative biology and can find niche blog posts or StackExchange threads that discuss the actual physics of fire-breathing.

We create an API route to kick off the research. Note that we aren't awaiting the result here because research takes time! We start it and return a taskId immediately.

// src/app/api/generate/route.ts
import Exa from 'exa-js';
import {NextRequest, NextResponse} from 'next/server';

const exa = new Exa(process.env.EXA_API_KEY);

// ... adventureSchema definition ...

export async function POST(req: NextRequest) {
  const {prompt} = await req.json();

  const instructions = `You are a creative assistant for a TTRPG Game Master. 
  Use the user's prompt to find ideas from blogs, forums, and wikis to generate a compelling adventure.
  Please generate a title, a summary, a few plot hooks, some interesting NPCs, and some key locations for the adventure.
  Each one of the story components should be put into their respective schema.
  Something like the summary should not have the title, plot hooks, NPCs, etc... Those should be in their own schemas.
  For context, here is the user's prompt: ${prompt}`;

  // Create the research task, but don't wait for it to complete.
  const researchTask = await exa.research.create({
    instructions,
    outputSchema: adventureSchema,
  });

  // Immediately return the task ID to the client.
  return NextResponse.json({taskId: researchTask.researchId});
}

Step 3: Streaming the "Vibes"

Waiting 30 seconds for a response feels like an eternity. To fix this, we use Server-Sent Events (SSE) to stream the agent's progress.

Fun fact! Did you know eBay uses Server-Sent events to countdown those final seconds before a sale ends?

This endpoint listens to the task stream. When Exa says "I'm searching for medieval castles," we push that message to the frontend instantly.

// src/app/api/adventure/[taskId]/route.ts
import Exa from 'exa-js';
import {NextRequest, NextResponse} from 'next/server';

import {CitationProcessor} from '@/lib/citation-processor';

const exa = new Exa(process.env.EXA_API_KEY);

export async function GET(_req: NextRequest, context: any) {
  const taskId = (await context.params).taskId;

  // We'll stream the progress from Exa.
  const stream = new ReadableStream({
    async start(controller) {
      const encoder = new TextEncoder();
      const citationProcessor = new CitationProcessor();
      const taskStream = await exa.research.get(taskId, {stream: true});

      for await (const event of taskStream) {
        citationProcessor.processEvent(event);

        if (
          event.eventType === 'task-operation' ||
          event.eventType === 'plan-operation'
        ) {
          const op = event.data;
          let message: string;
          switch (op.type) {
            case 'search':
              message = `Searching: "${op.query}"`;
              break;
            case 'crawl':
              message = `Crawling: ${new URL(op.result.url).hostname}`;
              break;
            case 'think':
              message = op.content;
              break;
            default:
              message = 'Starting an unknown journey...';
          }
          controller.enqueue(
            encoder.encode(
              `data: ${JSON.stringify({type: 'message', content: message})}\n\n`,
            ),
          );
        } else if (
          event.eventType === 'research-output' &&
          event.output.outputType === 'completed'
        ) {
          const finalResult = event.output;
          const deduplicatedCitations = citationProcessor.getCitations();
          const resultData = {
            ...finalResult.parsed,
            citations: deduplicatedCitations,
          };

          controller.enqueue(
            encoder.encode(
              `data: ${JSON.stringify({type: 'result', content: resultData})}\n\n`,
            ),
          );
          break;
        }
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      Connection: 'keep-alive',
    },
  });
}

Citation Mapping

One of the coolest things about this setup is citation mapping. We can actually track which specific sub-task (like "Search for NPC names") produced which URL results. This lets us tag the generated NPCs with the actual folklore blog post that inspired them.

// src/lib/citation-processor.ts
export class CitationProcessor {
  private taskIdToSection = new Map<string, string>();
  private citationsBySection: Record<string, Citation[]> = {};

  processEvent(event: any) {
    // 1. Identify which section (NPCs, Plot Hooks, etc.) the agent is working on
    if (event.eventType === 'task-definition') {
      const instructions = event.instructions.toLowerCase();

      if (instructions.includes('npc')) {
         this.taskIdToSection.set(event.taskId, 'npcs');
      }
      // ... map other sections (locations, plot hooks, etc.)
    } 
    // 2. Capture search results and assign them to that section
    else if (event.data.type === 'search' && event.data.results) {
      const section = this.taskIdToSection.get(event.taskId);

      if (section) {
        // Add the found URLs to the specific section's citation list
        this.citationsBySection[section].push(...event.data.results);
      }
    }
  }
}

🚀 Quickstart: Run It Locally

Enough theory... Let's get this running on your machine so you can start generating campaigns for your next session!

Prerequisites

You'll need Node.js (v20 or higher) and npm. You will also need an Exa API key (for the research) and an OpenAI API key (or compatible provider) for the generation.

1. Clone the Repository

Open up your terminal and grab the code:

git clone https://github.com/MichaelSolati/adventure-weaver.git
cd adventure-weaver

2. Install Dependencies

Let npm do its thing:

npm install

3. Set Up Environment Variables

Create a .env.local file in the root of your project. This keeps your secrets safe. Add your keys here:

EXA_API_KEY=your_exa_api_key_here
LLM_API_KEY=your_openai_api_key_here
LLM_MODEL=gpt-4o  # or your preferred model

4. Run the Development Server

Fire it up!

npm run dev

Navigate to http://localhost:3000, enter a prompt like "A cyberpunk scorched earth assault to liberate the digital ghost of your former lover from a megacorp's tower," (IYKYK) and watch the magic happen!

Wrapping Up

At the end of the day, the tools we use are just a means to an end. The real magic here is about shifting how we interact with AI. By moving from simple prompting to a "Research-then-Generate" workflow with Exa, we stop the AI from hallucinating generic tropes and start grounding it in actual creativity. It respects the nuance of capturing a specific "vibe," rather than just matching keywords.

The result is richer, grounded content that feels less like a robot wrote it and more like a curated creative work.

If you want to try weaving your own adventures, or if you're thinking, "Show me the code!", you can find the full source code on GitHub.

Visualizing the Event Loop: A Guide to Microtasks, Macros, and Timers

Michael Solati — Tue, 25 Nov 2025 16:00:00 +0000

I LOVE digging into the "weird" parts of JavaScript. When prepping for technical interviews, or just trying to debug why a UI update isn't rendering when I expect it to, I believe it's critical to understand not just what the language does, but how it schedules it.

An interesting scenario I often throw at folks (and have been thrown at me) is the classic "predict the output" game. It seems simple on the surface, but it quickly reveals if you have a solid mental model of the JavaScript execution model.

The Challenge

Imagine looking at this snippet. What order do the numbers print in?

console.log('1. Start');

setTimeout(() => {
  console.log('2. Timeout');
}, 0);

Promise.resolve().then(() => {
  console.log('3. Promise');
});

console.log('4. End');

When faced with this as a junior developer, I used to think: "Okay, code runs top to bottom. But setTimeout is asynchronous, so it waits. Promise is also async. So maybe 'Start', 'End', then... whichever one is faster?"

If you guessed: Start -> End -> Timeout -> Promise, you'd be following logical intuition, but you'd be wrong.

The actual output is:

1. Start
4. End
3. Promise
2. Timeout

Wait, why? setTimeout has a delay of 0, so shouldn't it run immediately after the main code finishes?

After the anxiety settles, remember something your friend (me) said: "It's important to understand the capabilities and data structures in any given language." In this case, we need to talk about the Event Loop, and specifically, the difference between Macrotasks and Microtasks.

The Model: Visualizing the Traffic

To understand why the Promise beats the Timeout, we have to look at the architecture. JavaScript utilizes a single main thread of execution coupled with a sophisticated mechanism known as the Event Loop. It can only do one thing at a time.

Here is the flow you need to visualize:

The Call Stack: This is where your code runs. "Start" and "End" happen here immediately.
The Web APIs: When the browser sees setTimeout, it hands that timer off to the Web APIs (or libuv in Node.js). Even with a delay of 0, it doesn't go back to the stack; it goes to a Queue.
The Queues: This is where the magic (and confusion) happens. There isn't just one queue.
- The Macrotask Queue (Task Queue): This holds things like setTimeout, setInterval, and I/O operations.
- The Microtask Queue: This holds Promise callbacks (.then, .catch, .finally), queueMicrotask, and MutationObserver.

The Breakdown: The VIP Lane

Here is the golden rule that solves the puzzle: The Event Loop performs a Microtask Checkpoint immediately after the Call Stack empties.

Microtasks are like VIPs at a club. They get to cut the line. But more importantly, the Event Loop processes the Microtask Queue exhaustively. If a microtask schedules another microtask, it gets added to the same queue and processed in the same cycle. The runtime will not move on to the next Macrotask (or even update the UI!) until the VIP section is completely empty.

Let's trace our code again with this model:

console.log('1. Start'): Pushed to Call Stack. Executed. Popped.
- Output: Start
setTimeout(..., 0): Pushed to Stack. The engine sees it's a timer, hands it to Web APIs. The Web API sees 0ms delay, so it queues the callback into the Macrotask Queue.
Promise.resolve().then(...): Pushed to Stack. The engine sees a Promise resolution. It queues the .then() callback into the Microtask Queue.
console.log('4. End'): Pushed to Stack. Executed. Popped.
- Output: End
- Now, the global code is done. The Call Stack is empty. The Event Loop wakes up and asks: "Is there anything in the Microtask Queue?" Yes, there is! The Promise callback.
The Event Loop moves the Promise callback to the Call Stack. Executed.
- Output: Promise
- Now the stack is empty again. The Event Loop asks: "Any more Microtasks?" No. "Okay, let's move on."
Rendering (Browser Only): At this point, the browser decides if it needs to update the rendering Layout/Paint. This happens after microtasks but before the next Macrotask.
The Event Loop moves the Timeout callback to the Call Stack. Executed.
- Output: Timeout

The Solution: One Loop to Rule Them All

So, next time you're faced with a snippet like this—whether in an interview or a tricky debugging session—you don't need to rely on intuition. You just need to trust the hierarchy.

const eventLoopCheck = () => {
  console.log('Script Start');   // 1. Synchronous

  setTimeout(() => {
    console.log('setTimeout');   // 3. Macrotask (Low Priority)
  }, 0);

  Promise.resolve().then(() => {
    console.log('Promise');      // 2. Microtask (High Priority)
  });
};

The Cheat Sheet

Synchronous Code: Runs first (Call Stack).
Microtasks (Promises): Run immediately after the stack clears, before rendering or new tasks.
Rendering: (Browser only) Happens after Microtasks but before the next Macrotask.
Macrotasks (Timers): Run only when the stack AND the Microtask queue are empty.

A Note for Node.js Developers

If you are running this in Node.js, there is a "Super VIP" lane called process.nextTick. This runs even before Promises!.

In Node.js, process.nextTick is technically not part of the Event Loop phases; it is processed immediately after the current operation completes. This means nextTick can actually starve your I/O if you aren't careful!

I love these visual mental models! Hopefully, you found this helpful. If you have any interesting or clever ways you visualize the Event Loop, I'd love to hear them! Or if you have a trickier code snippet that stumps people, I'd love to see that too.

LeetCode vs. Vibe Coding: The Reality of Interviewing in 2025

Michael Solati — Thu, 20 Nov 2025 16:00:00 +0000

I LOVE a good technical challenge. There is something satisfying about solving a complex problem, optimizing a data structure, or shaving off a few milliseconds of latency. But this past summer, I found myself back in the job market, and let me tell you… Things have changed.

The industry has split into two distinct realities.

On the one hand, I interviewed with established enterprise giants whose processes were like bread and butter to me. I was solving algorithmic puzzles and showcasing my understanding of the DOM and general web development skills. On the other hand, I interviewed with startups where the interviewer essentially handed me the keys to GitHub Copilot and said, "Build this feature. You have 45 minutes. Go."

It made me feel a bit like an "old man yelling at the sky," but it also raised some serious questions about where our industry is headed. Are we testing for engineering skills, or for subscription tiers?

The Great Bifurcation

If you've been interviewing lately, you've probably felt this whiplash. The data backs it up; we are seeing a dramatic bifurcation in how companies hire.

The Enterprise Players

The big players are committed to the LeetCode style interview. Why? Because they are terrified of AI impostors.

When I sat for these interviews, it was all about "Proof of Work". They wanted to know that I possessed the raw cognitive bandwidth to manipulate logic without a robot whispering the answer in my ear. And honestly? I get it. With tools like ChatGPT, it's easier than ever to fake competence. In fact, 81% of interviewers at Big Tech companies suspect candidates of using AI to cheat during remote sessions.

So, the LeetCode grind isn't going anywhere. It's the safety against the "AI-powered poser."

The Startups

Then there are the startups. These folks have embraced our new machine overlords. They aren't looking for someone who can write a linked list from memory; they want "AI Editors" and "Sense-Makers".

In these interviews, the constraints weren't syntax or memory; they were my ability to prompt, debug, and integrate. It was eventually fun, but also frantic. I had to get over the mindset of “showing" or “explaining" my work. I wasn't there to show I know the native functions and behaviors of the DOM. Instead, there was this massive expectation for speed. Because AI tools are supposed to make us 10x developers, the interview pacing was often set to "warp speed," expecting me to blaze through boilerplate code.

The Hidden Barrier to Entry

Here is the part that really worries me, and it's something we aren't talking about enough: THE COST.

In the traditional interview, all you needed was a brain and a whiteboard (or a laptop). Today, for these startup interviews, you are often expected to bring your own AI tools.

The startups aren't always providing the enterprise seats for these AI tools during the interview process. So, are you on ChatGPT's free tier? Or are you shelling out $20/month for Plus? Do you have a personal GitHub Copilot subscription?

It introduces a subtle but real economic barrier. Is your success in an interview dependent on whether you can afford Claude Code Max 20x ($200/month) vs. Cursor Pro ($20/month)? If my model hallucinates because I'm on a lower tier, and I didn't catch it, did I fail the interview?

Startups are essentially asking candidates to pay for the privilege of being efficient enough to get hired.

The "Home Court Advantage" Problem

Some companies, like Meta, are addressing this inequality by introducing standardized interviews that use the same AI tool for everyone. On paper, this sounds fair. It levels the playing field and removes the cost barrier.

But does it?

We all have our "fine-tuned" setups. You may have a CLAUDE.md or AGENTS.md file. We are "pros" with our tools. Throwing a developer into a standardized AI environment, like Meta's CoderPad setup, is like handing a race car driver a rental sedan and asking them to set a lap record. You might know how to drive, but you don't know this car.

From my own experience working at Meta, I found that even their internal tool, Metamate, struggled to handle complex tasks on the actual Facebook codebase. It often felt like I was correcting everything it output rather than it actually making life better. Anyway, if you've never used their specific flavor of AI before, are you ready to use it like a pro in a high-stakes 60-minute interview? Probably not.

Try Before You Buy

Finally, this brings me to the Paid Work Trial.

Startups are increasingly asking candidates to join the team for a few days to work on actual production tickets. In theory, there is some merit to this. It gives you a really good idea of how well you'll click with a team, and it mitigates the risk of a "false positive" hire.

But let's look at the logistics. What if you already have a job?

Can you really take 3 to 5 days off to work somewhere else? That's a massive time commitment that privileges people who are currently unemployed or have incredibly flexible schedules. Hell, does your current employer allow you to moonlight at another job? Most employment contracts have clauses that prevent us from working for another entity, even for a short "trial." The Paid Work Trial requires us to breach our current contract to potentially secure a new one.

tl;dr Everything is Awful

I wish I had a magic answer, but the reality is that for the foreseeable future, we're going to have to be bilingual.

Keep your fundamentals sharp: The "old way" isn't dead. We still need to prove we understand the code well enough to audit the AI's output.
Master (Generic) Prompts: Don't just rely on your custom configs. Get good at "Prompt Engineering" in a vanilla environment, because you might not get to bring your own configs to the interview.
Protect your time: If a company asks for a multiday trial, ensure it is paid at market rates. Don't work for free.

It's a weird time to be interviewing, caught between the LeetCode grinders and the vibe-coding speedrunners. But hey, at least it keeps things interesting, right?

I'd love to hear your horror stories (or success stories!) from the 2025 interview circuit. Are you seeing more work trials? More AI? Let me know!

I'm Getting Serious Déjà Vu... But This Time It's Different

Michael Solati — Tue, 18 Nov 2025 16:00:00 +0000

I've been getting the strangest sense of déjà vu lately.

I'm based in the Bay Area, and as I look at the tech landscape today, I'm transported back to somewhere around 2014 or 2015. I was just starting my own journey as a developer, and the coding bootcamps were exploding. It wasn't just a feeling: the market grew by 138% in 2015 alone, with graduates jumping from just over 6,700 to more than 16,000 in a single year. I would know, I was one of them! (Shoutout to the A100 team.) Every tech meetup, every "hack night," you'd meet a fresh-faced new developer, portfolio in hand, bright-eyed and ready to build.

The hiring pool was flooded, and we all had to claw our way into this industry, reading every blog post, breaking (and fixing) side projects, and slowly building careers. Our path, which felt so personal, was actually just a packaged and sold 12-week "sprint."

I remember feeling this perplexing mix of curiosity and concern. Was my job, my hard-won knowledge, or just a commodity?

Fast forward to today. I've had the opportunity to build a successful career for myself and feel comfortable and knowledgeable in my domain, and yet I'm getting that same feeling. But this time, the "bootcamp" isn't a school. It's GitHub Copilot. It's ChatGPT. It's the entire era of AI-vibe-coding.

I see demos on Twitter (not calling it X) and LinkedIn of people building entire, complex applications with a few well-worded prompts. And that old, familiar question is back: Are our skills about to be commoditized... again?

My initial instinct was to say, "It's the same story, just a new tool." But the more I looked at the data, the more I realized my déjà vu was misleading. The 2010s "bootcamp boom" and the 2020s "AI wave" are fundamentally different beasts.

Whoa. Déjà vu.

The Old "Filter": Horizontal Saturation

The bootcamp boom was, at its heart, a promise of access. It promised that anyone with the drive (and the tuition) could become a developer. It was a "gold rush" for skills, and it minted many developers who were fantastic at the "how." They knew how to build a to-do list, a simple blog, or a full-stack project with the MERN/MEAN stack.

But "how" is only half the story.

The challenge back then was one of horizontal saturation. The market saw a massive influx of new, similarly-skilled junior talent, all competing for a rapidly expanding pool of junior-level roles.

After that initial wave, the market calibrated. A "Great Filter" emerged. It turned out that getting a job and keeping a job were two very different things.

The filter wasn't "Can you code?"

It became:

Can you read, understand, and debug someone else's code?
Can you sit in a planning meeting and translate a vague business need into a concrete technical spec?
Can you explain why you chose PostgreSQL over MongoDB for this specific feature, and can you defend that trade-off?
Can you write a meaningful test?

The filter separated coding from engineering. Coding is the act of writing code. Engineering is the discipline of designing, building, maintaining, and owning systems in the real world.

The market wasn't saturated with engineers. It was saturated with people who could code. The "cream" that rose to the top were the folks who understood that the 12-week program was just the starting line, not the finish.

The New "Filter": Vertical Compression & Pipeline Elimination

This is where the parallel breaks down. The 2020s market isn't just saturated; it's being squeezed from two directions by forces we didn't have in 2015.

1. The Squeeze from the Top: "Vertical Compression"

Today's market is defined by vertical compression. The 2022-2024 period didn't just see a hiring slowdown; it saw over 660,000 tech workers lose their jobs. Crucially, these layoffs weren't confined to junior staff. They included "top-paid and most experienced developers in the industry" from elite companies like Google, Meta, and Amazon.

660k+ developers lost their jobs from 2022 to 2024

This creates a "domino effect" that is crushing the job market from the top down:

Laid-off senior architects compete for senior developer roles.
Displaced senior developers compete for mid-level roles.
Displaced mid-level developers compete for junior roles.

This leaves entry-level and new "AI wave" graduates in an almost impossible position. In 2015, I was competing with other bootcamp grads on a relatively even playing field. In 2025, new grads are competing with laid-off FAANG (MANGA?) engineers with years of experience.

2. The Squeeze from the Bottom: "Pipeline Elimination"

At the exact same time, AI is automating the very tasks that new developers used to cut their teeth on.

The real, immediate impact of AI is not replacing senior engineers. In fact, one study found that, for complex, real-world tasks, current AI tools slowed experienced developers, taking them 19% longer to complete their work.

The actual danger is at the very bottom of the pipeline. A 2024 survey found 70% of hiring managers believe AI can do the jobs of interns, and 37% of employers stated they would rather "hire" AI than a recent graduate.

So, new developers are squeezed from the top by "vertical compression" and from the bottom by AI-driven "task automation," all while fighting for a job pool that continues to shrink.

The Real Story: The "Great Capital Reallocation"

It's easy to look at this and say, "See! AI is taking the jobs!" But the data shows that's, at best, a "smokescreen".

Classic economic forces drove the vast majority of layoffs: correcting "post-pandemic overhiring" and reacting to a harsh macro-economic shift (like rising interest rates) that made capital expensive. In fact, only 3-4% of job cuts through September 2025 were explicitly tied to AI.

So why are CEOs at companies like Amazon and Google publicly blaming AI for "reorganizations" and "efficiency"?

It's a strategic narrative for Wall Street. "AI Washing" allows a company to frame a "weak" narrative ("We made a mistake and overhired") as a "strong" one ("We are a lean, AI-first company pivoting to the future").

This narrative obscures the real, long-term shift: The Great Capital Reallocation.

This is the "great tech paradox": CEOs hand out pink slips with one hand while signing billion-dollar AI investments with the other. For the first time, companies are strategically reallocating billions of dollars, shifting spending from Operational Expenditures, like our salaries, to Capital Expenditures, like massive, expensive GPU clusters.

The layoffs are not just savings; they are the source of funding for this new, capital-intensive arms race.

Amazon: Announced 14,000 corporate layoffs in October 2025, citing AI-driven "efficiency". At the exact same time, they announced a $100 billion capital expenditure plan for 2025 to expand AI and cloud data centers.
Meta: Cut over 21,000 jobs under its "Year of Efficiency". Simultaneously, it raised its 2025 capital expenditure guidance to $70-72 billion to acquire a staggering 1.3 million+ GPUs.

This is the new "Great Filter." The challenge is not just "Can you code?" It's "Can you justify your $150k salary to a company that is 'strategically reallocating' its budget to a $100 billion hardware plan?"

These 65,000 GPUs, priced from $25k-$40k, represent a staggering $1.6 to $2.6 BILLION investment by these tech giants. (chart source)

Don't Fear the Tool, Master the Craft (Now More Than Ever)

This isn't a post to scare anyone. And it's definitely not meant to be elitist or to gatekeep. On the contrary, this is a pragmatic call to action.

The "chaff," then and now, are those who look for a shortcut and mistake it for the entire journey. The "cream" will be the same as it ever was: the curious, the pragmatic, the problem-solvers.

"History Doesn't Repeat Itself, but It Often Rhymes"

~ Mark Twain

But the "filter" is different, and it's harsher.

The new filter won't be, "Can you write a function to sort an array?" (AI will do that).

The new filter will be:

Can you validate that the 1,000+ lines of code the AI just generated are secure, performant, and (most importantly) correct?
Can you architect a system of prompts, validations, and tests to get the output you need, every time, reliably?
Can you debug a subtle logic error when the AI 'hallucinates' a solution that looks right but is fundamentally wrong?
Can you take full, systems-level ownership when the AI-generated part fails at 3 AM?

The new "cream" will be the AI-Assisted Engineer, not the Prompt-Jockey. The "chaff" will be those who trust the black box, copy and paste the output, and call it a day.

This new wave won't make good engineers obsolete; it will make them more potent than ever. But the path into the industry is undeniably harder. New developers have to compete on three fronts: against other grads, against laid-off veterans, and against AI automating the entry-level pipeline.

So I'm not perplexed anymore. I'm focused. The answer is the same as it was in 2015, only more so: Don't just learn the "how." Master the "why." Understand the craft. Be the one who can build the whole system, not just one function.

I'm curious, what do you all think? What skills are you focusing on to make sure you're on the "engineering" side of this new, very different line? Let me know!

The Production Readiness Checklist

Michael Solati — Thu, 13 Nov 2025 17:30:00 +0000

I was tracking a package on the official USPS website on my phone this morning. Like many modern sites, it's a Progressive Web App (PWA). A small prompt appeared at the top of the screen asking if I wanted to install the app. I didn't tap "Install," but I did see something that caused a double-take.

The confirmation prompt didn't say "Install USPS Tracking."

It said: "Install Create React App Sample."

When my government tells me to "Install Create React App Sample" I "Install Create React App Sample"

I had to laugh. The official United States Postal Service website, a critical piece of infrastructure used by millions, was prompting me to install a default placeholder.

My first thought wasn't, "Oh, some developer is in trouble." My first thought was, "That's a process failure."

This isn't about blaming a single person who forgot to update a manifest.json file. This is a sign of a gap in the deployment process. It's a classic case of focusing so hard on the core features (the tracking logic) that we forget the "last 5%," the polish.

But here's the secret: to a user, that "last 5%" is the product. It's the first thing they see and the main thing that builds or breaks their trust. How can you trust a site with your package details if it still has the default boilerplate text?

This is why every team needs a "Production Readiness Checklist."

The Production Readiness Checklist

This isn't some 100-page document nobody reads. It's a set of automated guards and manual sign-offs that protect your team's professionalism. Here's what I recommend for every project.

1. The "Chrome" is Correct

I'm going to call this stuff the "chrome," it's the frame around your application. It's everything but the features, and it's the first thing a user sees.

manifest.json: Are the short_name and name correct? Are the icons pointing to the right, non-default assets? (This is what the USPS missed.)
index.html: Is the <title> tag correct? Is the <meta name="description"> meaningful?
Favicons: Do you have them? Are they loading? Don't be the broken image icon or the default React/Angular/Vue logo in the browser tab.
Error Pages: Are your 404 Not Found and 500 Server Error pages branded? Or do they just show a default Nginx or server error that scares your users?

2. Automate Your Professionalism (The CI Gate)

This is the most important part. Humans forget. Computers don't.

Your CI/CD pipeline (e.g., GitHub Actions, GitLab CI, Jenkins) shouldn't just run unit tests; it should serve as your quality gatekeeper. This is something I've championed on my teams. We add various tests to our CI/CD pipeline that run before any deployment.

For this situation, you can do this with a simple grep command. Here's a simple "fail-if-found" shell script you can tinker with and add to your pipeline today:

#!/bin/bash

# Define placeholder strings to check for
# Add your own framework's defaults!
PLACEHOLDERS=(
  "Create React App Sample"
  "React App"
  "lorem ipsum"
  "TODO"
  "FIXME"
  "powered by"
)

# Files to check in your build output
FILES_TO_CHECK="dist/index.html dist/manifest.json"

echo "Checking for placeholder text in production files..."

for file in $FILES_TO_CHECK; do
  if [ ! -f "$file" ]; then
    echo "ℹ️ Skipping check for non-existent file: $file"
    continue
  fi

  for placeholder in "${PLACEHOLDERS[@]}"; do
    # Use grep to search for the placeholder
    # -i: case-insensitive
    # -q: quiet mode, just returns exit status
    if grep -i -q "$placeholder" "$file"; then
      echo "❌ ERROR: Found placeholder text '$placeholder' in '$file'."
      echo "Deployment aborted. Please remove all placeholder text."
      exit 1
    fi
  done
done

echo "✅ No placeholder text found. Good to go!"
exit 0

This tiny script, added to your deployment process, just saved you from a world of embarrassment. You're now treating your configuration as code, which is a key DevOps principle.

Don't be the dev with concepts of a plan, ADD TESTS!

3. Review the "Scaffolding"

When we review pull requests, we're often trained to look at the *.ts or *.js files, the logic. We skim right past the 'scaffolding' files like index.html, and the various JSON or YAML files.

This needs to change. It's a mistake I still make, but these config files are part of the PR and deserve the same level of rigor. Ask in your PR reviews: "Did we update the title? Did we check the manifest?"

The "Last 5%" is 100% of Your Reputation

Back when I was a Developer Advocate (🥑), I worked a lot with PWAs and building sample apps. I would use tools like Firebase Hosting to get them up and to the public quickly and easily. But that simplicity can be a trap! It's so easy to run firebase deploy that you can forget to check the manifest.json that your framework scaffolding (like the Angular CLI) generated for you.

Using your CI pipeline (say, GitHub Actions) to deploy to your hosting provider and adding a linting step is the perfect way to combine ease of use with professional-grade quality.

Look, everyone ships bugs. Even massive organizations like the USPS. It's not about being perfect; it's about building robust systems that catch our simple, human mistakes.

These little details (the "chrome"), like your PWA name, the favicon, aren't "nice-to-haves." They are your digital storefront. They're the first impression you make, and they're critical for building the user trust that your application depends on.

So, my challenge to you is: go look at your app's manifest.json or index.html in your main branch. Right now. You might be surprised at what you find.

What's the most embarrassing placeholder you've ever seen slip into production?

Fighting Framework Jank (What's Not in the Docs)

Michael Solati — Wed, 12 Nov 2025 05:00:00 +0000

I’ve been there. We’ve all been there. You've just shipped a new dashboard. It’s got charts, it’s got tables, it’s got pizazz ✨. And on your fancy, company issued, 16" MacBook Pro, it flies. Buttery smooth But then the first bug report comes in:

Dashboard is laggy.

Or maybe you see a "it feels slow," or my personal favorite, "the page is janky."

You open it on a different machine, like your cell phone, and your heart sinks... Those smooth animations are stuttering. The clicks feel... off. And then creeps in that moment of dread, "Is React (or Vue, or Angular) just... slow?"

After going through an existential crisis (doubting my years as a software developer and realizing that my imposter syndrome is very justified) I then decided to blame the framework or some library I was using. But after profiling the very janky dashboard I realized the problem wasn't the framework at all.

The problem was me. I was so focused on the "framework way" of doing things that I was ignoring the most powerful performance tool I had: the browser itself.

The "Framework-Pure" Problem

Let's look at a simplified version of my janky component. It had two main jobs:

Render a massive, complex, but totally static SVG icon.
Fire off an analytics event as soon as it rendered to track that it was visible.

The "pure React" way to write this looked something like this:

import React, { useEffect } from 'react';

// Imagine this component returns a <svg> with hundreds of <path> elements
import { MyHugeStaticChartIcon } from './MyHugeStaticChartIcon'; 
import { sendAnalyticsEvent } from './analytics';

function JankyWidget() {
  useEffect(() => {
    // Fire this off as soon as we mount
    sendAnalyticsEvent('widget_visible', { detail: '...' });
  }, []);

  return (
    <div className="widget">
      <h3>My Janky Widget</h3>
      <MyHugeStaticChartIcon />
    </div>
  );
}

This code looks right, but it's a performance nightmare. Here's why:

Hydration Cost: React has to create a Virtual DOM node for every single one of those hundreds of <path> elements inside the SVG. That’s a ton of JavaScript objects to create and memory to allocate for something that will never change.
Main Thread Blockage: The useEffect fires right after mount. That sendAnalyticsEvent function, even if it's quick, is still work that's happening on the main thread. It's competing for resources with the browser, which is still trying to paint the screen and respond to the user's scroll.

This combination is what creates the "jank." The main thread is just too busy.

You can play with the janky version above!

The "One Weird Trick": Offload It to the Browser

After hours of profiling, the "Aha!" moment hit me. The fix isn't a new library. It's to stop asking the framework to do things the browser is already amazing at.

This "trick" has two parts:

Offload parsing with the <template> tag.
Offload execution with requestIdleCallback.

Part 1: The `<template>` Tag for Heavy Lifting

First, that massive SVG. It's static. So why are we making JavaScript build it?

The <template> tag is a native HTML element that is completely inert. The browser parses its content, but it doesn't render it, run scripts in it, or download images. It's just a chunk of DOM waiting to be used.

Step 1: Put your static HTML into your index.html.

<template id="my-chart-icon-template">
  <svg width="100" height="100" viewBox="0 0 100 100">
    <g>
      <path d="...a-very-complex-path..." />
      <path d="...another-complex-path..." />
      </g>
  </svg>
</template>

Step 2: Tweak your component to clone this content.

import React, { useRef, useEffect } from 'react';
// ...

function FastWidget() {
  const chartContainerRef = useRef(null);
  useEffect(() => {
    // 1. Find the template
    const template = document.getElementById('my-chart-icon-template');
    // 2. Clone its content (this is super fast)
    const content = template.content.cloneNode(true);
    // 3. Stamp it into our component
    if (chartContainerRef.current) {
      chartContainerRef.current.appendChild(content);
    }
    // ... analytics call will go here ...
  }, []);

  return (
    <div className="widget">
      <h3>My Fast Widget</h3>
      {/* This is now just an empty container */}
      <div ref={chartContainerRef} />
    </div>
  );
}

Boom. We just saved React from having to manage hundreds of virtual DOM nodes. We offloaded all that parsing work to the browser, which it does much more efficiently.

Part 2: `requestIdleCallback` for the "Nice-to-Haves"

Okay, the component renders faster, but that analytics call is still blocking the main thread in useEffect. This is where the second part of our "trick" comes in.

requestIdleCallback is a browser API that's like saying, "Hey browser, I know you're busy. When you get a free second and you're not busy with user input or animations, could you please run this function for me?"

It's perfect for non-critical tasks like analytics.

Let's add it to our useEffect:

// ... inside our FastWidget component ...
  useEffect(() => {
    // --- Template code from above ---
    const template = document.getElementById('my-chart-icon-template');
    const content = template.content.cloneNode(true);
    if (chartContainerRef.current) {
      chartContainerRef.current.appendChild(content);
    }
    // --- Our new, non-blocking analytics call ---
    if ('requestIdleCallback' in window) {
      window.requestIdleCallback(() => {
        sendAnalyticsEvent('widget_visible', { detail: '...' });
      });
    } else {
      // Fallback for older browsers
      setTimeout(() => {
        sendAnalyticsEvent('widget_visible', { detail: '...' });
      }, 0);
    }
  }, []);

The Payoff

And just like that, the jank is gone.

Our component now renders instantly. The state update (if we had one) happens immediately. The heavy-lifting of parsing the SVG is handled by the browser. And the non-critical analytics call waits politely for its turn when the main thread is free.

I love this kind of solution! It's not about "React vs. Vanilla JS." It's about remembering that your framework is a guest in the browser's house. By respecting the browser and using the native tools it provides, you can make your framework based apps feel infinitely faster.

So next time you're facing down some "jank," don't just reach for a new library. Ask yourself, "Can I just offload this to the browser?"

Meta's Llama API: Open Models, Meet Developer Convenience

Michael Solati — Thu, 01 May 2025 18:00:00 +0000

We keep seeing headlines about new LLMs reaching state-of-the-art performance and dominating benchmarks, which is genuinely incredible progress! But the gap between that and actually getting these powerful models deployed effectively within an application... that's often where the rubber meets the road, and frankly, where things can get pretty messy. Sure, downloading huge model weights is part of it, but creating a reliable, smooth operational workflow around them? That's the harder part.

That's why Meta's announcement at LlamaCon 2025 wasn't just another model drop (though they keep doing that, too, bless their open source ❤️). They unveiled the official Llama API. This is a significant shift. Meta, the champions of open weights you can download and run yourself, is now stepping firmly into the hosted API game.

Why should you, as a developer, care? This move is about bridging that gap between incredibly capable open source models and making them radically easier for us to use in our projects. Think about getting the flexibility and transparency we love about open models combined with the kind of developer experience and convenience we've (sometimes grudgingly) come to expect from the closed-source, API-first players. Plus, Meta made some very interesting decisions with this API, like building in OpenAI compatibility from the get-go. Seriously.

And from where I sit, over here thinking about real-time interactions all day at LiveKit, easier access to faster, cheaper, more capable models? That starts to unlock some really exciting possibilities. Let's dig in.

Meet the Llama Family (Served via API)

First, this isn't just an API for some legacy model. Meta is directly putting some of its latest and greatest Llama iterations into its hosted service. When it first peaked in preview, it featured the then-new Llama 4 Scout and Maverick models alongside Llama 3.3 8B. Looking at the official docs now, the lineup includes optimized FP8 versions of those Llama 4 models, plus the Llama 3.3 series in both 70B and 8B parameter sizes. (See here)

Let's break down the current herd available via the official API:

Llama 4 Scout (Llama-4-Scout-17B-16E-Instruct-FP8): This isn't your grandpa's text-only LLM. Scout is natively multimodal, meaning it understands text and images right out of the box. It uses a Mixture-of-Experts (MoE) architecture (17 billion active parameters, 16 'experts') which helps make it efficient. Think more intelligent routing of your requests to specialized parts of the model. The API version uses FP8 precision for efficiency.
Llama 4 Maverick (Llama-4-Maverick-17B-128E-Instruct-FP8): Scout's sibling, also natively multimodal and running on an MoE architecture, but with way more experts (128 of them!) packed into its 17 billion active parameters. This suggests it might handle more complex or nuanced multimodal tasks. Benchmarks show Maverick punching well above its weight, often outperforming much larger models, especially in image understanding and coding. Again, the API serves an efficient FP8 version.
Llama 3.3 70B (Llama-3.3-70B-Instruct): The latest iteration of Meta's 70B text-only model line. It boasts improved reasoning, coding chops, multilingual support, and a beefy 128k token context window. Meta positions it as delivering performance comparable to the earlier massive Llama 3.1 405B for text-based tasks, but faster and cheaper.
Llama 3.3 8B (Llama-3.3-8B-Instruct): The lightweight, speedy sibling to the 70B. It still gets the 128k context window and multilingual capabilities. Still, it is designed for scenarios where you need quick responses and lower resource usage. This was also one of the first models for fine-tuning via the API preview.

Here's a quick look at the models currently listed in the official API docs:

Model ID	Input Modalities	Output Modalities	Context Length (API)	Key Architecture
Llama-4-Scout-17B-16E-Instruct-FP8	Text, image	Text	128k	MoE (16 Experts)
Llama-4-Maverick-17B-128E-Instruct-FP8	Text, image	Text	128k	MoE (128 Experts)
Llama-3.3-70B-Instruct	Text	Text	128k	Transformer
Llama-3.3-8B-Instruct	Text	Text	128k	Transformer

What's really interesting here isn't just the specs but the strategy. Meta isn't just incrementally improving text generation. They're adding fundamental new capabilities like native multimodality and architectural innovations like MoE that directly target limitations of older models and compete feature-for-feature with the big closed-source players like Google's Gemini and OpenAI's GPT-4 series. Offering these advanced models through an easy-to-use API signals Meta wants developers to have frictionless access to their cutting-edge, not just the older stuff.

Benchmarks of Llama 4 (source)

The increased context windows across the board (up to 128k standard in the API, a massive leap from Llama 2/3's initial 8k) also unlock more sophisticated applications, from deeper conversations to analyzing larger documents.

The Developer Experience: Less Yak Shaving, More Building

Okay, powerful models are cool. But how easy is it to use them via this new API? This is where Meta has seriously considered reducing friction for developers.

We're talking easy, one-click API key generation and interactive "playgrounds" to quickly test prompts and models. These days, this is standard fare for APIs, but nailing these basics is crucial for getting developers up and running quickly.

As you'd hope, they've rolled out official SDKs for Python and TypeScript. Installation looks super simple (for Python, it's just pip install llama-api-client). The SDK examples lay out ways to use it for chat completion. Plus, you get support for async and streaming responses, which is great.

Now, pay attention, because this next part is the real game-changer and tells you a lot about Meta's thinking: it works with the OpenAI API! Yep, you heard right. They've actually included a dedicated compatibility endpoint at https://api.llama.com/compat/v1/.

What does this mean? You can take your existing code that uses the official OpenAI client libraries, point it to Meta's base URL, and swap in your Llama API key. It should just work for core functionalities like listing models, chat completions (sync and streaming), and even image understanding with Llama 4. Meta explicitly provides examples showing how to do this. This is a massive olive branch for developers already invested in the OpenAI ecosystem. It dramatically lowers the barrier to trying out or switching to Llama. Meta removes the "but I'd have to rewrite my integration" excuse. It's a genius, pragmatic move acknowledging OpenAI's current de facto standard status while leveraging Llama's strengths (like cost and openness) as a compelling reason to make that tiny configuration change.

The API preview wasn't just for running inferences; it also packed tools for fine-tuning and evaluating models. They first showed this off with the Llama 3.3 8B model, letting developers build custom versions right there in the hosted API. This means you could tailor models to your specific needs without wrestling with complex training setups. It really signals that Meta gets that serious AI work often needs more than just a one-size-fits-all model – it needs specialization. Putting these tools in the API turns it from a basic inference point into a much more complete development platform.

Open Arms vs. Walled Gardens: Llama's Place in the AI Bazaar

Meta's entire philosophy with Llama has been centered around openness. They release the model weights, allowing anyone (with the right hardware and expertise) to run, modify, and build upon them. The Llama API fits into this by providing an easier access point to these open models, but it doesn't lock you in. Meta explicitly states that models fine-tuned via the API are yours to take and host elsewhere if you want. This is in contrast to the proprietary API-only approach of competitors like OpenAI, Anthropic, and Google, where the models remain firmly within their walled gardens.

And let's be clear: these open models aren't just "good for open source," they are competitive on performance. Llama 3 models showed significant improvements over Llama 2. They outperformed other open models of similar size on various benchmarks like MMLU, HumanEval, and GSM-8K. The larger Llama 3.1 405B was positioned as rivaling top closed-source models. The newer Llama 4 models, like Maverick, are showing impressive results, even surpassing GPT-4o and Gemini 2 in areas like image understanding (ChartQA, DocVQA) and long-context tasks, according to some benchmarks.

But where Llama really throws down the gauntlet is cost. While Meta hasn't published official pricing for their own hosted API preview (it was mentioned as a free limited preview initially), the pricing from ecosystem partners who offer Llama models via API sets a clear and dramatic precedent. Meta themselves have highlighted affordability as a key benefit, and external analyses confirm Llama often offers some of the lowest costs per token in the industry.

Safety in the Open: Enter Purple Llama

With great power comes great responsibility, right? Meta isn't just tossing powerful models over the wall and hoping for the best. Along with the models and the API, they've invested significantly in building and openly sharing tools for trust and safety under the Purple Llama project umbrella.

The name comes from cybersecurity's "purple teaming," which combines offensive (red team) and defensive (blue team) approaches to finding and fixing vulnerabilities. Purple Llama aims to bring this collaborative, proactive security mindset to generative AI.

Key components include:

Llama Guard: A family of models (like Llama Guard 3 and 4) specifically designed for content moderation. These can filter both the inputs sent to your main Llama model and the outputs it generates, checking for harmful, unethical, or policy-violating content based on taxonomies like the MLCommons standard. Llama Guard models are available via the API, supporting multiple languages and image reasoning (in Guard 4).
CyberSec Eval: They've put together a set of benchmarks and tools to check if an LLM is likely to churn out insecure code or help someone with cyber mischief. This is super useful for developers because it lets them measure and lower the cybersecurity risks of using LLMs, especially when those models write code.
Prompt Guard: A tool focused on catching and blocking prompt injection attacks and jailbreaking. These are the kinds of bad inputs folks use specifically to try and bypass a model's built-in safety controls.

Why does any of this matter? Well, it's a strong sign that Meta is taking the safety concerns around AI models seriously. What's crucial is that they're putting these tools out there openly (with flexible licenses), letting everyone in the community use them, poke around, improve them, and help standardize safety practices. This open safety strategy is a good move, positioning Meta not just as a developer of powerful open AI but as a champion for responsible development.

Conclusion: Let the Llamas Loose

My final thoughts on the Llama API? It's clear this is more than just Meta playing catch-up in the API world. It feels like a deliberate step designed to make their strong open source models much simpler for developers to adopt and work with. It brings together the ease of using a hosted API with all the good stuff people already love about Llama: its performance, affordability, and open nature.

Meta's push with the API, the raw performance/cost advantages of the models, and the explicit focus on speed through partnerships like Cerebras and Groq make a strong argument for the momentum behind open(ly accessible) AI. And from a real-time perspective? The possibilities got a lot more interesting and, crucially, affordable.

Is it worth experimenting with the Llama API? Absolutely. Head over to the Llama Developer Portal, check out the models and SDKs, get on the waitlist if the preview is still limited, and see how it slots into your stack. I'd love to hear what you build with it!

Voice Assistants Evolving? Perplexity on iOS Attempts True Task Integration

Michael Solati — Mon, 28 Apr 2025 00:00:00 +0000

Okay, real talk: how often has Siri left you hanging with a "Here's what I found on the web" when you needed something done? 🙋‍♂️ It happens to the best of us. For years, it feels like we've been waiting for our iPhone assistants to get genuinely smart, especially with the AI revolution happening all around us.

Well, buckle up because Perplexity AI, the "answer engine" known for giving you sourced answers instead of just links, just brought its conversational voice assistant to iOS just last week! And the timing? Let's say it's very interesting, given the reported delays with Apple's own "Apple Intelligence" and the smarter Siri we were promised. Perplexity is sliding right into that gap.

So, What's the Big Deal? It's an Agent

This isn't just about talking to your search engine. Perplexity is pushing into agentic AI. What does that mean? It means the AI doesn't just find information; it aims to do things for you by interacting with other apps. It's trying to be the assistant that bridges the gap between knowing what you want and actually making it happen. They cleverly used standard iOS developer tools to make this happen, connecting their AI brain to the apps we use daily.

What kind of stuff can it actually do?

Bookings: Ask for a table tonight, and it can pop open OpenTable with the details mostly filled in. Need a ride? It can tee up an Uber request.
Emails: Dictate an email, and it'll draft it in the Mail app, which is ready for your final check.
Scheduling: Add events to your calendar or set reminders (with permission, of course).
Media: Find specific songs on Apple Music or pull up videos on YouTube.
Navigation: Get directions started in Apple Maps.

Perplexity vs. Siri: The Showdown

So, is it time to kick Siri to the curb? Well, not entirely. Siri still has exclusive access to core iPhone functions – setting system alarms, changing settings like Wi-Fi or Do Not Disturb, and sending texts directly via Messages. Perplexity can't touch those.

But where they overlap, Perplexity often feels like it's playing a different game:

Action, Not Just Links: This is huge. Ask Perplexity to book that table; it opens OpenTable and fills in the details. Ask Siri the same thing, and you're likely getting... web links. Perplexity tries to initiate the task, making it feel genuinely helpful.
Smoother Conversations: Perplexity seems much better at understanding natural language and keeping track of the conversation flow. It feels less like issuing commands and more like having a dialogue.
Works on Older iPhones: Big plus, this runs on devices like the iPhone 12/13, unlike Apple Intelligence, which demands newer hardware.

Perplexity makes you feel more capable and less frustrated with everyday assistant tasks. It's closer to the helpful AI companion many of us envisioned years ago.

Final Take: Worth Your Time? Heck Yes

Perplexity's voice assistant on iOS is a genuinely exciting development. It's fast, conversational, and surprisingly good at orchestrating tasks across different apps (often better than Siri).

No, it can't replace Siri for everything due to those system limitations. However, it offers a glimpse of a more powerful, proactive AI assistant for a wide range of common requests. It shows what's possible even within Apple's constraints. It pressures Apple to deliver on that "smarter Siri" promise.

If you're curious, download the app, tap that waveform icon, and take it for a spin. You might find yourself reaching for it more often than you expect.

Google's Cookie Conundrum

Michael Solati — Tue, 30 Jul 2024 00:00:00 +0000

Last Monday (July 22nd, 2024, for those of you reading in the future), Google made headlines by announcing a major policy change regarding its plans to phase out third-party tracking cookies in Chrome. This decision marks a pivot from Google's initial commitment to increase user privacy and calls into question the future of web privacy standards.

The Big Announcement

In the original proposal introduced several years ago as part of the Privacy Sandbox, Google aimed to gradually phase out third-party cookies, a powerful tool many advertisers use to track users across sites. However, the latest announcement encapsulated a rather unexpected turn of events. Anthony Chavez, the vice president of the Privacy Sandbox, stated, "Instead of deprecating third-party cookies, we would introduce a new experience in Chrome that lets people make an informed choice that applies across their web browsing." (source)

While Chrome plans to roll out a user-choice prompt for users to decide about third-party cookies, this change underscores a growing reality: achieving privacy in a digital landscape saturated with tracking technologies is a tough battle.

What's Under the Surface?

Industry experts have expressed skepticism regarding Google's decision to maintain third-party cookies under the guise of user choice. Safari and Firefox have already taken a hard stance against these tracking mechanisms, locking users out of third-party cookies since early 2020. Google's decision to backtrack highlights its predicament: balancing user privacy with its advertising revenue model.

Apple, a fierce critic of Google's Topics API—a core component of Privacy Sandbox that categorizes user interests based on browsing history—added fuel to the fire. As Apple's WebKit team pointed out, "The user doesn't get told upfront which topics Chrome has tagged them with." (source) This raises legitimate questions about users being able to manage their data and maintain anonymity.

The Regulatory Landscape

This Google pivot isn't just a matter of corporate policy; it's a broader commentary on regulatory pressures shaping tech giants' approaches to privacy. The UK's Competition and Markets Authority (CMA) is intensely monitoring this new approach. They are evaluating the impact of Chrome's user-choice status on cookies and its implications for user privacy across the internet.

Stephen Bonner, deputy commissioner at the Information Commissioner's Office (ICO), expressed disappointment in Google's adjustments, emphasizing the need for a more privacy-friendly internet. The ICO continues to advocate for the digital advertising industry to transition to more private alternatives to third-party cookies rather than veering into less transparent tracking forms. (source)

The Implications

So, what does this mean for everyone involved?

For Users: Those who value online privacy may see this pivot as a step backward, reducing their control over personal data and how it's shared across the web.
For Advertisers: The existing system that generates revenue through third-party tracking remains intact, leaving advertisers to navigate the older, more convoluted advertising landscape. This could lead to a shift in advertising strategies, focusing on first-party data and contextual advertising rather than relying on third-party tracking.
For Regulators: This shift prompts an ongoing dialogue around privacy legislation and the tech industry's accountability in safeguarding user data. Regulators are now faced with the challenge of protecting user privacy while allowing for innovation and competition in the digital advertising space.

The conclusion? Google's about-face on third-party cookies leaves everyone in a state of pause. While intended as a measure of user choice, it muddies the waters of privacy protection, sparking ethical debates about consent in the ever-evolving digital landscape.

Wrapping Up

As this saga unfolds, the situation serves as a reminder that the trajectory toward a more transparent, privacy-centric internet is fraught with challenges. While Google plans to roll out user-choice prompts, significant trust issues remain within the ecosystem. Whether this initiative will empower users by giving them more control over their data or serve as a façade of choice depends on execution and ongoing engagement with privacy advocates and regulators.

As we navigate the future of online privacy, one question looms: can companies genuinely prioritize user data, or will advertising revenue continue to dictate the rules of the game?

Do I post on Twitter, no I gave up on it, but I plan on getting active on Threads, so follow me there!

Why Netflix Took a Bet on GraphQL

Michael Solati — Tue, 27 Jun 2023 05:37:14 +0000

So you may have missed it, but about two weeks ago, the streaming giant Netflix shared the details of its massive leap forward by embracing GraphQL as its preferred API architecture. Let's dive into what Netflix did, why it made this bold move, how it executed the migration, and why other companies should seriously consider following suit.

If you're interested in reading about their experience, be sure to check out the blog post on the Netflix Technology Blog. But do that after you read this blog post to get our opinion first!

Netflix's Big Move

In 2022, Netflix migrated their iOS and Android apps to a GraphQL backend. Until then, they had used their own home-baked API framework, Falcor, to power their mobile apps. Interestingly, the Falcor Java implementation team also managed the API server.

Netflix's decision to adopt GraphQL was driven by its ambition to create a more flexible, efficient, and developer-friendly API architecture. Moving away from its monolithic REST API server, Netflix sought to empower its development teams, optimize the data transfer, and enhance the overall user experience.

By breaking up their Falcor monolith, they allowed every team to manage their own GraphQL API thanks to a federated GraphQL gateway; removing a burden from the Falcor Java team while empowering the other teams with ownership of the code they were writing.

The Power of GraphQL

So, what makes GraphQL so compelling? First and foremost, it offers unparalleled flexibility. Unlike REST APIs, where clients are constrained by fixed data structures, GraphQL empowers clients to request the data they need, eliminating over-fetching and under-fetching data. This leads to faster load times, improved performance, and, ultimately, happier users.

GraphQL's declarative nature and powerful tooling also provide an enhanced developer experience. It simplifies data fetching and eliminates the need for multiple API endpoints, making development more efficient and productive. With GraphQL, developers can focus on delivering value without being hindered by rigid API structures.

Netflix's Migration Journey

Now, let's look at how Netflix executed its migration to GraphQL. They approached the transition in two phases, ensuring a smooth and seamless integration.

Phase 1 - Creating a GraphQL Shim Service

Netflix's first step involved creating a GraphQL shim service on their monolithic Falcor API. This allowed client engineers to swiftly adopt GraphQL and explore client-side concerns without disrupting the server-side infrastructure. To launch this phase safely, Netflix employed AB testing, evaluating the impact of GraphQL versus the legacy Falcor stack.

Diagram of GraphQL Shim Service in front of Legacy API Monolith.

Phase 2 - Replacing Falcor with a GraphQL Gateway

Then, Netflix deprecated the GraphQL shim service and the legacy Falcor API in favor of federated GraphQL services owned by domain teams. This decentralized approach enabled independent management and ownership of specific sections of the API. To ensure the correctness and functional accuracy of the migration, Netflix employed replay testing, comparing results between the GraphQL Shim and the new Video API service. They also leveraged sticky canaries, infrastructure experiments that assessed performance, and business metrics, to build confidence in the new GraphQL services.

Diagram of Federated GraphQL Gateway replacing Shim Service.

Why You Should Consider GraphQL?

Netflix's successful migration is a powerful example for other companies evaluating their API strategies. Here are some compelling reasons why GraphQL should be seriously considered:

Increased Efficiency: GraphQL optimizes data retrieval by allowing clients to request only the required data, eliminating unnecessary network overhead. This leads to faster load times, improved performance, and optimized resource utilization.
Flexibility and Adaptability: With GraphQL, companies can quickly iterate and innovate, responding to changing business needs and user demands. Its flexible nature allows seamless additions, modifications, and deprecations without breaking existing client implementations.
Developer-Friendly: GraphQL's declarative nature, comprehensive documentation, and robust tooling make it a developer's dream. It simplifies data fetching and enhances productivity, empowering developers to deliver value more effectively.
Future-Proofing and Scalability: GraphQL's flexibility and adaptability future-proof API infrastructures. It enables long-term scalability, forward compatibility, and easy integration with evolving technologies.

Want to Make the Jump?

If you found this blog post interesting and are considering switching to GraphQL, first give us a 🌟 on GitHub, and also be sure to check out Amplication. Besides excellent content like this, we also build the premiere open-source developer tool for generating scalable, secure, and extensible backends using technologies like GraphQL.

We're making it even faster than ever to build on our platform with our new Database Schema Import functionality, allowing you to seamlessly import your existing database schema into Amplication. This helps reduce the transition time from your old backend to your new one by preserving your underlying database so you can work on enhancing your services.

You can sign up for the beta here, migrate your backend, and build something as impressive as Netflix.

Wrapping Up

Netflix's migration to GraphQL is a testament to the power and benefits of this revolutionary API architecture. By adopting GraphQL, Netflix achieved increased efficiency, flexibility, and developer-friendliness, while ensuring a seamless user experience. Other companies should take note of Netflix's success and seriously consider embracing GraphQL to take advantage of these benefits for themselves.

Adapting and staying ahead of the curve is critical as the technological landscape evolves. GraphQL presents a paradigm shift in API architecture, reimagining how data is exchanged between clients and servers. So, whether you're a small startup or a tech giant, you should explore the possibilities of GraphQL and join the ranks of companies (like Netflix and Amplication) embracing this technology.

Amplication Partners With GitHub to Provide Students With Intern Opportunities

Michael Solati — Thu, 15 Jun 2023 07:24:47 +0000

Are you a student ready to take your skills to the next level? Well, look no further! We want to introduce you to the exciting GitHub Octernships program. This groundbreaking paid internship program connects students with industry partners, like Amplication, where they gain paid professional experiences and mentorship on open-source development projects. We'll dive into the Octernships program, why students should jump on this incredible opportunity, and why Amplication's involvement will add excitement to the mix.

What are Octernships?

Imagine a world where you can work on real-world projects, get paid for them, and enhance your skills as a student. That's only a part of what Octernships offer. This program provides students with hands-on experience and the chance to work on diverse projects, helping them hone various software development skills. Octernships expose students to the full spectrum of software development endeavors, from open-source projects to documentation, design, and testing.

One of the most valuable parts of Octernships is the meaningful mentorship industry experts provide. This mentorship helps students navigate their projects and facilitates personal and professional growth. By working alongside seasoned professionals, students gain insights, advice, and guidance that can significantly impact their career trajectory.

Additionally, Octernships allow students to collaborate with other developers, expanding their network and forging connections that may prove key in their future endeavors. The program also provides a curated list of registered partners, allowing students to connect with potential employers and open doors to exciting career opportunities.

Why Should Students Apply?

Now let's discuss what an Octernship entails and why students should jump on this remarkable opportunity.

Hands-On Experience: Octernships offer students the chance to gain practical, real-world experience in software development. This experience-based learning is invaluable in bridging the gap between the classroom and the real world. It equips students with the skills and confidence necessary to thrive in their future careers.
Mentorship from Industry Experts: The mentorship provided through Octernships is a big deal. Being guided by experienced professionals allows students to learn from the best, receive personalized feedback, and gain insights into industry best practices. This mentorship fosters personal growth and accelerates students' learning curves.
Building a Professional Network: The connections students make during Octernships can impact their careers. Collaborating with fellow developers and connecting with potential employers broaden their network and open doors to future collaborations and employment opportunities.
Competitive Student Stipend: GitHub and its partners understand the importance of recognizing students' efforts and dedication. Participants receive a competitive stipend for each project, acknowledging their hard work and providing financial support.

Amplication and Octernships: A Dynamic Duo

Octernships have recently taken an exciting turn with the involvement of Amplication. This leading software development platform empowers developers to build robust and scalable backends. Amplication's participation in Octernships adds extra excitement and opportunity for students. Octerns will get the chance to work on our plugin ecosystem, adding new functionality and features to Amplication that developers worldwide will use.

So here's why Amplication's involvement is something to take note of:

Cutting-Edge Technology: Amplication brings cutting-edge technology to the Octernships program. Students participating in Amplication-related projects can work on building innovative tools that make developers' lives easier and help drive the industry forward. This experience will help prepare students for the ever-evolving landscape of software development.
Meaningful Contributions to Open Source: Amplication is committed to its' open-source origin, and through its involvement in the Octernship program, students can help the open-source community. By contributing to Amplication's open-source initiatives, students become part of a global collaborative effort, leaving a lasting mark on the software development ecosystem.
Enhanced Learning Opportunities: Amplication's expertise in software development improves the learning experience for the Octerns. Students will benefit from Amplication's knowledge and resources, expanding their skill sets and exposing them to real professional development experiences.

Wrapping Up

GitHub Octernship offers an exceptional opportunity for students to gain hands-on experience, receive mentorship from industry experts, build their professional networks, and make a meaningful impact on the open-source community. With Amplication's involvement, students can further elevate their learning experiences and embrace cutting-edge technology in their Octernship journey.

If you are a student passionate about software development, take advantage of this incredible opportunity. GitHub Octernships are your gateway to unleashing your potential, kick-starting your tech career, and becoming part of the next generation of developers driving software innovation.

Finally, Amplication's VP of Research & Development, Muly Gottlieb, will join GitHub to share about our Octernship project and answer your questions! Check out the stream on June 21st at 3:55 PM UTC.

DEV Community: Michael Solati

How 129KB of Whitespace (and a Recursive Loop) Broke the Web

New Advisory Notification

How We Got Here

The Fatal Flaw in Flight

The WAF Bypass (Why the Email Came Too Late)

The Second Wave: It Wasn't Just RCE

1. The Infinite Loop (CVE-2025-55184 & CVE-2025-67779)

2. The Spy in the Reflection (CVE-2025-55183)

The "Patch of the Patch"

Where We Go From Here

The Hindsight Perspective

I Built an AI-Powered TTRPG Adventure Generator (Because Generic Hallucinations Are Boring)

The "Research-then-Generate" Workflow

The Stack

Step 1: Defining the Adventure Schema

Step 2: The "Secret Sauce" (Exa)

Step 3: Streaming the "Vibes"

Citation Mapping

🚀 Quickstart: Run It Locally

Prerequisites

1. Clone the Repository

2. Install Dependencies

3. Set Up Environment Variables

4. Run the Development Server

Wrapping Up

Visualizing the Event Loop: A Guide to Microtasks, Macros, and Timers

The Challenge

The Model: Visualizing the Traffic

The Breakdown: The VIP Lane

The Solution: One Loop to Rule Them All

The Cheat Sheet

A Note for Node.js Developers

LeetCode vs. Vibe Coding: The Reality of Interviewing in 2025

The Great Bifurcation

The Enterprise Players

The Startups

The Hidden Barrier to Entry

The "Home Court Advantage" Problem

Try Before You Buy

tl;dr Everything is Awful

I'm Getting Serious Déjà Vu... But This Time It's Different

The Old "Filter": Horizontal Saturation

The New "Filter": Vertical Compression & Pipeline Elimination

1. The Squeeze from the Top: "Vertical Compression"

2. The Squeeze from the Bottom: "Pipeline Elimination"

The Real Story: The "Great Capital Reallocation"

Don't Fear the Tool, Master the Craft (Now More Than Ever)

The Production Readiness Checklist

The Production Readiness Checklist

1. The "Chrome" is Correct

2. Automate Your Professionalism (The CI Gate)

3. Review the "Scaffolding"

The "Last 5%" is 100% of Your Reputation

Fighting Framework Jank (What's Not in the Docs)

The "Framework-Pure" Problem

The "One Weird Trick": Offload It to the Browser

Part 1: The <template> Tag for Heavy Lifting

Part 2: requestIdleCallback for the "Nice-to-Haves"

The Payoff

Meta's Llama API: Open Models, Meet Developer Convenience

Meet the Llama Family (Served via API)

The Developer Experience: Less Yak Shaving, More Building

Open Arms vs. Walled Gardens: Llama's Place in the AI Bazaar

Safety in the Open: Enter Purple Llama

Conclusion: Let the Llamas Loose

Voice Assistants Evolving? Perplexity on iOS Attempts True Task Integration

So, What's the Big Deal? It's an Agent

What kind of stuff can it actually do?

Perplexity vs. Siri: The Showdown

Final Take: Worth Your Time? Heck Yes

Google's Cookie Conundrum

The Big Announcement

What's Under the Surface?

The Regulatory Landscape

The Implications

Wrapping Up

Why Netflix Took a Bet on GraphQL

Netflix's Big Move

The Power of GraphQL

Part 1: The `<template>` Tag for Heavy Lifting

Part 2: `requestIdleCallback` for the "Nice-to-Haves"