DEV Community: neo xia

Next.js 16 on Cloudflare Workers: what broke and what didn't

neo xia — Wed, 22 Jul 2026 03:23:39 +0000

I shipped a Next.js 16 app on Cloudflare Workers via OpenNext. Not a demo. A real product with streaming chat, server components, D1 at the edge, and anonymous user sessions.

Here is what broke, what barely worked, and what turned out to be surprisingly fine.

The stack

Next.js 16.2 (App Router)
@opennextjs/cloudflare 1.19
D1 for SQLite at the edge
Streaming chat via the AI binding (DeepSeek-V3 through a Workers proxy)
React 19
Tailwind CSS 4
No auth wall, no OAuth, no database on the origin

The site runs a few thousand sessions a week across ~30 persona pages, blog posts, guides, and learning content. Most pages are statically generated. The chat interaction is server-rendered components with streaming responses.

What worked surprisingly well

Static generation and ISR

Pages, blogs, guides, persona pages — everything that does not need user-specific rendering — runs as static HTML at deploy time.

Next.js 16 with generateStaticParams and fetch caching worked without modification. OpenNext handles the Cloudflare output format. The build step produces something Workers can serve. Revalidations are limited to Workers' cache API, but since most content changes at deploy time, I never hit that limit in production.

The one caveat: revalidateTag() does not work the same way in a Workers runtime. Tags are Node.js memory constructs, and Workers are stateless. If you depend on tag-based revalidation for content updates, you need to either trigger deploys or accept stale-while-revalidate behavior from the CDN.

D1 at the edge

D1 was the least surprising part of the stack. SQL queries from Next.js route handlers feel like calling a regular database. Sessions store in D1, messages store in D1, and the latency is low enough that restoring a full chat thread from 30 messages takes under 200ms cold.

The only sharp edge: D1 connections count against your Worker's concurrent request limit in development. With Next.js making its own fetch calls for compilation, I hit the D1 connection ceiling faster than expected. The fix was moving wrangler dev to use --experimental-json-config early, but the dev experience for D1 + Next.js hot reload is still rougher than it should be.

Streaming chat responses

The app streams model responses token by token. In dev, this was unreliable — the stream would drop mid-response on roughly 1 in 15 requests. I spent two days tracing it before realizing it was a local wrangler dev issue with HTTP chunked transfer encoding, not a production problem.

In production, streaming over Workers works. The AI binding handles the model request inside the Worker, and the response streams back through the Next.js route handler to the client without a hitch.

Anonymous user sessions

The product ships without required sign-in. Users get a UUID from crypto.randomUUID() stored in localStorage, sent as X-User-Id on every chat request. On the server, D1 writes sessions against that ID.

Workers not having durable session state actually helps here. The identity arrives in headers. No sticky sessions. No session table. The Worker pulls the user ID, checks D1 for existing sessions or quota limits, and proceeds.

What broke

Middleware and edge runtime mismatch

Next.js 16 middleware runs on every request. On Cloudflare Workers, middleware executes in the edge runtime, not the Node.js runtime. This means any middleware that imports crypto or uses APIs outside the Workers subset will fail at request time, not at build time.

I had a middleware that checked rate limits using a counter in D1. Simple enough. But the D1 binding is not available in the middleware context through OpenNext the way it is in route handlers. I had to move rate limiting to a Cloudflare WAF rule instead of doing it in Next.js middleware. Not a dealbreaker, but the middleware → Workers gap is larger than the documentation suggests.

`next/image` and Workers

Static images work fine. Dynamic image optimization through next/image does not, because Workers lack the image processing libraries that Node.js uses.

The fix: I pre-processed all images at build time and served them as static assets. No runtime optimization needed. For an app with fewer than 50 unique images, this was trivial. For an app with user-uploaded content, it would be a hard problem.

Environment variables in client components

Environment variables prefixed with NEXT_PUBLIC_ are baked into the client bundle at build time. That works fine. But runtime environment variables accessed through process.env in server components behave differently in Workers.

Cloudflare Workers use a env binding, not process.env. OpenNext bridges this, but the bridge is not seamless. Variables I set in wrangler.toml were available in the Worker context but not through the process.env API that Next.js server components expect at runtime. The workaround was importing the Cloudflare bindings through the @opennextjs/cloudflare types and passing them explicitly.

The first-deploy cold start

First deploy after building with OpenNext takes roughly 45-60 seconds for the first Worker request. After the initial cold start, response times drop to normal.

This is a known Workers characteristic compounded by Next.js route chunking. The more routes your app has, the more chunks the Worker needs to load on the first request. My app has about 30 routes. A smaller app with 5 routes would cold-start faster.

Subsequent deploys are faster because Cloudflare caches the compilation output. But the first deploy of the day always has a cold start window.

Streaming + D1 in the same route handler

This one was subtle. A route handler that reads from D1 and then streams a response — for example, loading session history and then starting the chat stream — sometimes lost the D1 response before the stream completed.

The issue: Workers terminate the request context after the response is sent. If you read from D1 inside the same handler that starts a streaming response, the D1 bindings can close before the stream finishes if the stream outlasts the initial response resolution.

The fix: resolve all D1 reads before starting the stream. Load session data early, store it in a closure, then begin streaming.

// Before: D1 read interleaved with streaming
export async function POST(req: Request) {
  const session = await db.prepare("SELECT * FROM sessions WHERE id = ?").bind(sessionId).first();
  const stream = await ai.beta.chat.completions.create({ model, messages, stream: true });
  return new Response(streamToReadableStream(stream));
}

// After: resolve D1 reads first, then stream
export async function POST(req: Request) {
  const session = await db.prepare("SELECT * FROM sessions WHERE id = ?").bind(sessionId).first();
  const messagesWithHistory = [...(session.messages || []), ...newMessages];
  const stream = await ai.beta.chat.completions.create({ model, messages: messagesWithHistory, stream: true });
  return new Response(streamToReadableStream(stream));
}

Straightforward fix once I recognized the pattern. Cost me two evenings of debugging before I found it.

What I am still watching

ISR revalidation at scale. I have not reached the threshold where Workers cache API limits matter. If the app grows to hundreds of content pages with frequent revalidations, I may need to revisit the caching strategy.
D1 row limits. A few thousand sessions with 10-15 messages each is well under D1's limits. If the app crosses 100k+ sessions, I will need either a message archiving strategy or a move to R2 for stored messages.
OpenNext churn. The OpenNext → Cloudflare pipeline changes version to version. Two minor bumps have already required wrangler.toml changes. The stack is stable enough for production but not mature enough to set and forget.

The honest take

Next.js 16 on Cloudflare Workers works for a real production app. The static generation path is smooth. D1 is solid. Streaming is fine after you learn the D1-resolution-before-stream rule.

The rough edges are in the gaps between ecosystems: middleware runtimes, environment variable access patterns, and image optimization. Each gap has a workaround, but the workarounds are not always documented.

The OpenNext team has done the heavy lift of bridging Next.js to Workers. The remaining friction is mostly about accepting that you are on a serverless platform with different constraints than Vercel. Once you internalize those constraints — resolve D1 first, pre-process images, avoid Node.js APIs in middleware — the stack holds.

For a production app with streaming chat, anonymous sessions, and static content: cosskill.com. The stack file is pinned on the repo if you want the full picture.

Why I Chose DeepSeek Over GPT-4 for a Free AI Conversation App

neo xia — Mon, 22 Jun 2026 06:43:39 +0000

I did not choose DeepSeek because I think GPT-4 is bad. I chose it because I was building a free app, and free apps teach you what actually matters pretty fast.

The question was simple: how do I keep sessions cheap enough that people can practice a lot without me lighting money on fire?

The answer pushed me toward DeepSeek-V3 (and later R1 for specific tasks).

The real constraint was volume

The app is a conversation practice tool. People come in to rehearse hard talks, not to admire the model.

A single practice session runs 8-15 turns. Each turn is roughly 300-600 tokens in, 100-300 out. Multiply that by five sessions a week per active user and the costs start compounding.

Here is what the math looked like when I was choosing (mid-2026 pricing):

Model	Input cost (per 1M tokens)	Output cost (per 1M tokens)	Cost per 10-turn session (est.)
GPT-4o	$2.50	$10.00	~$0.04-0.06
GPT-4 Turbo	$10.00	$30.00	~$0.12-0.18
DeepSeek-V3	$0.27	$1.10	~$0.004-0.007
DeepSeek-R1	$0.55	$2.19	~$0.008-0.012

At scale, the difference between $0.005 and $0.05 per session is the difference between running a free product and needing a paywall after three conversations. I wanted people to come back daily without hitting a wall.

What DeepSeek handled well

It stayed in character for 10-15 turns. It pushed back when the user got vague. It followed persona heuristics (numbered if/then rules in the system prompt) about as reliably as GPT-4o did for our use case.

For salary negotiation rehearsal, the model needs to say "that's not in the budget" and hold that position for three more turns while the user tries different approaches. DeepSeek-V3 did this. Not perfectly, but reliably enough that sessions felt real.

It also made the app easier to run as a free product. People can try, fail, reset, and try again without me worrying about per-session cost.

Where GPT-4 was still better

GPT-4 (and 4o) is smoother with nuanced emotional wording. When a conversation gets subtle, loaded with subtext, or requires picking up on implied meaning, GPT-4 catches more.

For the breakup text persona, GPT-4o noticed when a user's "kind" message was actually passive-aggressive. DeepSeek missed that about 20% more often in my informal testing across ~100 sessions.

But polish was not the main bottleneck for this product. The main bottleneck was getting people enough reps to build actual comfort with discomfort.

The tradeoff I actually cared about

Do I want one beautiful session, or ten useful ones?

For this app, ten useful ones. Every time.

So I took the cheaper model, put the engineering effort into the prompt architecture (persona seed, heuristics, mode wrapper, boundaries), and accepted that 85-90% quality at 10x the volume was a better product than 95% quality at 1x.

The model matters. The scaffolding around it matters more.

What I changed to make DeepSeek work

A few things made the choice viable:

Tighter system prompts. DeepSeek drifts more with long, loose instructions. Shorter seed, more numbered rules.
Lower temperature (0.55 for roleplay, 0.2 for scoring). Kept persona variation without character breaks.
Max reply length cap in the mode wrapper. DeepSeek's default is wordier than GPT-4o, so I had to constrain it explicitly.
Built retries into the flow. A bad response does not kill the session; the user gets a fresh turn.

The last one is underrated for any practice app. The experience should not feel fragile.

My actual takeaway

If you are building a free AI app, the best model is not always the smartest one. It is the one that lets people come back tomorrow.

Not bragging rights. Not benchmark charts. Whether the app stays affordable enough to be used like a tool instead of a demo.

For cosskill, DeepSeek made more sense. It let me build something people use five times a week instead of once and forget. Which is usually the whole game for a practice product anyway.

If you want to see the product, it is at cosskill.com.

I built an AI conversation simulator because I kept chickening out of real talks

neo xia — Mon, 01 Jun 2026 03:43:42 +0000

Last year I needed to ask for a raise. I knew my number, I'd read the guides, I had bullet points in my notes app. Then my manager said "let's chat about your goals for next quarter" and I said "sounds great, looking forward to it" and hung up. Never brought up money.

Same thing kept happening elsewhere. Coworker taking credit for my work, I said nothing. Relationship that should've ended months earlier, I kept postponing. I always knew what to say. I just couldn't say it with someone actually looking at me.

So I started building a thing to practice on. That thing became cosskill.

What it actually is

You pick a persona, tell it the situation in a sentence, and start talking. The persona doesn't help you. It holds position and pushes back. You practice not folding.

Think of it as a flight simulator for hard conversations. You rehearse until your opener comes out steady, then go do the real thing. 20 personas across five categories:

Operators (Musk, Jobs): first-principles thinking, harsh product feedback
Strategists (Trump, Buffett): treat everything as a deal or a bet
Relationship (Ex, Coworker): breakups, workplace friction, family money
Philosophy (Socrates, Aurelius, Confucius, Sun Tzu, four more): each tradition frames problems differently
Psychology (Rogers, Rosenberg, Ellis, Frankl, Kahneman, Jung): therapeutic frameworks on real situations

These aren't celebrity impressions. The Buffett persona won't hype your startup idea. It'll ask "what's the downside?" and keep asking until you have something concrete.

Tech stack

Next.js 16 on Cloudflare Workers. DeepSeek for inference. Cloudflare D1 (SQLite at edge) for the bits that need to persist. No user accounts, chat history lives in localStorage.

Monthly cost stays low enough that the free tier (10 messages/day) doesn't worry me.

Why I made these choices

DeepSeek instead of GPT-4/Claude. Each conversation is 10-30 messages. At GPT-4 pricing a free product bleeds money. DeepSeek gives maybe 90% of the quality for a fraction of the cost on this specific task, which is maintaining persona consistency across a back-and-forth.

No accounts. Every signup form is friction between "I need to practice this talk" and actually practicing it. If I add a login wall, some percentage of people close the tab and go back to rehearsing in the shower. I'd rather have them practice.

Cloudflare Workers instead of Vercel. D1 is genuinely good for this. One database at edge, no connection pooling, no separate DB service. DX is slightly worse for Next.js specifically but the deploy simplicity makes up for it.

Personas that resist instead of a chatbot that helps. A generic AI assistant will agree with you. That's the problem. You don't need agreement. You need someone saying "that's not in the budget" while you practice not conceding immediately. Each persona has hardcoded positions and pushback patterns. They're useful by being difficult.

What I actually learned building this

Prompting a persona to stay in character is a solved problem. The hard part was figuring out what "practice" means for conversations.

Scripts don't work because they sound robotic and shatter on contact with a real human. Free-form chatting doesn't work because there's no improvement loop. What actually works: have one sentence you want to say, say it to something that pushes back, adjust based on what happens, try again. The first two minutes of any hard conversation set the rest. So you practice those two minutes.

After a few runs, people say the real conversation feels less scary. Not because they memorized a script. Because they already heard the worst response and survived. "That's not in the budget" doesn't feel like a gut punch when you've heard it four times from an AI and practiced not caving.

Where it's at

Early. Traffic is growing organically. The pages that get the most hits are salary negotiation practice and breakup text help. Draw your own conclusions about what people actually struggle with.

Free tier handles most users. Pro ($9.90/month annual) is for people who want unlimited messages and custom personas.

Go try it

cosskill.com. No signup, just pick a persona and start. If you have a salary conversation coming up, try Buffett. Need to set a boundary with a coworker, try the Coworker persona. Want someone to tear your startup pitch apart, try Jobs.

Genuinely curious: what conversations do devs specifically need to practice that I haven't thought of? And what persona doesn't exist yet but should?

first time to here

neo xia — Mon, 01 Jun 2026 03:34:12 +0000