- I stopped guessing why builds break. I lint env vars.
- I generate
.env.examplefrom code. Not docs. - I validate at runtime with Zod. One error message.
- I run a tiny Node script in CI. Fails fast.
Context
I ship small SaaS apps. Usually solo. Usually fast.
And I kept losing time to env vars.
The worst kind of bug. Works on my machine. Fails in CI. Or only fails after deploy. Or fails only in preview.
Real examples from the last month:
-
NEXT_PUBLIC_APP_URLwas missing in Vercel preview. OAuth callback broke. -
DATABASE_URLexisted, but pointed to the wrong DB. Brutal. -
STRIPE_WEBHOOK_SECREThad a trailing space. Took me 40 minutes.
Cursor + Claude helped. But not by “prompting harder”.
I needed a system.
So I built an env pipeline:
1) single schema
2) runtime validation
3) .env.example generated from that schema
4) CI script that fails before Next.js even starts
1) I wrote one env schema. Everything else follows.
I used to scatter process.env.X across files.
That’s how you get silent undefined.
Then you “fix” it with || ''.
Then prod does something weird.
Now I centralize env parsing.
One file. One export.
I use Zod because the error messages are readable.
And because it lets me do coercion without hacks.
// src/env.ts
import { z } from "zod";
const EnvSchema = z.object({
// Server-only
DATABASE_URL: z.string().url(),
AUTH_SECRET: z.string().min(32),
// Public (Next.js exposes these)
NEXT_PUBLIC_APP_URL: z.string().url(),
NEXT_PUBLIC_POSTHOG_KEY: z.string().min(1).optional(),
// Example of coercion
RATE_LIMIT_PER_MINUTE: z.coerce.number().int().positive().default(60),
});
// Parse once. Throw once.
export const env = EnvSchema.parse({
DATABASE_URL: process.env.DATABASE_URL,
AUTH_SECRET: process.env.AUTH_SECRET,
NEXT_PUBLIC_APP_URL: process.env.NEXT_PUBLIC_APP_URL,
NEXT_PUBLIC_POSTHOG_KEY: process.env.NEXT_PUBLIC_POSTHOG_KEY,
RATE_LIMIT_PER_MINUTE: process.env.RATE_LIMIT_PER_MINUTE,
});
export type Env = z.infer;
Cursor made this fast.
I highlighted my old process.env usage and asked it to “extract to env.ts with Zod”.
It got 80% right.
The other 20% was me catching mistakes.
Like it tried to mark DATABASE_URL as NEXT_PUBLIC_... once. Nope.
One thing that bit me — Next.js runs code in weird places.
If you import env in a client component, you’ll bundle secrets.
So I keep env imports server-only.
2) I made Next.js scream early (before any page renders)
Runtime validation is good.
But I wanted it earlier than “user hits route”.
So I validate inside an instrumentation hook.
This runs when the server starts.
If env is wrong, it dies immediately.
// src/instrumentation.ts
// Next.js will run this on server startup.
// It won't run in the browser.
export async function register() {
if (process.env.NEXT_RUNTIME === "nodejs") {
// Import only on server to avoid bundling.
await import("./env");
}
}
This saved me from the dumbest deploy.
Preview environment missing AUTH_SECRET.
Instead of “random auth errors”, the build just failed with a Zod stack trace.
And yeah, the first time I wired this up I got:
Error: Cannot find module './env'
My fault.
Wrong path.
Spent 25 minutes.
Most of it was me staring at a working file.
3) I generate .env.example from the schema
Docs lie.
Old README snippets lie even harder.
I want .env.example to be derived from the schema.
So when I add a var, the example updates.
No manual steps.
I keep a tiny script.
It writes keys only. No secrets.
// scripts/generate-env-example.ts
import { writeFileSync } from "node:fs";
import { z } from "zod";
// Keep this list in sync with src/env.ts.
// I don't try to auto-parse TS. Too fragile.
const EnvSchema = z.object({
DATABASE_URL: z.string().url(),
AUTH_SECRET: z.string().min(32),
NEXT_PUBLIC_APP_URL: z.string().url(),
NEXT_PUBLIC_POSTHOG_KEY: z.string().min(1).optional(),
RATE_LIMIT_PER_MINUTE: z.coerce.number().int().positive().default(60),
});
const shape = (EnvSchema as z.ZodObject).shape;
const lines = Object.keys(shape)
.sort()
.map((key) => `${key}=`);
const header = [
"# Auto-generated. Don't edit by hand.",
"# Run: pnpm gen:env",
"",
].join("\n");
writeFileSync(".env.example", header + lines.join("\n") + "\n", "utf8");
console.log(`Wrote .env.example with ${lines.length} keys`);
This isn’t perfect.
I’m duplicating the schema.
I tried to get Claude to “read the TS AST and extract keys”.
Spent 4 hours.
Most of it was wrong.
It kept breaking on:
- re-exports
- renamed imports
- schema composition (
merge,extend)
So I stopped.
Duplication is fine if the script is dumb and stable.
I run it whenever I touch env.
Cursor makes it muscle memory because it keeps scripts/ open in the sidebar.
4) I added a CI check that fails fast
Local validation is nice.
CI validation is mandatory.
I want CI to fail with a single readable error.
Not “Next build failed somewhere”.
So I wrote a Node script that:
- loads
.envlocally (only when present) - imports
envto trigger Zod parsing - exits non-zero on failure
// scripts/check-env.ts
import "dotenv/config";
async function main() {
try {
// Import triggers parsing + validation.
await import("../src/env");
console.log("env: OK");
} catch (err: any) {
console.error("env: INVALID\n");
// Zod errors are readable, but nested.
// Print the message and the cause if present.
console.error(err?.message ?? err);
if (err?.cause) console.error("\nCause:\n", err.cause);
process.exit(1);
}
}
main();
Then I wire it into package.json.
{
"scripts": {
"gen:env": "ts-node scripts/generate-env-example.ts",
"check:env": "ts-node scripts/check-env.ts",
"build": "pnpm check:env && next build"
}
}
Yes, ts-node in CI can be slow.
Mine adds ~2 seconds.
Worth it.
If you hate ts-node, compile scripts with tsx or plain JS.
I kept it simple.
One more thing that got me.
I once had CI passing but deploy failing.
Because CI had secrets configured, but preview didn’t.
So I now run check:env in preview too.
If the platform supports it, make it part of the build command.
Results
Before this, env bugs were constant.
In April, I hit 9 separate env-related failures across 3 codebases.
I tracked them in a text file because I was annoyed.
After switching to the schema + startup validation + CI check, I hit 1 env failure in the last 14 days.
And that one was legit: I rotated AUTH_SECRET and forgot to update a preview environment.
Time-wise, I stopped losing 30–60 minutes per deploy.
Now it’s a 10-second failure with a clear message.
Key takeaways
- Put every env var in one schema file. No scattered
process.env. - Validate on server startup, not when a route gets hit.
- Generate
.env.examplefrom something deterministic. Humans won’t keep it updated. - Make CI run
check:envbeforenext build. Fast failure beats log archaeology. - Don’t try to get fancy with AST parsing unless you enjoy pain.
Closing
I’m curious about one specific thing.
Do you prefer strict env validation that fails the build, or do you allow missing optional vars in preview and only enforce them in production?
Top comments (0)