venkatesh m

Posted on May 13

Type Your File Validation Library as a Security Boundary

#typescript #security #webdev #architecture

A familiar shape

Every codebase I've worked in has had a validation function that looks roughly like this:

interface ValidationResult {
  valid: boolean;
  error?: string;
  data?: File;
}

function validateFile(file: File): ValidationResult {
  if (file.size > MAX_SIZE) {
    return { valid: false, error: 'File too large' };
  }
  if (!ALLOWED_EXTENSIONS.includes(getExtension(file.name))) {
    return { valid: false, error: 'Invalid file type' };
  }
  return { valid: true, data: file };
}

And every codebase has consumers like this:

const result = validateFile(uploadedFile);
if (result.valid) {
  uploadToS3(result.data!);
}

That ! non-null assertion at the end is the tell. The type system doesn't know that data is guaranteed to exist when valid is true. The consumer has to assert it. And once a consumer is asserting it, nothing stops a different consumer from skipping the check entirely:

// Three months later, in a different file
const result = validateFile(uploadedFile);
uploadToS3(result.data!);  // 'valid' check forgotten

The validation function ran. The check happened. The verdict was returned. And the data still made it through, unchecked.

This isn't a hypothetical. It's a failure mode I've seen repeatedly, in codebases written by teams who all knew better. The validator wasn't the problem. The validator's type signature was the problem.

This article argues that the type signature most validation libraries reach for is the worst possible signature for a security-relevant function. The replacement is three TypeScript patterns: discriminated unions, branded types, and exhaustiveness checks. Together they turn a validation library from a runtime check into a compile-time boundary.

Validation as a boundary, not a check

Before getting to types, the framing matters. There are two ways to think about what a file validation function does.

Frame 1: Validation is a check. A function runs some rules over an input and returns a verdict. Consumers can use the verdict however they like.

Frame 2: Validation is a boundary. A function decides whether unvalidated input has passed into a different category of data, called validated data, that downstream code is allowed to treat as trusted. Consumers can only use the new category of data. The raw input remains untrusted.

The modern TypeScript validation libraries (zod, valibot, arktype, effect/schema) have largely moved to Frame 2 for data validation. Their result types are discriminated unions like { success: true; data: T } | { success: false; error: ... } rather than boolean-and-optional-data shapes. If you are validating shape-conforming JSON, the pattern is well-established.

This article is about file validation specifically. The patterns are less established here, the consequences of getting it wrong are more directly security-relevant, and the result type often models more than just success-or-failure. A file can be valid, rejected for any of a dozen reasons, in a pending-scan state while an async scanner runs, or expired because a scan never completed within the quarantine window. The result shape needs to carry that complexity in a way that downstream code cannot ignore.

When you frame file validation as a security boundary, three implications follow:

The validator's output should make it impossible to use unvalidated data as if it were validated.
The validator's failure modes should be enumerable and exhaustively handled by consumers.
The validator should not lie. Silent nulls, swallowed errors, optimistic defaults: these are all forms of lying to the consumer about what the validator actually concluded.

The patterns that satisfy these implications are not new. Discriminated unions, branded types, and exhaustiveness checks have been in TypeScript for years. What is less common is applying all three together specifically to a validation library that exists as a security boundary. That is the focus of the rest of this article.

Why `{ valid: boolean; error?: string }` is the worst possible signature

The shape itself is the problem. It's worth being specific about why, because the same pattern still ships in file upload code across enterprise codebases despite the discriminated-union pattern being available for years.

Failure 1: Optional fields are not type-coupled to the boolean. TypeScript cannot prove that data is present when valid is true. The consumer has to assert it with data! or check defensively with if (data). Neither is enforced. Both are easy to skip.

Failure 2: Error reasons collapse into a single string. The string says "File too large." It doesn't say what the actual size was. It doesn't say what the limit was. It doesn't say what category of attack this looked like. Information needed for UI display, audit logging, security telemetry, and user feedback is flattened into one untyped string that's been read by exactly one log aggregator.

Failure 3: Adding new error types is invisible to existing consumers. When you add a new validation rule like polyglot file detection, the only signal to existing consumers is a new possible value in the error string. There's no compile error. There's no deprecation warning. Consumers keep doing what they were doing, except now there's a new error string they don't recognize, which probably falls into their default "Unknown error" UI branch.

Failure 4: There is no way to enforce that consumers handled the error case. A consumer can read the valid boolean, branch on it, and forget to surface the error string. The type checker has nothing to say. A different consumer might log the error string but ignore which kind of error it was, treating all rejections identically when some need user-visible feedback and others need silent security telemetry.

Failure 5: The signature lies about what validation produces. The validator runs a series of checks. The output should be a witness that those checks passed. Instead it's a boolean that any downstream code can ignore, plus a data field that exists whether or not the validation succeeded. The signature suggests that validation is an opinion. It should suggest that validation is a boundary.

Each of these is a real failure mode I've seen produce real bugs. They are not theoretical. The { valid: boolean; error?: string } shape is so common because it looks innocent. It is not innocent. It is a vector through which validation can be ignored, errors can be lost, and new threat categories can quietly fail to be handled.

TypeScript has the patterns to fix every one of these failures. The next three sections walk through them.

Discriminated unions: making the boundary type-level

Replace this:

interface ValidationResult {
  valid: boolean;
  error?: string;
  data?: File;
}

With this:

type ValidationResult =
  | { status: 'valid'; file: File }
  | { status: 'rejected'; reason: RejectionReason }
  | { status: 'pending-scan'; expiresAt: Date }
  | { status: 'expired'; receivedAt: Date };

That is the entire change at the result-shape level. The implications are substantial.

The file field is now coupled to the valid status. A consumer cannot access result.file without first narrowing the result to the 'valid' case. The type checker enforces this. The non-null assertion disappears entirely. Consumers must write:

const result = validateFile(uploadedFile);
if (result.status === 'valid') {
  uploadToS3(result.file);  // type-safe, no assertion needed
}

Each failure mode has its own shape. A 'rejected' result carries a structured RejectionReason, not a string. A 'pending-scan' result carries the expiration time of the quarantine window. An 'expired' result carries the original receipt time so downstream code can surface "this file was received an hour ago but never scanned, please retry." Each variant carries exactly the data its consumers need.

Adding a new variant becomes a compile-time event. If you add a 'flagged-for-review' variant later, every consumer using exhaustiveness checking (more on that in a moment) will produce a compile error pointing at the unhandled case. New threat categories cannot ship silently.

Now consider the RejectionReason type. The same principle applies one level deeper:

type RejectionReason =
  | { kind: 'size-exceeded'; limitBytes: number; actualBytes: number }
  | { kind: 'extension-not-allowed'; extension: string; allowed: readonly string[] }
  | { kind: 'magic-byte-mismatch'; declared: MimeType; detected: MimeType }
  | { kind: 'polyglot-detected'; detectedAs: readonly MimeType[] }
  | { kind: 'filename-invalid'; reason: FilenameRejection }
  | { kind: 'scanner-flagged'; scanId: string; categories: readonly ThreatCategory[] };

Each rejection kind carries the specific information needed to respond appropriately. The size-exceeded case has the limit and the actual size, so the UI can render "Your file is 12 MB; the limit is 10 MB" instead of just "File too large." The magic-byte-mismatch case has both the declared type and the actually-detected type, which is exactly the information a security audit log needs to investigate whether someone is probing for upload bypass. The scanner-flagged case has the scan ID and threat categories, so the consumer can either show a generic "this file was flagged" message or escalate based on category.

The supporting types (MimeType, FilenameRejection, ThreatCategory) would be defined elsewhere in the library, each as discriminated unions or branded types in their own right. The point here is the shape: each rejection variant carries the data its consumers need, structured for the specific case, not flattened into a string.

The shape of the data carries the meaning of the result. The consumer doesn't have to parse strings or guess at what error: "File too large" means in context. The data is the meaning.

A note on readonly arrays in the type: this is engineering-standards hygiene, not a deep design choice. Validation results should not be mutable by consumers. The readonly modifier prevents accidental mutation of the result before it's used.

Branded types: the validator returns a proof

Discriminated unions close one loophole. They close it well. But there is a second loophole that even discriminated unions don't close. Once a consumer narrows to the 'valid' case, they can extract the file and pass it around as a raw File to any function that accepts a File.

if (result.status === 'valid') {
  someOldFunction(result.file);  // accepts any File, validated or not
}

The result.file is a raw File object. The type system has no memory that it crossed a validation boundary. As soon as it leaves the narrow scope, it's indistinguishable from a File that came directly from a form input with no validation at all.

For most application code, this is fine. For security-relevant code, it isn't. The validator's whole job was to produce a category of data that downstream code can trust. If "trusted" and "untrusted" data have identical types, the trust boundary lives only in the head of the consumer who wrote the narrowing check. That is exactly the kind of boundary that erodes over months as the codebase grows.

Branded types close this loophole:

type Brand<T, B> = T & { readonly __brand: B };

type ValidatedFile = Brand<File, 'ValidatedFile'>;

type ValidationResult =
  | { status: 'valid'; file: ValidatedFile }
  | { status: 'rejected'; reason: RejectionReason }
  | { status: 'pending-scan'; expiresAt: Date }
  | { status: 'expired'; receivedAt: Date };

The ValidatedFile is structurally a File plus a phantom __brand property. At runtime, nothing changes. The __brand property doesn't exist; there is no overhead. At compile time, ValidatedFile is a distinct type from File, and the only way to produce one is to go through the validator.

Downstream functions can now require validated files specifically:

function uploadToS3(file: ValidatedFile): Promise<UploadResult> { ... }
function attachToFormSubmission(file: ValidatedFile, formId: string): void { ... }
function persistToDocumentStore(file: ValidatedFile, metadata: Metadata): Promise<DocumentId> { ... }

A caller that tries to pass a raw File to any of these gets a compile error. The error says, in effect: this function requires a file that has crossed the validation boundary; you have not produced one. The type system enforces the boundary.

The standard objection to branded types: "But I can write someFile as ValidatedFile and bypass the whole thing." That is the point. Branded types are not a sandbox. They are a discipline tool. If a consumer wants to bypass the boundary, they have to write the cast explicitly, and the cast is now a code-review-visible artifact. Reviewers can grep for as ValidatedFile and ask why this code thinks it has a validated file. The cast is not impossible. It is visible, which is what you want.

Compare this to the original signature, where the equivalent bypass of using result.data! without checking valid was indistinguishable from correct code. The cast version is honest about what the consumer is doing. The boolean version was not.

Exhaustiveness checks: making missed cases a compile error

The third pillar. Discriminated unions describe the shape. Branded types enforce the boundary. Exhaustiveness checks ensure consumers handle every case.

Consider a function that converts a ValidationResult into UI state:

function toUIState(result: ValidationResult): UIState {
  switch (result.status) {
    case 'valid':
      return { kind: 'show-success', file: result.file };
    case 'rejected':
      return { kind: 'show-rejection', reason: result.reason };
    case 'pending-scan':
      return { kind: 'show-scanning-spinner', expiresAt: result.expiresAt };
    case 'expired':
      return { kind: 'show-expired-warning', receivedAt: result.receivedAt };
  }
}

This compiles. It runs. It handles all four current cases. And it has a silent bug waiting to happen.

When you add a fifth variant to ValidationResult, like 'flagged-for-review', the compiler does not complain about this function. The switch does not need a default case. The function will return undefined, which TypeScript will type as UIState | undefined if you have strict null checks on, or just UIState if you don't, depending on the configuration. Consumers downstream may render the undefined as a blank UI, or crash, or silently skip the file. The new variant ships silently broken.

The assertNever pattern fixes this:

function assertNever(value: never): never {
  throw new Error(`Unhandled discriminated union variant: ${JSON.stringify(value)}`);
}

function toUIState(result: ValidationResult): UIState {
  switch (result.status) {
    case 'valid':
      return { kind: 'show-success', file: result.file };
    case 'rejected':
      return { kind: 'show-rejection', reason: result.reason };
    case 'pending-scan':
      return { kind: 'show-scanning-spinner', expiresAt: result.expiresAt };
    case 'expired':
      return { kind: 'show-expired-warning', receivedAt: result.receivedAt };
    default:
      return assertNever(result);
  }
}

The assertNever call requires its argument to be of type never. Inside the default case of an exhaustive switch over a discriminated union, the narrowed type is never, because there are no remaining variants. The function compiles.

When you add a fifth variant, the narrowed type at default is no longer never. It is the new variant. The assertNever call now receives a non-never argument, and the compiler produces an error pointing at the exact line. The error says, in effect: this function does not handle a case that the type system thinks is possible.

Every consumer that uses assertNever becomes a guided walk through every place the new variant needs handling. The new threat category does not ship until every consumer is updated. The compiler enforces the audit.

This is the difference between a codebase where adding new validation cases is a controlled, mechanically-guided change and a codebase where the same change is a slow audit chasing down places the new case might silently fail. The mechanism is small. The compounding effect over time is enormous.

A note on RejectionReason: the same pattern applies. Consumers that handle rejection reasons should use exhaustive switches on the kind field, with assertNever at the default. Adding a new rejection kind like polyglot detection, scanner integration, or filename normalization should require updating every consumer that produces user-visible feedback or audit logs. The compiler walks you through them.

What this changes

The combined application of discriminated unions, branded types, and exhaustiveness checks produces a validation library where:

Bypassing the validator is impossible at the type level. Consumers cannot use unvalidated data as if it were validated, because the types are distinct. They cannot forget the validation check, because the data and the verdict are not separable.

Adding new threat categories is a guided rollout. New variants in the result type or new kinds in the rejection reason produce compile errors at every consumer that uses exhaustiveness checking. The compiler tells you exactly where each consumer needs to be updated, before the new variant ships.

The information needed downstream is carried in the result. UI rendering, audit logging, security telemetry, and user feedback each get the structured data they need from the rejection kind they care about. No string-parsing. No guessing. No if (error.includes('size')) heuristics.

The validation library is a security boundary at the type-system level, not just a runtime check. The boundary is enforced by the compiler, not by the discipline of the consumers. It does not erode as the codebase grows.

There is a broader point underneath this. Types in TypeScript are usually framed as a documentation aid, a refactoring tool, or a way to catch typos. For libraries whose entire purpose is to enforce a boundary, whether security boundaries, validation boundaries, or capability boundaries, the type system is the mechanism that makes the boundary enforceable. Modern data validation libraries like zod and valibot use the type system this way. File validation libraries, where the consequences are more directly security-relevant, often don't. They reach for { valid: boolean; error?: string } because it looks simple, and the consequences of that choice ripple through every consumer for the lifetime of the codebase.

File validation libraries should be type-system-first. Discriminated unions for the result shape, including pending-scan and expired states that file pipelines actually have. Branded types for the trusted output, so a ValidatedFile cannot be confused with a raw File once it leaves the narrow scope. Exhaustiveness checks for every consumer of the variants, so new threat categories ship through a compiler-guided audit rather than a silent rollout. The patterns are not new. The combination, applied specifically to file validation as a security boundary, is what makes the difference between a library you can trust and a library that hopes consumers do the right thing.

If this kind of discipline resonates, I codify it more broadly in engineering-skills, a set of Claude Agent Skills that encode production engineering standards. The engineering-standards skill in that package covers the discipline this article describes: no any, discriminated unions over boolean flags for state, and the broader TypeScript hygiene that makes patterns like the ones above ship cleanly.

DEV Community

Type Your File Validation Library as a Security Boundary

A familiar shape

Validation as a boundary, not a check

Why `{ valid: boolean; error?: string }` is the worst possible signature

Discriminated unions: making the boundary type-level

Branded types: the validator returns a proof

Exhaustiveness checks: making missed cases a compile error

What this changes

Top comments (0)

A familiar shape

Validation as a boundary, not a check

Why { valid: boolean; error?: string } is the worst possible signature

Discriminated unions: making the boundary type-level

Branded types: the validator returns a proof

Exhaustiveness checks: making missed cases a compile error

What this changes

Why `{ valid: boolean; error?: string }` is the worst possible signature