Koritsu Nezumi

Posted on May 9

Don't Validate Your Data

#programming #discuss #software #datascience

Most validation libraries make the same fundamental mistake: they assume your job is to judge whether data is good or bad, and then tell the user what went wrong.

They're wrong about both parts.

The Problem Nobody Talks About

Let's say you're building a form. A user enters a phone number that's too short. A validation library fires back with an error message: "Phone number must be at least 10 digits".

Now what? Your form renderer gets an error string and has to... guess what to do with it. Display it in red? Suggest padding? Disable the submit button? Hide the field? The library doesn't know. You have to write that logic yourself.

But here's the real problem: the library threw away context. It knew:

The input was a string
It was too short
The minimum length required
The actual length Instead, you got a sentence. A string. And now you have to parse it back into facts to do anything useful with it.

This happens everywhere. API validation. ETL pipelines. Data cleaning. We waste cycles converting context into messages, then messages back into context.

A Different Approach

What if you stopped validating and started processing?

Instead of data being "valid" or "invalid," data just has a status. And every processor doesn't judge it just reads the status and transforms it into a new one.

A string processor sees the input and says: "This is a string." Now it's status.string.

A length processor looks at status.string and checks the length. If it's 5 characters and you need 16, it doesn't fail. It transforms to status.string.too-short with all the context intact: the actual value, the minimum length, everything.

Now the next processor in line can make a decision. If it's a recovery processor, it sees status.string.too-short and pads the string. If it's a UI processor, it sees the same status and renders a warning. If it's a logging processor, it sees the status and decides whether to alert someone.

Same data. Infinite interpretations.

No errors. No messages. Just facts.

/**
 * let's process a card number!
 * we already know is a string that has 16 digits
 * but we only want to show the last 4 digits and
 * hide the rest using dots.
 */
const processor = pipe(
  string.length.max(4), // status.string -> status.string.too-big
  string.length.trim.start(), // status.string.too-big -> status.string
  string.length.min(16), // status.string -> status.string.too-short
  string.length.pad.start("."), // status.string.too-short -> status.string
);

const result = await processor({
  _: "status.string",
  actual: "8493681269189712"
});

console.log(result); // { _: "status.string", actual: "............9712" }

Every step is pure. Every status is context-rich. No interpreter needed.

Why This Matters

For UI: Your form sees status.string.too-short and decides to show a text input with a character counter. Another UI might show a dropdown of suggestions. Same status, different UX.

For APIs: Your endpoint sees the same status and returns a machine-readable error code. No parsing error messages from clients. No lossy conversion.

For ETL: Your pipeline sees status.date.unparseable and decides: skip this row, use a default, or escalate. You control the logic, not the library.

For testing: You can generate test data by understanding the statuses your system produces. No mocking error states, the states are real.

The Design Philosophy

This isn't validation. It's not correction. It's state transition.

Every processor:

Reads the current status
Evaluates context
Outputs a new status

That's it. No judgments. No side effects. No special "error" states.

The consumer interprets. Always. That's their job, not the library's.

This scales because:

Status explosion is a feature, not a bug. More statuses = more context for consumers.
Processors are composable. Chain them however you want. The pipeline doesn't care.
No framework lock-in. You're not married to one way of handling errors. You're just piping data.
Internationalization is trivial. Error messages require translation infrastructure. Status codes don't. A consumer in Japan sees status.string.too-short and renders their own message in Japanese. A consumer in Spain sees the same status and renders in Spanish. One source of truth, infinite languages.

What This Enables

With this approach, you can build:

Adaptive pipelines that recover from failures automatically
UI frameworks that infer validation UI from the statuses they receive
Audit logs that capture exactly what happened at each step
Machine-readable contracts for APIs (no English error messages)
Data quality dashboards that show status distributions, not just pass/fail rates

You stop thinking about "valid vs invalid" and start thinking about "what state is this data in, and what should we do about it?"

The Shift

This isn't a small library improvement. It's a philosophical shift about what data processing means.

Most tools were designed by people thinking: "How do I tell users their data is wrong?"

This is for the folx thinking: "How do I give consumers all the context they need to decide what to do?"

It's the difference between judgment and information.

Stop validating. Start processing.

const pipe = (...tasks) => async (input) =>
  tasks.reduce((acc, curr) => acc.then(curr), Promise.resolve(input));

const string = {
  "length": {
    "min": (minLength) => async (status) =>
      status._ == "status.string"
      ? minLength <= status.actual.length
        ? { _: "status.string", actual: status.actual }
        : { _: "status.string.too-short", actual: status.actual, minLength }
      : status,
    "max": (maxLength) => async (status) =>
      status._ == "status.string"
      ? maxLength >= status.actual.length
        ? { _: "status.string", actual: status.actual }
        : { _: "status.string.too-big", actual: status.actual, maxLength }
      : status,
    "pad": {
      "start": (fillString) => async (status) =>
        status._ == "status.string.too-short"
        ? { _: "status.string", actual: status.actual.padStart(status.minLength, fillString) }
        : status,
    },
    "trim": {
      "start": () => async (status) =>
        status._ == "status.string.too-big"
        ? { _: "status.string", actual: status.actual.slice(status.actual.length-status.maxLength) }
        : status,
    }
  },
}

/**
 * let's process a card number!
 * we already know is a string that has 16 digits
 * but we only want to show the last 4 digits and
 * hide the rest using dots.
 */
const processor = pipe(
  string.length.max(4), // status.string -> status.string.too-big
  string.length.trim.start(), // status.string.too-big -> status.string
  string.length.min(16), // status.string -> status.string.too-short
  string.length.pad.start("."), // status.string.too-short -> status.string
);

const result = await processor({
  _: "status.string",
  actual: "8493681269189712"
});

console.log(result); // { _: "status.string", actual: "............9712" }

DEV Community