Richard Iannone

Posted on Mar 18

Pointblank: Data Validation That's Actually Beautiful

#python #data #opensource #dataquality

Your Data Is Lying to You

That CSV your pipeline ingested at 3 AM? It has NULL customer IDs. The revenue column has negative values. And that status field someone added "cancled" to (yes, misspelled) last Tuesday? It's been silently corrupting your analytics for a week.

Every data team has a version of this story. The uncomfortable truth is that most data quality issues aren't caught: they're discovered (usually by someone staring at a dashboard that doesn't add up).

We built Pointblank to change that.

What Is Pointblank?

Pointblank is an open-source Python library for assessing and monitoring data quality. You define validation rules, Pointblank interrogates your data, and you get clear, visual reporting that the whole team can act on.

It works with the tools you already use: Polars, Pandas, DuckDB, PostgreSQL, MySQL, SQLite, Parquet, PySpark, and Snowflake. No new infrastructure required.

What makes Pointblank different from other validation libraries? Two things:

Communication-first design. Validation results are rendered as beautiful, interactive HTML reports: not raw exceptions or log lines. These reports are made for sharing with stakeholders, not just debugging by engineers.
A composable, chainable API. You build validation plans step by step using a fluent interface that reads like a specification of what your data should look like.

See It in Action

Here's the core pattern in three parts (and the pattern always holds):

import pointblank as pb

validation = (
    pb.Validate(data=pb.load_dataset(dataset="small_table"))
    .col_vals_lt(columns="a", value=10)
    .col_vals_between(columns="d", left=0, right=5000)
    .col_vals_in_set(columns="f", set=["low", "mid", "high"])
    .col_vals_regex(columns="b", pattern=r"^[0-9]-[a-z]{3}-[0-9]{3}$")
    .interrogate()
)

That produces a validation report table like this:

Each row is a validation step. The left side shows your rules; the right side shows results like total test units, how many passed, how many failed, and whether any thresholds were exceeded. Failures even have a CSV download button so stakeholders can inspect the offending rows directly.

Go Deeper: Thresholds and Actions

Real-world validation isn't just pass/fail. Pointblank lets you set warning, error, and critical thresholds so you can distinguish between "a few oddities" and "the pipeline is on fire":

import pointblank as pb

validation = (
    pb.Validate(
        data=sales_data,
        thresholds=(0.01, 0.02, 0.05),
        actions=pb.Actions(
            critical="Major data quality issue in step {step} ({time})."
        ),
    )
    .col_vals_between(columns=["price", "quantity"], left=0, right=1000)
    .col_vals_not_null(columns=pb.ends_with("_id"))
    .col_vals_regex(
        columns="email",
        pattern=r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    )
    .col_vals_in_set(
        columns="status",
        set=["pending", "shipped", "delivered", "returned"]
    )
    .interrogate()
)

When a threshold is breached, actions fire: print a message, send a Slack notification, trigger a web hook, or run custom Python code. Your validation plan becomes an active part of your pipeline, not an afterthought.

Let AI Write the First Draft

Not sure where to start? Pointblank's DraftValidation feature uses an LLM to analyze your data and generate a complete validation plan:

import pointblank as pb

data = pb.load_dataset("game_revenue")
pb.DraftValidation(data=data, model="anthropic:claude-sonnet-4-5")

It examines your schema, value distributions, and patterns, then produces a full validation plan with sensible rules. Use it as-is, or refine it. Either way, you go from zero to validated in a really short amount of time.

A CLI for Your Terminal and CI/CD

Pointblank also ships with a command-line interface so you can validate data without writing any Python:

# Preview a dataset
pb preview sales.csv

# Check for missing values
pb missing data.parquet

# Run a quick validation
pb validate sales.csv --check col-vals-not-null --column customer_id

# Execute a full YAML-based validation plan
pb run validation.yaml

# Use exit codes for CI/CD pipelines
pb run validation.yaml --exit-code

That --exit-code flag means you can drop Pointblank into a GitHub Actions workflow or any CI/CD system and fail builds when data quality degrades. Validation as a gate, not a suggestion.

YAML-Driven Validation for Teams

For team workflows, define your validation plan in a version-controlled YAML file:

validate:
  data: sales_data.csv
  tbl_name: "sales_data"
  label: "Daily sales validation"

steps:
  - col_vals_gt:
      columns: "revenue"
      value: 0
  - col_vals_not_null:
      columns: ["customer_id", "order_id"]
  - col_vals_in_set:
      columns: "status"
      set: ["pending", "shipped", "delivered", "returned"]

Review it in a PR, run it in CI, share it across environments. Data quality rules treated with the same rigor as application code.

Reports in 50 Languages

Your team is global? So are Pointblank's reports. Validation output renders in 50 languages (German, Japanese, Portuguese, Chinese, and many more) so every stakeholder reads results in their own language.

The Case for Validating Your Data

If you're not validating your data, you're trusting it on faith. And data, left unchecked, has a way of drifting:

Schema changes happen without warning when upstream teams restructure tables
Null values creep in when optional fields become required
Business logic violations go unnoticed until a quarterly report doesn't reconcile
Type mismatches cause silent failures in pandas operations that return NaN instead of errors

The cost of catching these issues after they've propagated through your pipeline is orders of magnitude higher than catching them at the source.

So validate early, validate often, and validate thoroughly!

Get Started

Pointblank is MIT-licensed, pip-installable, and ready to rock:

pip install pointblank

From there, the Quickstart guide will have you running your first validation in under five minutes.

Here's where to go next:

GitHub: github.com/posit-dev/pointblank (star the repo, file issues, contribute!)
Documentation: posit-dev.github.io/pointblank (full user guide, API reference, demos)
PyPI: pypi.org/project/pointblank (install the latest release)

Here's a presentation I made that talks about Pointblank in the context of 'nice' Python packages.