Darshan Bamankar

Posted on Jun 3

I Built My First Python Library From Scratch — Here's Everything I Learned

#programming #python #opensource #beginners

A few weeks ago I had a simple goal: understand how Python libraries actually work — not just how to use them, but how they're built, packaged, and shipped.

So I built one. From scratch. No tutorials that skip the hard parts. No boilerplate generators. Just me, Python, and a lot of mistakes.

The result is valify — a data validation library that's now at v0.7.0 with 2,000+ downloads. Here's everything I learned along the way.

Why Build a Validation Library?

I picked validation because it's something every project needs, it's genuinely useful, and it touches every important part of the Python ecosystem:

Clean OOP design
Custom exceptions
Type hints and mypy
Packaging and PyPI
Testing with pytest
Documentation with Sphinx

It also gave me a chance to study how real libraries like pydantic and marshmallow work under the hood.

The Project Structure That Professionals Use

The first thing I learned is that Python library structure matters a lot more than I thought.

Most tutorials show you this:

myproject/
├── mypackage/
│   └── __init__.py
└── setup.py

But professional libraries use the src layout:

valify/
├── src/
│   └── valify/          ← the actual package
│       ├── __init__.py
│       ├── exceptions.py
│       ├── validators.py
│       └── schema.py
├── tests/
├── docs/
├── pyproject.toml
├── README.md
└── CHANGELOG.md

Why src/? Without it, when you're in your project folder and run import valify, Python might import your local development files instead of the installed package. The src/ folder prevents that subtle bug entirely.

pyproject.toml — The Modern Standard

The old way of packaging Python involved three files: setup.py, setup.cfg, and MANIFEST.in. It was a mess.

Today, everything lives in one file:

[build-system]
requires = ["setuptools>=68", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "valify"
version = "0.7.0"
description = "A composable, expressive data validation library for Python"
readme = "README.md"
requires-python = ">=3.10"
authors = [
    {name = "Darshan Bamankar", email = "darshanbamankar7@gmail.com"}
]
dependencies = []

dependencies = [] is intentional — valify has zero external dependencies. This is a design goal. A good library doesn't bloat its users' environments.

Write Your Exceptions First — Always

This is the lesson I wish someone had told me before I started.

Every real library defines its own exception hierarchy. Here's why:

# Without custom exceptions — which library raised this?
except ValueError:
    ...

# With custom exceptions — crystal clear
except valify.ValidationError as e:
    print(e.field)    # which field failed
    print(e.value)    # what value was rejected
    print(e.message)  # human readable message

Here's valify's exception hierarchy:

Exception
└── ValifyError              ← base — catch everything valify raises
    ├── ValidationError      ← a value failed validation
    │   └── RequiredFieldError  ← a required field was missing
    └── SchemaError          ← the schema definition is invalid

The key design decision: every exception stores structured data as attributes, not just a string message. This lets callers inspect errors programmatically.

class ValidationError(ValifyError):
    def __init__(self, message: str, *, field: str | None = None, value: Any = None) -> None:
        self.message = message
        self.field = field      # "name"
        self.value = value      # "A"
        full_message = f"[{field}] {message}" if field else message
        super().__init__(full_message)

Validators as Objects — The Strategy Pattern

The core insight of valify's design: validators are objects, not functions.

# Function approach — can't reuse or compose
validate_string("hello", min_length=2)

# Object approach — reusable, composable
v = StringValidator(min_length=2)
v.validate("hello")

Every validator inherits from a base class:

class Validator:
    def validate(self, value: Any) -> Any:
        raise NotImplementedError(
            f"{type(self).__name__} must implement validate()"
        )

    def to_json_schema(self) -> dict[str, Any]:
        raise NotImplementedError(
            f"{type(self).__name__} must implement to_json_schema()"
        )

This is the Strategy Pattern — each validator encapsulates one validation strategy. Because they're objects, you can store them in dictionaries, pass them around, and compose them together in a Schema.

The Detail That Trips Everyone Up — bool is a subclass of int

Python has a quirk that burned me early:

isinstance(True, int)   # True !!
isinstance(False, int)  # True !!

bool is a subclass of int in Python. This means if you check int first, True and False pass as valid integers. The fix:

if not isinstance(value, int) or isinstance(value, bool):
    raise ValidationError(...)

Always check bool before int. This pattern appears in IntValidator, FloatValidator, and Schema.from_example().

Accumulating Errors — The Most Important UX Decision

Most validation libraries stop at the first error:

❌ name: too short
# stops here, never checks age or email

valify collects ALL errors before raising:

❌ name: Must be at least 2 characters long.
❌ age: Must be at least 0.
❌ email: 'bad' is not a valid email address.

The implementation uses a simple dict to collect errors:

def validate(self, data: dict[str, Any]) -> dict[str, Any]:
    errors: dict[str, str] = {}
    result: dict[str, Any] = {}

    for field_name, validator in self.fields.items():
        if field_name not in data:
            errors[field_name] = "Required field is missing."
            continue
        try:
            result[field_name] = validator.validate(data[field_name])
        except ValidationError as e:
            errors[field_name] = e.message  # collect, don't raise

    if errors:  # raise everything at once
        raise ValidationError(...)

    return result

This single design decision makes valify dramatically more useful for real applications.

Schema.from_example() — The Killer Feature

This is what makes valify unique. No other validation library does this:

schema = Schema.from_example({
    "name":    "Darshan",
    "age":     20,
    "email":   "darshan@example.com",
    "score":   9.5,
    "active":  True,
    "address": {
        "city": "Pune",
        "pin":  "411001",
    },
    "tags": ["python", "developer"],
})

valify looks at your sample data and automatically infers:

"Darshan" → StringValidator()
20 → IntValidator()
"darshan@example.com" → EmailValidator() (detected via regex)
9.5 → FloatValidator()
True → BoolValidator()
{...} → nested Schema() (recursive!)
[...] → ListValidator() (inferred from first item)

The implementation uses a @classmethod — a factory method that creates a new Schema instance:

@classmethod
def from_example(cls, example: dict[str, Any]) -> "Schema":
    fields: dict[str, Validator] = {}

    for key, value in example.items():
        if isinstance(value, bool):      # bool before int — critical!
            fields[key] = BoolValidator()
        elif isinstance(value, int):
            fields[key] = IntValidator()
        elif isinstance(value, float):
            fields[key] = FloatValidator()
        elif isinstance(value, str):
            if _EMAIL_RE.match(value):
                fields[key] = EmailValidator()
            else:
                fields[key] = StringValidator()
        elif isinstance(value, dict):
            fields[key] = cls.from_example(value)  # recursion!
        elif isinstance(value, list) and value:
            # infer from first item
            ...

    return cls(fields)

JSON Schema Export

Every validator in valify can export itself as standard JSON Schema:

from valify import Schema, StringValidator, IntValidator, EmailValidator
from valify import OptionalValidator
import json

schema = Schema({
    "name":  StringValidator(min_length=2),
    "age":   IntValidator(min_value=0, max_value=120),
    "email": EmailValidator(),
    "bio":   OptionalValidator(StringValidator(), default=""),
})

print(json.dumps(schema.to_json_schema(), indent=2))

Output:

{
  "type": "object",
  "properties": {
    "name": {"type": "string", "minLength": 2},
    "age":  {"type": "integer", "minimum": 0, "maximum": 120},
    "email": {"type": "string", "format": "email"},
    "bio":  {"anyOf": [{"type": "string"}, {"type": "null"}]}
  },
  "required": ["name", "age", "email"]
}

This means valify schemas can be used to generate OpenAPI/Swagger documentation, validate JSON APIs, and integrate with any tool that understands JSON Schema.

Type Hints and mypy — Non-Negotiable

Every method in valify is fully typed:

def validate(self, value: Any) -> str:
    ...

def to_json_schema(self) -> dict[str, Any]:
    ...

Running mypy src/valify with strict mode passes with zero errors. This isn't just for show — it catches real bugs before runtime and makes the library a pleasure to use in IDEs with autocomplete.

One lesson: always annotate local variables when mypy can't infer the type:

# mypy infers dict[str, str] — wrong!
schema = {"type": "string"}
schema["minLength"] = 2  # error: int is not str

# explicit annotation — correct
schema: dict[str, Any] = {"type": "string"}
schema["minLength"] = 2  # ✅

Testing — 76 Tests and Counting

Every feature has tests. Every validator, every edge case, every error path:

class TestIntValidator:
    def test_bool_rejected(self):
        v = IntValidator()
        with pytest.raises(ValidationError):
            v.validate(True)  # bool is not int!

    def test_coerce_string_to_int(self):
        v = IntValidator(coerce=True)
        assert v.validate("42") == 42

The key testing principles I learned:

One assertion per test — when it fails, you know exactly what broke
Test the sad path as much as the happy path
Group tests in classes with setup_method for shared setup

The Numbers

2,000+ downloads in the first week
7 versions shipped
76 automated tests
0 external dependencies
Live at https://valify.readthedocs.io

What's Next

0.8.0 — Schema.is_valid(), Schema.errors(), RegexValidator
Flask/FastAPI integration
CLI tool — valify validate data.json schema.json
Road to 1.0

Try It

pip install valify

from valify import Schema, StringValidator, IntValidator, EmailValidator

schema = Schema({
    "name":  StringValidator(min_length=2),
    "age":   IntValidator(min_value=0, max_value=120),
    "email": EmailValidator(),
})

schema.validate({
    "name":  "Darshan",
    "age":   20,
    "email": "darshan@example.com",
})

📦 PyPI: https://pypi.org/project/valify/
⭐ GitHub: https://github.com/DarshanBamankar/valify
📖 Docs: https://valify.readthedocs.io

The Biggest Lesson

Building a library teaches you things that no tutorial can. You stop being a consumer of the ecosystem and start understanding how it actually works.

If you've been thinking about building your own library — just start. Pick something small, build it properly, ship it. The Python community is welcoming and the tooling has never been better.

And if you find valify useful, a ⭐ on GitHub means the world to a new open source maintainer.

DEV Community