Super Schema Architecture

Alexey Balekhov — Fri, 05 Jun 2026 07:44:20 +0000

This article describes an approach to application development based on a single, highly detailed format for describing domain entities and contracts. It provides practical examples of how such descriptions can be used and demonstrates how declarative schemas can bring some of the conveniences of low-code platforms into conventional full-code programs.

The approach is independent of any particular technology stack and is especially useful in heterogeneous systems. For that reason, the examples draw from a variety of programming languages and technologies: Java, Python, TypeScript, REST, GraphQL, and Protocol Buffers.

Introduction

Every program operates on data. Most projects contain numerous declarative descriptions of that data: object-oriented class definitions, database table schemas, GraphQL and protobuf schemas, and so on. Each of these descriptions serves a specific technical purpose.

A PostgreSQL table schema exists so that the database can store data, while an ORM schema exists so that application code can interact with the database. Although both describe the same information, they are neither equivalent nor interchangeable. An ORM schema may not support defining database constraints, yet it may contain auxiliary metadata that does not exist in the database itself. Nevertheless, the logical connection between the two schemas is obvious. At least column names and types must remain consistent.

Example. Prisma ORM schema:

model users {
  id         String   @id @db.Uuid @default(uuid(7))
  email      String   @unique
  birth_date DateTime @db.Date
}

PostgreSQL schema:

CREATE TABLE users (
    id UUID PRIMARY KEY,
    email TEXT NOT NULL UNIQUE,
    birth_date DATE NOT NULL,

    CONSTRAINT birth_date_in_past
        CHECK (birth_date < CURRENT_DATE)
);

The Prisma schema cannot express the constraint. However, it contains additional information about the default value for the id column, which is used by the ORM. Apart from that, the schemas largely duplicate each other.

Similar reasoning applies to metadata that is less tightly coupled. Consider an OpenAPI specification in the same project. API entities may have nothing in common with database models. The connection between the database and the API is implemented in endpoint code, and that code can be arbitrarily complex. However, when an endpoint merely passes data through without transformations, the API and database schemas may again duplicate metadata.

OpenAPI:

paths:
  /users/{id}:
    get:
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: string
            format: uuid

      responses:
        "200":
          description: User
          content:
            application/json:
              schema:
                type: object
                required:
                  - id
                  - email
                  - birth_date
                properties:
                  id:
                    type: string
                    format: uuid
                  email:
                    type: string
                  birth_date:
                    type: string
                    format: date

Unlike the ORM example, this schema describes an independent entity. Yet it is not entirely disconnected from the database model. Since data is passed through unchanged in this simplified example, any metadata applicable to the database model is equally applicable to the endpoint response. On the one hand we may not want to expose all of that metadata in the API contract. However, if we do need to duplicate some of it, maintenance becomes a problem. For example, detailed documentation attached to a database column may also need to be provided and kept up to date for API consumers.

My point is that the practice of completely separating database and API entities is often considered correct only because our tools do not always allow us to express more sophisticated relationships between them. We will return to this later. For now, let us focus on simple CRUD systems where entities at different layers effectively duplicate one another.

Naturally, the software industry has long recognized the problem of metadata duplication and has produced countless converters and code generators: OpenAPI or JSON Schema to programming language structures and back, XML Schema to XML validation code, even protobuf schemas to SQL. In my view, these solutions focus on isolated problems rather than the bigger picture.

SSA Level 0: Metadata Consolidation

Validation, Transport, and Storage

Consider the same CRUD system, but now with a web interface. Imagine the simplest possible implementation: a form that directly writes data into the database. What path does a piece of data travel through such an application? Let us examine a single birth date field.

The process starts with a UI form. The date picker stores its value as a JavaScript Date object.
The frontend may validate the value immediately. A birth date must always be in the past.
The data is converted into a format suitable for API transport. The in-memory Date object is serialized into a JSON string.
The backend converts the JSON into a more convenient in-memory representation. The date string becomes a backend language object.
The data is validated.
The backend converts the value into a representation suitable for database storage.

React fragment example:

const [birthDate, setBirthDate] = useState<Date>();

const handleSubmit = () => {
    if (!birthDate || birthDate.valueOf() > new Date().valueOf()) {
        throw new Error("Birth date must be in the past");
    }

    await fetch(`/api/users/${userId}`, {
        method: 'PATCH',
        body: JSON.stringify({
            birth_date: birthDate.toISOString().slice(0, 10)
        })
    });
}

JSON:

{ "birth_date": "1990-05-20" }

Java:

public record PatchUserRequest(
    LocalDate birth_date
) {}

if (!request.birth_date().isBefore(LocalDate.now())) {
    throw new ValidationException();
}

PostgreSQL:

birth_date DATE NOT NULL

How many distinct forms of metadata exist along this path?

Frontend in-memory representation (Date).
Validation rules.
JSON representation (formatted string).
Backend in-memory representation (LocalDate).
Database schema (DATE column).

At least four different representations are involved. Yet all of them are logically connected. Within a project, we can agree on how dates should be represented in each form and formally define the transformations between them. To do so, we introduce our own super schema format that captures all relevant metadata in one place.

What should such a super schema look like? It could be YAML, a dedicated DSL, or simply declarations in a general-purpose language. For further examples I will use an imperative pseudocode:

User = Model({
  key: 'id',
}, {
  id: Uuid({ autogenerate: 'v7' }),
  email: Unique(Email()),
  birth_date: PlainDate({ forbidFuture: true }),
})

The exact format is not important. Completeness is. From this schema we can automatically generate TypeScript types, validation rules, JSON serialization logic, backend DTOs, and validation code. Since the description is technology-agnostic, we are less dependent on specific frameworks. For example, we could generate Zod schemas and start using them on the frontend without changing the source definition.

These super schemas become the single source of truth for all derived metadata. They define and synchronize in-memory representations, encoding rules, validation rules, and database storage types. This ensures consistency and type safety throughout the entire data lifecycle.

Presentation Logic

The information contained in a super schema can also be useful at the presentation layer.

If a field represents a phone number, we can automatically format it for display (+1 234-567-890). For input fields, we can automatically apply masks and request a numeric keyboard on mobile devices. This can be achieved by mapping data types to view components.

Such an approach enables reusable UI abstractions that work for any data described by a schema. For example, forms and data tables can be generated automatically.

At first glance, super schemas may appear to violate principles such as the Single Responsibility Principle. One declaration influences multiple architectural layers. The inclusion of presentation-related metadata may seem especially surprising.

In reality, the super schema does not mix architectural layers. It merely groups together facts about a particular type of data. Each layer consumes the information relevant to it.

The key idea is that a super schema describes data as completely as possible, not how the data should be used. Once we understand the nature of the data, we can leverage that knowledge across many different domains.

SSA Level 1: Derived Super Schemas

So far we have considered a degenerate case where information simply flows from one representation to another.

Even under this limitation, SSA can be useful in large systems. I once worked on the logging subsystem of a streaming platform. To eliminate bugs caused by schema mismatches, we generated Java, Swift, Python, and TypeScript DTOs, protobuf schemas, ClickHouse and Impala schemas and migrations from a single source of truth.

Let us now move to the general case.

The fact that a super schema can be used across all layers does not mean that the same schema should be used everywhere.

Real systems transform and enrich data between user input, API calls, and database writes. Therefore, schemas for presentation, transport, and storage may differ. To handle this, we need schema transformation operations that allow one schema to be derived from another.

At a minimum, we need:

Selecting a subset of fields from an existing schema.
Combining fields from multiple schemas into a new schema.

These operations allow us to express relationships between data models while avoiding duplication.

UserInApi = Compose(
  Pick(User, ['id', 'email', 'birth_date']),
  Pick(Profile, ['avatar']),
  Object({ age: Integer })
)

Here, UserInApi is derived from User and Profile and extended with an additional age field.

Although all of these are super schemas, nothing prevents us from using UserInApi exclusively at the API layer while using User and Profile only for persistence and, for example, caching.

SSA Level 2: API Contracts

The next logical step is to describe API contracts by combining input schemas, output schemas, and endpoint metadata.

userPatchEndpoint = createEndpoint({
    method: 'PATCH',
    path: '/users/:userId',
    params: { userId: Uuid() },
    body: Pick(User, ['birth_date']),
    result: UserInApi,
})

This abstraction is similar to endpoint definitions in OpenAPI, GraphQL, or gRPC and can be used to generate those specifications. But this information can also be leveraged at runtime:

perform additional serialization transformations when the transport layer does not support them natively;
abstract over transport protocols and support multiple protocols simultaneously;
preserve metadata for purposes beyond transport and validation.

For example, a frontend application could use such a specification not only to render forms but also to perform requests from generic code.

SSA Level 3: Semantic Operations

We can go even further.

Instead of merely describing endpoints, we can define endpoint categories. For example, a list endpoint accepts filtering, sorting, and pagination parameters and returns a collection of entities with a specific schema. So these abstractions represent contracts at a higher semantic level. They describe not only transport details but also the meaning of an operation.

userListEndpoint = createListEndpoint(User)
userUpdateEndpoint = createUpdateEndpoint(User)

This higher-level abstractions can, for example, be used to automate CRUDL workflows throughout the system. On the frontend, generic runtime logic can automatically update caches when a request is made to the update endpoint or build a table with server-side sorting, filtering, and pagination by the list endpoint:

<ListEndpointTable endpoint={userListEndpoint} />

For simple cases on the backend, we can automatically generate a handler for the endpoint:

app = FastAPI()
handleListEndpoint(app, userListEndpoint)

At the same time, developers remain free to drop down to lower levels of abstraction whenever necessary. A list endpoint can still have a completely custom backend implementation or be consumed by custom frontend logic.

This gives us many of the benefits associated with low-code platforms while preserving full control over application behavior.

Challenges

At present, I am not aware of any mature technology designed specifically for describing data independently of a particular use case.

Most metadata formats support extensions (OpenAPI, Protobuf, GraphQL, and others), but their focus on a specific technology and its type system makes them inconvenient for higher-level abstractions and schema derivation. A more promising candidate may be Microsoft's recently introduced TypeSpec. However, it remains heavily focused on network APIs.

As a result, one obvious challenge is implementing metadata transformations yourself. While code generators are usually straightforward, early-stage projects may lack the resources to build them, and adopting SSA in mature systems can require significant effort.

In large organizations, the benefits may not be immediately obvious to individual developers. Metadata duplication is often scattered across different parts of a system and therefore does not appear to be a single problem. SSA may seem relevant only to CRUD applications and admin panels. However, in real systems, data spreads across many surfaces: internal services, integrations, analytical pipelines, and more. The cost of propagating knowledge about data grows because it generates communication overhead. In such environments, having a single source of truth and a rigorous description of all data flows may be the most valuable consequence of SSA.

Conclusion

I tried to keep this article grounded in practical problems commonly encountered in mainstream software development. These problems are often treated as unavoidable and their impact tends to be underestimated.

At the same time, I wanted to demonstrate how a more declarative approach can unlock possibilities we might not otherwise consider. The description of SSA Level 3 only scratches the surface; many potential applications remain outside the scope of this article.

I believe that the availability and completeness of metadata represent an important step forward in software engineering practices.

Know your data.

I am actively seeking a senior or lead software engineering role. If you think my experience could be a good fit for your team, feel free to reach out.

DEV Community: Alexey Balekhov