DEV Community

Cover image for Data Validation: Scratching Surface of Code Responsibility
Alexander Demin
Alexander Demin

Posted on • Originally published at Medium

Data Validation: Scratching Surface of Code Responsibility

Introduction

As responsible developers, our goal is to write code that is not just functional but also clear, maintainable, and adaptable. In this process, we often face a key but somewhat elusive question: the responsibility problem in clean code. Hold on, don’t close the page just yet. I’m not planning to describe the well-known Single Responsibility Principle (SRP). Instead, the idea is to take a look at something broader and yet sometimes more tricky — where should specific functionalities be placed within a system’s architecture?

To bring this topic into focus, let’s take a look at various elements that exist in almost any modern software system — data validation, error handling, database transaction management, caching logic, logging, metrics, and more. Each of these components has an important role, and yet, their optimal placement within a system’s architecture is a subject of ongoing debate (or it should be).

The idea of this blog post is to showcase my current view on some parts of this question and, hopefully, to help some of you architect your services in a cleaner way. If it raises more questions than answers, even better. In discussions, we can reveal the full picture and make an informed decision.

But before we dive into the main topic, let’s first go over some basic concepts. These are important for understanding the ideas we’ll discuss and how they affect decisions in software design.

Just One More Look To The Layered Service Architecture

Let’s think about layered architecture in software not as a clear abstraction, but as a company with different departments, each doing its part to provide value. Understanding this helps us figure out where to put different parts of our code later.

Imagine a software system as a company. In this company, we have different departments working together, each with a specific role:

  • Incoming Communication Layer (Transport Department). This layer acts as the bridge between external inputs (like user requests, and messages from other services) and the internal business logic of the software. It is responsible for receiving data, transforming it into a format understood by the domain layer, and forwarding it appropriately. This layer ensures that the data communication to and from the business logic is streamlined and efficient.
  • Business Logic Layer (Production Department). Think of this as the production department of the company. It takes the requests from the transport department and works on them. This is where the main action happens — the rules and processes of the software are applied here.
  • Persistence Layer (Storehouse). This is like the storehouse of the company. Once the production department has processed the request, the results (data) need to be stored. This layer handles saving data to databases or files and retrieving it when needed.

Though this analogy might seem like an oversimplification, it isn’t. This definition helps to make an informed decision later when deciding where to put what functionality. And it is important to understand not only what each layer does but, equally crucial, what it DOES NOT do.

Of course, there are usually more components than just those listed here, but overall, they typically fall in the same general logic (and it really does not matter what approach you follow — onion, hexagonal, or anything else):

  • Driving infra layer with synchronous HTTP/gRPC/console handlers, async events/commands consumers.
  • Domain layer with all the business logic.
  • Driven infra layer with database repositories, caches, async events/commands publishers, clients to other services.

Thinking about layered architecture like a company with different departments helps us see how each part of our software has its own job. Each layer focuses on its role without stepping into others’ tasks. This organization makes our software easier to work with, fix, and change.

If you, by any chance, are not familiar with these concepts, I’d suggest checking out some dedicated articles, there is an enormous amount of them on the internet.

Validation Place Is Obvious. Well, Not Quite.

Now that we’ve refreshed our understanding of the layers concept in software, let’s dive into a specific aspect of functionality placement. To keep things concise, we’ll focus on one example, exploring which part of the validation should live in which layer.

Typical Framework Approach to Validation

You might ask — why is it even a question? Don’t we do it for years and this functionality is supported by the majority of existing frameworks, right out of the box? Well, yes, and not really. Most frameworks offer a straightforward approach to validation. Here are just a few examples:

  • In Java Spring, you might use the @Valid annotation in the controller parameter, along with specific annotations for each important field in the DTO (Data Transfer Object).
@Entity
public class User {
    @NotNull
    @Size(min = 5, max = 255)
    private String name;

    @NotNull
    @Min(18)
    @Max(150)
    private Integer age;
}

@RestController
public class UserController {
    @PostMapping("/users")
    ResponseEntity<String> addUser(@Valid @RequestBody User user) {
        // ...
    }
}
Enter fullscreen mode Exit fullscreen mode
  • PHP Laravel framework typically involves explicit request validation right in the controller.
public function addUser(Request $request) {
   $validated = $request->validate([
      'name' => 'required|min:5|max:255',
      'age' => 'numeric|min:18|max:150',
   ]);
}
Enter fullscreen mode Exit fullscreen mode
  • In Golang, despite the real absence of frameworks, many libraries follow a similar approach for validation. Let’s take a look at an example with github.com/go-playground/validator/v10 library:
type User struct {
    Name string `json:"name" validate:"required,min=5,max=255"`
    Age  int    `json:"age" validate:"gte=18,lte=150"`
}

func (a *API) createUserHandler(w http.ResponseWriter, r *http.Request) {
    var user User

    err := json.NewDecoder(r.Body).Decode(&user)
    if err != nil {
        // write error to the response
        return
    }

    validate := validator.New()

    err = validate.Struct(user)
    if err != nil {
        errors := err.(validator.ValidationErrors)
        // write errors to the response, 
        // most likely mapping specific validator errors to something
        // that can be understood by a client
        return
    }

    user, err = a.userService.CreateUser(user.Name, user.Age)
    if err != nil {
        // write error to the response
        return
    }

    w.WriteHeader(http.StatusOK)
    // write created user to the response
}
Enter fullscreen mode Exit fullscreen mode

In these examples, the validation is done right in the Incoming Communication Layer, before any data is sent to the Domain Layer.

A Closer Look at Layer Responsibilities

When we examine the roles of the Incoming Communication Layer and the Domain Layer a bit deeper, a different approach to validation emerges:

  • Incoming Communication Layer is where data first enters our system. This layer is primarily concerned with transporting data, such as transforming received data into a format the Domain Layer can understand. Usually, it’s not aware of the deeper business logic or requirements.
  • Domain Layer is where the main action of our software occurs. This layer knows all the requirements and business invariants.

Considering this separation of responsibilities, does it still look like a good idea to put detailed validation to the infra layer? Just imagine if your transport department receives accounting reports, opens all the boxes, and examines received documents on validity without asking the responsible department. They simply can’t do it even if they want to — there is not enough knowledge in place to decide what is right or wrong.

What the Incoming Communication Layer (read “HTTP handler/controller”) can really do is only perform basic checks like ensuring the data is in a generally correct format (e.g., valid JSON), check the payload size, and maybe do other checks related to its area of responsibility.

On the other hand, the Domain Layer has all the information and really can perform thorough validation, ensuring the data is meaningful and matches with the domain requirements. This even includes checking for mandatory fields, the length of strings, and the correctness of field formats.

This approach differs from the simple usage of a validation library in an HTTP Controller/Handler. It creates a clear separation of responsibilities between the Infrastructure (Incoming Communication Layer) and the Domain Layer for the task of data validation:

  • Incoming Communication Layer has limited knowledge of payload internals, focuses on basic payload format and size checks.
  • Domain Layer is fully informed about the data, checks data against business rules, ensuring it meets all necessary criteria.

Adopting this idea prevents business logic from leaking outside the Domain Layer. Each layer does its part efficiently, supporting each other without overstepping boundaries. This method ensures that our software remains well-organized, with clear responsibilities, leading to better maintainability and scalability.

Let’s take a look at the revised example in Golang with the discussed separation of responsibilities:

package api

type User struct {
    Name string `json:"name"`
    Age  int    `json:"age"`
}

func (a *API) createUserHandler(w http.ResponseWriter, r *http.Request) {
    var dto User

    err := json.NewDecoder(r.Body).Decode(&dto)
    if err != nil {
        // write error to the response
        return
    }

    user, err = a.userService.CreateUser(dto.Name, dto.Age)
    if err != nil {
        // proper domain error handling
        return
    }

    w.WriteHeader(http.StatusOK)
    // marshal user entity and write to the response
}
Enter fullscreen mode Exit fullscreen mode
package domain

type User struct {
    Name string
    Age  int
}

var ErrUserTooYoung := NewValidationError("user_too_young")
var ErrUserTooOld   := NewValidationError("user_too_old")
// define other needed errors

func NewUser(name string, age int) (*User, error) {
    if age < 15 {
        return nil, ErrUserTooYoung
    }

    if age > 150 {
        return nil, ErrUserTooOld
    }

    // ... other validation checks

    return User{Name:name, Age:age}, nil
}

type UserService struct {
    // ...
}

func (s *UserService) CreateUser(name string, age int) (*User, error) {
    user, err := NewUser(name, age)
    if err != nil {
        return nil, err
    }

    // store user in the database

    return user, nil
}
Enter fullscreen mode Exit fullscreen mode

If you feel that the NewUser function becomes too complex, it might be a signal that it is time to use the Value Object approach. Some would even say that it is better to use Value Object from the very beginning.

Let’s say with Age as a Value Object the example could look like this:

package domain

type Age int

func NewAge(age int) (Age, error) {
    if age < 15 {
        return 0, ErrUserTooYoung
    }

    if age > 150 {
        return 0, ErrUserTooOld
    }

    return Age(age), nil
}

type User struct {
    Name string
    Age  Age
}

// NewUser already accepts the Age value object instead of a generic int.
func NewUser(name string, age Age) (*User, error) {
    // ... other validation checks

    return User{Name:name, Age:age}, nil
}

func (s *UserService) CreateUser(name string, age int) (*User, error) {
    // here we create age as a value object which validates itself
    userAge, err := NewAge(age)
    if err != nil {
        return nil, err
    }

    user, err := NewUser(name, userAge)
    if err != nil {
        return nil, err
    }
}
Enter fullscreen mode Exit fullscreen mode

And so on. From the first look, it might seem like more complex code, but in reality, it just becomes explicit — it is always clear what errors the code can return and in what format. It does not hide business rules but instead showcases them directly, right in the domain layer where they belong. With this approach, it becomes nearly impossible to forget something and create a wrongly structured domain entity. Validation becomes a first-class citizen of the domain.

Benefits

  • By segregating validation responsibilities according to the layers, the code becomes more organized. The Incoming Communication Layer handles basic data integrity, while the Domain Layer manages complex business rule validation.
  • With each layer performing its distinct validation checks, the system becomes more robust against invalid or malicious data inputs, reducing security vulnerabilities. Especially when other types of transport are added to a service, e.g. you have HTTP and add gRPC or async handlers.
  • There is no coupling to a specific validation library, everything is explicit and clear for other developers.

Potential Pitfalls

  • If responsibilities are not clearly defined, they may overlap, leading to redundant validations and increased complexity.
  • Coordinating error responses and providing meaningful feedback across different layers and to clients can be challenging.

For a deeper dive into effective error-handling strategies, particularly in a domain-centric approach using Go, refer to my dedicated article here. Domain Centric Approach to Error Handling Using Go

Conclusion

The way we handle responsibilities in software architecture has serious implications on the design and maintainability of our systems.

The discussed principle doesn’t just apply to the Incoming Communication and Domain Layers, it covers the driven infrastructure layers as well, such as repositories and event publishers. Their primary task is just to store and retrieve domain entities without concerning with the internals of these entities. It is the domain entities responsibility to determine the correctness of their internals.

I hope this article has been a bit thought-provoking and insightful for you. And I’ll be happy to hear your feedback!

  • Do you agree or disagree with the points raised? Why?
  • Or maybe the topic is obvious to you and you’re wondering why it was even brought to the table?
  • Would you find it useful to explore other examples of functionality responsibility issues in further posts?

Your views are really important, and I’m excited to keep talking about this. Thank you for reading, and stay tuned for more articles! ❤️


Find me on Linkedin: https://www.linkedin.com/in/alex-demin-dev/

Top comments (4)

Collapse
 
rytheturtle profile image
RyTheTurtle

Agreed with different levels of validations. Though typically I've heard the "incoming communication layer" referred to as "system boundary", to avoid confusion when discussing things like layers in a layered architecture, where the layers are generally implied to be "stacked" top to bottom.

As a heuristic for what validation goes at the infrastructure layer/system boundary, I guide teams with the question "do I need external information to perform this validation?". Typically, the shape of data, data schema, and simple bounds checking should be done at the boundaries of the system. This has the added benefit of improved performance by avoiding unnecessary database queries and API calls if we can tell right away that a request or input is not valid.

Collapse
 
oberonus profile image
Alexander Demin

Hey, thanks for the feedback. I agree that "do I need external information to perform this validation?" question is actually the one that should give information about logic placement. Bounds checking is a place when it becomes a bit tricky.

For example, in provided example - user age. Does the infra layer have enough information to validate it? It can check basic contract requirements, e.g. this field should be in age field of JSON payload, it should be integer and it must be provided. But the check of actual age - here when it falls into domain layer, since infra layer does not have enough information (and should not) - what age is allowed? Is it 18-60? or 18-110? or 21-110? This validation highly depends on the domain rules and can't be placed in infra layer (system boundary) without domain logic leak.

Collapse
 
rytheturtle profile image
RyTheTurtle

That's a great point, and I agree the domain logic should not leak in to the system boundary / infra layer.

In my experience, the answer is that any particular field or validation doesn't have to exclusively be validated in one part or another. To take the same example, my answer would be two fold:

  • infra layer: Validate the age is present and is a positive integer.
  • domain layer: Validate the domain-specific rules for allowed ages for users.
Thread Thread
 
oberonus profile image
Alexander Demin

Yes, 100% agree. The main point is that this is a question of code responsibility. It is not about specific fields, but more about the validation scope that can be done in specific place while still having strong boundaries between layers.

Some pieces of validation can be even duplicated, e.g. domain logic can be called not from infra, but from another domain service, and proper validation still should be applied. So here is the question - what can/should be placed in infra layer and for what reason -IMO it is almost always a question of optimization.