Gul Zaib

Posted on Feb 15

Optimizing PHP Applications: Why Separate Read and Write Models Matter

#php #laravel #architecture #database

Models are a great tool to communicate with a data storage. We can define how the data looks like, and that makes sure that it’s compatible with the data storage, typically a database. Once we have a model that validates our input and helps us write that data we could be tempted to use it also for retrieving data. Except for some basic CRUD applications, that is not usually such a good idea. Let’s see why.

Set up a model to work with

Let’s use a simple User model and the interface of a repository, we don’t really need the details here. But let’s assume we have some assertion library that we use to validate that every model created is valid.

class User
{
    public function __construct(
        public string $email,
        public string $name,
        public string $password,
    ) {
        Assert::email($email);
        Assert::notEmpty($name);
        Assert::password($password, strength: 3);
    }
}
interface UserRepository
{
    public function save(User $user): void;
}

So main use case, we get data for a new user, it validates that the name is not empty, the email is a valid email and that the password complies with whatever we defined as strength level 3. Then we send it to the repository and save it. Job done.

$user = new User(
    $request->get('email'),
    $request->get('name'),
    $request->get('password'),
);
$repository->save($user);

Problem: Model properties that should not be read

So now we want to read a user by email from the database to return a json representation of it for a client to present some user profile. What happens if we add a read method to our repository reusing the same model?

interface UserRepository
{
    public function save(User $user): void;
    public function get(string $email): User;
}
// Inside some controller class
return new Response(
    json_encode(
        $repository->get($request->get('email'))
    ),
);

So what are we getting here?

{
  "email": "peter@dailybugle.com",
  "name": "Peter Parker",
  "password": "$2y$10$OEaTphGkW0HQv4QNxtptQOE.BSQDnpwrB.\/VGqIgjMEhvIr22jnFK"
}

The first thing that should cross our minds when we watch this is that passwords, even encrypted, should never, ever be sent in any kind of communication from the server. So this is an important security concern.

Even if this is probably the worst possible case of an information leak caused by using a write model as a read model, it’s not the only one. Another common issue is just sending irrelevant information to the client. For example, we could have an active boolean we can use for enabling or disabling users that would be useless for the client, because if the user is not active the request will respond with a 404 Not Found. Irrelevant data means that we are sending bytes that will never be consumed, hurting performance. It may be little, but everything adds and this has an easy solution.

So what do we do? Provide a return with a restricted list of data? This could solve these problems.

class User
{
    // ...

    public function read(): array
    {
        return [
            'email' => $this->email,
            'name' => $this->name,
        ];
    }
}

But there are more issues to solve, let’s see.

Problem: Unnecessary validations

Talking about performance, we have validations in the model constructor, but are those needed when we fetch data that is already in the database? They must have been valid the moment they were stored, so it can be argued that running those validations again is a waste.

But not only a waste, it can be a real problem. Validations might evolve and that can impact the ability to fetch results if we use a write model which makes use of validations. Suppose an application that validates that emails for users have a valid email format, but at some point another rule is added to blacklist some domains in email addresses. The validation is updated, but the existing users can’t really be updated, because they still expect communications via that email address.

Now we get a request for a list of 100 users in which one of them has a blacklisted domain, what happens? The whole request is considered an error. And what do we send the user? A 400 Bad Request response like if some user input was wrong? This is not the client’s fault but the server’s. In this case it should be some kind of 500 error.

To avoid this, I’ve seen some complex solutions involving Reflection and an instance without constructor. If we really had to use the write model in cases we don’t want to validate I would move the assertions to a static constructor though, like this.

class User
{
    public function __construct(
        public string $email,
        public string $name,
        public string $password,
    ) {}

    public static function create(string $email, string $name, string $password): self
    {
        Assert::email($email);
        Assert::notEmpty($name);
        Assert::password($password, strength: 3);

        return new self($email, $name, $password);
    }
}

This way, when creating a new model that requires validation I can do User::new(), and use the constructor when fetching data from the database. Solves some issues, but there are more.

Problem: Adding extra data to the model

Another common situation is the client requiring some more data for the view. In our example, the view might need to show the number of comments that a user has created in the system. That’s not part of the model, but it looks wasteful not to add that in the same HTTP response and keep the client waiting for a second one just because the data does not match the write model.

Even if we try to add the data in the same request, sticking to this write model means that we can’t use a single database request to get the whole set of data, though in many cases that could be solved with a simple SQL join. Instead we get the write model and then do another database request to fetch the missing data, and compose it before sending it to the client.

return new Response(
    json_encode(
        array_merge(
            $repository->get($request->get('email')),
            ['comments' => $commentRepository->count($request->get('email'))]
        )

    ),
);

It works, but it means an extra database query with it’s impact in the performance. And it also hurts re-usability, as you can’t just call the repository somewhere else, you also need to copy and paste the comments part.

Problem: Are inserts and updates really the same?

For a last problem, this is not really a write vs read model, but when we are updating a model, can we really use the same class that we use when creating it?

So if we create a new user with this model, we expect name, email and password. For creating a user that’s ok, but in our example our security expert requires that passwords are updated in a specific way, which involves the user requesting a password change, an email being sent to the user with a limited time token and then validating that token to accept the new password.

The password should never be updated in any other way, so what do we do if we use the same model we already have for updating the user? We will have two different places in the code where we update the user, one for password, another for anything else.

interface UserRepository
{
    public function save(User $user): void;
    public function update(User $user): void;
}
// Updating name
$user = new User(
    $request->get('email'),
    $request->get('name'),
    'WHAT DO WE DO WITH PASSWORD HERE?',
);

$repository->update($user);
// Updating password
$user = new User(
    $request->get('email'),
    'WHAT DO WE DO WITH NAME HERE?',
    $request->get('password'),
);

$repository->update($user);

Now we have to deal with data in the model that must not be processed, which will make our repository implementation unnecessarily more complex. It will also force the model creation to provide data that will not be available and will not be used, making code much harder to understand. And finally, we introduce a fragile implementation that, if used incorrectly, can cause the update of something that should not be updated, just because it is in the model. If we process the user name change in a way that triggers a password update, that’s a serious problem.

Solution: Individual model for each case

How can we solve all the problems when reading a user? A dedicated model will do.

final readonly class UserRead
{
    public function __construct(
        public string $email,
        public string $name,
        public int $commentCount,
    ) {}
}

We can have another repository to fetch it.

interface UserReadRepository
{
    public function get(string $email): UserRead;
}

This implementation, assuming a relational SQL database, would not select the password form the table which is not in the read model, solving problem number 1. This read model does not include validations solving problem number 2. And this model has a place for the comments count that can be implemented in the new repository by using a join in a single query, solving problem number 3.

Even more, if we have more representations of a user, we should have a different read model to cover each one. We could have a UserWithLastCommentsRead for example.

And for the update problems? You probably guessed. Individual models for each update.

final readonly class UserDataUpdate
{
    public function __construct(
        public string $email,
        public string $name,
    ) {
        Assert::notEmpty($name);
    }
}
final readonly class UserPasswordUpdate
{
    public function __construct(
        public string $email,
        public string $password,
    ) {
        Assert::password($password, strength: 3);
    }
}
interface UserRepository
{
    public function save(User $user): void;
    public function updateData(UserDataUpdate $userDataUpdate): void;
    public function updatePassword(UserPasswordUpdate $userPasswordUpdate): void;
}

Now there are no mistakes or unnecessary data. Each update is isolated and it is much more protected from bugs.

Note that in the update models I didn’t add the email validation. That is intentional because it is going to be used to find the user, and if we have an evolved validation, as commented before, we would not be able to find older users with emails that are not valid anymore, but still in the database anyways.

Last words

This is really not that different as we model objects in the real world. We never consider everything about a real life object in a particular context. For example a car.

If a car is modeled by a driver, we can expect the positioning of the seat and the rear mirrors to be really important, while at the same time it is irrelevant for a mechanic doing some maintenance. The mechanic will probably be more concerned about metrics on the engine that are not important to the driver. And a kid at the school learning about transport methods will probably just care about it being a land transport with 4 wheels.

If we use different models for the same real life objects we can definitely do the same for our code models.

Top comments (4)

david duymelinck • Feb 16

What you are calling models are actually DTO's and value objects. The difference between the two is that a DTO is just a class with properties and for a value object the value of the properties has to be valid.

I would not create an object for every action, that creates way too many objects if you are going to do that for the whole application.
I would create a User DTO, and validate the data in the repository methods.

// User.php
class User {
    public function __construct(
        public string $email = '',
        public string $name = '',
        public string $password = '',
    ) {}
}

For the creation and updates of the user you can do.

// Create user
$user = new User();
$user->email = $request->get('email');
$user->password = $request->get('passport');
$user->name = $request->get('name');

$repository->create($user);
// Updating name
$user = new User();
$user->email = $request->get('email');
$user->name = $request->get('name');

$repository->update($user);
// Updating password
$user = new User();
$user->email = $request->get('email');
$user->password = $request->get('passport');

$repository->update($user);

You were using a value object in your first example, and that made you come up with a solution that is over-engineered.
Also doing this

return new Response(
    json_encode(
        $repository->get($request->get('email'))
    ),
);

Is not a real world scenario. Every reviewer would flag this as a problem.
Another problem with your solution is that you can't update the name and the password at the same time, so you have to create two objects, call two methods, and execute two database queries.

In case of the comments I would use the name UserWithCommentCount, it is more descriptive than UserRead, and i would name the repository method getUserWithCommentCount. And again UserWithCommentCount is a DTO because if something went wrong getting the data from the database, you have an object with empty values. And these can be checked before you go to the next step in the request handling.
I think you read somewhere that is good to separate read and write queries, but that is for high performance on the database side and it is far more difficult to implement.