Barry O Sullivan

Posted on Jan 30, 2018

Projection Building Blocks: What you'll need to build projections

#cqrs #projections #php

(Originally posted on my blog)

Let’s talk about projections. This topic is quite large, so this is the first part in a four part series on projections.

Projection Building Blocks: What you'll need to build projections
Broadcasting events in PHP: Techniques and technologies
Designing Projections: How to design and implement real world projections
Projection DevOps: Continuously deploying new/updated projections with zero downtime

If you've read my previous articles, you should have the basics of event sourced/event driven Command Query Responsibility Segregation (CQRS) systems. At its core there are two concepts, a command side that outputs events and a query side that reads them.

Up until now I've focussed primarily on the command side, i.e. how we model state changes in our apps. I mention projections, but always with a hand-wavy statement and without any real detail. Let's do something about that.

What are projections?

Projections are a necessary part of any event sourced or CQRS system. These systems don't rely on a single generic data source such as a normalised MySQL database. Instead you build up your data sets by playing through the events, i.e the “film”, "projecting" them into the shape you want. This allows lot of flexibility as you're no longer bound by a single data model on which you have to run increasingly monstrous SQL queries (12+ joins anyone?). With projections you can build a data model specifically for the problem/question at hand.

For instance, say your app has the following requirements:

Webapp needs to fetch a user, their active cart, and the cart's items, all as a single document.
Marketing needs a list of how much each user spends over a 6 month period, broken down month by month.

Building a generic data-model that can produce both answers is possible, but it's difficult, and leads to complex SQL statements and brittle data structures. Instead, it's much easier to build up a custom dataset for each use-case, keeping them independent and minimal.

In the above, the webapp would listen for the appropriate events and build a tree of the user, their cart and its items. This structure would get stored in a document DB, such as MongoDB, and fetched later when it’s needed. Nice and easy.

For the Marketing report, it’s a little different. We listen for the same events, but we don’t build the same structure. Instead we keep track of three pieces of data: the users_id, the month, and how much they spent that month. We store that data in a RDBMS, such as MySQL, so that it’s easy to query.

These two datasets are simple yet completely different. They are each designed to answer their specific question and are thus simpler to understand and build. That's the power of projections.

The Building Blocks

Building a robust* projection system is not a trivial task, as there are many concepts and moving parts. Before you can build one, you need to understand each piece in isolation, then see how they all work together.

*An apt word that has been ruined for me due to overuse in college

Event

An event is a named object that represents some discreet change that occurred in your system. It's usually modelled as a class with a collection of properties, giving just enough formation to be useful.

Eg.

<?php namespace Domain\Selling\Events;

use ...;

class CartCreated extends Event
{
    /** @var Uuid */
    public $cart_id;

    /** @var Uuid */
    public $customer_id;     
}

Events also contain some generic meta information, info that each event should contain to make easier to work with. I'd recommend the following.

the ID of the event (unique)
when the event happened
the actor that triggered the event (could be a person or a system process)
the version of the event (events can change shape over time, we'll get into this later)

<?php namespace Domain\Events;

use ...;

abstract class Event 
{
    /** @var Uuid */
    public $id;

    /** @var Carbon */
    public $occurred_at;

    /** @var Actor */
    public $actor;

    /** @var integer */
    public $version = 1;
}

EventStore and EventStream

The EventStore is your access point to all the events that have ever occurred in your system. You give it a position/point in time and it gives you an EventStream that you can iterate through. From there it’s a simple as iterating through the stream until there are no events left.

<?php
$last_position = "b70931a6-b330-4866-97b4-0710c8ad5d3e";
$event_store = new EventStore();
$event_stream = $event_store->getStream($last_position);
while ($event = $event_stream->next()) {
    // Do things
}
?>

Under the hood Event stores and streams can be implemented in any number of technologies, SQL, NoSQL, Kafka, even the FileSystem. As long as they meet the following constraints you're good.

You can start reading from any point in the stream
The events are read in the order that they occurred
The event stream is append only (history doesn't change)

Now that we have a way to read events from the log, let's look at what we do with them.

Projector

In order to build up a data set, you need to listen for a set of events. That's where projectors come in. Their job is to listen for events then pass them through to the projection that's building up the dataset.

There are many ways to do this, but this is my preferred one.


<?php namespace App\Projections\Carts;

use Domain\Shopping\Events;

class Projector
{
    private $projection;

    public function __construct(Projection $projection)
    {
        $this->projection = $projection;
    }

    public function whenCartCreated(Events\CardCreated $event)
    {
        $this->projection->createCart($event->cart_id, $event->customer_id);
    }

    public function whenItemAddedToCart(Events\ItemAddedToCart $event)
    {
        $this->projection->addItemToCart($event->cart_id, $event->item);
    }

    public function whenItemRemovedFromCart(Events\ItemRemovedFromCart $event)
    {
        $this->projection->removeItemFromCart($event->cart_id, $event->item_id;
    }

    public function whenCartCheckedOut(Events\CartCheckedOut $event)
    {
        $this->projection->checkoutCart($event->cart_id);
    }
}

As you can see above, the method signature defines the event we're listening for, and the method itself extracts the data and feeds it through to the projections.

Projection

A projection is the result of "projecting" a sequence of events. It has two categories of functions: commands and queries (standard CQS pattern). Commands change the shape of the underlying dataset. Queries fetch results from the dataset, usually to answer business questions or to present data.

Here's a simple example that looks after a customer's cart and it's items.

<?php namespace DoctrinePlayground\App\Projections\Carts;

use Doctrine\DBAL\Connection;
use Ramsey\Uuid\UuidInterface;
use Carbon\Carbon;
use DoctrinePlayground\Domain\Selling\Entities\Item;

class Projection
{
    private $db;

    const CARTS_TABLE = 'carts';
    const CART_ITEMS_TABLE = 'cart_items';

    public function __construct(Connection $db)
    {
        $this->db = $db;
    }

    /** Example Commands **/

    public function createCart(UuidInterface $cart_id, UuidInterface $customer_id, Carbon $created_at)
    {
        $this->db->insert(self::CARTS_TABLE, [
            'cart_id'     => $cart_id,
            'customer_id' => $customer_id,
            'active'      => true,
            'created_at'  => $created_at->format("Y-m-d H:i:s"),
        ]);
    }

    public function addItemToCart(UuidInterface $cart_id, Item $item)
    {
        $this->db->insert(self::CART_ITEMS_TABLE, [
            'cart_id'     => $cart_id,
            'item_id'     => $item->item_id,
            'item_ref'    => $item->reference
        ]);
    }


    /** Example Queries **/

    public function getActiveCart(UuidInterface $customer_id)
    {
        $cart_arr = $this->db->fetchAssoc('SELECT * FROM '.self::CARTS_TABLE.' WHERE customer_id = ? AND active = 1', [$customer_id]);

        if (!$cart_arr) {
            return null;
        }

        $cart = (object)$cart_arr;

        $items = $this->db->fetchAll('SELECT * FROM '.self::CART_ITEMS_TABLE.' WHERE cart_id = ?', [$cart->cart_id]);

        $cart->items = array_map(function($item) {
            return (object)$item;
        }, $items);

        return $cart;
    }
}

An aside on implementation

If you plan to have many different implementations of the same projection, I'd recommend extracting the methods into an interface, otherwise don't bother. If you do write an interface, some devs advocate separating the Command and Query sides into their own interfaces, but I think this is overkill; We did this for each of our projections and it just made the code harder to navigate, understand and change, ie. it didn't bring any real benefit.

Projectionist

In keeping with the projection metaphor we also have a projectionist. The projectionist is responsible for playing a collection of projectors. Internally, the projectionist does this by keeping track of where each projector is in the stream (by event position or event id), then plays each projector forward from that point, recording the last event that each projector has seen.

Projector Position Ledger

As mentioned above, the projectionist needs some way to keep track of where each projector is in the event log. That's where the Projector Position Ledger comes in. This is a simple data store that keeps track of each projector and it's position. Its fairly simple and can be implemented in any storage technology.

A handy idea, it should also keep track of whether a projector is broken or not. If a projector tries to run and an unexpected exception is thrown, the projector should be marked as "broken" and the projectionist should stop attempting to play it. This way you won't try to keep playing a broken projector, filling up your bug tracker with duplicate exceptions.

Event Upgrader

This component is a little more advanced, but it's still worth mentioning. Sometimes you'll need to change the shape of events, usually by adding or changing properties. When this happens, you'll need to "upgrade" the event shape. This is why each event has a "version" attribute, so we can check the version of the event and apply the appropriate upgrader if it's required.

This component lives in the event stream and manipulates the event data before it is deserialized into the actual event classes. It is used by both the command and query side.

I won't get into too much detail here, as this is quite complex, just be aware that it exists. Think of event upgraders as migrations for events that are run on the fly and you're most of the way there.

Putting it all together

Those are all the components, so let's look at how they all work together.

The projectionist is given a collection of projectors. They go through each projector, and fetch an event stream that starts after the last event each has seen. The new events are played through the projector, and on completion the projectionists records the last even seen by the projector.

And that's that, those are the pieces you need to build an effective projection system, at least at the start.

Next Steps

There's one thing missing from above, how do you trigger your projectionist to play a projector? This is a complex question, especially if you're using PHP. So that's why I'm dedicating the next article to just that. We'll go through some implementation details, exploring the pros and cons of each, then settle on what I think is the best solution.

The third article will dive into building complex projections using various technologies designed to solve specific problems. In it we’ll highlight some of the techniques our team has found useful when building, designing and implementing projections.

The fourth article will dig into projection versioning, seamless releases and some more advanced concepts related to the projector/projectionist side of things.

Until then, best of luck!

Top comments (9)

Ben Halpern • Jan 30 '18

Great post

ww • Feb 4 '18

Great post as a concept description.
Unfortunately no one ever describes one super important thing - how to deal with broken projections. I don't mean, like, how to fix the logic itself.
So as far as I understood(actually it's like this on every article about projections), the projectionist keeping track of the global sequence per projection. Then, if something goes wrong(unhandled exception), the specific projection marked as broken. And then what? You system deployed on the other part of the world and the next time you, as a developer, have a chance to fix it and deploy your fix will be in a month or so.
What would be your solution to this?

Thank you.

Barry O Sullivan • Feb 4 '18

You raise an excellent point, how do you manage broken projectors?

I'm currently writing a blog post on this topic and a library to take care of it. You can see the library here it talks about how it handles these issues in the readme.
github.com/barryosull/the-projecti...

The short answer for now.

If you're booting new projectors in prep for a release and there's a failure you should:
1) Mark the broken projector as "broken"
2) Mark the other new projectors as "stalled"
3) Report the error
4) Stop the release

On the next release, the boot process should attempt to play all "broken"/"stalled" projectors forward again.

If you're playing projectors as normal and there's an error, you should:
1) Mark the broken projector as "broken"
2) Report the error
3) Continue as normal, ignoring "broken" projectors

We've found this solved the problem nicely.

As to your hypothetical situation, if you release broken code and you can't fix it for a month, then you've got some big problems, regardless of whether it's you're using projections or not.

Versioning your projectors is a great way to keep old projectors running, while booting up news ones. If the new one fails, the old one is still running in the background. This is given more detail on the library linked above.

ww • Feb 5 '18

Thanks for replying.

The thing is that we have been using all the technics you described for a while already. They play nice, indeed. But you described a happy path, even when something is broken.

What I am asking is how to deal with exact situation. I am kinda surprised, why no one ever touches this topic. Everyone writes nice code which never fails? Nah!

Barry O Sullivan • Feb 5 '18

You system deployed on the other part of the world and the next time you, as a developer, have a chance to fix it and deploy your fix will be in a month or so.
What would be your solution to this?

I'm really confused by this. If you deploy code with a bug, and you can't fix it for over a month, then it's going to be broken for a month. There's no way around this, it doesn't matter if it's event driven or not. That's why no one touches the topic.

If you're asking how you ensure the system is stable before releasing, then the answer is automated tests.

If the release process is the bottleneck, you should look into CI/CD, so you can get the fix out faster.

Otherwise you're just stuck.

ww • Feb 5 '18

None of the mentioned technics give me 100% protection(Of course you might say: there is no such thing). They are reducing the likelihood, but not completely.
When there are lots of moving parts: ui, domain, projection code, projection storage, it is really easy to miss something. For example, we have all of these: CI/CD, unit testing, integration testing, manual testing, several levels of UAT and yet, I was once in a situation deploying 4 times a day into prod, just because of those tiny missing things which were breaking the whole system. It is possible, I've seen it, I was fixing it. Luckily the users of this software were just next to. Even though they were not happy with such interruptions.

But currently we are in a difficult situation - all the users are somewhere and there is no way patching it right away.

Where were we? Right. Would you be happy as a client of a banking system, if a tiny mistake in my account blocks your account as well?
See where I am heading?

Sandor | tutorialhell.dev • Jul 29 '19

Do you mean that whenever a new event is created, for example, another item is added to the cart, the projection, that was created some time before needs to be rebuilt/replaced to reflect the latest state?
Since we store only the the users_id, the month, and how much they spent that month, all the other information we need for the report we need to query. Right?

Rafal Pienkowski • Jan 30 '18

Great article. You made my day. Cheers.

Barry O Sullivan • Jan 30 '18

Thank you, your comment has made mine.

DEV Community