Discussion on: Event Sourcing: What it is and why it's awesome

View post

Comment deleted

I'm not sure not sure what you mean, so it's hard to answer.

In terms of migrations, EventSourced apps use migrations for the database backed projections, in the same way regular apps do. If the schema changes shape, you create a migration. So as your systems evolves, you'll get more and more migrations, like any standard web app.

Hope that helps, if it's not on the right track, let me know and I'll do my best to answer.

Xavi Montero • Jan 9 '20 • Edited

Event migrations and what to do if the event stream gets corrupted is a topic of a full book. I bought and read it and it was worth reading: leanpub.com/esversioning "Versioning in an Event Sourced System" by Greg Young.

I do store the log of events (as Barry pointed out) in MySQL in JSON fields and make some "auto-computed" fields on that JSON to make indexing... When the JSON evolves over time... this is what we do:

Imagine we have this event:

{
    "eventId": "abcd",
    "timeStamp": "2020-01-10T00:18:44.123456Z",
    "type": "user.form.submitted",
    "version": "1.0.0",
    "applicationExecutionId": "1234",
    "form_submission_id": "xyz",
    "form_data":
    {
        "lead_name": "Alice",
        "lead_email": "alice@example.com",
        "trip_id": "9999",
        "trip_airport": "Barcelona"
    }
}

and now we want this (assuming we have at some place a correct conversion table for the airport codes:

{
    "id": "abcd",
    "timeStamp": "2020-01-10T00:18:44.123456Z",
    "type": "user.form.submitted",
    "version": "2.0.0",
    "data":
    {
        "id": "xyz",
        "lead":
        {
            "name": "Alice",
            "emailAddress": "alice@example.com"
        },
        "trip":
        {
            "id": "9999",
            "airportIataCode": "BCN"
        }
    },
    "metadata":
    {
        "applicationExecutionId": "1234"
    }
}

And for some reason we don't want to "translate on the fly" and want to have a "coherent data source" with only one single version, here is what we do:

1) We write both an upgrader AND a downgrader
2) We (in staging) get a good snapshot of production
3) We upgrade from Table_1 into Table_2
4) We then downgrade from Table_2 into Table_3
5) We dump tables 1 and 3 and make a diff. If that works it means we did not "forget any critical field".

When we know it works then in production we first upgrade all events, and "mark" the pointer of the "old source" to keep track of "what events where originally from there".

I mean if we had 70 events in the old table and "upgrade" events 1 to 70, then the 71 is written to the new table with the new format. We "mark" somewhere that "the old original table" had "1-70".

Then we start writing to the new table with the new version but still "downgrade" the new events into the old table (this means we have two collections). Our new "source of truth" is the new version but the old one still allows "old projectors to work".

This switching has to be done "in real-time with zero downtime" so many times we have a release that has a flag that allow us to tell "write here or there" and it is changed only once without re-deploying.

This way we "decouple" the coder's needs and we can focus on "writing" and we'll correct "reading" maybe next week. The we progressively upgrade the projectors to read the new version (as both are in place, this allows to work without stress), this can be a matter of several sprints / weeks.

When no projectors are left on the old version and all read from the new version, we first kill the downgrader.

At some moment we do a complete backup (just in case) of the old collection from 1-70 (we don't need the downgraded ones as they were not the source of truth in the old table).

Once the backup is done you can "double-check" asynchronoysly (I mean, maybe the next week) downgrading the 1-70 from the new table to a temporal table and diff with the backup, just to double-check the upgrade was perfect also in production and no bit is lost.

If that's okey, the (unneeded) backup can go to any long-time-storage like AWS Glacier.

If for any reason this double-check failed because there's a new case that was not present the day you did it in pre-production and there's a bug in the upgrader, you still can do this:

1) Create a corrected upgrader.
2) Create a new table and upgrade 1-70 from the backup
3) Copy 71-xxx from the current source of truth to the new table.
4) Switch all the writing to here.
5) Kill all the projections and reset the projection ledgers and re-project all (to avoid anything projected coming from the corrupted upgrade).

(Steps 1 to 3 seem an overkill, it'd seem better to just update events 1-70 in the log, but it's an obsession to me to never UPDATE the Logs. In fact I find a good practice to limit the MySQL user to SELECT and INSERT and hard-forbidding the UPDATE and DELETE on those tables, so even a bug could not corrupt them, so I'm used to always think of those tables as WORM-media).

In fact all the solution might seem an overkill. But I think it's longer "explained" than coded...

The upgrader code normally is SELECT all rows, for each row, hydrate the old event, create a new event from the old event (I use factories for that), store the new event as it was just created now. Rather easy.

There's many litearture on the upgrader. My addition is to "double-check" with the downgrader, to ensure we did not loose any bit of information by accident.

I tend to store any "unneeded data" in the new version into a "metadata" block that many of my projectors just ignore. But the "full history" is there just in case I re-need it again :D

In addition we use "semantic versioning" for the events:

Major => the format changes.
Minor => we add fields but do not remove fields.
Patch => We change things at writing that can be fully decoded at reading without changing any line of the reader code: JSON UTF encoding from 0xnnn into àèìáéí, ugly-print to pretty-print, etc.

Again... The book is very very very recommended. It all goes about exactly answering to your question.