Auto-generated IDs VS Manually created: which approach to chose?

#webdev #dotnet #tutorial #microservices

Creating incremented auto-generated integer IDs on a database level was a default approach for a long time. It was (is?) very popular for a reason.

Advantages of generated IDs includes:

No need to manage IDs manually in code + database guarantees uniqueness;
Integers perform fast DB-level for joins, comparisons, indexing, and take less storage size;
Natural Ordering - Sequential IDs help clustered indexes, reduce fragmentation.

However, with modern software architecture evolving, this approach may not be the best option.

Let's compare this approach and the approach with manually creating IDs by example. It is simplified but reflects a common business case without diving into details.

This article shows examples in C# and EF Core, but the same approach can be used for any language and ORM that implements the Unit of Work pattern.

Example case description

Let's imagine there are three independent microservices or modules: OrderService, PaymentService, and ShippingService. Each module has its own database.

Order service saves an Order with all its information, including OrderItems. Payment service is not interested in Order Details, it tracks if the Order was paid or not. Shipping service wants to have its own Order, but it should only contain shipment information.

Orders should be easily synced between services, so that the most recent information about the order is available in whichever service and it is easy to get information about the order from other services if needed. Payment Service has OrderId and it should be the same for Orders in all services.

Using auto-generated IDs

The following code shows what it looks like with the IDs generated on the DB level.

Disclaimer: I'm advocating for initializing domain entities through constructors, but for the articles, it is more readable to do it with object initializers.

public async Task<int> Handle(CreateOrderCommand request, CancellationToken ct)
{
    var order = new Order
    {
        TotalPrice = request.TotalPrice,
        OrderItems = request.Items.Select(i => new OrderItem
        {
            ProductId = i.ProductId,
            Quantity = i.Quantity
        }).ToList(),
        OrderedAt = DateTime.Now()        
    };

    _dbContext.Orders.Add(order);
    await _dbContext.SaveChangesAsync(ct);

    var paymentEvent = new OrderCreatedForPayment
    {
        OrderId = order.Id,
        TotalPrice = order.TotalPrice
    };

    var shippingEvent = new OrderCreatedForShipping
    {
        OrderId = order.Id
    };

    await _messageBus.PublishAsync(paymentEvent, ct);
    await _messageBus.PublishAsync(shippingEvent, ct);

    return order.Id;
}

This example looks not that bad, however, it has potential issues:

Order cannot be saved in the same transaction with events to apply Outbox pattern because Order Id will be available only after SaveChangesAsync() is called;
There is no guarantee that Order in other services, Shipping Service in particular, will be created with the same ID. And this point is crucial;
Entities rely on the infrastructure.

The same object should be identified the same way across different services or modules. Of course, it is possible to solve this by adding a separate column OrderId along with the database ID. But it adds complexity, makes it harder to maintain and may introduce errors if identifiers are misused.

Using manually created IDs (Guid)

Let's see how we can improve this case by manually creating IDs.
In C# the common type for IDs will be Guid. In SQL, it will be translated into type UUID v4. Important: to generate a new random Giud, you need to call a static Guid.NewGuid(), not create a class instance.

public async Task<Guid> Handle(CreateOrderCommand request, CancellationToken ct)
{
    var orderId = Guid.NewGuid();

    var order = new Order
    {
        Id = orderId,
        TotalPrice = request.TotalPrice,
        OrderItems = request.Items.Select(i => new OrderItem
        {
            ProductId = i.ProductId,
            Quantity = i.Quantity
        }).ToList(),
        OrderedAt = DateTime.Now()         
    };

    _dbContext.Orders.Add(order);

    var paymentEvent = new OrderCreatedForPayment
    {
        OrderId = orderId,
        TotalPrice = order.TotalPrice
    };

    var shippingEvent = new OrderCreatedForShipping
    {
        OrderId = orderId
    };

    _dbContext.OutboxMessages.Add(new OutboxMessage
    {
        Id = Guid.NewGuid(),
        Type = nameof(OrderCreatedForPayment),
        Payload = JsonSerializer.Serialize(paymentEvent),
        OccurredOnUtc = DateTime.UtcNow
    });

    _dbContext.OutboxMessages.Add(new OutboxMessage
    {
        Id = Guid.NewGuid(),
        Type = nameof(OrderCreatedForShipping),
        Payload = JsonSerializer.Serialize(shippingEvent),
        OccurredOnUtc = DateTime.UtcNow
    });

    await _dbContext.SaveChangesAsync(ct);

    return orderId;
}

Advantages of this approach are:

Order is created in the same transaction as corresponding events ensuring data integrity;
Only single DB connection was opened for this piece of code;
Entity is untied from the Infrastructure;
Order will be created with the same identifier across all microservices, modules or, even, systems. This means data can be easily synced or merged, if needed.
We are fully in charge of how the identifier is generated. We could use any of the UUIDs or even compose it with some additional information. For example, if we want natural ordering and timestamp-based identifier, we may use UUIDv7;
It is easier to write tests, because there is no need to save the entity to access its ID. This may sound not important, however it's crusial that the code is easily testable which leads to better tests quality and therefore less problems in the future.

Note: for this implementation, you will also need background service to read OutboxMessages and publish them to the message broker.

EF Core Configuration

By default, EF Core generates values for the defined identifiers depending on its type - integers or GUID.
Here is how to configure EF Core to not generate IDs using Fluent API. This should be a part of DbContext:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Order>()
        .Property(o => o.Id)
        .ValueGeneratedNever();
}

What about sequences?

There is a workaround on how to get next value for autogenerated IDs. You can say ORM, in our case EF Core, to use sequence as the default ID generator for an entity.

A sequence is a database object that generates a sequence of numbers (1, 2, 3, …) independent of any table. When you call it to get the next value, it reserves the value so it could not be used in another transaction.
The next value in code can be with a raw SQL call (which is not very elegant).

var nextId = await context.Database
    .SqlQueryRaw<int>("SELECT NEXT VALUE FOR MySequence")
    .FirstAsync();

Could it solve all our concerns? Not really.

Imagine that there is a need to create new Service that also produces Order, for example, Subscription service. Each service has its own database. It means that there is a very high chance to have Orders with same IDs in different services.

Yes, this may never happen, but business requirements change rapidly and are not always predictable.

Remember, Entity should be unique across the whole system.

Conclusion

Shifting responsibility to the database to manage ID values leads to:

No ability to provide data consistency between services or modules;
Concurrency issues when two or more threads try to create a row with the same id;
Difficulties with synchronizing database data;

These are crucial for microservice or event-driven architectures and could be solved by utilizing manual IDs creation.

It is worth mentioning that integer IDs are claimed to be faster and take less disc space. However, nowadays databases are stored in the cloud and have significantly more performance capabilities than before. For massive tables, smaller indexes may help, but for normal-sized cloud apps, the difference is often negligible. Additionally, there are UUIDs which are timestamp based, so could cover natural ordering needs.

Of course, every decision is a trade-off. The manual creation of IDs is not the silver bullet. There is a chance of collision (which is nearly zero), they are less readable and take more space than integer values.
Every project is special and there is no single right way to build it.

However, for most modern applications, especially those with an emphasis on domain logic, it is a great approach, which will save you a lot of time and prevent unexpected errors.

Thank you for reading!

Feel free to share in the comments, which approach do you prefer and why?