DEV Community

Cover image for SAGA Pattern in Go
Serif COLAKEL
Serif COLAKEL

Posted on

SAGA Pattern in Go

In this article, we will explore how to coordinate distributed transactions in Go using the Saga pattern.

Coordinating Distributed Transactions Without Distributed Transactions

Modern distributed systems are built from independently deployable services.

That flexibility comes with a cost.

The moment a business operation spans multiple services, a simple database transaction is no longer enough.

Imagine an e-commerce checkout flow:

Order Service
    ↓
Payment Service
    ↓
Inventory Service
    ↓
Shipping Service
Enter fullscreen mode Exit fullscreen mode

What happens if:

  • the order is created successfully
  • inventory is reserved
  • payment fails

You now have partially completed work spread across multiple services.

In a monolith, you would simply roll back the transaction.

In a distributed system, there is no single transaction to roll back.

This is where the Saga Pattern comes in.

Instead of relying on distributed transactions, a Saga coordinates a series of local transactions and compensating actions to maintain business consistency.

In this article, we'll explore how production Go systems implement Saga workflows, the trade-offs involved, and practical patterns you can use in real microservices.


The Problem With Distributed Transactions

In a monolithic application, business operations are often protected by a single database transaction.

BEGIN;

INSERT INTO orders (...);

UPDATE inventory
SET quantity = quantity - 1;

INSERT INTO payments (...);

COMMIT;
Enter fullscreen mode Exit fullscreen mode

Either everything succeeds or everything rolls back.

Life is good.

Microservices change the rules.

Each service owns its own database:

Order Service      -> orders_db
Payment Service    -> payments_db
Inventory Service  -> inventory_db
Shipping Service   -> shipping_db
Enter fullscreen mode Exit fullscreen mode

No single transaction spans all of them.

Some teams attempt:

  • Two-Phase Commit (2PC)
  • XA Transactions
  • Distributed Locks

In theory they provide consistency.

In practice they introduce:

  • operational complexity
  • reduced availability
  • tight coupling
  • performance bottlenecks

Most modern systems choose a different path:

Accept eventual consistency and design for recovery.


What Is a Saga?

A Saga is a sequence of local transactions.

Each step:

  1. Performs some business action
  2. Commits locally
  3. Triggers the next step

If a later step fails:

  • previously completed steps execute compensating actions

Think of it as a distributed rollback mechanism.


Traditional Transaction

BEGIN

Step A
Step B
Step C

COMMIT
Enter fullscreen mode Exit fullscreen mode

Failure:

ROLLBACK
Enter fullscreen mode Exit fullscreen mode

Saga Transaction

Step A ✓
Step B ✓
Step C ✗

Compensate B
Compensate A
Enter fullscreen mode Exit fullscreen mode

Instead of undoing database state through a transaction log, we undo business actions through explicit compensation.


A Real Production Example

Consider an online marketplace.

Checkout workflow:

Create Order
Reserve Inventory
Charge Payment
Create Shipment
Enter fullscreen mode Exit fullscreen mode

Everything looks simple until a dependency fails.

Scenario:

Order Created      ✓
Inventory Reserved ✓
Payment Failed     ✗
Enter fullscreen mode Exit fullscreen mode

Inventory is now locked.

Customers cannot buy those products.

Warehouse reports incorrect stock.

This is a real production issue many teams encounter.

The solution is compensation.


Defining Saga Steps in Go

Let's start with a generic Saga implementation.

type Step struct {
    Name       string
    Execute    func(context.Context) error
    Compensate func(context.Context) error
}
Enter fullscreen mode Exit fullscreen mode

Each step knows:

  • how to execute
  • how to undo itself

Now define the Saga.

type Saga struct {
    steps []Step
}
Enter fullscreen mode Exit fullscreen mode

Executing a Saga

func (s *Saga) Execute(ctx context.Context) error {
    var completed []Step

    for _, step := range s.steps {
        if err := step.Execute(ctx); err != nil {
            s.rollback(ctx, completed)
            return fmt.Errorf(
                "saga failed at step %s: %w",
                step.Name,
                err,
            )
        }

        completed = append(completed, step)
    }

    return nil
}
Enter fullscreen mode Exit fullscreen mode

If a step fails:

  • rollback starts immediately
  • previously completed steps are compensated

Implementing Compensation

func (s *Saga) rollback(
    ctx context.Context,
    completed []Step,
) {
    for i := len(completed) - 1; i >= 0; i-- {
        step := completed[i]

        if err := step.Compensate(ctx); err != nil {
            log.Printf(
                "compensation failed for %s: %v",
                step.Name,
                err,
            )
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Compensation happens in reverse order.

Just like a stack unwind.


Production Checkout Workflow

Let's model an order process.


Step 1: Create Order

func createOrder(
    ctx context.Context,
    orderID string,
) error {
    log.Printf("order created: %s", orderID)

    return nil
}
Enter fullscreen mode Exit fullscreen mode

Compensation:

func cancelOrder(
    ctx context.Context,
    orderID string,
) error {
    log.Printf("order cancelled: %s", orderID)

    return nil
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Reserve Inventory

func reserveInventory(
    ctx context.Context,
    productID string,
) error {
    log.Printf(
        "inventory reserved: %s",
        productID,
    )

    return nil
}
Enter fullscreen mode Exit fullscreen mode

Compensation:

func releaseInventory(
    ctx context.Context,
    productID string,
) error {
    log.Printf(
        "inventory released: %s",
        productID,
    )

    return nil
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Charge Payment

func chargePayment(
    ctx context.Context,
    orderID string,
) error {
    return errors.New(
        "payment provider unavailable",
    )
}
Enter fullscreen mode Exit fullscreen mode

Compensation:

func refundPayment(
    ctx context.Context,
    orderID string,
) error {
    log.Printf(
        "payment refunded: %s",
        orderID,
    )

    return nil
}
Enter fullscreen mode Exit fullscreen mode

Running the Saga

saga := Saga{
    steps: []Step{
        {
            Name: "Create Order",
            Execute: func(ctx context.Context) error {
                return createOrder(ctx, "order-123")
            },
            Compensate: func(ctx context.Context) error {
                return cancelOrder(ctx, "order-123")
            },
        },
        {
            Name: "Reserve Inventory",
            Execute: func(ctx context.Context) error {
                return reserveInventory(
                    ctx,
                    "product-1",
                )
            },
            Compensate: func(ctx context.Context) error {
                return releaseInventory(
                    ctx,
                    "product-1",
                )
            },
        },
        {
            Name: "Charge Payment",
            Execute: func(ctx context.Context) error {
                return chargePayment(
                    ctx,
                    "order-123",
                )
            },
            Compensate: func(ctx context.Context) error {
                return refundPayment(
                    ctx,
                    "order-123",
                )
            },
        },
    },
}

err := saga.Execute(context.Background())
Enter fullscreen mode Exit fullscreen mode

Output:

order created
inventory reserved
payment provider unavailable

inventory released
order cancelled
Enter fullscreen mode Exit fullscreen mode

Business consistency restored.


Compensation Is Not Rollback

This is one of the biggest misconceptions.

Many engineers assume:

Compensation == Rollback
Enter fullscreen mode Exit fullscreen mode

It doesn't.

Consider payment processing.

You cannot magically undo:

Bank Transfer
Credit Card Charge
Email Sent
SMS Delivered
Enter fullscreen mode Exit fullscreen mode

You can only perform another business action.

Examples:

Charge Card
↓
Refund Card
Enter fullscreen mode Exit fullscreen mode
Create Shipment
↓
Cancel Shipment
Enter fullscreen mode Exit fullscreen mode

These are not the same thing.

Compensation is business logic.


Choreography vs Orchestration

Two common Saga styles exist.


Choreography

Services communicate through events.

OrderCreated
      ↓
InventoryReserved
      ↓
PaymentProcessed
      ↓
ShipmentCreated
Enter fullscreen mode Exit fullscreen mode

Each service reacts independently.

Advantages:

  • loosely coupled
  • scalable
  • no central coordinator

Disadvantages:

  • difficult debugging
  • event explosion
  • hidden dependencies

Large systems often struggle with visibility.


Orchestration

A dedicated coordinator controls the flow.

Saga Orchestrator
       ↓
Inventory
       ↓
Payment
       ↓
Shipping
Enter fullscreen mode Exit fullscreen mode

Advantages:

  • easier monitoring
  • centralized workflow
  • simpler debugging

Disadvantages:

  • additional component
  • orchestration logic grows over time

Many enterprise systems prefer orchestration because operational visibility matters.


Handling Retries Properly

Distributed systems fail.

Compensation can fail too.

Consider:

Payment Failed
↓
Release Inventory
↓
Inventory Service Down
Enter fullscreen mode Exit fullscreen mode

Now rollback itself has failed.

Production systems usually implement:

  • retries
  • dead-letter queues
  • manual recovery workflows

Example:

func retry(
    ctx context.Context,
    attempts int,
    fn func() error,
) error {

    for i := 0; i < attempts; i++ {

        if err := fn(); err == nil {
            return nil
        }

        time.Sleep(
            time.Duration(i+1) *
                time.Second,
        )
    }

    return errors.New(
        "retry attempts exhausted",
    )
}
Enter fullscreen mode Exit fullscreen mode

Never assume compensation always succeeds.


Saga + Outbox Pattern

This is where things become interesting.

Most production systems combine:

Saga
+
Outbox Pattern
Enter fullscreen mode Exit fullscreen mode

Why?

Because Saga introduces events:

OrderCreated
InventoryReserved
PaymentCompleted
Enter fullscreen mode Exit fullscreen mode

Those events must be delivered reliably.

The Outbox Pattern guarantees:

  • no event loss
  • atomic persistence
  • safe retries

This combination is extremely common in modern microservices.


Idempotency Is Mandatory

Compensation may execute twice.

Retries may happen.

Network failures may duplicate requests.

Your operations must tolerate duplication.

Bad:

inventory -= 10
Enter fullscreen mode Exit fullscreen mode

Good:

if reservationAlreadyReleased {
    return nil
}
Enter fullscreen mode Exit fullscreen mode

Idempotency is not optional.

It is foundational to Saga reliability.


Observability Matters

Track:

  • saga started
  • saga completed
  • saga compensated
  • compensation failures
  • execution duration
  • retry count

Useful metrics:

saga_execution_total

saga_compensation_total

saga_failure_total

saga_duration_seconds
Enter fullscreen mode Exit fullscreen mode

If you cannot observe Saga behavior, you will eventually debug failures through database queries at 3 AM.


A Production Incident

A payment provider began timing out during a Black Friday campaign.

Order creation succeeded.

Inventory reservations succeeded.

Payment confirmations never arrived.

Without compensation:

50,000 products locked
Enter fullscreen mode Exit fullscreen mode

Customers could not purchase inventory that physically existed.

The warehouse team believed stock was depleted.

After implementing Saga compensation:

Payment Timeout
↓
Inventory Released
↓
Order Cancelled
Enter fullscreen mode Exit fullscreen mode

The system recovered automatically.

No manual intervention required.

This is exactly the type of failure Saga patterns are designed to handle.


Key Takeaways

  • Distributed transactions rarely scale well in microservices.
  • Saga patterns embrace eventual consistency rather than fighting it.
  • Compensation is business logic, not database rollback.
  • Retries and idempotency are mandatory.
  • Most production systems combine Saga and Outbox patterns.
  • Observability is critical for debugging distributed workflows.

Microservices make distributed failures inevitable.

Saga patterns don't eliminate those failures.

They make them survivable.

And in production systems, survivability is often more important than perfection.


Happy Coding 🚀

Top comments (0)