Serif COLAKEL

Posted on Jun 13

🧠 Exactly-Once Processing in Go: The Myth, Reality, and Production Patterns

#go #productivity #software #backend

Duplicate processing is not a bug.

It is the default behavior of reliable distributed systems.

Every distributed system eventually faces the same uncomfortable truth:

The moment you introduce:

Retries
Failovers
Message brokers
Network partitions
Service crashes

duplicates become inevitable.

Yet many engineers still believe that "exactly-once processing" is something a broker can magically provide.

It isn't.

In this article we'll explore:

Why exactly-once delivery is mostly a myth
Why at-least-once is the real-world standard
How idempotency actually works
How to build Inbox and Outbox patterns in Go
How Kafka, PostgreSQL, and Redis fit together in production

🧨 Exactly-Once Is Not a Transport Property

A message broker cannot guarantee exactly-once processing across your entire system.

Exactly-once delivery requires a Single Point of Truth (SPoT) capable of enforcing uniqueness.

Why?

Because delivery, processing, and persistence are separate failure domains.

Even if a broker behaves perfectly:

Your service can crash after processing
Your database can commit while ACK fails
Your consumer can retry unknowingly

This creates unavoidable duplication scenarios.

📊 Delivery Semantics in Real Systems

Model	Meaning	Reality
At-most-once	Message may be lost	Common in fire-and-forget systems
At-least-once	Message may be duplicated	Kafka, SQS, RabbitMQ
Exactly-once	Message processed once	Only possible inside bounded systems

👉 In practice, everything is at-least-once.

💥 Why Duplicates Happen

Consumer Crash After Processing

func handle(msg Message) error {
    if err := process(msg); err != nil {
        return err
    }

    return ack(msg)
}

Failure window:

text Process Message ✅ Commit Database ✅ Crash Service ❌ ACK Never Sent ❌

Result:

text Broker thinks processing failed ↓ Message redelivered ↓ Duplicate processing

Network Timeout After Success

`go
err := process(msg)

if err != nil {
return err
}

return broker.Ack(msg)
`

What happens if:

text ACK sent ↓ Network timeout ↓ Broker never receives ACK

The broker retries.

The operation runs again.

Retry Storms

A temporary latency spike can trigger:

text Client Retry ↓ Gateway Retry ↓ Service Retry ↓ Consumer Retry

Result:

text 1 failure ↓ 50 duplicate executions

🧠 The Real Solution: Idempotency

Instead of preventing duplicates:

text Wrong Question: How do I prevent duplicates?

Ask:

text Correct Question: How do I make duplicates harmless?

That's where idempotency comes in.

An idempotent operation produces the same final state no matter how many times it executes.

🚨 Naive Payment Service

Let's start with a broken implementation.

`go
func Charge(
ctx context.Context,
req PaymentRequest,
) error {

_, err := db.Exec(ctx, `
    INSERT INTO payments (
        id,
        amount
    )
    VALUES ($1, $2)
`,
    req.ID,
    req.Amount,
)

return err

}
`

Now imagine:

text Request arrives ↓ Payment inserted ↓ Response lost ↓ Client retries ↓ Payment inserted again

Customer charged twice.

✅ Production Solution #1: Database Unique Constraints

Create an Inbox table.

sql CREATE TABLE idempotency_keys ( event_id TEXT PRIMARY KEY, created_at TIMESTAMP DEFAULT now() );

Attempt to claim the key first.

go _, err := tx.Exec(ctx,
INSERT INTO idempotency_keys(event_id)
VALUES($1)
`, eventID)

if err != nil {
var pgErr *pgconn.PgError

if errors.As(err, &pgErr) &&
   pgErr.Code == "23505" {

    return nil
}

return err

}
`

The database becomes the source of truth.

⚠️ Common pgx Pitfall

This subtle bug catches many Go teams.

Wrong:

go import "github.com/jackc/pgconn"

Correct:

go import "github.com/jackc/pgx/v5/pgconn"

Otherwise:

go errors.As(err, &pgErr)

silently fails.

Your duplicate protection stops working.

🧪 Testing Duplicate Safety

Let's simulate 100 concurrent requests.

`go
func TestIdempotency(
t *testing.T,
) {
var wg sync.WaitGroup

for i := 0; i < 100; i++ {
    wg.Add(1)

    go func() {
        defer wg.Done()

        processOrder(
            context.Background(),
            "same-event-id",
        )
    }()
}

wg.Wait()

}
`

Expected outcome:

text 100 requests ↓ 1 insert succeeds ↓ 99 rejected safely

⚡ Redis Optimization

Postgres guarantees correctness.

Redis improves performance.

Use Redis only as:

text Fast Lock Layer

Not as:

text Source of Truth

Atomic Lua Guard:

`lua
local key = KEYS[1]

if redis.call("GET", key) then
return 1
end

redis.call(
"SET",
key,
"PENDING",
"EX",
60
)

return 0
`

This protects the database from thundering herds.

📦 Kafka Exactly-Once: What It Actually Means

Many engineers believe:

`text

Kafka Exactly Once

Business Logic Executes Once
`

Wrong.

Kafka guarantees:

text Producer ↓ Kafka ↓ Consumer

Kafka does NOT guarantee:

text Producer ↓ Kafka ↓ Consumer ↓ Postgres

The moment you touch an external database:

exactly-once disappears.

⚠️ Disable Auto Commit

Never rely on Kafka auto commits.

Bad:

go enable.auto.commit=true

Good:

go consumer, _ := kafka.NewConsumer( &kafka.ConfigMap{ "enable.auto.commit": false, }, )

Process first.

Commit later.

`go
err := handleOrder(
ctx,
db,
msg,
)

if err == nil {
consumer.CommitMessage(msg)
}
`

📤 Solving the Dual-Write Problem

This architecture is broken:

text Insert Order ↓ Publish Event

What if Kafka is down?

text Order exists ↓ Event lost forever

That's the Dual Write Problem.

🚀 Transactional Outbox Pattern

Store event publishing intent in the same transaction.

sql CREATE TABLE outbox ( id UUID PRIMARY KEY, payload JSONB, status TEXT DEFAULT 'PENDING' );

go _, err = tx.Exec(ctx,
INSERT INTO orders(...)
VALUES(...)
`)

_, err = tx.Exec(ctx, INSERT INTO outbox(...) VALUES(...))
`

Commit both together.

🔄 Outbox Worker

`go
type OutboxWorker struct {
db *pgxpool.Pool
broker Broker
}

func (w *OutboxWorker) Run(
ctx context.Context,
) {
ticker := time.NewTicker(
time.Second,
)

defer ticker.Stop()

for {
    select {
    case <-ctx.Done():
        return

    case <-ticker.C:
        w.processBatch(ctx)
    }
}

}
`

⚡ Scaling Workers Safely

Use PostgreSQL native locking.

sql SELECT * FROM outbox WHERE status = 'PENDING' FOR UPDATE SKIP LOCKED LIMIT 10;

Benefits:

No deadlocks
No duplicate workers
Infinite horizontal scaling

💳 The Hardest Part: External Side Effects

Database writes are easy.

External systems are not.

Examples:

Stripe
Twilio
SendGrid
Payment Gateways

Without idempotency:

text Duplicate Message ↓ Duplicate Payment ↓ Real Money Lost

Always verify external APIs support idempotency keys.

🛡️ Graceful Shutdown

Kubernetes gives you roughly:

text 30 seconds

before SIGKILL.

Handle shutdown properly.

`go
ctx, stop := signal.NotifyContext(
context.Background(),
syscall.SIGTERM,
)

defer stop()
`

Allow in-flight transactions to finish.

📈 Structured Observability

Track duplicates explicitly.

go slog.Info( "duplicate detected", "event_id", eventID, )

If you can't measure duplicates:

you can't prove idempotency works.

🏗️ Final Production Architecture

text Kafka │ ▼ Inbox Pattern (Idempotency) │ ▼ PostgreSQL │ ▼ Outbox Pattern │ ▼ Kafka / RabbitMQ

This architecture embraces reality:

`text
At-Least-Once Delivery
+
Idempotency
+

Recovery

Production Reliability
`

🚀 Key Takeaways

Exactly-once delivery does not exist across distributed systems.
At-least-once delivery is the industry standard.
Idempotency transforms duplicates into harmless events.
PostgreSQL UNIQUE constraints should be the source of truth.
Redis is an optimization layer, not a consistency layer.
Kafka auto commits can cause data loss.
Inbox + Outbox patterns form the backbone of reliable event-driven systems.
Reliability is not about preventing failures.
Reliability is about remaining correct despite failures.

DEV Community