How We Built a Trust Registry in Go

#go #engineering #microservices #backend

Originally published on Truthlocks Blog

The trust registry is the most critical service in the Truthlocks platform. Every verification request, every agent identity check, every trust score query, every scope validation passes through it. If the trust registry is slow, everything is slow. If it is down, everything is down. If it has a bug, every verification decision is potentially wrong.

We built it in Go. This post explains why, and walks through the engineering patterns that let us serve verification requests with P99 latency under 40 milliseconds while maintaining strict multi tenant isolation and a complete audit trail.

Why Go

We evaluated Go, Rust, Java, and TypeScript for the trust registry. We chose Go for three reasons.

First, Go's concurrency model matches our workload perfectly. The trust registry handles thousands of concurrent verification requests, each of which involves multiple I/O operations: database queries, cache lookups, and occasionally external service calls. Goroutines let us handle this concurrency without the thread pool tuning that Java requires or the async complexity that Rust and TypeScript introduce.

Second, Go produces small, statically linked binaries that start in milliseconds. Our Docker images are under 30 MB. This matters for our deployment model (AWS ECS Fargate) where container startup time directly affects scaling responsiveness and cost.

Third, Go's standard library is excellent for the kind of service we are building. The net/http server, the database/sql interface, the encoding/json package, the crypto packages for Ed25519 operations: these are production quality, well tested, and performant. We added relatively few external dependencies.

Service Architecture

The trust registry follows a clean layered architecture. The HTTP handler layer parses requests, validates input, and delegates to the service layer. The service layer contains business logic and orchestrates calls to the repository layer. The repository layer handles all database interactions through a PostgreSQL connection pool.

Each layer has a defined interface. Handlers depend on service interfaces. Services depend on repository interfaces. The concrete implementations are wired together at startup through dependency injection. This is not a framework. It is plain Go interfaces and struct composition. No magic, no reflection, no code generation.

Multi Tenant Isolation

Every request to the trust registry carries a tenant context. The API gateway extracts the tenant identifier from the authentication token and passes it as an X-Tenant-ID header. The first thing the handler does is extract this header and propagate the tenant context through the entire request lifecycle.

At the database level, every table has a tenant_id column, and PostgreSQL row level security (RLS) policies enforce that queries can only access rows belonging to the current tenant. We set the tenant context on the database connection before executing any query:

func (r *repo) withTenantContext(ctx context.Context, tenantID uuid.UUID, fn func(tx pgx.Tx) error) error {
    return r.pool.BeginFunc(ctx, func(tx pgx.Tx) error {
        _, err := tx.Exec(ctx, "SET LOCAL app.current_tenant = $1", tenantID.String())
        if err != nil {
            return err
        }
        return fn(tx)
    })
}

The SET LOCAL ensures the tenant context applies only to the current transaction. It is automatically cleared when the transaction completes, preventing tenant context leakage between requests that share the same connection.

The Verification Hot Path

The verification endpoint is the most latency sensitive path in the system. When a verifier submits an attestation for verification, the service needs to:

Resolve the issuer's identity from the trust registry. Validate the issuer's trust level and active status. Retrieve the signing key and verify the cryptographic signature. Check the revocation registry. Verify transparency log inclusion.

We execute as many of these steps concurrently as possible. The issuer lookup and revocation check are independent and run in parallel goroutines. The signature verification starts as soon as the key is available. The transparency log check runs concurrently with the revocation check.

func (s *verificationService) Verify(ctx context.Context, req VerifyRequest) (*VerifyResult, error) {
    g, ctx := errgroup.WithContext(ctx)

    var issuer *Issuer
    var revoked bool

    g.Go(func() error {
        var err error
        issuer, err = s.registry.GetIssuer(ctx, req.IssuerDID)
        return err
    })

    g.Go(func() error {
        var err error
        revoked, err = s.revocation.IsRevoked(ctx, req.AttestationID)
        return err
    })

    if err := g.Wait(); err != nil {
        return nil, err
    }

    // Now verify signature with the retrieved key
    valid := s.crypto.VerifyEd25519(issuer.PublicKey, req.Payload, req.Signature)

    return &VerifyResult{
        Valid:      valid && !revoked,
        TrustLevel: issuer.TrustLevel,
    }, nil
}

The concurrent execution pattern shaves roughly 60% off the total latency compared to sequential execution.

Caching Strategy

The trust registry uses a two tier caching strategy. An in process LRU cache holds recently accessed issuer records and key material. A connection pool to PostgreSQL with prepared statements handles cache misses. We do not use an external cache like Redis because the added network hop would negate the latency benefit for our access patterns.

Cache invalidation is straightforward because the trust registry uses event sourcing. When an issuer's state changes (trust level update, key rotation, revocation), the event handler invalidates the corresponding cache entries. The next request fetches the current state from the database.

Graceful Shutdown and Health Checks

In a containerized environment, graceful shutdown is critical. When ECS sends a SIGTERM, the service stops accepting new requests, waits for in flight requests to complete (with a 30 second timeout), closes database connections cleanly, and then exits.

Health checks run at two levels. The liveness check confirms the process is running and responsive. The readiness check confirms that the database connection pool is healthy and the service can serve requests. ECS uses the readiness check to determine whether to route traffic to the container.

What We Would Do Differently

If we were starting the trust registry from scratch today, we would change two things.

We would use structured logging from day one. We started with unstructured log lines and migrated to structured JSON logging later. The migration was straightforward but time consuming. Starting with structured logging would have saved us weeks of log parsing scripts and dashboard rebuilds.

We would implement distributed tracing earlier. We added OpenTelemetry instrumentation after the initial launch. Having traces from the beginning would have significantly accelerated our ability to diagnose latency issues and optimize the verification hot path.

Everything else, from the language choice to the architecture to the database strategy, has proven out well under production load. Go was the right choice for this workload, and we are confident it will scale to the next order of magnitude.

Truthlocks provides machine identity infrastructure for AI agents. Register, verify, and manage non-human identities with trust scoring and instant revocation.