DEV Community

Cover image for The Anti-Corruption Layer for a Legacy Database You Cannot Replace
Gabriel Anhaia
Gabriel Anhaia

Posted on

The Anti-Corruption Layer for a Legacy Database You Cannot Replace


A team I worked with inherited a Go service from a vendor rewrite. The brief was generous: build a fresh hexagonal domain with full test coverage. There was one catch.

The database was off-limits.

It had been running since 2003. Reports were built on top of it. Three other systems read from the same tables. A migration was on the five-year roadmap, which meant it would never happen. The new Go service had to read and write the same customers_v2 table the old system used, with the same column names and Y/N flags, and dt_cre timestamps that some rows stored as text and some as timestamp.

You cannot redesign a schema you do not own. You can put a wall between it and your domain.

What an Anti-Corruption Layer Is, in This Context

The pattern comes from Eric Evans's Domain-Driven Design (Addison-Wesley, 2003). An anti-corruption layer (ACL) is a translation boundary between your model and an external model whose shape you do not control.

Most ACL posts use it for a third-party API. This is the same pattern aimed inward. The legacy database is the foreign model. Your domain has a Customer aggregate. The schema has customers_v2 with twenty-three columns. Four are deprecated. Two are nullable but never null in practice. One is a VARCHAR(1) flag that means three different things depending on a sibling row.

The ACL is a Postgres adapter that speaks the legacy schema fluently and hands the domain a clean Customer. The domain never sees tflag1. It never has to.

The Schema You Inherit

Here is the table. Names sanitised, shape kept honest:

CREATE TABLE customers_v2 (
    cust_no       VARCHAR(12)  PRIMARY KEY,
    cust_nm       VARCHAR(80)  NOT NULL,
    em_addr       VARCHAR(120),
    tflag1        VARCHAR(1)   DEFAULT 'N',
    tflag2        VARCHAR(1)   DEFAULT 'N',
    cntry_cd      CHAR(2),
    dt_cre        VARCHAR(20),
    dt_upd        TIMESTAMP,
    bill_addr_1   VARCHAR(80),
    bill_addr_2   VARCHAR(80),
    bill_city     VARCHAR(60),
    bill_zip      VARCHAR(20),
    legacy_csid   INTEGER,
    notes_blob    TEXT
);
Enter fullscreen mode Exit fullscreen mode

A short tour, because every legacy schema has the same kinds of scars:

  • cust_no is the PK. It is a string with a checksum digit. Do not treat it as numeric.
  • tflag1 is 'Y' for active, 'N' for inactive, 'D' for deleted-but-kept-for-reports. Three states in a one-character column.
  • tflag2 is 'Y' if marketing consent was given, 'N' if not, NULL if the row predates GDPR.
  • dt_cre is sometimes 2003-04-12, sometimes 12/04/2003, sometimes 2003-04-12T00:00:00. The original loader was inconsistent. All three formats still appear in production.
  • The billing address is denormalised across four columns. The current system has a separate addresses table, but a migration was never finished.

You are not going to fix any of this. You are going to translate it.

The Domain Aggregate

The domain expresses the model your business uses today. Not what the schema looks like. What your code wants.

package customer

import "time"

type Status int

const (
    StatusActive Status = iota
    StatusInactive
    StatusArchived
)

type Address struct {
    Line1      string
    Line2      string
    City       string
    PostalCode string
    Country    string
}

type Customer struct {
    ID                string
    Name              string
    Email             string
    Status            Status
    MarketingConsent  bool
    BillingAddress    Address
    CreatedAt         time.Time
    UpdatedAt         time.Time
}
Enter fullscreen mode Exit fullscreen mode

No tflag1. No cntry_cd. No three-format string date. The domain has a Status enum with three named states, a MarketingConsent boolean, and a time.Time. These are the types service code, tests, and the HTTP layer all speak.

The Port

The domain defines what it needs from persistence. Defined where the caller lives.

package customer

import (
    "context"
    "errors"
)

var (
    ErrNotFound = errors.New("customer not found")
    ErrConflict = errors.New("customer write conflict")
)

type Repository interface {
    FindByID(
        ctx context.Context,
        id string,
    ) (Customer, error)

    Save(
        ctx context.Context,
        c Customer,
    ) error
}
Enter fullscreen mode Exit fullscreen mode

Two methods. No SQL. No mention of customers_v2. The repository contract is what the domain wants, not what the database offers.

The Adapter: Where the Translation Lives

The ACL itself: a Postgres adapter that knows the legacy schema and translates it to the domain on the way in, back to the schema on the way out.

A note on the implementation choice: we use database/sql with explicit column lists, not GORM auto-mapping. Auto-mappers tempt you into giving struct tags to your domain type so it lines up with the schema. That is the corruption you are trying to prevent. The whole point of the ACL is that the domain does not know tflag1 exists.

package legacydb

import (
    "context"
    "database/sql"
    "errors"
    "fmt"
    "strings"
    "time"

    "yourapp/domain/customer"
)

type CustomerRepo struct {
    db *sql.DB
}

func NewCustomerRepo(db *sql.DB) *CustomerRepo {
    return &CustomerRepo{db: db}
}

const customerCols = `
    cust_no, cust_nm, em_addr,
    tflag1, tflag2, cntry_cd,
    dt_cre, dt_upd,
    bill_addr_1, bill_addr_2,
    bill_city, bill_zip
`
Enter fullscreen mode Exit fullscreen mode

FindByID reads the legacy row into local scan variables, then hands the row to a mapper that does the translation:

func (r *CustomerRepo) FindByID(
    ctx context.Context,
    id string,
) (customer.Customer, error) {
    query := `SELECT ` + customerCols +
        ` FROM customers_v2 WHERE cust_no = $1`

    row := r.db.QueryRowContext(ctx, query, id)

    var raw legacyCustomerRow
    err := row.Scan(
        &raw.CustNo, &raw.CustNm, &raw.EmAddr,
        &raw.TFlag1, &raw.TFlag2, &raw.CntryCd,
        &raw.DtCre, &raw.DtUpd,
        &raw.BillAddr1, &raw.BillAddr2,
        &raw.BillCity, &raw.BillZip,
    )
    if errors.Is(err, sql.ErrNoRows) {
        return customer.Customer{},
            customer.ErrNotFound
    }
    if err != nil {
        return customer.Customer{},
            fmt.Errorf("scanning customer: %w", err)
    }

    return mapToDomain(raw)
}
Enter fullscreen mode Exit fullscreen mode

The legacyCustomerRow is a private struct that mirrors the schema exactly. It uses sql.NullString and sql.NullTime because the schema is full of nullable columns that the domain does not care about:

type legacyCustomerRow struct {
    CustNo    string
    CustNm    string
    EmAddr    sql.NullString
    TFlag1    sql.NullString
    TFlag2    sql.NullString
    CntryCd   sql.NullString
    DtCre     sql.NullString
    DtUpd     sql.NullTime
    BillAddr1 sql.NullString
    BillAddr2 sql.NullString
    BillCity  sql.NullString
    BillZip   sql.NullString
}
Enter fullscreen mode Exit fullscreen mode

This struct never leaves the adapter package. The domain has no idea it exists.

The mapper is the heart of the ACL: the function that turns schema into meaning.

func mapToDomain(
    r legacyCustomerRow,
) (customer.Customer, error) {
    status, err := mapStatus(r.TFlag1)
    if err != nil {
        return customer.Customer{}, err
    }

    createdAt, err := parseLegacyDate(r.DtCre)
    if err != nil {
        return customer.Customer{},
            fmt.Errorf(
                "customer %s has bad dt_cre: %w",
                r.CustNo, err,
            )
    }

    updatedAt := r.DtUpd.Time
    if !r.DtUpd.Valid {
        updatedAt = createdAt
    }

    return customer.Customer{
        ID:               r.CustNo,
        Name:             r.CustNm,
        Email:            r.EmAddr.String,
        Status:           status,
        MarketingConsent: mapConsent(r.TFlag2),
        BillingAddress: customer.Address{
            Line1:      r.BillAddr1.String,
            Line2:      r.BillAddr2.String,
            City:       r.BillCity.String,
            PostalCode: r.BillZip.String,
            Country:    r.CntryCd.String,
        },
        CreatedAt: createdAt,
        UpdatedAt: updatedAt,
    }, nil
}
Enter fullscreen mode Exit fullscreen mode

The three small helpers do the awkward work that would otherwise sit in the domain:

func mapStatus(
    flag sql.NullString,
) (customer.Status, error) {
    if !flag.Valid {
        return customer.StatusInactive, nil
    }
    switch strings.ToUpper(flag.String) {
    case "Y":
        return customer.StatusActive, nil
    case "N":
        return customer.StatusInactive, nil
    case "D":
        return customer.StatusArchived, nil
    default:
        return 0, fmt.Errorf(
            "unknown tflag1 value %q",
            flag.String,
        )
    }
}

func mapConsent(flag sql.NullString) bool {
    if !flag.Valid {
        return false
    }
    return strings.EqualFold(flag.String, "Y")
}

var dateFormats = []string{
    "2006-01-02T15:04:05",
    "2006-01-02",
    "02/01/2006",
}

func parseLegacyDate(
    s sql.NullString,
) (time.Time, error) {
    if !s.Valid || s.String == "" {
        return time.Time{}, nil
    }
    for _, f := range dateFormats {
        t, err := time.Parse(f, s.String)
        if err == nil {
            return t, nil
        }
    }
    return time.Time{}, fmt.Errorf(
        "unrecognised date %q", s.String,
    )
}
Enter fullscreen mode Exit fullscreen mode

The corruption stops here. The domain never sees 'Y', never sees a NullString, never has to guess which date format the row was written in.

The Write Path

The same translation runs in reverse for Save. Domain types in, legacy columns out:

func (r *CustomerRepo) Save(
    ctx context.Context,
    c customer.Customer,
) error {
    flag1, err := domainStatusToFlag(c.Status)
    if err != nil {
        return err
    }
    flag2 := "N"
    if c.MarketingConsent {
        flag2 = "Y"
    }

    _, err = r.db.ExecContext(ctx,
        `INSERT INTO customers_v2 (
            cust_no, cust_nm, em_addr,
            tflag1, tflag2, cntry_cd,
            dt_cre, dt_upd,
            bill_addr_1, bill_addr_2,
            bill_city, bill_zip
        ) VALUES (
            $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12
        )
        ON CONFLICT (cust_no) DO UPDATE SET
            cust_nm     = EXCLUDED.cust_nm,
            em_addr     = EXCLUDED.em_addr,
            tflag1      = EXCLUDED.tflag1,
            tflag2      = EXCLUDED.tflag2,
            cntry_cd    = EXCLUDED.cntry_cd,
            dt_upd      = EXCLUDED.dt_upd,
            bill_addr_1 = EXCLUDED.bill_addr_1,
            bill_addr_2 = EXCLUDED.bill_addr_2,
            bill_city   = EXCLUDED.bill_city,
            bill_zip    = EXCLUDED.bill_zip`,
        c.ID, c.Name, c.Email,
        flag1, flag2, c.BillingAddress.Country,
        c.CreatedAt.Format("2006-01-02T15:04:05"),
        c.UpdatedAt,
        c.BillingAddress.Line1,
        c.BillingAddress.Line2,
        c.BillingAddress.City,
        c.BillingAddress.PostalCode,
    )
    if err != nil {
        return fmt.Errorf("upserting customer: %w", err)
    }
    return nil
}

func domainStatusToFlag(
    s customer.Status,
) (string, error) {
    switch s {
    case customer.StatusActive:
        return "Y", nil
    case customer.StatusInactive:
        return "N", nil
    case customer.StatusArchived:
        return "D", nil
    default:
        return "", fmt.Errorf(
            "unknown status %d", s,
        )
    }
}
Enter fullscreen mode Exit fullscreen mode

New rows always write dt_cre in the canonical format. The adapter cleans the data on the way out without breaking historic rows on the way in.

What You Drop, What You Keep

The legacy schema also has columns the domain has no use for. legacy_csid, notes_blob, deprecated audit columns. The ACL leaves them out of the SELECT and the INSERT. The other systems that depend on them keep working because their own queries name the columns they care about.

Two rules hold the line:

  • The adapter package is the only code that imports database/sql and the only code that knows the table name.
  • The domain types have no tags, no SQL knowledge, no knowledge that a tflag1 ever existed.

If a developer ever needs to ask "what does tflag1 = 'D' mean?", the answer lives in one file: legacydb/customer.go. Not scattered across handlers, services, and reports.

Testing the Translation Directly

The mapper is a pure function. Test it without a database:

func TestMapToDomain_StatusFlags(t *testing.T) {
    cases := []struct {
        flag string
        want customer.Status
    }{
        {"Y", customer.StatusActive},
        {"N", customer.StatusInactive},
        {"D", customer.StatusArchived},
    }
    for _, c := range cases {
        row := legacyCustomerRow{
            CustNo: "C-1",
            CustNm: "Acme",
            TFlag1: sql.NullString{
                String: c.flag, Valid: true,
            },
            DtCre: sql.NullString{
                String: "2003-04-12", Valid: true,
            },
        }
        got, err := mapToDomain(row)
        if err != nil {
            t.Fatalf("flag=%s: %v", c.flag, err)
        }
        if got.Status != c.want {
            t.Fatalf(
                "flag=%s: got %v want %v",
                c.flag, got.Status, c.want,
            )
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Date-parsing tests are equally cheap. Every legacy quirk that bit you in production becomes a one-line test case. The mapper grows resilient as you find more dirty rows.

When the Roadmap Finally Lets You Migrate

The day the schema migration is approved — five years late, like always — you replace the adapter, not the domain.

A new postgresv3.CustomerRepo reads from the new clean tables. The composition root in main.go swaps one constructor:

// repo := legacydb.NewCustomerRepo(db)
repo := postgresv3.NewCustomerRepo(db)
Enter fullscreen mode Exit fullscreen mode

Service code does not change. Handlers do not change. Tests do not change. The domain has been speaking its own language all along, waiting for the database to catch up.

That is the payoff of an anti-corruption layer pointed inward. A schema you cannot replace today does not deserve veto power over the code you write tomorrow.


If this was useful

The anti-corruption layer is one of the load-bearing patterns when hexagonal architecture meets a real codebase. Hexagonal Architecture in Go walks through the full picture: port design, adapters for inbound and outbound boundaries, the difference between a vendor ACL and a database ACL, and how to migrate a service that started as direct SQL and grew into a domain.

Thinking in Go — the 2-book series on Go programming and hexagonal architecture

Top comments (0)