Gabriel Anhaia

Posted on Apr 29

The Strange Case of Go's nil Interface Comparison

#go #debugging #interfaces #backend

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A junior engineer joins your team. Their first PR is a clean
refactor of an error path. The CI passes. Two reviewers approve.
It ships on a Friday afternoon and the on-call rotation gets paged
on Saturday morning because a service that has run fine for two
years suddenly starts panicking under load.

Twelve lines in the diff. One of them is the bug. The fix takes
one keyword: return the untyped nil keyword instead of a typed
pointer through an error interface. If that sentence felt
obvious, this post is not for you. If it did not, sit down,
because this is the bug that bites most Go shops sooner or later
and it is going to bite yours next.

The Comparison That Lies

Run this. It is the smallest version of the bug.

package main

import "fmt"

type MyError struct{ msg string }

func (e *MyError) Error() string { return e.msg }

func doWork() error {
    var e *MyError = nil
    return e
}

func main() {
    err := doWork()
    fmt.Println(err == nil)   // false
    fmt.Println(err)          // <nil>
}

Output:

false
<nil>

The function returned nil. The print of the value shows <nil>.
The comparison against nil returns false. Three statements,
all true at the same time. The last one is what your caller
wrote, and what just paged your on-call.

Why This Happens

A Go interface value is not a pointer. It is a two-word struct
the runtime stores as (type, value). The first word is the
concrete type stored in the interface. The second word is the
data, which for pointer types is the pointer itself.

An interface compares equal to the literal nil only when both
words are zero (type unset and value unset). Any interface that
has a concrete type assigned is non-nil, even if the value half
of the pair is a nil pointer.

When doWork returns e, Go has to widen *MyError (a concrete
type) into error (an interface). The widening sets the type
word to *MyError and the value word to nil. The returned
interface is (type=*MyError, value=nil). That pair is not
the zero pair. err == nil is therefore false.

The print is misleading on purpose. fmt.Println follows the
Stringer/error chain and ends up calling the method through
the interface. The method receiver is a typed *MyError whose
value happens to be nil, so the runtime prints <nil> for the
underlying pointer. That <nil> is the rendered value of the
pointer, not a statement about the interface that wrapped it.

Three Real Shapes of This Bug

Teams I have talked to keep hitting one of three shapes.

Shape 1: named return + early defer

The classic shape. A function declares a named err error
return, defers a recovery handler, and assigns a typed nil to
the named return inside the recovery.

type DBError struct{ code int }

func (e *DBError) Error() string {
    return fmt.Sprintf("db error %d", e.code)
}

func Save(rec Record) (err error) {
    defer func() {
        if r := recover(); r != nil {
            var dbe *DBError
            // intent: "no DB error to report"
            err = dbe
        }
    }()
    return persist(rec)
}

func main() {
    if err := Save(rec); err != nil {
        log.Fatalf("save failed: %v", err)
    }
}

The author thought "I will set err to nil if I do not have a
real error to report." They wrote err = dbe where dbe is a
nil *DBError. The named return becomes (type=*DBError, value=nil). The caller's err != nil check fires and the
program exits with save failed: <nil>.

The fix is a single keyword. Return the untyped nil literal
into the interface, never a typed nil pointer.

defer func() {
    if r := recover(); r != nil {
        err = nil   // both words zero
    }
}()

Shape 2: repository returning a typed nil

The repository pattern is where this bug lives in long-running
Go services. The repository method returns a domain interface.
A construction path on the happy branch initialises a typed
pointer to nil and returns it.

type User interface {
    ID() string
    Email() string
}

type pgUser struct{ id, email string }

func (u *pgUser) ID() string    { return u.id }
func (u *pgUser) Email() string { return u.email }

type UserRepo struct{ db *sql.DB }

func (r *UserRepo) FindByID(id string) (User, error) {
    row := r.db.QueryRow(
        "SELECT id, email FROM users WHERE id = $1", id,
    )
    var u *pgUser
    if err := row.Scan(&u.id, &u.email); err != nil {
        if errors.Is(err, sql.ErrNoRows) {
            return u, nil   // BUG: typed nil into User
        }
        return nil, err
    }
    return u, nil
}

There are two bugs. The Scan call panics on a nil receiver,
which is the surface symptom. The deeper bug is the
return u, nil on the not-found branch. u is a nil *pgUser.
Returning it widens to User and produces (type=*pgUser, value=nil). The caller writes:

user, err := repo.FindByID(id)
if err != nil { return err }
if user == nil {
    return ErrUserNotFound      // never fires
}
fmt.Println(user.Email())       // panic: nil pointer deref

The user == nil check looks correct and is wrong. The next
line dereferences the typed nil and panics.

The fix is to return the untyped nil interface and the sentinel
error:

if errors.Is(err, sql.ErrNoRows) {
    return nil, ErrUserNotFound
}

Or, if you want to keep the not-found-as-nil convention, write
return nil, nil on the literal so the interface position never
carries a typed nil.

Shape 3: options struct with `any`

The third shape is the one that survives review the longest
because the offending pointer is two indirections away from the
return statement. An options struct holds a field of type any
(or interface{}). A constructor stores a typed nil into that
field. A consumer reads the field and checks it against nil.

type Options struct {
    Logger any   // accepts any logger; nil means "default"
}

type FileLogger struct{ path string }

func (l *FileLogger) Log(s string) { /* ... */ }

func NewOptions(logPath string) Options {
    var fl *FileLogger
    if logPath != "" {
        fl = &FileLogger{path: logPath}
    }
    return Options{Logger: fl}
}

func main() {
    opts := NewOptions("")
    if opts.Logger == nil {
        fmt.Println("using default logger")
    } else {
        fmt.Println("using configured logger")
    }
}

Output:

using configured logger

fl is a nil *FileLogger. Storing it in Options.Logger
widens it into any as (type=*FileLogger, value=nil). The
zero check fails. The program goes down the configured-logger
branch and the next call into Logger panics on the nil
receiver.

The Fixes That Actually Work

Three habits cover most of the cases that show up in production.

Habit 1: constructors return untyped nil interfaces

When a function's return type is an interface, never assign a
typed pointer to a return slot you intend to be nil. Return the
literal nil.

func FindByID(id string) (User, error) {
    if /* not found */ {
        return nil, ErrUserNotFound   // good
    }
    var u *pgUser
    return u, nil                     // bad: typed nil
}

The fix is mechanical. If the type-position you are returning
into is an interface, the nil you write is the keyword, not a
zero-valued typed variable.

Habit 2: check the concrete type, not the interface

When you cannot guarantee the producer follows habit 1 (because
it is third-party code, or generated code, or pre-existing code
the team has not converged on), do the nil check on the
underlying concrete type.

func saveStrict(rec Record) error {
    err := persist(rec)
    if err == nil {
        return nil
    }
    var dbe *DBError
    if errors.As(err, &dbe) && dbe == nil {
        return nil   // typed-nil DBError, treat as no error
    }
    return err
}

errors.As extracts the concrete pointer. The pointer you check
is *DBError, where == nil does the right thing because there
is no interface wrapper to confuse the comparison. Treat this as
a defensive bridge for producers you do not control, not a
default pattern: silently swallowing typed-nil errors will hide
real bugs the day a producer changes its semantics. reflect
offers the heavyweight version: reflect.ValueOf(x).IsNil()
returns true for typed nils stored in interfaces, at the cost of
allocations and a roughly 20x slower path than errors.As.

Habit 3: lint it before review

go vet's default analyzers do not flag this directly. The
community linters do.
nilness
catches some of it. exhaustruct
and nilnil catch
constructor returns and (nil, nil) return pairs. The standard
staticcheck runs SA4023 ("impossible nil check on concrete
type"), which is the closest first-party lint to this family.

Wire one or all of those into CI. The check costs nothing on
each run and pays for itself the first time it stops a typed-nil
return from reaching production.

Go find it in your repo

Grep your repo for var \w+ \*\w+ followed by a return into an
interface position. If your editor has good Go tooling, hovering
the return value will show you the interface type. Walk every
site. The bug is sitting in one of them, and now you know its
name.

If this was useful

Interfaces are where Go's type system meets the runtime, and
typed-nil is one of about a dozen quiet traps that live there.
The Complete Guide to Go Programming walks the runtime end to
end: how interface values are laid out, how widening works, why
the comparison rules are what they are, and how to write Go code
that does not depend on the reader having read this post.

DEV Community

The Strange Case of Go's nil Interface Comparison

The Comparison That Lies

Why This Happens

Three Real Shapes of This Bug

Shape 1: named return + early defer

Shape 2: repository returning a typed nil

Shape 3: options struct with `any`

The Fixes That Actually Work

Habit 1: constructors return untyped nil interfaces

Habit 2: check the concrete type, not the interface

Habit 3: lint it before review

Go find it in your repo

If this was useful

Top comments (0)

The Comparison That Lies

Why This Happens

Three Real Shapes of This Bug

Shape 1: named return + early defer

Shape 2: repository returning a typed nil

Shape 3: options struct with any

The Fixes That Actually Work

Habit 1: constructors return untyped nil interfaces

Habit 2: check the concrete type, not the interface

Habit 3: lint it before review

Go find it in your repo

If this was useful

Shape 3: options struct with `any`