SEN LLC

Posted on Apr 16

Writing dig in 500 Lines of Rust (with hickory-resolver, the trust-dns successor)

#rust #dns #cli #tutorial

Writing dig in 500 Lines of Rust (with hickory-resolver, the trust-dns successor)

dig is the standard but its output is cryptic. nslookup is dead. host is too terse for anything past A records. I wanted a tiny CLI that does A / AAAA / MX / TXT / CNAME / NS / SOA queries with readable output, supports any resolver and DNS-over-HTTPS, and prints JSON when I want to script against it. ~500 lines of Rust later, here it is.

📦 GitHub: https://github.com/sen-ltd/dns-lookup

The problem with the existing tools

If you debug DNS often, you have a love-hate relationship with at least one of dig, nslookup, and host. They each have a reason to exist and they each have a reason to make you swear:

dig is the canonical tool. It tells you everything. The problem is that "everything" includes flag bits, opcode names, the question section, the authority section, the additional section, query times measured against a server you didn't ask about, and a wire-format-ish answer section where the only information you actually wanted is buried two columns into a tab-separated mess.
nslookup is interactive-mode legacy from when network engineers ran it manually in the 1990s. It's been "deprecated" forever but ships everywhere because half the world's runbooks still reference it. Its output is somehow both terse and full of irrelevant lines like Non-authoritative answer: that nobody is parsing.
host is the friendly one, but it answers exactly one question — host example.com — and gives up the moment you want to see anything other than the default A/AAAA/MX bundle in its default format.

What I actually want most days is: given this name, here are the records of these types, with some indication of how long the lookup took and which server answered. That's it. No question section, no opcode header, no authority chase. And when I'm in a script, I want the same data as JSON.

So I wrote one. In Rust, because I wanted to see how good hickory-resolver had gotten, and because I wanted a single statically-linked binary I could drop into an Alpine container that's 11 MB total.

The hickory-resolver pleasant surprise

Quick name-clearing first: hickory-dns is the project formerly known as trust-dns. The rename happened in 2023 to avoid trademark friction with the unrelated "Trust" company name, and the crates are now hickory-proto, hickory-resolver, hickory-server, hickory-recursor. Same author (Benjamin Fry), same code lineage, same APIs in 0.24, just a different namespace. If you have old trust-dns-resolver in your Cargo.toml, you can do a search-and-replace.

The pleasant surprise is how much it does for you. I expected to write a UDP socket and a wire-format parser. I wrote neither. Here's what the entire dependency surface for this project looks like:

[dependencies]
clap = { version = "4", features = ["derive"] }
tokio = { version = "1", features = ["rt-multi-thread", "macros", "time"] }
hickory-resolver = { version = "0.24", features = ["dns-over-https-rustls", "tokio-runtime"] }

Three deps. clap for the CLI, tokio because the resolver is async, and hickory-resolver for the actual DNS work. With one feature flag, you get DoH-over-rustls bundled in.

Picking a transport without writing socket code

hickory has a ResolverConfig that owns the list of name servers, and a Protocol enum (Udp / Tcp / Https / Tls / Quic) that says how each name server is reached. The resolver itself is one type — TokioAsyncResolver — that you build from a config and a set of options. You don't pick a transport at the call site; you bake it into the resolver, and then just lookup(name, RecordType).

That's the design insight I think is worth pausing on. I came in expecting the API to be roughly udp_query(server, name, type) plus a separate tcp_query plus a separate doh_query. Instead, the transport is a property of the server, and the resolver knows how to reach each server in its list. UDP/TCP fallback (when a UDP response is truncated, you retry over TCP) is handled inside the connection abstraction, not at the call site, because if you've configured a UDP name server, the resolver knows it has the option to escalate to TCP without your intervention.

For dns-lookup, that means the wiring layer is small. Here's the whole resolver builder, lightly trimmed:

use hickory_resolver::config::{
    NameServerConfig, Protocol as HProtocol, ResolverConfig, ResolverOpts,
};
use hickory_resolver::TokioAsyncResolver;

pub fn build_resolver(
    resolver: Option<&str>,
    protocol: Protocol,
    timeout: Duration,
) -> Result<(TokioAsyncResolver, String), String> {
    let mut opts = ResolverOpts::default();
    opts.timeout = timeout;
    opts.attempts = 1;

    let (config, label) = if let Some(addr) = resolver {
        let ip: IpAddr = addr.parse()
            .map_err(|_| format!("--resolver: not an IP address: {addr}"))?;

        // DoH against a raw IP needs a TLS server name we don't have,
        // so for `--resolver IP --protocol https` we fall back to TCP/53
        // and surface that in the label.
        let (port, hproto, used_label) = match protocol {
            Protocol::Udp   => (53u16, HProtocol::Udp, "udp"),
            Protocol::Tcp   => (53u16, HProtocol::Tcp, "tcp"),
            Protocol::Https => (53u16, HProtocol::Tcp, "tcp(doh-fallback)"),
        };

        let mut cfg = ResolverConfig::new();
        cfg.add_name_server(NameServerConfig {
            socket_addr: SocketAddr::new(ip, port),
            protocol: hproto,
            tls_dns_name: None,
            tls_config: None,
            trust_negative_responses: true,
            bind_addr: None,
        });
        (cfg, format!("{addr}:{port}/{used_label}"))
    } else {
        // No explicit resolver: pick a known good public set so the
        // protocol flag is honored without fighting /etc/resolv.conf.
        let cfg = match protocol {
            Protocol::Https => ResolverConfig::cloudflare_https(),
            Protocol::Tcp | Protocol::Udp => ResolverConfig::cloudflare(),
        };
        (cfg, format!("cloudflare/{:?}", protocol))
    };

    Ok((TokioAsyncResolver::tokio(config, opts), /* label */ ...))
}

A couple of things worth noting:

opts.attempts = 1. By default hickory will retry timed-out queries a couple of times before giving up. For an interactive CLI where I want a sharp error message and a clean elapsed-time number, retries are the wrong default — they make the user think their network is fine ("only" 6 seconds!) when actually it's two failed attempts at 3 seconds each. Setting attempts = 1 is the difference between a CLI that feels honest and one that feels evasive.

The dns_dns_name/tls_config fields exist for TLS transports. When you use real DoH against a known provider via cloudflare_https(), hickory fills in the server name cloudflare-dns.com for the TLS handshake and brings its own bundled rustls config. When you point at an arbitrary IP, you don't have either of those, which is why my code explicitly downgrades --resolver IP --protocol https to plain TCP/53 and labels the answer tcp(doh-fallback) so the user can see what happened. The honest move is to surface the limitation in the output rather than silently produce nothing.

The whole transport choice is data, not control flow. If I wanted to add QUIC tomorrow, it's one more arm in the match and one more feature flag in Cargo.toml. The lookup code never sees it.

The record-type-as-data trick

Inside Rust, hickory hands you a Lookup that contains a stream of RData enum variants — RData::A(A(Ipv4Addr)), RData::MX(MX), RData::TXT(TXT), etc. Every variant has its own little helper struct with named accessors (mx.preference(), mx.exchange(), soa.serial(), …), and they all live in the hickory_proto::rr::rdata module.

This is correct — DNS records have wildly different shapes and you really do want different field names — but it makes the formatter's life harder, because the formatter doesn't want to know about hickory_proto::rr::rdata::soa::SOA at all. The formatter wants a flat data type it can match on, and tests want to be able to construct example records without standing up a hickory Message.

The fix is a one-way conversion at the resolver boundary. I have my own enum:

#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Record {
    A(IpAddr),
    Aaaa(IpAddr),
    Mx { preference: u16, exchange: String },
    Txt(String),
    Cname(String),
    Ns(String),
    Soa {
        mname: String,
        rname: String,
        serial: u32,
        refresh: i32,
        retry: i32,
        expire: i32,
        minimum: u32,
    },
}

…and a decode_rdata function that does the boring conversion exactly once:

pub fn decode_rdata(rdata: &hickory_resolver::proto::rr::RData) -> Option<Record> {
    use hickory_resolver::proto::rr::RData;
    match rdata {
        RData::A(a)    => Some(Record::A(IpAddr::V4(a.0))),
        RData::AAAA(a) => Some(Record::Aaaa(IpAddr::V6(a.0))),
        RData::MX(mx)  => Some(Record::Mx {
            preference: mx.preference(),
            exchange:   mx.exchange().to_utf8(),
        }),
        RData::TXT(txt) => {
            let joined = txt.txt_data().iter()
                .map(|b| String::from_utf8_lossy(b).to_string())
                .collect::<Vec<_>>().join("");
            Some(Record::Txt(joined))
        }
        RData::CNAME(name) => Some(Record::Cname(name.0.to_utf8())),
        RData::NS(name)    => Some(Record::Ns(name.0.to_utf8())),
        RData::SOA(soa)    => Some(Record::Soa {
            mname:   soa.mname().to_utf8(),
            rname:   soa.rname().to_utf8(),
            serial:  soa.serial(),
            refresh: soa.refresh(),
            retry:   soa.retry(),
            expire:  soa.expire(),
            minimum: soa.minimum(),
        }),
        _ => None,
    }
}

After decode_rdata, nothing in the program imports hickory_* except the lookup function itself. The formatter, the JSON encoder, the dig-style printer, and every test in the project use the plain Record enum.

Two payoffs:

The unit tests don't need the network. I can construct an MX { preference: 10, exchange: "mail.example.com".into() } in Rust source and run my JSON formatter against it. The whole formatter test suite — 17 tests covering text grouping, dig-style tabbing, JSON shape, escape rules, plural-vs-singular wording, duration formatting — runs in 0.00s with no sockets.
TXT-record weirdness is encapsulated. TXT records can hold multiple strings per record (each up to 255 bytes), because the wire format is length-prefixed and a single TXT RRdata can chain several of those length-prefixed chunks. Most callers want one string. So my decoder joins them — which is the right call for dig-style output — and the rest of the program never has to know.

The TXT thing in particular is the kind of detail you only learn by writing this. SPF records frequently exceed 255 bytes and are stored as two or three concatenated chunks; if you only print the first chunk, you've silently corrupted the SPF policy of whoever you're debugging. That's the bug class that makes dig output look ugly: it's printing every chunk separately with quotes around each, because it doesn't want to make the joining decision for you. I'm willing to make the decision; that's the point of writing a friendlier tool.

Format negotiation: text, json, dig

The CLI takes --format text|json|dig and the formatter is one entry point that fans out:

pub fn format(fmt: OutFormat, name: &str, outcome: &LookupOutcome, palette: &Palette) -> String {
    match fmt {
        OutFormat::Text => format_text(name, outcome, palette),
        OutFormat::Json => format_json(name, outcome),
        OutFormat::Dig  => format_dig(name, outcome),
    }
}

For JSON, I deliberately did not pull in serde. There's exactly one type to serialize and the field set is finite. Hand-writing the encoder is shorter than the derive ceremony, and I wanted the test suite to be able to assert on exact substrings of the JSON output without worrying about field ordering changing across serde versions. The whole encoder is one function with a manual escape pass:

fn json_str(s: &str) -> String {
    let mut out = String::with_capacity(s.len() + 2);
    out.push('"');
    for c in s.chars() {
        match c {
            '"'  => out.push_str("\\\""),
            '\\' => out.push_str("\\\\"),
            '\n' => out.push_str("\\n"),
            '\r' => out.push_str("\\r"),
            '\t' => out.push_str("\\t"),
            c if (c as u32) < 0x20 => out.push_str(&format!("\\u{:04x}", c as u32)),
            c => out.push(c),
        }
    }
    out.push('"');
    out
}

That's a complete RFC 8259-compliant string encoder for the subset I emit. Probably 30 lines saved over the serde route, and I get one fewer dep on the dependency tree. The test that proves it works against an SPF record containing both "include" and \all:

#[test]
fn json_escapes_quotes_and_backslashes_in_txt() {
    let oc = outcome(vec![(
        RecordKind::Txt,
        Record::Txt(r#"v=spf1 "include" \all"#.into()),
    )]);
    let out = format_json("example.com", &oc);
    assert!(out.contains(r#"\"include\""#));
    assert!(out.contains(r"\\all"));
}

For the text format, the only mildly clever thing is grouping by record type while preserving insertion order — which I do by tracking the previously-emitted kind in a one-element state machine. No HashMap<RecordKind, Vec<Record>>, no sorting, no two passes. Just a single linear walk over outcome.records.

Exit codes for shell scripting

0 — Query succeeded and returned at least one record
1 — Query succeeded but found nothing (NXDOMAIN / NoData)
2 — Bad arguments or network/protocol error

The split between 1 and 2 matters more than people realize. If you alias dns-lookup into a Bash health check, you want "host doesn't exist anymore" (exit 1) to be distinguishable from "the resolver itself is broken" (exit 2). Most shell scripts collapse them into "non-zero is bad" and that's fine, but the option is there.

To make this work, the resolver wrapper specifically catches ResolveErrorKind::NoRecordsFound and turns it into an empty Vec<Record> with Ok status, rather than propagating it as an error. That maps the DNS-protocol-level "the query worked, here's no answer" onto the shell-level exit-code split.

Tradeoffs (a.k.a. things I deliberately didn't ship)

This is v1, and I left the following on the floor:

No DNSSEC validation. hickory supports it, but the API surface is non-trivial and validating signatures correctly requires a trust anchor and a chain walk that I don't want to handle in 500 lines. Use dig +sigchase or delv if you actually need this.
No reverse DNS (-x). Easy to add — the code is addr.in-addr.arpa with a PTR query — but I haven't because I rarely use it interactively and I wanted v1 to be small.
No EDNS Client Subnet, no +trace, no zone transfers. Same reason. These are debugging features for DNS operators, and that's not the audience for dns-lookup. I want this tool to be the one a backend engineer reaches for when they're trying to figure out why their service can't reach the database.
DoH against an arbitrary --resolver IP falls back to TCP/53. As discussed above — the TLS handshake needs a server name we don't have. Use --protocol https without --resolver and you get real DoH against Cloudflare.

Try it in 30 seconds

docker run --rm ghcr.io/sen-ltd/dns-lookup example.com --type ALL --format json | jq .

Or build from source — Rust compiles in 17 seconds in the Alpine builder image, the runtime stage is alpine:3.20 with ca-certificates, and the final image is 11 MB.

git clone https://github.com/sen-ltd/dns-lookup
cd dns-lookup
cargo test                # 23 offline tests, no network
cargo build --release
./target/release/dns-lookup example.com --type MX

The whole codebase is five files: cli.rs (clap definitions), resolver.rs (the hickory wrapper and the Record decoder), formatter.rs (the three output modes plus all the unit tests), color.rs (a tiny no-deps ANSI palette), and main.rs (60 lines of wiring). If you want to learn how DNS records actually look in a typed language, or how to draw a hard line between an external library's data model and your own, I think it's a useful read.

The interesting takeaway for me was that hickory has gotten very good. I expected to wrestle with byte buffers; I wrote a match against an RData enum and went home. If you've been avoiding the project because you remember trust-dns from 2019, or because you assumed pure-Rust DNS would be a research project, both of those things are no longer true.

DEV Community

Writing dig in 500 Lines of Rust (with hickory-resolver, the trust-dns successor)

Writing dig in 500 Lines of Rust (with hickory-resolver, the trust-dns successor)

The problem with the existing tools

The hickory-resolver pleasant surprise

Picking a transport without writing socket code

The record-type-as-data trick

Format negotiation: text, json, dig

Exit codes for shell scripting

Tradeoffs (a.k.a. things I deliberately didn't ship)

Try it in 30 seconds

Top comments (0)