SEN LLC

Posted on Apr 19

A Cross-OS Port Finder in Rust — One CLI, Three Completely Different Data Formats

#rust #cli #linux #networking

A Cross-OS Port Finder in Rust — One CLI, Three Completely Different Data Formats

A tiny Rust CLI that answers "who is holding port 3000?" on macOS, Linux and Windows with the same flags and the same output shape — and optionally kills the offender. 488 KB binary, 42 tests, zero external crates beyond clap + serde.

npm run dev dies with EADDRINUSE: address already in use :::3000 and once again there's a zombie Node sitting on the port. On macOS the incantation is lsof -nP -iTCP:3000 -sTCP:LISTEN. On Linux it's ss -Hlntp "( sport = :3000 )" — unless it's Alpine, in which case ss may not have -p. On Windows it's netstat -ano | findstr :3000 followed by tasklist /FI "PID eq ..." to get the process name.

Three operating systems, three completely different commands, three different output shapes. If you ship software that runs on all three — and "ship" here includes "SSH into customer boxes for debugging" — you end up re-learning one of those spells every couple of weeks.

I wanted a single tool:

Same command on every OS. port-finder 3000 works on macOS, Linux and Windows.
Same output shape. Fixed five columns: PORT / PID / COMMAND / USER / ADDRESS.
Don't lie about dual-stack. A process bound to both 0.0.0.0:3000 and [::]:3000 is two sockets, not one — show both rows.
One-shot kill. --kill (SIGTERM) or --force (SIGKILL) without pulling out a second command.
--json for scripts. Anything I build has a monitoring / automation path.
cargo install-able static binary. No C dependency, no OpenSSL.

Writing it turned out to be more interesting than I expected. Each OS exposes "who is listening on this socket" in a genuinely different shape — not just different flags, but different data formats. The three parsers inside port-finder have almost nothing in common.

GitHub: https://github.com/sen-ltd/port-finder

The surface

# single port
$ port-finder 3000

# several
$ port-finder 3000 8080 5432

# range
$ port-finder 3000-3100

# show everything
$ port-finder

# kill after showing
$ port-finder 3000 --kill          # SIGTERM
$ port-finder 3000 --force         # SIGKILL

# into a pipeline
$ port-finder --json 8080 | jq '.listeners[] | {pid, command}'

Exit codes are three-valued:

Code	Meaning
`0`	At least one listener matched (or no ports were given).
`1`	The ports you asked about had no listeners.
`2`	Bad arguments, command failure, or a `--kill` that didn't land.

Three operating systems, three data formats

The fun starts here. Each OS has a completely different way of exposing socket → process attribution. port-finder has one backend per platform, selected at build time via #[cfg(target_os = ...)], but they all land in the same Vec<Listener>.

1. macOS — `lsof -F`, the "one field per line" format

macOS ships lsof. Called plainly, it's column-aligned — which breaks as soon as a command name has a space in it (Code Helper (Plugin), for one real-world example). Use -F to get one field per line instead:

$ lsof -nP -iTCP -sTCP:LISTEN -F pcLnt
p1103
crapportd
Lme
f12
tIPv4
n*:57768
f14
tIPv6
n*:57768

Each line starts with a single-letter tag:

Tag	Meaning	Scope
`p`	PID — starts a new process block	process
`c`	command name	process
`L`	login name (user)	process
`f`	fd — starts a new file block	file
`t`	file type (`IPv4` or `IPv6`)	file
`n`	name — for sockets, `addr:port`	file

The parser is a tiny state machine. Track the current (pid, command, user) at the process level and the current address family at the file level. Emit a Listener whenever you see an n. Reset file-level state at every f, reset everything at every p.

for raw in input.lines() {
    let (tag, value) = raw.split_at(1);
    match tag {
        "p" => { pid = Some(value.parse()?); command = None; user = None; ipv6 = false; }
        "c" => command = Some(value.into()),
        "L" => user    = Some(value.into()),
        "f" => ipv6    = false,                 // new fd → family unknown until `t` arrives
        "t" => ipv6    = value == "IPv6",
        "n" => { /* split `addr:port`, push a Listener */ }
        _   => {}
    }
}

The dual-stack trap lives here. lsof reports the IPv4 socket and the IPv6 socket of a dual-bound process with identical strings (*:57768 for both). Without the t marker you lose the distinction. The fix is to rewrite the * using the family we just tracked:

let address = match addr {
    "*" if ipv6 => "[::]".to_string(),
    "*"         => "0.0.0.0".to_string(),
    other       => other.to_string(),
};

Skip that step and you spend an evening wondering why killing "the one process on port 57768" doesn't free the port — because there were two sockets, bound to separate address families, and you only killed one of them.

2. Linux — `/proc/net/tcp` hex and byte order

Linux shells out to nothing. Everything lives in /proc/net/tcp and /proc/net/tcp6:

  sl  local_address rem_address   st ... uid ... inode
   0: 00000000:0BB8 00000000:0000 0A ...  1000 ... 987654 ...

There are three small traps and one larger design question.

Trap 1: Filter on state.

The st column's 0A is TCP_LISTEN (defined in include/net/tcp_states.h). Without the filter, your result set includes ESTABLISHED, TIME_WAIT and everything else.

Trap 2: The address is a __be32 printed with %X.

On a little-endian host, that prints the bytes reversed. 0100007F is 127.0.0.1:

u32::from_str_radix("0100007F", 16) → 0x0100007F (host-order u32)
.to_le_bytes() → [0x7F, 0x00, 0x00, 0x01] ← the original network-order bytes

The key move is .to_le_bytes(), not Ipv4Addr::from(u32). The From<u32> impl for Ipv4Addr treats the u32 as already in network byte order, which is exactly the wrong assumption here:

let word = u32::from_str_radix(addr_hex, 16)?;
Ipv4Addr::from(word.to_le_bytes())          // not Ipv4Addr::from(word)

IPv6 is the same trick four times: 32 hex chars → four 8-char chunks → four u32s → four .to_le_bytes() calls → 16 bytes → Ipv6Addr::from([u8; 16]).

Trap 3: The port is not reversed.

This one bit me on a previous project and I was ready for it this time. The kernel runs the port through ntohs(inet->inet_sport) before printing it (see get_tcp4_sock in net/ipv4/tcp_ipv4.c), so it comes out in host byte order already:

let port = u16::from_str_radix(port_hex, 16)?;   // just parse

0x0BB8 is 3000. If you "helpfully" swap bytes here, you get 0xB80B = 47115, and absolutely nothing matches.

Design question: inode → PID attribution.

/proc/net/tcp gives you a socket inode. It does not tell you which process owns that socket. You get the mapping by walking /proc/[0-9]+/fd/*, readlink()-ing each entry, and matching the socket:[<inode>] form:

fn scan_proc_sockets() -> HashMap<u64, u32> {
    let mut map = HashMap::new();
    for entry in std::fs::read_dir("/proc")?.flatten() {
        let Ok(pid) = entry.file_name().to_string_lossy().parse::<u32>() else { continue };
        let Ok(fds) = std::fs::read_dir(format!("/proc/{pid}/fd")) else { continue };
        for fd in fds.flatten() {
            let Ok(link) = std::fs::read_link(fd.path()) else { continue };
            if let Some(inode) = extract_socket_inode(&link.to_string_lossy()) {
                map.entry(inode).or_insert(pid);
            }
        }
    }
    map
}

Running without sudo means you can only read your own processes' fd directories, so inodes owned by other users may appear orphaned. lsof has the exact same constraint — this isn't a deficiency of the approach, it's how the kernel exposes the data.

Command name comes from /proc/<pid>/comm. The user name resolves by parsing /etc/passwd for the UID that was right there on the original row. Everything is stdlib.

3. Windows — `netstat` + `tasklist`, two passes

Windows has neither lsof nor /proc, so we combine the output of two built-ins:

> netstat -ano -p TCP
  TCP    0.0.0.0:135            0.0.0.0:0              LISTENING       964
  TCP    [::]:3000              [::]:0                 LISTENING       12345

Trap: the IPv6 literal's internal colons.

127.0.0.1:3000 and [::1]:3000 share one parser. A naïve rsplit(':') tears the IPv6 address apart. Split on the last colon that sits outside brackets:

pub fn split_address(s: &str) -> Option<(&str, &str)> {
    let mut depth = 0i32;
    let mut last = None;
    for (i, ch) in s.char_indices() {
        match ch {
            '[' => depth += 1,
            ']' => depth -= 1,
            ':' if depth == 0 => last = Some(i),
            _ => {}
        }
    }
    let i = last?;
    Some((&s[..i], &s[i + 1..]))
}

Then tasklist maps PID → image name:

> tasklist /FO CSV /NH
"System","4","Services","0","136 K"
"node.exe","12345","Console","1","100,032 K"

Trap: the memory column embeds commas.

The last column — "Mem Usage" — formats as 100,032 K. A naïve split(',') on that row shifts every column by one, so the PID you extract is actually the session type. Real CSV parsing is the only answer:

fn split_csv_row(line: &str) -> Vec<String> {
    let mut out = Vec::new();
    let mut cur = String::new();
    let mut in_quotes = false;
    let mut chars = line.chars().peekable();
    while let Some(ch) = chars.next() {
        match ch {
            '"' => {
                if in_quotes && chars.peek() == Some(&'"') {
                    cur.push('"'); chars.next();      // `""` inside a quoted field is a literal quote
                } else {
                    in_quotes = !in_quotes;
                }
            }
            ',' if !in_quotes => out.push(std::mem::take(&mut cur)),
            _ => cur.push(ch),
        }
    }
    out.push(cur);
    out
}

Twenty lines. Handles the "" escape rule for good measure, so the parser doesn't fall over if some future column ever embeds a quote.

Testing all three backends from anywhere

Three backends means CI gets awkward. A GitHub Actions Ubuntu runner can't exercise the macOS parser — unless you structure the code so it can.

The pattern: expose the parser as a pub fn parse(input: &str, ...) -> Result<...> that takes a plain string. Keep the live command invocation inside a separate #[cfg(target_os = "...")] function. Now any host can test any parser against fixture strings pulled from real output:

// src/linux.rs
pub fn parse(tcp: &str, tcp6: &str, ports: &[u16]) -> Result<Vec<TcpEntry>, Error> { ... }

#[cfg(target_os = "linux")]
pub fn find(ports: &[u16]) -> Result<Vec<Listener>, Error> {
    let tcp  = std::fs::read_to_string("/proc/net/tcp").unwrap_or_default();
    let tcp6 = std::fs::read_to_string("/proc/net/tcp6").unwrap_or_default();
    parse(&tcp, &tcp6, ports).map(/* + inode → pid resolution */)
}

#[cfg(not(target_os = "linux"))]
pub fn find(_: &[u16]) -> Result<Vec<Listener>, Error> { Err(Error::Unsupported) }

Fixtures are straight excerpts from real /proc/net/tcp, lsof -F, netstat -ano runs:

const TCP4: &str = "\
   0: 00000000:0BB8 00000000:0000 0A ...  1000 ... 987654 ...
   1: 0100007F:1F90 0100007F:C442 01 ...     0 ... 111111 ...  // ESTABLISHED — skipped
   2: 0100007F:0050 00000000:0000 0A ...     0 ... 222222 ...
";

#[test]
fn listen_rows_only() {
    let got = parse(TCP4, "", &[]).unwrap();
    assert_eq!(got.len(), 2);         // the ESTABLISHED row at index 1 is excluded
    assert_eq!(got[0].port, 3000);
    assert_eq!(got[0].address, "0.0.0.0");
}

This test runs on macOS, on Windows under CI, anywhere. Live verification of the #[cfg(target_os)] paths still needs a real machine for each OS — I run cargo test && ./target/release/port-finder on a macOS laptop, a Linux EC2 box, and a Windows VM — but 95 % of the logic is exercised by the portable fixtures.

`--kill` is just shelling out

Killing is OS-specific in detail but trivial to shell out for. On Unix, kill -15 <pid> (or -9 with --force); on Windows, taskkill /PID <pid> /F. No need for libc::kill, no need for an extra crate:

pub fn kill_pid(pid: u32, force: bool) -> Result<(), Error> {
    #[cfg(unix)] {
        let sig = if force { "-9" } else { "-15" };
        Command::new("kill").arg(sig).arg(pid.to_string()).status()?;
    }
    #[cfg(windows)] {
        let mut cmd = Command::new("taskkill");
        cmd.arg("/PID").arg(pid.to_string());
        if force { cmd.arg("/F"); }
        cmd.status()?;
    }
    Ok(())
}

port-finder 3000 --kill always prints the table before killing — you see what you're about to kill — and then de-duplicates PIDs before sending signals, so a process bound to 0.0.0.0:3000 and [::]:3000 gets one signal, not two.

Tests

Kind	Count	Covers
unit (macos)	7	lsof `-F` parser, IPv4/IPv6 wildcard rewrite, fd-boundary state reset
unit (linux)	10	`/proc/net/tcp` parser, hex decoding, LISTEN filter, inode extraction, uid resolution
unit (windows)	7	netstat column parser, bracketed IPv6, tasklist CSV (commas, `""` escape)
unit (render)	6	table-width computation, USER column auto-drop, JSON shape, null-user handling
unit (main)	5	port range expansion, dedup, bad-value rejection
CLI integration	6	`--help` / `--version` / invalid input / JSON wellformedness

42 tests, sub-second runtime. Everything is static fixtures and port-spec arithmetic — no /proc, no sockets, no containers required.

test result: ok. 31 passed (lib)
test result: ok. 5 passed (main)
test result: ok. 6 passed (cli integration)

Release profile

The usual Rust size-squeeze:

[profile.release]
strip = true
lto = true
codegen-units = 1
opt-level = "z"
panic = "abort"

Three deps (clap, serde, serde_json), no TLS, no crypto, no C libraries. macOS (arm64) comes out to 488 KB. Cheap enough to cargo install --path . into every toolbox you've got.

Wrap

port-finder is a tiny tool for a tiny question — "who is holding this port?" — but writing it surfaces three genuinely different worlds of data: lsof's tag-per-line fields, /proc/net/tcp's host-endian hex, and netstat's column output paired with tasklist's CSV-with-commas. The fact that "the same thing" is stored in three such different shapes is operating-system history written directly into the API surface.

Cross-platform CLIs are small pieces of unification layered over a lot of legacy. The 488 KB binary that falls out of cargo build --release carries all three decoders and answers a single question consistently. That's a trade I'll take.

Next time you yell "who took port 3000?" — try it out.

DEV Community

A Cross-OS Port Finder in Rust — One CLI, Three Completely Different Data Formats

A Cross-OS Port Finder in Rust — One CLI, Three Completely Different Data Formats

The surface

Three operating systems, three data formats

1. macOS — `lsof -F`, the "one field per line" format

2. Linux — `/proc/net/tcp` hex and byte order

3. Windows — `netstat` + `tasklist`, two passes

Testing all three backends from anywhere

`--kill` is just shelling out

Tests

Release profile

Wrap

Top comments (0)

A Cross-OS Port Finder in Rust — One CLI, Three Completely Different Data Formats

The surface

Three operating systems, three data formats

1. macOS — lsof -F, the "one field per line" format

2. Linux — /proc/net/tcp hex and byte order

3. Windows — netstat + tasklist, two passes

Testing all three backends from anywhere

--kill is just shelling out

Tests

Release profile

Wrap

1. macOS — `lsof -F`, the "one field per line" format

2. Linux — `/proc/net/tcp` hex and byte order

3. Windows — `netstat` + `tasklist`, two passes

`--kill` is just shelling out