Mathieu Kerjouan

Posted on Jun 18

Packet Filtering with nftables on Linux

#linux #firewall #sysadmin #nftables

iptables is probably the most used firewall by all sysadmin in the Linux ecosystem. The most recent wave of Linux admins are using ufw as well, mostly due to Ubuntu. Coming from BSD world, I was using ipfw (on FreeBSD), packet-filter (on FreeBSD and OpenBSD), and hated iptables for its confusing interface. Time passed and nftables came along the road as a stable alternative to iptables. So, let have some fun with it.

This post is not about NAT or packet forwarding. Those topics will require another publication due to their complexity. Furthermore, most of the sysadmins are configuring a firewall for servers (bare-metal) or cloud instance (virtual-machine). Sometimes, a firewall can also be used for containers, but most of the time, administrators are using the configuration automatically generated by Docker for example. In short: this post is more about filtering than translating packets.

Bootstrapping

It's rare to see nftables installed by default on any Linux distribution. Usually, the kernel is already compiled with the requirements, but the userland tools are not there. On Debian-like distribution, the nftables package can easily be installed with the apt-get command.

$ sudo apt-get install nftables

The same can be done on your favorite Linux distribution with the help of the specific package manager like on AlpineLinux, Gentoo, ArchLinux and so on.

For the rest of this post, I will assume you are using Debian 13 (Trixie), which is probably the most used Linux distribution one with Ubuntu, but even if you are using another distribution, the userland should be quite similar. All commands will also be executed as root to avoid adding sudo or doas every time.

Command Line Interface

The main userland tool to manage nftables is called nft. This firewall is dynamic, and any kind of rules can be added or removed even when the firewall is enabled and active.

# nft --help
Usage: nft [ options ] [ cmds... ]

Options (general):
  -h, --help                      Show this help
  -v, --version                   Show version information
  -V                              Show extended version information

Options (ruleset input handling):
  -f, --file <filename>           Read input from <filename>
  -D, --define <name=value>       Define variable, e.g. --define foo=1.2.3.4
  -i, --interactive               Read input from interactive CLI
  -I, --includepath <directory>   Add <directory> to the paths searched for include files. Default is: /etc
  -c, --check                     Check commands validity without actually applying the changes.
  -o, --optimize                  Optimize ruleset

Options (ruleset list formatting):
  -a, --handle                    Output rule handle.
  -s, --stateless                 Omit stateful information of ruleset.
  -t, --terse                     Omit contents of sets.
  -S, --service                   Translate ports to service names as described in /etc/services.
  -N, --reversedns                Translate IP addresses to names.
  -u, --guid                      Print UID/GID as defined in /etc/passwd and /etc/group.
  -n, --numeric                   Print fully numerical output.
  -y, --numeric-priority          Print chain priority numerically.
  -p, --numeric-protocol          Print layer 4 protocols numerically.
  -T, --numeric-time              Print time values numerically.

Options (command output formatting):
  -e, --echo                      Echo what has been added, inserted or replaced.
  -j, --json                      Format output in JSON
  -d, --debug <level [,level...]> Specify debugging level (scanner, parser, eval, netlink, mnl, proto-ctx, segtree, all)

I personally think editing firewall rules like iptables or nftables directly from the command line can be dangerous. In case of mistake, one can easily lose access to the server, but at least, if the rule was not stored in the configuration file, a simple reboot of the machine should fix the issue. Unfortunately, it's also easier to do mistakes when using the CLI, especially if the rules are complex. One should be always careful and read more than once the command to be executed.

It is also a good practice to use the -c (check) flag to see what kind of changes will be applied before altering the loaded configuration. Before starting our journey with nftables, we will flush all the ruleset configured and recreate something from scratch. If you are using nftables right now, you should probably save your configuration, but I assume if you are here, it's probably the first time you are using it, so, the ruleset should already be empty.

# nft flush ruleset

More than one table can be created, but here, we will try to recreate from scratch the default permissive ruleset only from command line. A table can be configured for an address family, it means if a table is created with the family ip it will work only for ipv4 address and not the rest. If you want to deal with ipv4 and ipv6 at the same time, the inet family was made for you. At this time, 6 families exist:

ip: catch only IPv4 packets;
ip6: catch only IPv6 packets;
inet: catch both IPv4 and IPv6 packets (dual stack);
arp: catch IPv4 ARP packets;
bridge: catch packets from a bridge;
netdev: catch packets on ingress and egress.

To create a new table, the command nft add table ${family} ${table_name} will be used. To list its content (ot check if the table already exist), the nft table list can be used.

# nft add table inet filter 

# ip netns exec bob nft list ruleset
table inet filter {
}

This new table is the only one on our system and will be used by default but a table without chains will do nothing. A chain contains the firewall rules to apply on packets, more than one chain can be created inside a table. A chain is also define by its type, its hook and its priority, all these parameters are mandatory. The only type we will study today is the filter type, mostly used to do filtering actions on packets.

A hook defines the flow of packets, for example, a chain set with the input hook will catch only incoming packets, a chain set with the output hook will catch only outgoing packets and finally a chain with the forward hook will catch only the packets being forwarded.

Finally, the priority parameter defines the priority level if other chains exist on the system. Indeed, when an incoming packet is catched by nftables, the firewall will take the first table available with the corresponding address family and then, will select a chain based on its priority.

The syntax here is a bit odd, be it is required. The parameters of the chain must be passed inside curly brackets, and then, can be interpreted by the shell. To avoid this situation, command should be passed as a string and not a list of argument. In short: you need single or double quotes around it. The command used to create a chain is nft add chain ${family} ${table_name} ${chain_name} { $parameters; }. Let create our first input chain.

# nft 'add chain inet filter input {type filter hook input priority filter; policy accept;}'

# nft list chain inet filter input
table inet filter {
        chain input {
                type filter hook input priority filter; policy accept;
        }
}

Great, next, we can create our output chain using the same process.

# nft 'add chain inet filter output {type filter hook output priority filter; policy accept;}'

# nft list chain inet filter output
table inet filter {
        chain output {
                type filter hook output priority filter; policy accept;
        }
}

Finally, we can create our forward chain. We will not use it in this article, but it can be required by some tools to work correctly, like Docker. Again, practically the same command.

# nft 'add chain inet filter forward {type filter hook forward priority filter; policy accept;}'

# nft list chain inet filter forward
table inet filter {
        chain forward {
                type filter hook forward priority filter; policy accept;
        }
}

If you have checked the commands before executing them, you probably saw the policy parameter. This chain's parameter is used to define the default policy of the chain. When a packet enters the firewall, it can then be accepted (accept) or dropped (drop). The default value is accept, but a serious firewall configuration should use drop instead.

Anyway, we have now all our table and chains create on our system, we can list them with nft -a list ruleset. The -a flag print also the handle id on each chains/rules and can be used to delete or modify it.

# nft -a list ruleset
table inet filter { # handle 2
        chain input { # handle 1
                type filter hook input priority filter; policy accept;
        }

        chain output { # handle 2
                type filter hook output priority filter; policy accept;
        }

        chain forward { # handle 3
                type filter hook forward priority filter; policy accept;
        }
}

Lucky us, this output can also be used as configuration file! You can simply redirect it to a file, and voilà, you have your firewall configuration saved on your disk. Only the header will be missing, but it can easily be added with an echo or printf command.

# touch nftables_default.conf

# chmod +x nftables_default.conf

# echo '#!/usr/sbin/nft -f' > nftables_default.conf

# echo 'flush ruleset' | tee nftables_default.conf

# nft -a list ruleset | tee nftables_default.conf

Now we have a backup of our default permissive firewall configuration, we can play with the rules. In case of problem, like, if you are losing your connection, or you want to come back to this milestone, the only command you have to execute will be the one below.

# ./nftables_default.conf

As you know, all our chains are permissive, accepting all packets by default. We will keep this policy during the design phase of the rules. Usually, if you have a server running, you are probably using sshd on it to grab a shell. It would be unfortunate to lose access to the server because of a firewall rule. Right? Yeah, so, let add our first rule to explicitly allow ssh connections. The command to add a rule is nft add rule ${family} ${table_name} ${chain_name} ${rule}, where ${rule} is a string made of keywords to match packets and defines the action to apply on them. In our case, we want to allow all packets from anywhere to the destination TCP/22 port.

# nft add rule inet filter input tcp dport 22 accept

# nft -a list chain inet filter input
table inet filter {
        chain input { # handle 1
                type filter hook input priority filter; policy accept;
                tcp dport 22 accept # handle 4
        }
}

If you are on a server, you probably have other daemons running, like nginx or httpd (http servers). You would probably also allow TCP/80 and TCP/443 to be available to everyone, but, having a security there to restrict access to bots for example can be great. Furthermore, you are lazy, and you don't want to modify and reload your rules every time you restrict access to some wild IP addresses. This is where Sets will help us. A set can be created with the nft add set ${family} ${table_name} ${set_name} { params* } command.

# nft 'add set inet filter blackhole {
    type inet_proto;
    flags timeout;
    timeout 60s; 
  }'

Parameters given to a set are useful. For example, the set blackhole previously created will store inet_proto (TCP or UDP port) and the timeout flag configures the retention time of each elements, in our case: 60 seconds. To delete a set, the nft delete set ${family} ${table_name} ${set_name} can be used.

# nft 'delete set inet filter blackhole'

Let creates a new sec called blackhole4, this one will store IPv4 addresses for 1 hour.

# nft 'add set inet filter blackhole4 {
    type ipv4_addr ; 
    flags timeout; 
    timeout 1h; 
  }'

The specification and the content of a set can be returned via the nft list set ${family} ${table_name} ${set_name} function.

# nft 'list set inet filter blackhole4'
table inet filter {
        set blackhole4 {
                type ipv4_addr
                timeout 1h
        }
}

If not defined with the parameters elements, a set is empty by default. To add a new element in it, the nft element add ${family} ${table_name} ${set_name} { ELEMENT } can be executed. Let's try out with one IP address.

# nft 'add element inet filter blackhole4 { 1.1.1.1 }'

# nft 'list set inet filter blackhole4'
table inet filter {
        set blackhole4 {
                type ipv4_addr
                timeout 1h
                elements = { 1.1.1.1 expires 59m58s166ms }
        }
}

The same can be done for IPv6.

# nft 'add set inet filter blackhole6 { type ipv6_addr ; flags timeout; timeout 3600s; }'

# nft 'list set inet filter blackhole6'
table inet filter {
        set blackhole6 {
                type ipv6_addr
                timeout 1h
        }
}

If you want to support IPv4 or IPv6 range (1.2.3.4/32 or ffce::/64 for example), the flag interval is required.

# nft 'delete set inet filter blackhole4'

# nft 'add set inet filter blackhole4 {
    type ipv4_addr ; 
    flags timeout ; 
    timeout 1h ;
    flags interval;
  }'

# nft 'delete set inet filter blackhole4'

# nft 'add set inet filter blackhole6 { 
    type ipv6_addr ; 
    flags timeout; 
    timeout 1h; 
    flags interval; 
  }'

Now, IP addresses ranges can be added.

# nft 'add element inet filter blackhole4 { 1.2.3.4/24 }'

# nft 'list set inet filter blackhole4'
table inet filter {
        set blackhole4 {
                type ipv4_addr
                flags interval,timeout
                timeout 1h
                elements = { 1.2.3.0/24 expires 59m57s486ms }
        }
}

Why using sets? It will make your life easier. You can create many of them, and adds IP addresses directly from the command line. The only drawback is due to the retention, if you plan to have fixed IP addresses in those set, you can use instead anonymous sets or configure the elements parameters with the list of persistent elements.

A rule can use those named sets by prefixing their name with @ during the rule definition. For example, if we want to block all IP addresses from blackhole4 and blockhole6 sets, we can use the following rules.

# nft 'add rule inet filter input log ip saddr @blackhole4 drop'

# nft 'add rule inet filter input log ip6 saddr @blackhole6 drop'

A log statement in the rule. Everytime a packet will reach this rule, it will also be logged. Then, we can allow HTTP/80 and HTTP/443

# nft 'add rule inet filter input tcp dport 80 accept'

# nft 'add rule inet filter input tcp dport 443 accept'

Here the final rules defined in the input chain.

# nft list chain inet filter input
table inet filter {
        chain input {
                type filter hook input priority filter; policy accept;
                tcp dport 22 accept
                log ip saddr @blackhole4 drop
                log ip6 saddr @blackhole6 drop
                tcp dport 80 accept
                tcp dport 443 accept
        }
}

If you feel confident enough, you can now switch the default policy to drop. In fact, if your services are simply listening to TCP/22, TCP/80 and TCP/443, it will change nothing, but all other potential daemon listening to other ports will not be visible anymore from the outside world.

# nft 'add chain inet filter input { policy drop ; }'

# nft 'list chain inet filter input'
nft 'list chain inet filter input'
table inet filter {
        chain input {
                type filter hook input priority filter; policy drop;
                tcp dport 22 accept
                log ip saddr @blackhole4 drop
                log ip6 saddr @blackhole6 drop
                tcp dport 80 accept
                tcp dport 443 accept
        }
}

Ah! But we did something very bad! By default, we are blocking everything on all interfaces, including the loopback (lo). It can be a problem!

# nft -a 'list chain inet filter input'
table inet filter {                                                                                      
        chain input { # handle 1               
                type filter hook input priority filter; policy drop;
                tcp dport 22 accept # handle 19
                log ip saddr @blackhole4 drop # handle 20
                iif "lo" accept # handle 28
                log ip6 saddr @blackhole6 drop # handle 21
                tcp dport 80 accept # handle 22
                tcp dport 443 accept # handle 23
        }
}

We want to add this rule before the SSH rule (handle 19), to do that, insert command will be used instead of add. To match an interface, iif or iifname keywords. Here, iif will do the job. iifname is useful when some rules are required for non-existing interfaces (an answer from StackOverflow about the difference between those two keywords is great).

# nft -a 'insert rule inet filter input handle 19 iif lo accept'

# nft -a 'list chain inet filter input'                                                                                                                                                      
table inet filter {
        chain input { # handle 1
                type filter hook input priority filter; policy drop;
                iif "lo" accept # handle 31
                tcp dport 22 accept # handle 19
                log ip saddr @blackhole4 drop # handle 20
                log ip6 saddr @blackhole6 drop # handle 21
                tcp dport 80 accept # handle 22
                tcp dport 443 accept # handle 23
        }
}

As you can see, the rule has been added before the SSH one with the handle 31. That's pretty cool, but we would also like to have some statistics regarding each rule usage.

# nft 'replace rule inet filter input handle 19 tcp dport 22 counter accept'

# nft -a 'list chain inet filter input'
table inet filter {
        chain input { # handle 1
                type filter hook input priority filter; policy drop;
                iif "lo" accept # handle 31
                tcp dport 22 counter packets 0 bytes 0 accept # handle 19
                log ip saddr @blackhole4 drop # handle 20
                log ip6 saddr @blackhole6 drop # handle 21
                tcp dport 80 accept # handle 22
                tcp dport 443 accept # handle 23
        }
}

# nc -zv localhost 22
Connection to 127.0.0.1 22 port [tcp/ssh] succeeded!

# nft -a 'list chain inet filter input'
table inet filter {
        chain input { # handle 1
                type filter hook input priority filter; policy drop;
                iif "lo" accept # handle 31
                tcp dport 22 counter packets 17 bytes 996 accept # handle 19
                log ip saddr @blackhole4 drop # handle 20
                log ip6 saddr @blackhole6 drop # handle 21
                tcp dport 80 accept # handle 22
                tcp dport 443 accept # handle 23
        }
}

Before deploying this kind of configuration, always test it many tests and don't hesitate to read the documentation or ask questions on mailing-list (nftables mailing-listor NANOG) or forums. Be aware a wrong firewall configuration can have disastrous effect on your network. If you are happy with your rules, and you want to have it enabled when the system is booting, don't forget to activate the service with systemd on Debian (or with the service manager of your distribution).

# systemctl status nftables

# systemctl enable nftables

# systemctl start nftables

Configuration and Syntax

The main nftables configuration can be found in /etc/nftables.conf on most of the distributions but it can also be stored in /etc/nftables.rules.d directory (e.g. Gentoo). This file will contain the firewall rules loaded nftables at boot time, then, this is critical to be sure the rules have been tested first to avoid being blocked if you are using SSH for example.

distribution	file or directory
Debian-like	`/etc/nftables.conf`
Gentoo	`/etc/nftables.rules`
Gentoo	`/etc/nftables.rules.d`
Alpine	`/etc/nftables.nft`
ArchLinux	`/etc/nftables.conf`

On Debian - and probably on other distribution as well - one can find few use cases and examples in /usr/share/doc/nftables/examples directory. It can be helpful if you are starting with nftables and you have some trouble to understand how to configure it correctly.

# find /usr/share/doc/nftables/examples/ | sort
/usr/share/doc/nftables/examples/
/usr/share/doc/nftables/examples/all-in-one.nft
/usr/share/doc/nftables/examples/arp-filter.nft
/usr/share/doc/nftables/examples/bridge-filter.nft
/usr/share/doc/nftables/examples/ct_helpers.nft
/usr/share/doc/nftables/examples/inet-filter.nft
/usr/share/doc/nftables/examples/inet-nat.nft
/usr/share/doc/nftables/examples/ipv4-filter.nft
/usr/share/doc/nftables/examples/ipv4-mangle.nft
/usr/share/doc/nftables/examples/ipv4-nat.nft
/usr/share/doc/nftables/examples/ipv4-raw.nft
/usr/share/doc/nftables/examples/ipv6-filter.nft
/usr/share/doc/nftables/examples/ipv6-mangle.nft
/usr/share/doc/nftables/examples/ipv6-nat.nft
/usr/share/doc/nftables/examples/ipv6-raw.nft
/usr/share/doc/nftables/examples/load_balancing.nft
/usr/share/doc/nftables/examples/nat.nft
/usr/share/doc/nftables/examples/netdev-ingress.nft
/usr/share/doc/nftables/examples/overview.nft
/usr/share/doc/nftables/examples/pf.os
/usr/share/doc/nftables/examples/README
/usr/share/doc/nftables/examples/secmark.nft
/usr/share/doc/nftables/examples/sets_and_maps.nft
/usr/share/doc/nftables/examples/sysvinit
/usr/share/doc/nftables/examples/sysvinit/nftables.init
/usr/share/doc/nftables/examples/sysvinit/README
/usr/share/doc/nftables/examples/workstation.nft

Most of the distributions are using an ultra-permissive ruleset by default, it allows everything from everywhere. Even activated, in this case, the firewall is mostly useless. You can see below the content of /etc/nftables.conf on Debian.

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority filter;
        }
        chain forward {
                type filter hook forward priority filter;
        }
        chain output {
                type filter hook output priority filter;
        }
}

Two methods exist to load this code, the first one is to simply execute the file (note the shebang pointing to the nft command in the first line of the configuration).

# /etc/nftables.conf

The second method is to load it manually with nft and the -f flag followed by the path where the configuration is stored.

# nft -f /etc/nftables.conf

In both case, the firewall will load the rules, and they can now be displayed with nft.

# nft list ruleset
table inet filter {
        chain input {
                type filter hook input priority filter; policy accept;
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
        }

        chain output {
                type filter hook output priority filter; policy accept;
        }
}

Using a configuration file containing nftables rules is easier to deal with than to hack and grind the command line interface. In fact, I usually prefer designing my firewall this way to avoid the annoying index and handle references to manage. Let re-create our previous rules from the command line section in /tmp/nftables_rules.conf to improve it.

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
  set blackhole4 {
    type ipv4_addr
    flags interval,timeout
    timeout 1h
  }

  set blackhole6 {
    type ipv6_addr
    flags interval,timeout
    timeout 1h
  }

  chain input {
    type filter hook input priority filter; policy drop;
    iif "lo" accept
    tcp dport 22 counter
    log ip saddr @blackhole4 drop
    log ip6 saddr @blackhole6 drop
    tcp dport 80 accept
    tcp dport 443 accept
  }

  chain forward {
    type filter hook forward priority filter; policy accept;
  }

  chain output {
    type filter hook output priority filter; policy accept;
  }
}

What we can do with that? Well, we can group the duplicated rules or the rules having the same behaviors. For example, allowing TCP/80 and TCP/443 are related to the same service, why not grouping them together inside their own chain?

  # ...
  chain http {
    counter
    tcp dport {80, 443} accept
  }
  # ...

Now, we can remove those rules from the input chain and use a jump or a goto statement. The jump statement is used to route a packet inside another chain and get it back if no changes happened. The goto statement will give the packet to another chain and will never get it back, the policy of this chain will be the final one.

  # ...
  chain input {
    type filter hook input priority filter; policy drop;
    iif "lo" accept
    tcp dport 22 counter
    log ip saddr @blackhole4 drop
    log ip6 saddr @blackhole6 drop
    jump http
  }
  # ...

# nft -f /tmp/nftables_rules.conf

# nft list ruleset
table inet filter {
        set blackhole4 {
                type ipv4_addr
                flags interval,timeout
                timeout 1h
        }

        set blackhole6 {
                type ipv6_addr
                flags interval,timeout
                timeout 1h
        }

        chain input {
                type filter hook input priority filter; policy drop;
                iif "lo" accept
                tcp dport 22 counter packets 17 bytes 996 accept
                log ip saddr @blackhole4 drop
                log ip6 saddr @blackhole6 drop
                jump http
        }

        chain http {
                counter packets 4 bytes 216
                tcp dport 80 accept
                tcp dport 443 accept
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
        }

        chain output {
                type filter hook output priority filter; policy accept;
        }
}

`netns` Network Simulation

Most of the readers should know Docker, perhaps Podman or LxC/Incus. Maybe OpenVZ was one of your tool in the past. In fact, those projects are userland interfaces to control Linux namespaces and cgroups. All those features mixed together offer a way to isolate running processes from different point of view (pid, network, users, etc...).

Simulating a network on Linux was never so easy. A huge effort was made by the Linux developers to create enough tools to deal with virtual network. A new network stack is created for each new netns created. On most of the Linux distribution available, no namespaces are created. It can be checked using ip netns command followed by the list subcommand. It should return nothing, because no namespace are present.

# ip netns list

To create such namespace, the command ip netns and the subcommand add followed by the name of the namespace. Let create 3 network namespace: alice, bob and eve. As you can see with the last command, those namespaces are now returned when we invoke the ip netns list command.

# ip netns add alice

# ip netns add bob

# ip netns add eve

# ip netns list
eve
bob
alice

These new network stacks are totally isolated from the rest of the system (in fact, not totally, because we are sharing the same kernel, and those namespaces are not working inside a virtual machine). Each of them are initialized with a lo (loopback) interface by default. To execute a command using a specific namespace, one can use ip netns exec followed by the id of the namespace and the command to execute. On Linux, we can use ifconfig (legacy) or ip (modern). I would recommend to use ip because of its modern integration with the recent kernel and the insane amount of features it already support. So in our case, if we want to see the interfaces created, we can invoke ip link show, it will print the complete list of devices present. The full command should look like that: ip netns exec ${namespace_name} ip link show.

# ip netns exec alice ip link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

# ip netns exec bob ip link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

# ip netns exec eve ip link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

Now, we can simulate a real network. First thing to do, create a schema to see how the namespaces will communicate together.

Not sure if this is really accurate, but as you can see, we will have 3 namespace, alice and bob will be connected to eve via a bridge. nftables will also be enabled and active on the three namespaces. Now, how to create a virtual interface? To do that, we can use the veth interface, it will provide a way to link two namespaces. You can also see it a bit like a virtual cable, with two ends, one plugged on one namespace, and the other, on the other namespace. The command to use when creating a new veth interface is ip link add. The syntax is a bit unusual for Unix/Linux system, but it looks like that the following snippet.

ip link add \
  ${veth_netns1_name} netns ${netns1_name} \
  type veth \
  peer ${veth_netns2_name} netns ${netns2_name}

${veth_netns1_name} will be used to rename the interface present in the ${netns1_name} namespace. ${veth_netns2_name} will be used to rename the interface created in the ${netns2_name} namespace. Let try those commands. As you can see, after executing those commands, the eve@if2 interface has been added in alice namespace, the eve@if3 has been added in the bob namespace, two interfaces called alice@if2 and bob@if3 have been added in the eve namespace.

# ip link add eve netns alice type veth peer alice netns eve

# ip link add eve netns bob type veth peer bob netns eve

# ip netns exec alice ip link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eve@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether be:af:4b:81:c7:de brd ff:ff:ff:ff:ff:ff link-netns eve

# ip netns exec bob ip link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eve@if3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 76:4c:0d:53:e5:4a brd ff:ff:ff:ff:ff:ff link-netns eve

# ip netns exec eve ip link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: alice@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 3a:a9:54:25:ec:5a brd ff:ff:ff:ff:ff:ff link-netns alice
3: bob@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 66:6a:89:88:06:7b brd ff:ff:ff:ff:ff:ff link-netns bob

Now all our namespaces are connected via eve, we can create a bridge in it. Again, the command ip link add can be used here. All new veth interfaces added in the eve namespace will be plugged into bridge0. To do that, all those interfaces must have it has master.

# ip netns exec eve ip link add type bridge

# ip netns exec eve ip link set dev alice master bridge0

# ip netns exec eve ip link set dev bob master bridge0
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: alice@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master bridge0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 3a:a9:54:25:ec:5a brd ff:ff:ff:ff:ff:ff link-netns alice
3: bob@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master bridge0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 66:6a:89:88:06:7b brd ff:ff:ff:ff:ff:ff link-netns bob
4: bridge0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 3a:a9:54:25:ec:5a brd ff:ff:ff:ff:ff:ff

It looks good, alice@if2 and bob@if2 have been added into bridge0. Now, we enabled all interface state (from DOWN to UP). The command is simple: ip link set dev ${device_name} up. Let execute that for the alice namespace.

# ip netns exec alice ip link set dev eve up

# ip netns exec alice ip link show 
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eve@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether be:af:4b:81:c7:de brd ff:ff:ff:ff:ff:ff link-netns eve

We can do the same for the bob namespace...

# ip netns exec bob ip link set dev eve up

# ip netns exec bob ip link show 
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eve@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 76:4c:0d:53:e5:4a brd ff:ff:ff:ff:ff:ff link-netns eve

And finally activate all interfaces in the eve namespace.

# ip netns exec eve ip link set dev bridge0 up

# ip netns exec eve ip link s dev alice up

# ip netns exec eve ip link set dev bob up

# ip netns exec eve ip link show 
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: alice@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UP mode DEFAULT group default qlen 1000
    link/ether 3a:a9:54:25:ec:5a brd ff:ff:ff:ff:ff:ff link-netns alice
3: bob@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UP mode DEFAULT group default qlen 1000
    link/ether 66:6a:89:88:06:7b brd ff:ff:ff:ff:ff:ff link-netns bob
4: bridge0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 3a:a9:54:25:ec:5a brd ff:ff:ff:ff:ff:ff

That's great, we can now configure IP addresses. eve will act as a router here, so, we can give it 10.0.0.1/8.

To configure the L3 network stack (e.g. IPv4, IPv6) the ip address command can be used. To add a new address simply use ip address add dev ${device} ${address}. To show the current configuration, one can use ip address show.

# ip netns exec eve ip address add dev bridge0 10.0.0.1/8

# ip netns exec eve ip address show dev bridge0
4: bridge0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 3a:a9:54:25:ec:5a brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/8 scope global bridge0
       valid_lft forever preferred_lft forever
    inet6 fe80::38a9:54ff:fe25:ec5a/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever

alice will have the 10.0.0.2/24 IP address. It should now be able to reach eve by using ICMP ECHO packets via ping.

# ip netns exec alice ip address add dev eve 10.0.0.2/8

# ip netns exec alice ip address show dev eve
2: eve@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether be:af:4b:81:c7:de brd ff:ff:ff:ff:ff:ff link-netns eve
    inet 10.0.0.2/8 scope global eve
       valid_lft forever preferred_lft forever
    inet6 fe80::bcaf:4bff:fe81:c7de/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever

# ip netns exec alice ping -c 1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.046 ms

--- 10.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.046/0.046/0.046/0.000 ms

It works! bob will receive 10.0.0.3/8 IP address. Let do the same test with ping.

# ip netns exec bob ip address add dev eve 10.0.0.3/8

# ip netns exec bob ip address show dev eve
2: eve@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 76:4c:0d:53:e5:4a brd ff:ff:ff:ff:ff:ff link-netns eve
    inet 10.0.0.3/8 scope global eve
       valid_lft forever preferred_lft forever
    inet6 fe80::744c:dff:fe53:e54a/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever

# ip netns exec bob ping -c 1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.084 ms

--- 10.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.084/0.084/0.084/0.000 ms

Our isolated test network is ready to test our firewalls configuration. By default, nftables is disabled on each namespace, then, we need to enable it. I assume there the /etc/nftables.conf contains the standard nftables configuration. If it's not the case on your side, you can easily recreate this ruleset. Let enable nftables on alice namespace first.

# ip netns exec alice nft list ruleset

# ip netns exec alice nft -f /etc/nftables.conf

# ip netns exec alice nft list ruleset
table inet filter {
        chain input {
                type filter hook input priority filter; policy accept;
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
        }

        chain output {
                type filter hook output priority filter; policy accept;
        }
}

Then, we can enable it on bob namespace.

# ip netns exec bob nft list ruleset

# ip netns exec bob nft -f /etc/nftables.conf

# ip netns exec bob nft list ruleset
table inet filter {
        chain input {
                type filter hook input priority filter; policy accept;
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
        }

        chain output {
                type filter hook output priority filter; policy accept;
        }
}

Finally, eve can also have its own firewall enabled.

# ip netns exec eve nft list ruleset

# ip netns exec eve nft -f /etc/nftables.conf

# ip netns exec eve nft list ruleset
table inet filter {
        chain input {
                type filter hook input priority filter; policy accept;
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
        }

        chain output {
                type filter hook output priority filter; policy accept;
        }
}

The firewalls are all ready, we can now inject our configuration for each namespace and see what will happen. To simulate a server (e.g. ssh for example) or a client, we can use nc. If we want to do something more complex, like simulating a port scanner, we can use nmap. Let simulate sshd on each namespace. You can start one new terminal for each one.

# ip netns exec alice nc -vkl 22

# ip netns exec bob nc -kl 22

# ip netns exec eve nc -vkl 22

# ip netns exec alice nc -zv 10.0.0.1 22
Connection to 10.0.0.1 22 port [tcp/ssh] succeeded!

# ip netns exec bob nc -zv 10.0.0.1 22
Connection to 10.0.0.1 22 port [tcp/ssh] succeeded!

# ip netns exec eve nc -zv 10.0.0.2 22
Connection to 10.0.0.2 22 port [tcp/ssh] succeeded!

# ip netns exec eve nc -zv 10.0.0.3 22
Connection to 10.0.0.3 22 port [tcp/ssh] succeeded!

Let update the input chain policy on alice by dropping everything by default

# ip netns exec alice nft 'add chain inet filter input { policy drop; }'

# ip netns exec alice nft list chain inet filter input
table inet filter {
        chain input {
                type filter hook input priority filter; policy drop;
        }
}

# ip netns exec alice ping -c1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.

--- 10.0.0.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

# ip netns exec alice nft 'add rule inet filter input icmp type { * } accept'

# ip netns exec alice ping -c1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.067 ms

--- 10.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.067/0.067/0.067/0.000 ms

# ip netns exec eve nc -w3 -zv 10.0.0.2 22
nc: connect to 10.0.0.2 port 22 (tcp) timed out: Operation now in progress

# ip netns exec alice nft 'add rule inet filter input tcp dport ssh accept'

# ip netns exec alice nft -a list chain inet filter input
table inet filter {
        chain input { # handle 1
                type filter hook input priority filter; policy drop;
                icmp type { * } accept # handle 8
                tcp dport 22 accept # handle 9
        }
}

# ip netns exec eve nc -w3 -zv 10.0.0.2 22
Connection to 10.0.0.2 22 port [tcp/ssh] succeeded!

eve is now able to ping alice because the ICMP packets are allowed on alice side. If you want to simulate a real router with eve, you will also need to check your kernel configuration with sysctl or via sysfs. You will probably need to set the following parameters. If you want more details about them, you can check the IP Sysctl Documentation from Kernel.org.

net.ipv4.ip_forward: Forward Packets between interfaces;
net.ipv4.conf.alice.forwarding: Enable IP forwarding on this interface. This controls whether packets received on this interface can be forwarded;
net.ipv4.conf.all.forwarding: Enable IP forwarding on this interface. This controls whether packets received on this interface can be forwarded;
net.ipv4.conf.bob.forwarding: Enable IP forwarding on this interface. This controls whether packets received on this interface can be forwarded;
net.ipv4.conf.bridge0.forwarding: Enable IP forwarding on this interface. This controls whether packets received on this interface can be forwarded;
net.ipv6.conf.all.forwarding: Enable global IPv6 forwarding between all interfaces.

If you prefer using sysfs, don't forget to run the command with the correct network namespace, below an example:

# ip netns exec eve find /sys/devices/virtual/net/ -maxdepth 1
/sys/devices/virtual/net/
/sys/devices/virtual/net/bridge0
/sys/devices/virtual/net/alice
/sys/devices/virtual/net/bob
/sys/devices/virtual/net/lo

# ip netns exec alice find /sys/devices/virtual/net/ -maxdepth 1
/sys/devices/virtual/net/
/sys/devices/virtual/net/eve
/sys/devices/virtual/net/lo

# ip netns exec bob find /sys/devices/virtual/net/ -maxdepth 1
/sys/devices/virtual/net/
/sys/devices/virtual/net/eve
/sys/devices/virtual/net/lo

Before cleaning up everything, and to give you a proof this network is really isolated from the rest of the system, you can try to ping with setting any network namespace.

# ping -c 1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.

--- 10.0.0.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

# ping -c1 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.

--- 10.0.0.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

# ping -c1 10.0.0.3
PING 10.0.0.3 (10.0.0.3) 56(84) bytes of data.

--- 10.0.0.3 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

When you are done with these last tests, you can delete all namespaces one by one, nothing will be saved, and your system will be back like before. Neat, right? How to do that? You just have to invoke ip netns delete ${namespace_name}.

# ip netns delete alice

# ip netns delete bob

# ip netns delete eve

# ip netns list

Why using network namespaces to test a firewall configuration? Well, because it's cheap, it's highly flexible, and you don't risk to cut your connection with the ssh server because you are controlling the namespace directly from the host itself. It can also be used to simulate different kind of traffic behavior with the help of tc, create a VPN with wg or configure vxlan for example. Anyway, this is a cool way to learn how to configure a network and simulate it without paying a shit ton of money in real hardware.

Conclusion

This first post on nftables was inspired by one training session I gave ~5 years ago to one of my developers team when they wanted to know more about network usage. Indeed, we were working in a company selling network solution, but most of the developers were totally unaware of the complexity of the network. Furthermore, a big part of the equipments and devices were (and still are) crazy expensive, then it's kinda challenging to train people with their own individual system.

Linux (or BSD systems) are offering there a great alternative. If you don't know, a big part of the modern firewall, router, or switch appliances are based on modified FOSS. Back 20 years ago, when I began my career, we were dealing with Alcatel-Lucent Core Routers, all of them - at this time - were using a custom version of RHEL (I can't remember the name of the OS, after a quick research, it could be TiMOS). Redback, now part of Ericsson, were also selling their core network equipments on a modified version of NetBSD. So, why not using FOSS to learn how to deploy and use a network like a netadmin?

Anyway, I hope this long will motivate you to test nftables and perhaps from iptables. I also hope it will motivate you to learn more about networking protocols and how to use them in your environment. As usual, here a list of links to help you improve your nftables knowledge.

Official nftables website, where you will find all official information related to nftables, lot of examples and the documentation;
Official nftables wiki is a gold mine, where you will find a lot of snippets, examples, uses cases and best practices;
nftables documentation from the Gentoo Wiki, where you will find a lot of examples and some use cases;
nftables documentation from the Arch Linux Wiki, where you will also find a lot of examples and use cases;
nftables documentation from the Alpine Linux Wiki, another great place to learn more about nftables;
nftables documentation from the official Ubuntu documentation, a good place to learn how nftables has been integrated in Ubuntu;
nftables configuration from the official Red Hat documentation, nice to have if you are using Fedora or RHEL and you would like to use nftables by default;
Linux kernel nftables source code, where you will learn how nftables has been implemented in C and integrated in the Linux kernel;
nftables userland source code, where you will find the source code of the nft command and all other scripts or tools deployed by default on your favorite distribution.

As final note regarding nftables, like any software, it has bugs and can also have security issues, because it is running as kernel module, it can have a big impact on all your running application. In fact, nftables and netfilter have been - and are still - impacted by that. One in charge of servers with nftables (and more generally with a Linux kernel installed) should always follow the Linux Kernel CVE announce page, and be ready to plan an upgrade in case of problem.

Happy hack and have fun!

Image Cover by Guido Jansen on Unsplash