VLAN Segmentation: Balancing Security and Performance

#network #security #vlan #segmentation

When I see a single "flat" network with around 400-500 IP-addressable devices in a production facility or a medium-sized office, I know there's not just a security vulnerability, but also a significant performance drain. An environment where everyone can talk directly to everyone else, printers are constantly broadcasting, and an accounting computer can directly access a PLC on the production line is one where calling yourself a "system administrator" is a bit optimistic.

While working on a production ERP system in recent years, I personally experienced how this network chaos led to "meaningless" timeout errors on the software side. Network segmentation isn't just about "writing firewall rules"; it also involves the L2 (Layer 2) leg, broadcast domain management, and hardware limitations. In this post, I'll share how I architected VLAN segmentation and maintained the performance-security balance, with technical details from my 20 years of field experience.

The Flat Network Trap and the Invisible Cost of Broadcast Traffic

In most places, network setup begins with a "plug and play" mentality. A single /23 or /22 block (510 or 1022 hosts) is defined, handed to the DHCP server, and everything seems to work. However, broadcast traffic is like "noise" knocking on every device's door on the network. When a device sends an ARP (Address Resolution Protocol) request, every device in that segment's CPU wakes up, even if just for milliseconds, to process this packet.

One of the biggest mistakes I've seen in the field is broadcast traffic like mDNS, LLMNR, and ARP reaching 15-20% of the total traffic on a flat network of 1000 devices. This situation creates jitter (latency fluctuation), especially on latency-sensitive production devices or sensor gateways providing real-time data streams. I generally prefer to limit a broadcast domain to a maximum of 250 hosts (a /24 block). If the number of devices increases, this is not a logical separation, but a technical necessity.

# To monitor broadcast packets per second on a Linux machine
tcpdump -n "broadcast or multicast" -i eth0 | pv -l > /dev/null

You can monitor the network noise with the simple command above. If more than 100 lines flow per second, it's time for you to implement segmentation on that segment. When I implement segmentation, I'm not just ensuring security; I'm also optimizing the load on switch processors.

L2 vs L3 Segmentation: Where Does Performance Get Lost?

When segmenting, the most critical decision is where the traffic will terminate. Terminating all VLANs on a Firewall (L3) maximizes security but sacrifices performance. If you have a backup server transferring 10Gbps of data internally and all this traffic passes through the Firewall (Router-on-a-stick), you'll "choke" the Firewall's CPU.

In my own projects and in places I've consulted, I generally leave the "Inter-VLAN Routing" task to L3-capable core switches. While I route traffic between security-critical areas (e.g., between the accounting segment and the guest network) through the Firewall, I manage traffic between production line operator screens and the application server using ACLs (Access Control Lists) on the L3 switch.

Method	Security Level	Performance (Throughput)	Complexity
Router-on-a-stick	Very High	Low (Interface limited)	Medium
L3 Switch (SVI)	Medium	Very High (Wire-speed)	Low
Firewall (Sub-interfaces)	Highest	Medium (Low if DPI is loaded)	High

The key point to watch out for here is MTU (Maximum Transmission Unit) values. During VLAN-to-VLAN transitions, if Jumbo Frames (9000 bytes) are enabled on one side and disabled on the other, packets start fragmenting, leading to MSS (Maximum Segment Size) mismatch issues. Once, while investigating why reports were taking 30 seconds on a production ERP, I discovered the problem wasn't with the SQL query, but with a switch configuration attempting to pass 9000 MTU packets through a gateway with 1500 MTU.

Switch Hardening: A VLAN is More Than Just a Tag

Creating a VLAN and leaving it at that is like locking the door and leaving the key in it. When you set up the 802.1Q tagging system, you must also secure the ports on the switch. Attacks like "VLAN Hopping" are still real and can bypass your entire segmentation with a simple configuration error.

First, I disable all unused ports and assign them to a group I call "Blackhole VLAN," which has no outbound connectivity (e.g., VLAN 999). Furthermore, I never leave the native vlan value as 1. I also never mix switch management traffic (Management VLAN) with user data.

⚠️ Native VLAN Danger

Using the default VLAN 1 allows attackers to bypass your segmentation by "double tagging" and injecting packets into other VLANs. Always set the native VLAN on trunk ports to an unused ID.

Below is an example of the basic hardening steps I apply on a corporate switch:

! Switch port security and DHCP Snooping
interface GigabitEthernet0/1
 switchport mode access
 switchport access vlan 10
 switchport port-security
 switchport port-security maximum 2
 switchport port-security violation shutdown
 ip dhcp snooping limit rate 10
!
ip dhcp snooping vlan 10
ip arp inspection vlan 10

The ip arp inspection and ip dhcp snooping commands here are the lifeblood of the network. Segmentation without them provides only theoretical order. If someone accidentally plugs a modem into the network in the real world, DHCP snooping prevents your entire network from crashing.

Management VLAN and Out-of-Band (OOB) Access

My biggest fear as a system administrator is losing access to a switch or server after making a rule change. That's why I always keep management traffic on a separate VLAN. I grant access to this VLAN only from specific IPs (usually my management workstation or a jumpbox).

In a client project, all switch management IPs were on the main user network. When a broadcast storm occurred, it became impossible to connect to the switches via SSH to resolve the issue because the switch CPU was so busy it couldn't respond to SSH packets. Since then, if physically possible, I set up an "Out-of-Band" management network with separate cabling, or at least a Management VLAN with the highest priority (marked with QoS/DSCP).

As I mentioned in my [related: network security] post, avoiding the "any" statement when writing permit rules on the management network can save your life. A dedicated management block with static IPs, known only to you, prevents malware on the network from attempting brute-force attacks on your switches from the outset.

Zero-Trust Approach: Where VLANs Fall Short

Traditional VLAN segmentation is based on the distinction between "trusted zones" and "untrusted zones." However, in the modern world, especially with remote access (VPN/ZTNA) becoming so prevalent, a device simply being on the "production VLAN" doesn't mean it's trusted. This is where principles of ZTNA (Zero Trust Network Access) need to be brought down to the network layer.

My implemented method is to use VLANs solely as "traffic regulators" and perform the actual authorization on a device-by-device basis. If an operator panel (HMI) needs to join the network, I authenticate it not by MAC address via the switch port, but preferably via certificate-based 802.1X authentication. If it lacks a certificate, I automatically place it into a limited-privilege "Quarantine VLAN."

This approach completely eliminates scenarios like "someone came, unplugged the cable, and plugged in their own laptop." Yes, the configuration overhead increases slightly, but my 20 years of experience have shown me this: the 5 hours of configuration time you don't spend upfront will return to you as 50 hours of sleepless nights and lost prestige during an incident.

Monitoring and Troubleshooting: VLAN Flap and Loop Issues

We've segmented, written the rules, and everything is great. But when a problem arises, how do we know where it broke? The most insidious problem with network segmentation is "VLAN Flapping" and Spanning Tree (STP) loops. Especially in topologies where multiple switches are interconnected, a loop can paralyze all your segments.

My monitoring dashboards (data collected via SNMP with Prometheus + Grafana) always include an "STP Topology Change" alarm. If I receive a topology change message out of the blue, it means someone is playing with cables somewhere, or a switch is failing.

# To check STP status on Cisco devices
show spanning-tree summary
show spanning-tree vlan 10

Additionally, I have a script that regularly sends ICMP requests to the gateway IP of each VLAN (SVI) and measures the latency. If the latency on a VLAN jumps from 1ms to 50ms, I can immediately see that a device on that segment is "acting up" (nic babbling) or a broadcast loop has started. It might be surprising to realize that half of the performance issues in databases, which I discussed in my [related: PostgreSQL tuning] post, are actually caused by such network fluctuations.

Conclusion: Striking the Right Balance

VLAN segmentation is not a destination, but an ongoing process. As a company grows and new devices are added, this structure needs to be updated. My golden rule is this: Seek security at the L3/Firewall layer, performance at the L2/Switch layer, and order through disciplined documentation.

Don't just see segmentation as "dividing IPs." It's the reflection of organizational flow onto the network. Production, accounting, camera systems, and IoT devices should play in their own sandboxes but pass through a controlled gate when they need each other.

In my next post, I'll detail how I'm hardening the security of Linux services running on this network structure (using auditd and cgroup limits). We've divided the network; now it's time to protect the hosts within that network.