Andrew Despres

Posted on Aug 5

Switch Port Configuration and Troubleshooting

#networking #network #comptia #beginners

Preamble:
This space will be utilized to synthesize my notes and help improve my learning process while I study for the CompTIA Network+ N10-009 certification exam. Please follow along for more Network+ notes and feel free to ask any questions or, if I get something wrong, offer suggestions to correct any mistakes.

Link Aggregation and NIC Teaming

Link aggregation means combining two or more separate cabled links into a single logical channel. For example, a single network adapter and cable segment might support 1 Gbps; bonding this with another adapter and cable segment gives a link of 2 Gbps. Link aggregation can also be used in an uplink between two switches or between a switch and a router or between two routers.

NOTE: From the host end, this can also be called NIC teaming; at the switch end, it can be called port aggregation and is referred to by Cisco as an EtherChannel. The term "bonding" is also widely substituted for "aggregation.”

A server node uses NIC teaming to create a 4 Gbps channel link from four 1 Gbps ports to a workgroup switch, while the workgroup switch bonds its uplink transceivers to create a 20 Gbps channel to a router.

Link aggregation can also provide redundancy; if one link is broken, the connection is still maintained by the other. It is also often cost-effective; a four-port Gigabit Ethernet card might not match the bandwidth of a 10 GbE port but will cost less.

This configuration is fully redundant only if the business function does not depend on the full speed of the bonded link. If one port fails, and the link drops to 1 Gbps, but that bandwidth is insufficient, there is not full redundancy. Full redundancy is achieved when the total bandwidth equals the sum of the bandwidths of all individual links. This ensures that even if one link fails,the remaining links can still provide the required bandwidth to maintain network operations without degradation in performance.

Link aggregation is typically implemented using the IEEE 802.3ad/802.1ax standard. 802.3ad bonded interfaces are described as a link aggregation group (LAG). 802.3ad also defines the link aggregation control protocal (LACP), which can be used to detect configuration errors and recover from the failure of one of the physical links.

On a Cisco switch, the following commands configure LACP to group the first four Gigabit interfaces into a single channel with the ID 2:
interface range GigabitEthernet0/1-4
channel-group 2 mode active
The following commands configure the 10G interfaces into a channel group with ID 1. In this example, the 10G interfaces are on a different module than the Gigabit interfaces:

interface range 10GigabitEthernet1/1-2
channel-group 1 mode passive

On the router/layer 3 switch, the channel should be set to active:

interface range 10GigabitEthernet0/1-2
channel-group 1 mode active

Optionally, both sides can be configured as active. However, if both sides are set to passive, no bonded channel will be created. The channel ID on each side does not have to match, but it is easier to manage the connection if it is the same on both switches.

Maximum Transmission Unit

A standard Ethernet frame has a maximum length of 1,518 bytes, excluding the preamble. Each frame has an 18-byte header:

6-byte destination and source MAC address fields.
4-byte error checking field.
2-byte EtherType field.

The maximum size of the data payload is 1,500 bytes. This upper limit of the payload is also referred to as the maximum transmission unit (MTU).

In circumstances where data payloads can be very high, a 1500-byte MTU means using a lot of frames. A jumbo frame is one that supports a data payload of up to 9,216 bytes. This reduces the number of frames that need to be transmitted, which can reduce the amount of processing that switches and routers need to do. It also reduces the bandwidth requirement somewhat, as fewer frame headers are being transmitted. The benefits of jumbo frames are somewhat disputed, however.

When implementing jumbo frames, it is critical that all hosts and appliances (switches and routers) along the communications path be able and configured to support them. It is also vital to ensure that each device supports the same MTU. Also, it can be complex to calculate the MTU if any additional headers are used (for IPSec, for instance).

Jumbo frame support can be configured using the command mtu 9018, where 9,018 is the required size. On some appliances, this must be configured for the whole system; on others, it can be configured on a per-interface basis.

Spanning Tree Protocol

Large networks make use of multiple switches configured in a mesh or partial mesh topology to implement redundant links. Multiple paths are part of good network design as they increase resilience; if one link fails, then the network can remain operational by forwarding frames over a different path. However, Ethernet has no concept of a "time to live" value for frames, so layer 2 broadcast and flooded traffic could continue to loop through a network with multiple paths indefinitely.

The Spanning Tree Protocol (STP) is a means for the bridges or switches to organize themselves into a hierarchy and block loops. The switch at the top of the hierarchy is the root. The switch with the lowest ID, comprising a priority value and the MAC address, will be selected as the root.

Each switch then determines the shortest path to the root bridge by exchanging information with other switches. This STP information is packaged as bridge protocol data unit (BPDU) multicast frames. Different port roles are assigned to the interfaces participating in the spanning tree. A port that forwards "up" to the root, possibly via intermediate switches, is identified as a root port. Ports that can forward traffic "down" through the network with the least cost are identified as designated ports. A port that would create a loop is identified as a blocking or non-designated port. Subsequently, bridges exchange Topology Change Notifications if devices are added or removed, enabling them to change the status of forwarding/blocked ports appropriately.

This image shows the minimum configuration necessary to prevent loops in a network with three bridges or switches. The root bridge has two designated ports (DP) connected to Bridge A and Bridge B. Bridges A and B both have root ports (RP) connected back to the interfaces on the root bridge. Bridges A and B also have a connection directly to one another. On Bridge A, this interface is active and traffic for Bridge B can be forwarded directly over it. On Bridge B, the interface is blocked (BP) to prevent a loop, and traffic for Bridge A must be forwarded via the root bridge.

Spanning Tree Protocol Configuration

If a switch supports spanning tree, it should operate by default without configuration. An administrator can (and should) set the priority value to predetermine root bridge selection. The root will usually be part of a high-bandwidth backbone or core switch group; performance will suffer if a switch on a low-bandwidth segment becomes root. You can use the show spanning-tree command to report the current configuration. Using spanning-tree id root primary and spanning-tree id root secondary assign main and backup priority values to the chosen switches.

Viewing spanning tree configuration on a Cisco switch. This switch has been designated the root.

The following table shows the different port states associated with spanning tree operation.

When all ports on all switches are in forwarding or blocking states, the network is converged. When the network is not converged, no communications can take place. Under the original 802.1D standard, this made the network unavailable for extended periods—tens of seconds—during configuration changes. STP is now more likely to be implemented as 802.1D-2004/802.1w or Rapid STP (RSTP). The rapid version creates outages of a few seconds or less. In RSTP, the blocking, listening, and disabled states are aggregated into a discarding state.

Power Over Ethernet

Power over Ethernet (PoE) is a means of supplying electrical power from a switch port over ordinary data cabling to a connected powered device (PD), such as a VoIP handset, IP camera, or wireless access point. Powering these devices through a switch is more efficient than using a wall-socket AC adapter for each appliance. It also allows network management software to control the devices and apply schemes, such as making unused devices go into sleep states and power capping.

PoE is defined in two IEEE standards (now both rolled into 802.3-2018):

3af—Power is supplied as 350mA@48V and limited to 15.4 W output. Given that some of this dissipates over the length of cable, it supports PDs that require up to about 13 W.
3at (PoE+)—Supplies at 30 W, with a maximum current of 600 mA. This can support PD requirements of up to about 25 W.
3bt (PoE++)—Supplies at 60 W (Type 3) or 90 W (Type 4), with up to 51 W and 71 W usable power, respectively.

PoE switches are referred to as endspan (or endpoint) power sourcing equipment (PSE). On a Cisco switch, the command power inline auto max 15000 enables a port for PoE and sets a maximum output of 15,000 mW (or 15 W).

When a device is connected to a port on a PoE switch, the switch goes through a detection phase to determine whether the device is PoE enabled. If not, it does not supply power over the port and, therefore, does not damage non-PoE devices. If so, it determines the device's power consumption and sets the supply voltage level appropriately.

Note: If a switch does not support PoE, a device called a power injector (or midspan) can be used.

Switch Troubleshooting

Ethernet switches and network adapters introduce the potential for issues at the Data Link layer and can reveal subtle cabling problems and interference at the Physical layer. Diagnosing and resolving problems gets more complex as you work up through the network stack. You need to assimilate your knowledge of both cabling types and Ethernet framing with awareness of status indicators and commands for network equipment to resolve these issues.

Hardware Failure Issues

When you are using the CompTIA Network+ troubleshooting model, it is wise to rule out physical hardware failure and Data Link layer issues before diagnosing a Network layer or application issue.

Power Issues

Like any computer system, networks require stable power to operate properly. Power anomalies, such as surges and spikes, can damage devices, under-voltage events (very brief power loss) can cause systems to lock up or reboot, while power failures will down everything, including the lights. Enterprise sites have systems to protect against these issues. Uninterruptible power supplies (UPSs) can keep servers, switches, and routers running for a few minutes. This provides time to either switch in a secondary power source (a generator) or shut down the system gracefully, hopefully avoiding data loss. Most power problems will have to be escalated to an electrician or the power company, depending on where the fault lies.

Hardware Failure Issues
If power is not the issue, consider other components that might have experienced hardware failure, including host network adapters, switch/router/modem appliances, and the cabling between them. Complete hardware failure is relatively uncommon, so if you can rule out power and cabling problems, then for a network adapter, verify that the driver is working correctly. The easiest thing to do is to replace the driver (in Windows, use Device Manager to do this). For a network appliance, use status LEDs to confirm operation and check that things such as plug-in cards and modules are seated correctly. You should also consider overheating as a potential cause of hardware issues. Make sure there is good airflow around the intake and outlet vents. Check that fans and internal components are not clogged with dust and that systems are not exposed to direct sunlight.

At the Data Link layer, most wired hosts connect to the network via a switch. If you suspect a device such as a switch, analyze the topology of your network. You should be able to view those users who are suffering from the problem, identify which part of the network is affected, and identify the problem bridging or switching device.

When you have narrowed the problem to a device, you must determine what the nature of the problem is. It is always worth resetting the switch to see if that resolves the problem. Often, restarting network devices can clear transitory errors.

Note: Do be aware that restarting a switch, router, or server can be very disruptive to the rest of the network. Identify how to mitigate potential impacts and seek authorization for your plan before proceeding. Also, remember that a restart will apply the startup configuration. Any unsaved changes in the running configuration will be discarded.

Port Status Indicators
When you are troubleshooting a suspected layer 1 or layer 2 problem, check the LED status indicators on the NIC at one end and the switch/router port at the other. You will need the vendor documentation to interpret the LEDs. There may be two LEDs for status and for activity, or the LED might use a mode button to show different information. On a switch port, the following LED link states are typical:

Solid green—The link is connected, but there is no traffic.
Flickering green—The link is operating normally (with traffic). The blink rate indicates the link speed.
No light—The link is not working, or the port is shut down.
Blinking amber—A fault has been detected (duplex mismatch, excessive collisions, or redundancy check errors, for instance).
Solid amber—The port is blocked by the spanning tree algorithm, which works to prevent loops within a switched network.

Switch Show Commands

If you can isolate the issue to a single host and then rule out cable, transceiver, and bad port issues at the Physical layer, bear in mind that the Data Link configuration might not be working.

In privileged mode, a variety of show commands can be used to display the current configuration of a switch. There are usually many show commands, but two of particular importance are as follows:

show config displays the switch's configuration. The startup configuration ( show startup-config), which is configured on next boot, could be different from the running configuration (show running-config). If there has been some undocumented change to the switch, using these commands and comparing the output may reveal the source of a problem.
show interface lists the state of all interfaces or the specified interface. An interface has a line status (up if a host is connected via a good cable) and a protocol status (up if an Ethernet link is established). show interface will also report configuration details and traffic statistics if the link is up/up.

If an interface is not up/up, you need to diagnose the cause from the state:

Down/down—There is no link. This is typically because no host is attached, but it could also be caused by a speed mismatch.
Administratively down/down—The interface has been disabled using the shutdowncommand. Use no shutdown to bring it up.
Down/error disabled—The interface is disabled due to some error state. This is typically either a spanning tree or port security violation issue.
Up/down (suspended)—The port is part of a link aggregation group, and the channel has not been negotiated. Use show etherchannel to investigate the cause. Both sides should use the same speed, duplex, and link control type, and use the same number of ports. When using LACP, at least one side must be active.

If the line and protocol status is down/down, check whether autonegotiation of speed and duplex settings is configured and whether it is failing. In most cases, this will be because either the adapter or the switch port has been manually configured. If a host is set to a fixed configuration and the switch is set to auto-negotiate, the switch will default to 10 Mbps/half-duplex because the host will not negotiate with it! So, if the host is manually configured to 100 Mbps/full-duplex, the link will fail. Setting both to auto-negotiate will generally solve the problem. A speed mismatch will cause the link to fail, while a duplex mismatch will slow the link down (it will cause high packet loss and late collisions).

Interface Error Counters
Interface status commands will also report whether any collisions are being generated. Collisions might occur if the duplex setting on the switch port and host is mismatched or if a legacy hub device or host NIC is connected to a switch. Other types of interface errors might indicate a misconfiguration problem at the Data Link layer or interference at the Physical layer.

Increasing Interface Counters
An interface might change rapidly or "flap" between up and down states, making the problem harder to observe and diagnose. Interface counters record the number of events over time. This allows you to diagnose issues with an interface that is up but that is unreliable or performing poorly.

Link state—Measures whether an interface is working (up) or not (down). You would configure an alert if an interface goes down so that it can be investigated immediately. You may also want to track the uptime or downtime percentage so that you can assess a link's reliability over time.
Resets—The number of times an interface has restarted over the counter period. Interfaces may be reset manually or could restart automatically if traffic volume is very high or a large number of errors are experienced. Anything but occasional resets should be closely monitored and investigated.
Discards/drops—An interface may discard incoming and/or outgoing frames for several reasons, including checksum errors, mismatched MTUs, packets that are too small (runts) or too large (giants), high load, or permissions—the sender is not on the interface's access control list (ACL) or there is some sort of virtual LAN (VLAN) configuration problem, for instance. Each interface is likely to class the type of discard or drop separately to assist with troubleshooting the precise cause.

Cyclic Redundancy Check Errors
A cyclic redundancy check (CRC) is calculated by an interface when it sends a frame. A CRC value is calculated from the frame contents to derive a 32-bit value. This is added to the header as the frame check sequence. The receiving interface uses the same calculation. If it derives a different value, the frame is rejected. The number of CRC errors can be monitored per interface.

CRC errors are usually caused by interference. This interference might be due to poor quality cable or termination, attenuation, mismatches between optical transceivers or cable types, or some external factor.

Runt Frame Errors
A runt is a frame that is smaller than the minimum size (64 bytes for Ethernet). A runt frame is usually caused by a collision. In a switched environment, collisions should only be experienced on an interface connected to a legacy hub device and there is a duplex mismatch in the interface configuration (or possibly on a misconfigured link to a virtualization platform). If runts are generated in other conditions, suspect a driver issue on the transmitting host.

Giant Frame Errors
A giant is a frame that is larger than the maximum permissible size (1518 bytes). There are two likely causes of giant frames:

Jumbo frames—A host might be configured to use jumbo frames, but the switch interface is not configured to receive them. This type of issue often occurs when configuring storage area networks (SANs) or links between SANs and data networks. The MTU value in the show interface output will indicate whether jumbo frames are accepted on a particular port.

Ethernet trunks—A trunk link carries traffic between switches or between a switch and a router. Trunk links often use 802.1Q framing to carry virtual LAN (VLAN) information. If one switch interface is configured for 802.1Q framing, but the other is not, the frames will appear too large to the receiver, as 802.1Q adds 4 bytes to the header, making the maximum frame size 1522 bytes.

Note: An Ethernet frame that is slightly larger (up to 1600 bytes) is often referred to as a baby giant.

MAC Address Table
A switch learns MAC addresses by reading the source address when a frame is received on a port. The address mapping for that port is cached in a MAC address table. The address table is implemented as content addressable memory (CAM), a special type of memory optimized for searching, rather than random access. Consequently, the MAC address table is often also referred to as the CAM table. Entries remain in the MAC address table for a period of time before being flushed. This ensures problems are not encountered when network cards (MAC addresses) are changed.

If a MAC address cannot be found in the MAC address table, then the switch acts like a hub and transmits the frame out of all the ports, except for the source port. This is referred to as flooding.

Knowing the MAC addresses associated with a particular interface is often important for troubleshooting. You can query the MAC address table of a switch to find the MAC address or addresses associated with a particular port using a command such as:show mac address-table

Network Loop and Broadcast Storm Issues

A network loop is where flooded frames circulate the network perpetually. Because switches flood broadcasts out all ports, these frames will go down one link to the next switch, which will send the broadcast back up the redundant link, and back to the originating switch. As this repeats, the switches start to see source MAC addresses associated with multiple ports and so clear the MAC address table mapping, which causes them to start flooding unicast traffic too.

Without intervention, this loop will continue indefinitely, causing a broadcast storm. A broadcast storm will cause network utilization to go to near maximum capacity and the CPU utilization of the switches to jump to 80% or more. This makes the switched segment effectively unusable until the broadcast storm stops. A broadcast storm may quickly consume all link bandwidth and crash network appliances.

If there is a loop, spanning tree should shut down the port. This will isolate the problem to a segment of the network. Inspect physical ports that correspond to the disabled interfaces for looped connections. At the patch panel, this could mean a patch cable that connects two ports on the same switch. On the office floor, it could mean a patch cable between two wall ports. Check the switch for log events related to MAC address flapping.

If a broadcast storm occurs on a network where spanning tree is already enabled, you should investigate the following potential causes:

Verify compatible versions of Spanning Tree Protocol or Rapid Spanning Tree Protocol are enabled on all switches.
Verify the physical configuration of segments that use legacy equipment, such as Ethernet hubs.
Investigate networking devices in the user environment and verify that they are not connected as part of a loop. Typical sources of problems include unmanaged desktop switches and VoIP handsets.

Power Over Ethernet Issues

Power over Ethernet (PoE) uses data cabling to run lightweight Powered Device (PD) appliances, such as Voice over IP (VoIP) handsets, IP cameras, and wireless access points.

Cabling Issues
Cabling for PoE+ must be Cat 5e or better, but standards typically recommend the use of Cat 6A. Drawing power down the cable generates more heat. If this heat is not dissipated, it can affect data rates. Thermal performance is improved by using pure copper cabling with thicker conductors. A thin conductor will generate more heat through resistance. Shielded cabling is capable of dispersing heat more efficiently.

Note: Conductor thickness is measured as American Wire Gauge (AWG). Remember that smaller numbers mean thicker wires, so 23 AWG cable will have superior PoE performance to 24 AWG cable.

Incorrect Standard
A PD should be able to negotiate the correct mode and power output with the switch. However, this process can fail with some devices that only support the first PoE standard, especially if the switch interface is enabled for high power PoE++ Type 4 PDs. The switch and PD must negotiate a compatible mode:

Alternative A delivers power with data over pairs 1/2 and 3/6. This is compatible with 10/100 and 10/100/1000 links.
Alternative B delivers power over the 10/100 spare pairs (4/5 and 7/8). This is not compatible with Gigabit Ethernet.
Four-pair delivers power over all pairs. This is required by PoE++ Type 3 and Type 4 PDs. This is compatible with 10/100/1000 and also supports 10G.

Power Budget Exceeded
Each switch has a total power budget for all ports. This will typically be around 300–400 watts (300,000–400,000 milliwatts). If the power requirements of all connected devices exceed the budget, some will not be activated, or there might be intermittent resets. You can use the show power inline command to report the power budget and power consumption. If the power budget is exceeded, you will typically need to provision another switch, though it is also possible to use power injector devices to remove the load of selected PDs from the switch.

Note: Actual power consumption can fluctuate quite widely. For example, a camera with pan-tilt-zoom controls will use more power when its motor is active.

We've touched on a lot in these notes. From using LAG/LACP, MTUs, STP, PoE and how to troubleshoot issues related to layer 2 switches. Grasping these concepts and understanding why and how issues occur on layer 2 switches is a key skill to know for your CompTIA Network+ exam. I hope you feel more confident with these concepts and are ready to take the next step towards taking that exam.

DEV Community