BGP, by design, is a lot more capable than most typical routing protocols. Here are a few ways MP-BGP/BGPv4 is fundamentally unique:
- Extensible: MP-BGP can store and run multiple address families (AFIs) and subsequent address families (SAFIs). When doing virtual networking, it's likely that we'll pick up a lot more AFIs for potential use cases in the future. Here's some that I'm aware of / might use in the future:
- AFI: IPv4
- SAFI: Unicast
- SAFI: Multicast
- SAFI: Flow Spec (Firewalling)
- AFI: IPv6
- SAFI: Unicast
- SAFI: Multicast
- SAFI: Flow Spec (Firewalling)
- AFI: L2VPN
- SAFI: EVPN
- AFI: Link State (IGP Domain Traffic Engineering)
- Universal: Nearly all networking equipment can support BGP
- Stable: No flooding, no large-scale recalculations with ephemeral routing entries. This is especially important if you're scaling network segments on a regular basis
- It's designed for interlinking Autonomous Systems.
- Most Routing protocols do not support import/export filtering methods - it breaks loop prevention to use them. BGP is designed to use them right out of the gate.
- I'll repeat this one again - with OSPF/IS-IS/EIGRP, you have to either run 2 IGP instances and configure redistribution between them, or trust any prefix a virtual instance (NSX-V, NSX-T, Vyatta, etc) sends to your physical network!*
- Portable: Since BGP runs on TCP (Unicast IPv4/IPv6) it'll run over just about anything, and get rid of static routes in a wide variety of use cases. This includes:
- IPSec Tunnels. You can keep the packet overhead GRE would normally consume. This is also the de-facto approach for interconnecting Public Clouds:
- Amazon AWS: Direct BGP-over-IPSec is supported on their VPN appliance and on Transit Gateway (TGW) https://docs.aws.amazon.com/vpn/latest/s2svpn/your-cgw.html#CGRequirements
- VMware VMC on AWS: I may be biased, but this follows a very similar method to TGW but is considerably easier to set up. Dead simple, up and running in minutes. https://docs.vmware.com/en/VMware-Cloud-on-AWS/services/com.vmware.vmc-aws.networking-security/GUID-5AF45CE6-FA53-45C0-83E5-25F8E3A055E9.html
- Azure VNet: Indirect BGP-over-IPSec, with Multi-hop set to 2. The trick here is to create a loopback on each side and set static routes, then to fire up peer relationship at that. This is pretty nerdy and overly elegant in my opinion, but it runs fairly well. https://docs.microsoft.com/en-us/azure/vpn-gateway/tutorial-site-to-site-portal
- Site-to-Site VPNs between security appliances: 10/10 would recommend. Nearly every appliance vendor can do it instead of policy VPN nowadays - I do have a snippet https://docs.vmware.com/en/VMware-Cloud-on-AWS/services/com.vmware.vmc-aws.networking-security/GUID-5AF45CE6-FA53-45C0-83E5-25F8E3A055E9.html
- MPLS: Pretty much any major enterprise that uses MPLS already does this. Enabling Layer 2 carrier services introduces a great deal of complexity and opportunities for failure, but even if you're consuming Layer 2 Services...
- BUM: Broadcast, Unknown-Unicast, Multicast traffic types all introduce inefficient flooding and disproportionately tax those carrier networks you depend on for your business. Most carrier ethernet services do not have an explicit method for handling this type of traffic! (EVPN Excluded) https://en.wikipedia.org/wiki/Broadcast,_unknown-unicast_and_multicast_traffic. Since BGP is completely unicast, it's not as much of an issue here.
- SD-WAN: As probably expected, SD-WAN is highly dynamic, and typically runs better when you can make frequent, incremental changes to pathing (this may be less true in small-scale implementations where there's an SD-WAN appliance as a SPOF)
- Velocloud: OSPF/BGP https://docs.vmware.com/en/VMware-SD-WAN-by-VeloCloud/3.3/velocloud-admin-guide-33/GUID-7A080D7A-C527-433C-96CA-534D1418A3E0.html
- Cisco: OSPF/BGP https://www.cisco.com/c/en/us/td/docs/routers/sdwan/configuration/routing/vEdge-20-x/routing-book/m-unicast-routing.html
- Aryaka: This one's interesting because they support only eBGP or RIPv2 https://www.aryaka.com/docs/smartmanage-ANAP-product-brief.pdf
- SASE: With all SASE offerings, static routing is officially off the table, because you can't automatically move between circuits. Most offerings will leverage BGP-over-IPSec (above included) and the ability to do the same over diverse connectivity. If you're looking at SASE, you want to seriously consider more BGP than people who don't.
I'll keep this short to avoid beating a dead horse too much.
- There's no such thing as flooding. Network changes aren't that big of a deal anymore due to this - but in many cases, that's the entire point of dynamic routing.
- Security appliances tend to either not support them or weren't designed with OSPF support in mind. Palo Alto's skunkworks division is building a separate route engine that will be BGP-only https://docs.paloaltonetworks.com/pan-os/10-0/pan-os-new-features/networking-features/advanced-route-engine.html
- There are ways to limit route flapping in BGP
The following advantages are specific to NSX-T/V or virtualized routers:
- Link-state adjacencies don't change if a virtual system is down. If a hypervisor hosting a VM stays up and a VM is down, link-state doesn't change, so you're going to wait for the entire dead interval as an outage.
- I'll repeat this again, with virtualized network functions Link-state adjacencies failover at their maximum dead interval, nullifying the primary advantage to these routing protocols!
- Interlinks from a physical network must be specifically engineered to prevent non-determinism. If you're multi-homing a virtual router via the same Layer 2 domain, LSAs will not only flow between the physical network endpoints and your desired Virtual Network Function (VNF) but *between physical network devices.
- This can be designed around, but you lose the ability to scale multi-pathed machines easily and automatically.
- You can get the dynamic adjacency capability with the BGP Neighbor Range feature on nearly all datacenter network equipment today.
This is where most people get hung up - if a network doesn't currently use BGP, it'll potentially introduce problems by adding a new thing for engineers to maintain, major forklifts to pick up hardware support, and so on.
These are all very valid concerns. I'd recommend that instead of shutting down the argument, try some of these solutions on for size instead:
Most of the complicated stuff is iBGP loop prevention or pro-grade tuning. Cisco education mechanisms have failed the community somewhat here and with IS-IS (you can only test on SO MUCH in the CCNA/CCNP!). These advanced capabilities are rarely necessary for most typical enterprise deployments. The typical enterprise BGP deployment responsibilities will consist of:
- eBGP to a vendor or provider
- Import/Export Filters (
- TTL-Security / Authentication
- Timers: Default dead interval is 180, which is pretty high. My rule of thumb is 30/90 for WAN, 4/12 for the datacenter. https://docs.vmware.com/en/VMware-Validated-Design/5.0/com.vmware.vvd.sddc-design.doc/GUID-46A773E1-38F7-4F14-B158-489BEB90F44E.html
This can be either really difficult and complex or really simple depending on needs. If it's not an enterprise-wide deployment of BGP (usually it won't be) just plan it out on paper before implementing - there will be learning experiences, accept that they'll happen, and maximize end results. You can't get this education without getting your hands dirty, so make sure it won't hurt the business / use a lab if you can
If you can't, contain the deployment: Set up a prefix for whatever workload is being used, and redistribute that instead of BGP until you've hit maturity. In many cases, it'll just stay there, and that's OK.
This is probably where I'd start - it's got the highest value to effort ratio. Given your vendor choices, it's probably not that complicated and doesn't necessarily need to be redistributed to campus or other internet edge modules. For most enterprise deployments, this is totally cool. If you're me, you'll start getting annoyed by NLRIs not propagating across sites, which brings you to...
This is actually how most Service Provider networks work! BGP isn't designed to synchronize - it doesn't modify any next-hop addresses for advertised prefixes and needs another routing protocol to do that. There are some applications where you can go all-BGP https://tools.ietf.org/html/rfc7938 but they're usually reserved for hyper-scaler applications or shops that already are very familiar with BGP. Physical network routes can continue to propagate in this scenario just like they always did, and you're using BGP for the virtual ones. The only redistribution required would be a zeroes/default route from your point of origin to keep things nice and intuitive.
This is pretty complicated unless you contain the use cases. In the two scenarios above, you're mostly off the hook on this one - at most, you'll be installing a default route.
- VMware NSX-T
- VMware Cloud on AWS
- Avi Networks Load Balancer
- Amazon AWS
- Project Calico (Kubernetes!)
- Vyatta Vyos
- F5 LTM
- Microsoft Azure
- All SD-WAN
- All firewalls
Needless to say, if you're a shop that consumes more than vCenter and ESXi, you probably should be dipping your toes in the water. How far is up to you, but it cannot be avoided.
- If it's providing value, you're doing well.
- If you don't know something, that's OK. We're in an ever-changing industry.