The final step about “VXLAN EVPN DCI” journey regards host reachability and fault scenario. In this post i’ll try to verify the reachability between 2 hosts in different sites and their behavior when the network faces some faults.
Host reachability
When the overlay configurations is up and running we can try to connect a host or a device (in my lab i will use SiteA-1 and SiteB-1 switches on vlan 10) to our BGW and try to reach the remote side using our VXLAN EVPN DCI.
e.g. SiteA-1
interface Vlan10
no shutdown
ip address 10.10.0.5/24
e.g. SiteB-1
interface Vlan10
no shutdown
ip address 10.10.0.7/24
All the necessary informations are populated in the BGW:
SiteABGW1# sh l2route mac all
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen, (Orp): Orphan
Topology Mac Address Prod Flags Seq No Next-Hops
----------- -------------- ------ ------------- ---------- ---------------------
------------------
10 5005.0000.1b08 Local L, 0 Po3
10 5007.0000.1b08 BGP Rcv 0 172.16.1.200 (Label:
10010)
SiteA-1# sh mac address-table vlan 10
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
+ 10 5005.0000.1b08 dynamic 0 F F Po3
C 10 5007.0000.1b08 dynamic 0 F F nve1(172.16.1.200)
SiteABGW1# sh bgp l2vpn evpn vni-id 10010
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 23, Local Router ID is 172.16.1.1
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - b
est2
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 172.16.1.1:32777 (L2VNI 10010)
*>l[2]:[0]:[0]:[48]:[5005.0000.1b08]:[0]:[0.0.0.0]/216
172.16.1.100 100 32768 i
*>i[2]:[0]:[0]:[48]:[5007.0000.1b08]:[0]:[0.0.0.0]/216
172.16.1.200 100 0 i
* i 172.16.1.200 100 0 i
*>l[3]:[0]:[32]:[172.16.1.100]/88
172.16.1.100 100 32768 i
*>i[3]:[0]:[32]:[172.16.1.200]/88
172.16.1.200 100 0 i
* i 172.16.1.200 100 0 i
and reachability happens:
SiteA-1# ping 10.10.0.7
PING 10.10.0.7 (10.10.0.7): 56 data bytes
36 bytes from 10.10.0.5: Destination Host Unreachable
Request 0 timed out
64 bytes from 10.10.0.7: icmp_seq=1 ttl=254 time=13.195 ms
64 bytes from 10.10.0.7: icmp_seq=2 ttl=254 time=7.646 ms
64 bytes from 10.10.0.7: icmp_seq=3 ttl=254 time=7.078 ms
64 bytes from 10.10.0.7: icmp_seq=4 ttl=254 time=7.247 ms
SiteA-1# sh mac address-table vlan 10
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
* 10 5007.0000.1b08 dynamic 0 F F Po1
LACP local fault
Guess that 1 link of the port-channel facing the BGW got some problems and stops to work and unfortunatly that particular link was the link selected by the port-channel load-balancing algorithm! (so lucky :) ). In this case LACP fast-rate option will help to reconverge the traffic in less than 2 seconds restoring connectivity.
In our case the Eth1/3 on SiteABGW2 (the forwarding port) will simulate a fault but the connectivity will be restored ivery few seconds due to the fast-rate feature:
SiteA-1# ping 10.10.0.7 count 20 interval 1
PING 10.10.0.7 (10.10.0.7): 56 data bytes
64 bytes from 10.10.0.7: icmp_seq=0 ttl=254 time=8.29 ms
64 bytes from 10.10.0.7: icmp_seq=1 ttl=254 time=8.248 ms
64 bytes from 10.10.0.7: icmp_seq=2 ttl=254 time=8.581 ms
64 bytes from 10.10.0.7: icmp_seq=3 ttl=254 time=9.074 ms
Request 4 timed out
Request 5 timed out
64 bytes from 10.10.0.7: icmp_seq=6 ttl=254 time=9.842 ms
64 bytes from 10.10.0.7: icmp_seq=7 ttl=254 time=8.688 ms
64 bytes from 10.10.0.7: icmp_seq=8 ttl=254 time=8.124 ms
64 bytes from 10.10.0.7: icmp_seq=9 ttl=254 time=9.13 ms
64 bytes from 10.10.0.7: icmp_seq=10 ttl=254 time=9.45 ms
64 bytes from 10.10.0.7: icmp_seq=11 ttl=254 time=9.501 ms
64 bytes from 10.10.0.7: icmp_seq=12 ttl=254 time=9.652 ms
64 bytes from 10.10.0.7: icmp_seq=13 ttl=254 time=8.833 ms
64 bytes from 10.10.0.7: icmp_seq=14 ttl=254 time=7.852 ms
64 bytes from 10.10.0.7: icmp_seq=15 ttl=254 time=9.886 ms
64 bytes from 10.10.0.7: icmp_seq=16 ttl=254 time=9.547 ms
64 bytes from 10.10.0.7: icmp_seq=17 ttl=254 time=8.924 ms
64 bytes from 10.10.0.7: icmp_seq=18 ttl=254 time=9.378 ms
64 bytes from 10.10.0.7: icmp_seq=19 ttl=254 time=9.166 ms
--- 10.10.0.7 ping statistics ---
20 packets transmitted, 18 packets received, 10.00% packet loss
round-trip min/avg/max = 7.852/9.009/9.886 ms
Zig-Zag fault
Now suppose that you have the link to SiteABGW2 in fault (like the previous example) and, at the same time, all the links on SiteABGW1 facing the external transport are in fault as well:
In this particular scenario you will use the routing adjacency between the vpc peer in order to maintain the BGP sessions with the remote BGW established and let the whole infrastructure up and running:
SiteABGW1# sh bgp l2 evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 172.16.1.1, local AS number 65535
BGP table version is 32, L2VPN EVPN config peers 3, capable peers 3
12 network entries and 15 paths using 2568 bytes of memory
BGP attribute entries [7/1204], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
172.16.0.3 4 65535 101 99 32 0 0 01:33:03 3 <---SiteBBGW1
172.16.0.4 4 65535 101 98 32 0 0 01:31:35 3 <---SiteBBGW2
SiteABGW1# sh ip route 172.16.0.3
172.16.0.3/32, ubest/mbest: 1/0
*via 192.168.99.2, Vlan999, [110/81], 00:01:40, ospf-UNDERLAY, intra <----SiteABGW2
SiteABGW1# sh run inte vlan 999
interface Vlan999
description Peering-VPC
ip address 192.168.99.1/30
ip ospf network point-to-point
ip router ospf UNDERLAY area 0.0.0.0
SiteA-1# ping 10.10.0.7
PING 10.10.0.7 (10.10.0.7): 56 data bytes
64 bytes from 10.10.0.7: icmp_seq=0 ttl=254 time=11.448 ms
64 bytes from 10.10.0.7: icmp_seq=1 ttl=254 time=14.807 ms
64 bytes from 10.10.0.7: icmp_seq=2 ttl=254 time=6.659 ms
64 bytes from 10.10.0.7: icmp_seq=3 ttl=254 time=7.601 ms
64 bytes from 10.10.0.7: icmp_seq=4 ttl=254 time=7.622 ms
--- 10.10.0.7 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 6.659/9.627/14.807 ms