The final step about “VXLAN EVPN DCI” journey regards host reachability and fault scenario. In this post i’ll try to verify the reachability between 2 hosts in different sites and their behavior when the network faces some faults.

Host reachability

When the overlay configurations is up and running we can try to connect a host or a device (in my lab i will use SiteA-1 and SiteB-1 switches on vlan 10) to our BGW and try to reach the remote side using our VXLAN EVPN DCI.

LAB Topology

e.g. SiteA-1

interface Vlan10
  no shutdown
  ip address 10.10.0.5/24


e.g. SiteB-1

interface Vlan10
  no shutdown
  ip address 10.10.0.7/24

All the necessary informations are populated in the BGW:

SiteABGW1# sh l2route mac all 

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link 
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen, (Orp): Orphan

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops            
                  
----------- -------------- ------ ------------- ---------- ---------------------
------------------
10          5005.0000.1b08 Local  L,            0          Po3                  
                  
10          5007.0000.1b08 BGP    Rcv           0          172.16.1.200 (Label: 
10010) 


SiteA-1# sh mac address-table vlan 10
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
+   10     5005.0000.1b08   dynamic  0         F      F    Po3
C   10     5007.0000.1b08   dynamic  0         F      F    nve1(172.16.1.200)


SiteABGW1# sh bgp l2vpn evpn vni-id 10010
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 23, Local Router ID is 172.16.1.1
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - b
est2

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 172.16.1.1:32777    (L2VNI 10010)
*>l[2]:[0]:[0]:[48]:[5005.0000.1b08]:[0]:[0.0.0.0]/216
                      172.16.1.100                      100      32768 i
*>i[2]:[0]:[0]:[48]:[5007.0000.1b08]:[0]:[0.0.0.0]/216
                      172.16.1.200                      100          0 i
* i                   172.16.1.200                      100          0 i
*>l[3]:[0]:[32]:[172.16.1.100]/88
                      172.16.1.100                      100      32768 i
*>i[3]:[0]:[32]:[172.16.1.200]/88
                      172.16.1.200                      100          0 i
* i                   172.16.1.200                      100          0 i

and reachability happens:

SiteA-1# ping 10.10.0.7
PING 10.10.0.7 (10.10.0.7): 56 data bytes
36 bytes from 10.10.0.5: Destination Host Unreachable
Request 0 timed out
64 bytes from 10.10.0.7: icmp_seq=1 ttl=254 time=13.195 ms
64 bytes from 10.10.0.7: icmp_seq=2 ttl=254 time=7.646 ms
64 bytes from 10.10.0.7: icmp_seq=3 ttl=254 time=7.078 ms
64 bytes from 10.10.0.7: icmp_seq=4 ttl=254 time=7.247 ms

SiteA-1# sh mac address-table vlan 10
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
*   10     5007.0000.1b08   dynamic  0         F      F    Po1

LACP local fault

Guess that 1 link of the port-channel facing the BGW got some problems and stops to work and unfortunatly that particular link was the link selected by the port-channel load-balancing algorithm! (so lucky :) ). In this case LACP fast-rate option will help to reconverge the traffic in less than 2 seconds restoring connectivity.

LAB Topology

In our case the Eth1/3 on SiteABGW2 (the forwarding port) will simulate a fault but the connectivity will be restored ivery few seconds due to the fast-rate feature:

SiteA-1# ping 10.10.0.7 count 20 interval 1
PING 10.10.0.7 (10.10.0.7): 56 data bytes
64 bytes from 10.10.0.7: icmp_seq=0 ttl=254 time=8.29 ms
64 bytes from 10.10.0.7: icmp_seq=1 ttl=254 time=8.248 ms
64 bytes from 10.10.0.7: icmp_seq=2 ttl=254 time=8.581 ms
64 bytes from 10.10.0.7: icmp_seq=3 ttl=254 time=9.074 ms
Request 4 timed out
Request 5 timed out
64 bytes from 10.10.0.7: icmp_seq=6 ttl=254 time=9.842 ms
64 bytes from 10.10.0.7: icmp_seq=7 ttl=254 time=8.688 ms
64 bytes from 10.10.0.7: icmp_seq=8 ttl=254 time=8.124 ms
64 bytes from 10.10.0.7: icmp_seq=9 ttl=254 time=9.13 ms
64 bytes from 10.10.0.7: icmp_seq=10 ttl=254 time=9.45 ms
64 bytes from 10.10.0.7: icmp_seq=11 ttl=254 time=9.501 ms
64 bytes from 10.10.0.7: icmp_seq=12 ttl=254 time=9.652 ms
64 bytes from 10.10.0.7: icmp_seq=13 ttl=254 time=8.833 ms
64 bytes from 10.10.0.7: icmp_seq=14 ttl=254 time=7.852 ms
64 bytes from 10.10.0.7: icmp_seq=15 ttl=254 time=9.886 ms
64 bytes from 10.10.0.7: icmp_seq=16 ttl=254 time=9.547 ms
64 bytes from 10.10.0.7: icmp_seq=17 ttl=254 time=8.924 ms
64 bytes from 10.10.0.7: icmp_seq=18 ttl=254 time=9.378 ms
64 bytes from 10.10.0.7: icmp_seq=19 ttl=254 time=9.166 ms

--- 10.10.0.7 ping statistics ---
20 packets transmitted, 18 packets received, 10.00% packet loss
round-trip min/avg/max = 7.852/9.009/9.886 ms

Zig-Zag fault

Now suppose that you have the link to SiteABGW2 in fault (like the previous example) and, at the same time, all the links on SiteABGW1 facing the external transport are in fault as well:

LAB Topology

In this particular scenario you will use the routing adjacency between the vpc peer in order to maintain the BGP sessions with the remote BGW established and let the whole infrastructure up and running:

SiteABGW1# sh bgp l2 evpn summary 
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 172.16.1.1, local AS number 65535
BGP table version is 32, L2VPN EVPN config peers 3, capable peers 3
12 network entries and 15 paths using 2568 bytes of memory
BGP attribute entries [7/1204], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd       
172.16.0.3      4 65535     101      99       32    0    0 01:33:03 3           <---SiteBBGW1
172.16.0.4      4 65535     101      98       32    0    0 01:31:35 3           <---SiteBBGW2


SiteABGW1# sh ip route 172.16.0.3
172.16.0.3/32, ubest/mbest: 1/0
    *via 192.168.99.2, Vlan999, [110/81], 00:01:40, ospf-UNDERLAY, intra   <----SiteABGW2


SiteABGW1# sh run inte vlan 999
interface Vlan999
  description Peering-VPC
  ip address 192.168.99.1/30
  ip ospf network point-to-point
  ip router ospf UNDERLAY area 0.0.0.0


SiteA-1# ping 10.10.0.7
PING 10.10.0.7 (10.10.0.7): 56 data bytes
64 bytes from 10.10.0.7: icmp_seq=0 ttl=254 time=11.448 ms
64 bytes from 10.10.0.7: icmp_seq=1 ttl=254 time=14.807 ms
64 bytes from 10.10.0.7: icmp_seq=2 ttl=254 time=6.659 ms
64 bytes from 10.10.0.7: icmp_seq=3 ttl=254 time=7.601 ms
64 bytes from 10.10.0.7: icmp_seq=4 ttl=254 time=7.622 ms

--- 10.10.0.7 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 6.659/9.627/14.807 ms