This year i plan to deep learn datacenter protocols and technologies because many more customers engaged me to support them in datacenter refreshing or enhancement and i want to, at least, become a CCNP Datacenter in order to offer them an always better solution.
One of the datacenter trend-topic is VXLAN for sure, so i read “Building Data Centers with VXLAN BGP EVPN” and i’m trying to setup a VXLAN lab starting from the simplest scenario: VXLAN Flood and Learn.
VXLAN is a type of encapsulation (MAC in UDP) used to stretch an ethernet frame (L2) between 2 different sites among a routed transport network. To do this we need some particular devices called VTEP (VXLAN Tunnel EndPoint) that encapsulate the classic dot1q frame into a “VXLAN packet” at the source and decapsulate it at the destination. The final goal of this lab is to put in communication 2 different client in the same vlan.
Lab topology is:
In a Clos topology (Spine-Leaf) the Leaf nodes act as a VTEP and they are able to manage the dot1q frame on client-side port, encaplsulate it wth VXLAN header, and propagate it to destination VTEP via the underlay routing where the packets are decapsulated and forward to destination devices using mac-address table lookup.
VXLAN Flood and Learn is very similar to switching Flood and Learn where a VTEP receive a frame with an unknown destination and flood it to all the other VTEPs in order to know how to forward it. When the remote VTEP receive this “broadcast” request, it flood the request to all the port belong to that particular vlan until te destination device responds with its mac-address and port. At this point the destination VTEP store this info to its mac-address table and advertise it to the source VTEP.
Due to the routed nature of the underlay, the broadcast behavior is simulate by the multicast protocol so a classic vlan is associated to a VNI or vn-segment (VXLAN segment or L2VNI) and the VNI is also associated to a multicast group.
In this lab, the Spines act as backbone router (using OSPF) and PIM Anycast RP in order to manage the shared and source trees creates by the VTEPS:
e.g. Spine1
feature ospf
feature pim
interface Ethernet1/1
no switchport
medium p2p
ip unnumbered loopback0
ip router ospf UNDERLAY area 0.0.0.0
ip pim sparse-mode
no shutdown
interface Ethernet1/2
no switchport
medium p2p
ip unnumbered loopback0
ip router ospf UNDERLAY area 0.0.0.0
ip pim sparse-mode
no shutdown
interface Ethernet1/3
no switchport
medium p2p
ip unnumbered loopback0
ip router ospf UNDERLAY area 0.0.0.0
ip pim sparse-mode
no shutdown
interface loopback100
description RP
ip address 10.0.0.100/32
ip router ospf UNDERLAY area 0.0.0.0
ip pim sparse-mode
interface loopback101
description ANYCAST RP
ip address 10.0.0.254/32
ip router ospf UNDERLAY area 0.0.0.0
ip pim sparse-mode
ip pim rp-address 10.0.0.254
ip pim anycast-rp 10.0.0.254 10.0.0.100
ip pim anycast-rp 10.0.0.254 10.0.0.200
The Leaf nodes act as a classic ethernet switch and VTEPs, so you need to enable the VXLAN features and configure the appropriate NVE interface (for VTEP capabilities)
e.g. Leaf1
feature ospf
feature pim
feature vn-segment-vlan-based
feature nv overlay
ip pim rp-address 10.0.0.254
vlan 10
vn-segment 30010
interface nve1
no shutdown
source-interface loopback100
member vni 30010 mcast-group 239.0.0.1
interface Ethernet1/1
no switchport
medium p2p
ip unnumbered loopback0
ip router ospf UNDERLAY area 0.0.0.0
ip pim sparse-mode
no shutdown
interface Ethernet1/2
no switchport
medium p2p
ip unnumbered loopback0
ip router ospf UNDERLAY area 0.0.0.0
ip pim sparse-mode
no shutdown
interface Ethernet1/3
switchport access vlan 10
interface loopback100
description VTEP
ip address 10.0.0.1/32
ip router ospf UNDERLAY area 0.0.0.0
ip pim sparse-mode
At the beginning, every VTEP only knows its directly attached devices so Leaf1 add into its own mac-address table only the entry for Client1. When Client1 starts an icmp echo request (ping) to Client2, the multicast tree for group 239.0.0.1 associated to the VNI 30010 (and vlan 10) is created and the arp request is received by the destination VTEP that create another multicast tree in order to reply the arp request:
Leaf1# show ip mroute 239.0.0.1
IP Multicast Routing Table for VRF "default"
(10.0.0.1/32, 239.0.0.1/32), uptime: 03:58:02, nve mrib ip pim
Incoming interface: loopback100, RPF nbr: 10.0.0.1
Outgoing interface list: (count: 1)
Ethernet1/1, uptime: 02:50:48, pim
---------------------------------------------------
Leaf3# sh ip mroute 239.0.0.1
IP Multicast Routing Table for VRF "default"
(10.0.0.3/32, 239.0.0.1/32), uptime: 03:52:27, nve mrib ip pim
Incoming interface: loopback100, RPF nbr: 10.0.0.3
Outgoing interface list: (count: 1)
Ethernet1/1, uptime: 02:52:35, pim
Once the request is completed and Leaf1 received the ARP reply it add all the necessary info to reach the remote destination to its mac-address table:
Leaf1# show mac address-table
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
* 10 aabb.cc00.6000 dynamic 0 F F Eth1/3
* 10 aabb.cc00.7000 dynamic 0 F F nve1(10.0.0.3)
G - 5003.0000.1b08 static - F F sup-eth1(R)
--------------------------------------------------------------------------
Leaf3# show mac address-table
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
* 10 aabb.cc00.6000 dynamic 0 F F nve1(10.0.0.1)
* 10 aabb.cc00.7000 dynamic 0 F F Eth1/3
G - 5005.0000.1b08 static - F F sup-eth1(R)
Finally, all the necessary info to complete the path from source to destination are installed and the clients can talk to each others:
Client1-Vlan10#ping 192.168.10.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/17/27 ms
Client1-Vlan10#show ip arp
Protocol Address Age (min) Hardware Addr Type Interface
Internet 192.168.10.1 - aabb.cc00.6000 ARPA Ethernet0/0
Internet 192.168.10.2 89 aabb.cc00.7000 ARPA Ethernet0/0
--------------------------------------------
Client2-Vlan10#ping 192.168.10.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/16/20 ms
Client2-Vlan10#show ip arp
Protocol Address Age (min) Hardware Addr Type Interface
Internet 192.168.10.1 90 aabb.cc00.6000 ARPA Ethernet0/0
Internet 192.168.10.2 - aabb.cc00.7000 ARPA Ethernet0/0
Flood and Learn works well but it’s not an efficient way to achieve our goal so, in order to improve optimization to VXLAN dataplane, we need to introduce a control-plane called EVPN over BGP but i’ll test it in a future post.