This year i plan to deep learn datacenter protocols and technologies because many more customers engaged me to support them in datacenter refreshing or enhancement and i want to, at least, become a CCNP Datacenter in order to offer them an always better solution.

One of the datacenter trend-topic is VXLAN for sure, so i read “Building Data Centers with VXLAN BGP EVPN” and i’m trying to setup a VXLAN lab starting from the simplest scenario: VXLAN Flood and Learn.

VXLAN is a type of encapsulation (MAC in UDP) used to stretch an ethernet frame (L2) between 2 different sites among a routed transport network. To do this we need some particular devices called VTEP (VXLAN Tunnel EndPoint) that encapsulate the classic dot1q frame into a “VXLAN packet” at the source and decapsulate it at the destination. The final goal of this lab is to put in communication 2 different client in the same vlan.

Lab topology is:

LAB Topology

In a Clos topology (Spine-Leaf) the Leaf nodes act as a VTEP and they are able to manage the dot1q frame on client-side port, encaplsulate it wth VXLAN header, and propagate it to destination VTEP via the underlay routing where the packets are decapsulated and forward to destination devices using mac-address table lookup.

VXLAN Flood and Learn is very similar to switching Flood and Learn where a VTEP receive a frame with an unknown destination and flood it to all the other VTEPs in order to know how to forward it. When the remote VTEP receive this “broadcast” request, it flood the request to all the port belong to that particular vlan until te destination device responds with its mac-address and port. At this point the destination VTEP store this info to its mac-address table and advertise it to the source VTEP.

Due to the routed nature of the underlay, the broadcast behavior is simulate by the multicast protocol so a classic vlan is associated to a VNI or vn-segment (VXLAN segment or L2VNI) and the VNI is also associated to a multicast group.

In this lab, the Spines act as backbone router (using OSPF) and PIM Anycast RP in order to manage the shared and source trees creates by the VTEPS:

e.g. Spine1

feature ospf
feature pim

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/3
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface loopback100
  description RP
  ip address 10.0.0.100/32
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode

interface loopback101
  description ANYCAST RP
  ip address 10.0.0.254/32
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode

ip pim rp-address 10.0.0.254
ip pim anycast-rp 10.0.0.254 10.0.0.100
ip pim anycast-rp 10.0.0.254 10.0.0.200

The Leaf nodes act as a classic ethernet switch and VTEPs, so you need to enable the VXLAN features and configure the appropriate NVE interface (for VTEP capabilities)

e.g. Leaf1

feature ospf
feature pim
feature vn-segment-vlan-based
feature nv overlay

ip pim rp-address 10.0.0.254

vlan 10
  vn-segment 30010

interface nve1
  no shutdown
  source-interface loopback100
  member vni 30010 mcast-group 239.0.0.1

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/3
  switchport access vlan 10

interface loopback100
  description VTEP
  ip address 10.0.0.1/32
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode

At the beginning, every VTEP only knows its directly attached devices so Leaf1 add into its own mac-address table only the entry for Client1. When Client1 starts an icmp echo request (ping) to Client2, the multicast tree for group 239.0.0.1 associated to the VNI 30010 (and vlan 10) is created and the arp request is received by the destination VTEP that create another multicast tree in order to reply the arp request:

Leaf1# show ip mroute 239.0.0.1
IP Multicast Routing Table for VRF "default"

(10.0.0.1/32, 239.0.0.1/32), uptime: 03:58:02, nve mrib ip pim 
  Incoming interface: loopback100, RPF nbr: 10.0.0.1
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 02:50:48, pim

---------------------------------------------------

Leaf3# sh ip mroute 239.0.0.1
IP Multicast Routing Table for VRF "default"

(10.0.0.3/32, 239.0.0.1/32), uptime: 03:52:27, nve mrib ip pim 
  Incoming interface: loopback100, RPF nbr: 10.0.0.3
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 02:52:35, pim

Once the request is completed and Leaf1 received the ARP reply it add all the necessary info to reach the remote destination to its mac-address table:

Leaf1# show mac address-table 
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
*   10     aabb.cc00.6000   dynamic  0         F      F    Eth1/3
*   10     aabb.cc00.7000   dynamic  0         F      F    nve1(10.0.0.3)
G    -     5003.0000.1b08   static   -         F      F    sup-eth1(R)

--------------------------------------------------------------------------

Leaf3# show mac address-table 
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
*   10     aabb.cc00.6000   dynamic  0         F      F    nve1(10.0.0.1)
*   10     aabb.cc00.7000   dynamic  0         F      F    Eth1/3
G    -     5005.0000.1b08   static   -         F      F    sup-eth1(R)

Finally, all the necessary info to complete the path from source to destination are installed and the clients can talk to each others:

Client1-Vlan10#ping 192.168.10.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/17/27 ms
Client1-Vlan10#show ip arp      
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  192.168.10.1            -   aabb.cc00.6000  ARPA   Ethernet0/0
Internet  192.168.10.2           89   aabb.cc00.7000  ARPA   Ethernet0/0

--------------------------------------------

Client2-Vlan10#ping 192.168.10.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/16/20 ms
Client2-Vlan10#show ip arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  192.168.10.1           90   aabb.cc00.6000  ARPA   Ethernet0/0
Internet  192.168.10.2            -   aabb.cc00.7000  ARPA   Ethernet0/0

Flood and Learn works well but it’s not an efficient way to achieve our goal so, in order to improve optimization to VXLAN dataplane, we need to introduce a control-plane called EVPN over BGP but i’ll test it in a future post.