Showing posts with label Data Center Interconnect. Show all posts
Showing posts with label Data Center Interconnect. Show all posts

Monday, November 24, 2014

Vmotion over a VXLAN overlay.

In an old Post I showed how to create a simple VXLAN tunnel to stretch Layer 2 over an Layer 3 network. Now I'll show how VMWare VMotion works over this same type of setup. Note: I researched this and it warned that VXLAN was not a solution for doing VMotion over VXLAN. The latency could be high over a DCI so VMotion may not always work. This is just a proof of concept.


From what I've read, vmotion over vxlan on the hypervisor is not supported in ESXi 5.1 or earlier. I'm not sure about ESXi 5.5. In any case it doesn't matter what version of ESXi you have, because in this scenario the VXLAN tunnel starts and stops at the SPINE layer. The Hypervisor is NOT using VXLAN at all and is communicating to the underlay through a normal vlan tag. Even the leaf switch is communicating to the spine over a vlan tag. 

In this setup I have a NetApp SAN that will serve as the NFS datastore for the VMs. The SAN network is using VLAN 10, while the VMs Data path is using VLAN 100. 

On VMWare I created a NFS Datastore. The IP address of SAN NetApp is 100.10.1.254.


Each Host Serveris is seperated by a L3 network as in the above topology.

Host 1 has an VMKernal IP of 100.10.1.4 and Host 2 has an IP of 100.10.1.5 for communicating with NetApp.

 

Vmotion is enabled on both dvSwitches.


I created a distributed switch for the NFS network and added a 1G NIC as an uplink. This was done on both Hosts.
 
A second dvSwitch was created for the VMs Data path. This is using a 10G NIC for the uplink.
On NetApp


Vol1 was created which allows Read/Write access to the VMware hosts.

NetApps Network Interface is configured with the IP 100.10.1.254 and uses a vlan tag of 10.


On Juniper Leaf switch.

vlan 10 and vlan 100 are created

jnpr@QFX5100-48S-5# show vlans
v10 {
    vlan-id 10;
}
v100 {
    vlan-id 100;
}

Interfaces are created and vlans added.

set interfaces xe-0/0/2 description TO-VMW-145-vmnic9
set interfaces xe-0/0/2 unit 0 family ethernet-switching interface-mode trunk
set interfaces xe-0/0/2 unit 0 family ethernet-switching vlan members v100
set interfaces ge-0/0/47 description TO-VMW-145-vmnic3-for-NETAPP
set interfaces ge-0/0/47 unit 0 family ethernet-switching interface-mode trunk
set interfaces ge-0/0/47 unit 0 family ethernet-switching vlan members v10
set interfaces et-0/0/50 description TO-SPINE1
set interfaces et-0/0/50 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-0/0/50 unit 0 family ethernet-switching vlan members all

The 1GE link is used for the NFS and the 10G is used for Data between the Leaf and Host1. The uplink to Spine1 is over a 10G link and has vlan member all.

On the Juniper Spine1:

Again two vlans are created, but here is where the mapping of vlan to VXLAN tunnel is created.


jnpr@EX9200-1# show vlans
v10 {
    vlan-id 10;
    l3-interface irb.10;
    vxlan {
        vni 10;
        multicast-group 239.1.1.10;
        encapsulate-inner-vlan;
        decapsulate-accept-inner-vlan;
    }
}
v100 {
    vlan-id 100;
    l3-interface irb.0;
    vxlan {
        vni 100;
        multicast-group 239.1.1.100;
        encapsulate-inner-vlan;
        decapsulate-accept-inner-vlan;
    }
}

Core interface is L3.
set interfaces et-2/0/0 description TO-CORE1
set interfaces et-2/0/0 unit 0 family inet address 192.168.24.4/24
set interfaces et-2/0/0 unit 0 family iso

Link towards Leaf1 is a trunk with the 2 vlans.
set interfaces et-2/2/1 description TO-QFX5100-48S-5
set interfaces et-2/2/1 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-2/2/1 unit 0 family ethernet-switching vlan members v100
set interfaces et-2/2/1 unit 0 family ethernet-switching vlan members v10

The NetApp sits only off of Spine 1 and traffic is switched there from Leaf1.

set interfaces ge-7/0/0 description TO-NETAPP
set interfaces ge-7/0/0 unit 0 family ethernet-switching interface-mode trunk
set interfaces ge-7/0/0 unit 0 family ethernet-switching vlan members v10

IRBs are created for the two vlans.  Vlan 100's IRB is the default gateway for the Data Path of the VMs. IRB for vlan 10 is used just to make sure we can ping the NetApp and the VMware VMKernal IP.

set interfaces irb unit 0 family inet address 100.1.1.1/24
set interfaces irb unit 10 family inet address 100.10.1.200/24

PIM and IGP routing protocols are created for connectivity.

set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive
set protocols pim rp static address 192.168.0.1
set protocols pim interface lo0.0 mode bidirectional-sparse
set protocols pim interface et-2/0/0.0 mode bidirectional-sparse
set protocols lldp interface all
ON the Remote Spine2 switch the configuration is almost the same. The only difference is that there is no direct NetApp connection, so NFS needs to be tunneled through VXLAN so the VMKernal on Host 2 can access the storage.

jnpr@EX9200-2# show vlans       
v10 {
    vlan-id 10;
    l3-interface irb.10;
    vxlan {
        vni 10;
        multicast-group 239.1.1.10;
        encapsulate-inner-vlan;
        decapsulate-accept-inner-vlan;
    }
}
v100 {
    vlan-id 100;
    l3-interface irb.0;
    vxlan {
        vni 100;
        multicast-group 239.1.1.100;
        encapsulate-inner-vlan;
        decapsulate-accept-inner-vlan;
    }
}


jnpr@EX9200-2# show protocols | display set
set protocols isis reference-bandwidth 40g
set protocols isis interface et-2/0/0.0
set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive
set protocols pim rp static address 192.168.0.1
set protocols pim interface lo0.0 mode bidirectional-sparse
set protocols pim interface et-2/0/0.0 mode bidirectional-sparse
set protocols lldp interface all

LLDP is turned on the Juniper switches and dvSwitches so we can monitor the underlay.

On Leaf1 we can see both dvSwitches

jnpr@QFX5100-48S-5# run show lldp neighbors
Local Interface    Parent Interface    Chassis Id          Port info          System Name
xe-0/0/2           -                   00:05:33:48:70:b9   port1
xe-0/0/2           -                   00:50:56:b9:0b:b3   eth1               South-VM           
et-0/0/50          -                   4c:96:14:6b:bb:c0   TO-QFX5100-48S-5   EX9200-1           
ge-0/0/47          -                                       port 2 on dvSwitch dvSwitch-NFS-v10 (etherswitch) localhost.jnpr.net 
xe-0/0/2           -                                       port 2 on dvSwitch dvSwitch-v100 (etherswitch) localhost.jnpr.net

And a Host running lldpd. We can also see the uplink to Spine1.


jnpr@QFX5100-48S-5# run show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 5 entries, 5 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical
    name                address             flags              interface
    v10                 00:50:56:6f:2c:31   D             -   ge-0/0/47.0         
    v10                 02:a0:98:2c:06:cc   D             -   et-0/0/50.0  

The things to note here are the two macs. The NetApp uses 02:a0:98:2c:06:cc which is being learned from the uplink to the Spine. And 00:50:56:6f:2c:31 is the NIC of Host 1.

On Spine 1 

jnpr@EX9200-1# run show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 5 entries, 5 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v10                 00:50:56:69:37:db   D             -   vtep.32769          
    v10                 00:50:56:6f:2c:31   D             -   et-2/2/1.0          
    v10                 02:a0:98:2c:06:cc   D             -   ge-7/0/0.0          
    v10                 4c:96:14:f2:b6:e0   D             -   vtep.32769          
    v10                 a8:d0:e5:f7:bf:f0   D             -   vtep.32769

We can see that macs are being learned over the VTEP or VXLAN tunnel.

Mac  00:50:56:69:37:db is the NIC of Host2.

Spine 2 we can see NETAPP is being learned through the VXLAN tunnel.

# run show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 3 entries, 3 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v10                 00:50:56:69:37:db   D             -   et-2/2/1.0          
    v10                 02:a0:98:2c:06:cc   D             -   vtep.32769           <<< NETAPP
    v10                 4c:96:14:6b:bb:f0   D             -   vtep.32769 



Leaf 2

jnpr@QFX5100-48S-6> show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 4 entries, 4 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical
    name                address             flags              interface
    v10                 00:50:56:69:37:db   D             -   ge-0/0/47.0        
    v10                 02:a0:98:2c:06:cc   D             -   et-0/0/50.0        
    v10                 4c:96:14:6b:bb:f0   D             -   et-0/0/50.0        
    v10                 a8:d0:e5:f7:bf:f0   D             -   et-0/0/50.0

Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical
    name                address             flags              interface
    v100                00:50:56:b9:55:58   D             -   xe-0/0/2.0          
    v100                4c:96:14:6b:bb:f0   D             -   et-0/0/50.0    

Now we are ready to do vMotion.

You can see from Ubuntu the mac address is on eth1 which is going to vlan 100 on Leaf2


jnpr@vmotion-ubuntu:~$ ifconfig

eth1      Link encap:Ethernet  HWaddr 00:50:56:b9:55:58 
          inet addr:100.1.1.30  Bcast:100.1.1.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:feb9:5558/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8633 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7023 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1200614 (1.2 MB)  TX bytes:905670 (905.6 KB)

On Spine2 it's a local mac.
jnpr@EX9200-2# run show ethernet-switching table vlan-id 100
Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v100                00:50:56:b9:55:58   D             -   et-2/2/1.0          



On VMWare I choose the Vmotion ubuntu VM.




And Select Migrate and chooose Change Host and select Host1 as the Destination. Select the Priority of your choice and click Finish.



Migration completed!

On Leaf1 I can see the mac move.
jnpr@QFX5100-48S-5# run show ethernet-switching table vlan-id 100

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical
    name                address             flags              interface
    v100                00:50:56:b9:55:58   D             -   xe-0/0/2.0           
    v100                4c:96:14:6b:bb:f0   D             -   et-0/0/50.0

On Spine 1 it is local.

jnpr@EX9200-1# run show ethernet-switching table vlan-id 100

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 1 entries, 1 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v100                00:50:56:b9:55:58   D             -   et-2/2/1.0          

And on Spine 2 it now sits across the VTEP.


jnpr@EX9200-2# run show ethernet-switching table vlan-id 100

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v100                00:50:56:b9:55:58   D             -   vtep.32769          
    v100                4c:96:14:6b:bb:f0   D             -   vtep.32769          

One caveat I found was that vmotion sends large frames. IP mcast + VXLAN adds overhead. So when I setup my network with the default configuration I could not initiate vmotion. I had to change the MTU size on Spine to 9000 and then it worked.

So the conclusion is, you don't need VXLAN support on the Hypervisor if you want to do vMotion over a Layer 2 stretch technology. You can run VXLAN Tunnels between two Juniper Ex9200 Spine switches and map vlans to these tunnels.  The Overlay is unaware of what is happening in the Underlay.


Monday, August 4, 2014

VXLAN for Layer 2 stretch over L3 network

I showed how EVPN with MPLS is used to stretch Layer 2 across Data Centers. Now I'll show how to stretch Layer 2 using VXLAN as the tunneling protocol. I'm not going to setup EVPN with VXLAN as that is a different technical method. This is a simple point to point VTEP setup to show how it works.

With Juniper EX9200s you can map vlans into a specific VXLAN tunnel or VTEP (Vxlan Tunnel End Point) The original L2 frame gets encapsulated into a VXLAN header. The outer header is an IP frame.


This allows it to cross a L3 network while retaining the original L2 frame. Communication to setup the tunnel is done through multicast.





Note: VXLAN tunnels can originate from the Hypervisor itself using vShield in VMware. This is another method if you want to have the tunnels originate in the underlay.

The EXs will need to be configured for PIM and an RP will be needed to build the Multicast tree. Multicast is used to interconnect the different VTEPs. It's used to optimize network traffic. Only End points listening for the multicast traffic will be forwarded frames. Other devices in the network will not receive this traffic.

 Each VTEP will need to have two things.

1) A VXLAN Network Identifier (aka VNI) which is like a dlci in Frame relay or vc-id in Point to Point Psuedo-wires in MPLS. 

2) An IP multicast address

When a L2 packet hits the switch, it will be encapsulated into an ip mulicast address + a vxlan header. This packet will then go to the RP to do the replication  to all the "receivers". In our case we only have 2 end points so the RP will only see 2 receivers.

Here's config snippets of how this is built.

EX1

First build the interface connecting to the LEAF switch.

set interfaces et-2/2/1 description TO-LEAF1
set interfaces et-2/2/1 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-2/2/1 unit 0 family ethernet-switching vlan members v100

Add the core facing interface

set interfaces et-2/0/0 description TO-CORE1
set interfaces et-2/0/0 unit 0 family inet address 192.168.24.4/24
set interfaces et-2/0/0 unit 0 family iso

setup the EX to use a tunneling resource

set chassis fpc 9 pic 1 tunnel-services

Add your IGP flavor of choice and it's related config to exchange L3 information through the network.

set interfaces lo0 unit 0 family inet address 4.4.4.4/32
set interfaces lo0 unit 0 family iso address 49.0001.0040.0400.4004.00
set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive

Configure PIM and point it to the RP

set protocols pim rp static address 192.168.0.1
set protocols pim interface lo0.0 mode bidirectional-sparse
set protocols pim interface et-2/0/0.0 mode bidirectional-sparse

Then map your VLAN into a VTEP

set vlans v100 vlan-id 100
set vlans v100 l3-interface irb.0
set vlans v100 vxlan vni 1
set vlans v100 vxlan multicast-group 239.1.1.1
set vlans v100 vxlan encapsulate-inner-vlan
set vlans v100 vxlan decapsulate-accept-inner-vlan

Then setup the switch to use it's ip address as the source of the tunnel

set switch-options vtep-source-interface lo0.0

On the RP you would only need to setup your IPs addresses and PIM configuration

set chassis fpc 1 pic 2 tunnel-services
set interfaces et-2/0/0 description TO-EX2
set interfaces et-2/0/0 unit 0 family inet address 192.168.35.3/24
set interfaces et-2/0/0 unit 0 family iso
set interfaces et-3/2/0 description TO-CORE1
set interfaces et-3/2/0 unit 0 family inet address 192.168.23.3/24
set interfaces et-3/2/0 unit 0 family iso
set interfaces et-3/2/1 description TO-CORE1
set interfaces et-3/2/1 unit 0 family inet address 192.168.123.3/24
set interfaces et-3/2/1 unit 0 family iso
set interfaces lo0 unit 0 family inet address 3.3.3.3/32 primary
set interfaces lo0 unit 0 family inet address 192.168.0.1/32
set interfaces lo0 unit 0 family iso address 49.0001.0030.0300.3003.00
set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive
set protocols pim rp local family inet address 192.168.0.1
set protocols pim interface all mode bidirectional-sparse
set protocols pim interface fxp0.0 disable


You would create a similar VTEP on the remote EX

set chassis fpc 3 pic 0 tunnel-services
set interfaces et-2/0/0 description TO-CORE2
set interfaces et-2/0/0 unit 0 family inet address 192.168.35.5/24
set interfaces et-2/0/0 unit 0 family iso
set interfaces et-2/2/1 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-2/2/1 unit 0 family ethernet-switching vlan members v100
set interfaces irb unit 0 family inet address 100.1.1.2/24
set interfaces lo0 unit 0 family inet address 5.5.5.5/32
set interfaces lo0 unit 0 family iso address 49.0001.0050.0500.5005.00
set protocols isis reference-bandwidth 40g
set protocols isis interface et-2/0/0.0
set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive
set protocols pim rp static address 192.168.0.1
set protocols pim interface lo0.0 mode bidirectional-sparse
set protocols pim interface et-2/0/0.0 mode bidirectional-sparse
set protocols lldp interface all
set switch-options vtep-source-interface lo0.0
set vlans v100 vlan-id 100
set vlans v100 l3-interface irb.0
set vlans v100 vxlan vni 1
set vlans v100 vxlan multicast-group 239.1.1.1
set vlans v100 vxlan encapsulate-inner-vlan
set vlans v100 vxlan decapsulate-accept-inner-vlan

-----------------------

Once that is done you can check the connectivity

EX1

Check the PIM state and see if the Mcast Join was sent to the RP and if the multicast route is seen by the pim neighbor


user@EX1# run show pim join detail
Instance: PIM.master Family: INET
R = Rendezvous Point Tree, S = Sparse, W = Wildcard

Group: 239.1.1.1
    Source: *
    RP: 192.168.0.1
    Flags: sparse,rptree,wildcard
    Upstream interface: et-2/0/0.0           
    Downstream neighbors:
        Interface: Pseudo-VXLAN          

Group: 239.1.1.1
    Source: 4.4.4.4
    Flags: sparse,spt
    Upstream interface: Local                
    Downstream neighbors:
        Interface: pe-9/1/0.32770        
        Interface: Pseudo-VXLAN  
        

Group: 239.1.1.1
    Source: 5.5.5.5
    Flags: sparse,spt
    Upstream interface: et-2/0/0.0           
    Downstream neighbors:
        Interface: Pseudo-VXLAN      

Once traffic is flowing from the LEAF switches,  you can then check the vtep to see traffic statistics

user@EX1# run show vlans   

Routing instance        VLAN name             Tag          Interfaces
default-switch          v100                  100     
                                                                                  et-2/2/1.0*
                                                                                  vtep.32768*
  
user@EX1# run show interfaces vtep.32768 detail
  Logical interface vtep.32768 (Index 324) (SNMP ifIndex 604) (Generation 239)
    Flags: Up SNMP-Traps Encapsulation: ENET2
    VXLAN Endpoint Type: Remote, VXLAN Endpoint Address: 5.5.5.5, L2 Routing Instance: default-switch, L3 Routing Instance: default
    Traffic statistics:
     Input  bytes  :            508486320
     Output bytes  :            509589960
     Input  packets:               498516
     Output packets:               499598
    Local statistics:
     Input  bytes  :                    0
     Output bytes  :                    0
     Input  packets:                    0
     Output packets:                    0
    Transit statistics:
     Input  bytes  :            508486320              8158304 bps
     Output bytes  :            509589960              8158280 bps
     Input  packets:               498516                  999 pps
     Output packets:               499598                  999 pps
    Protocol eth-switch, MTU: 1600, Generation: 331, Route table: 6
      Flags: Trunk-Mode

The EX is acting as a switch so you can see the mac table and find out where the macs are learned


user@EX1# run show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v100                00:00:05:ed:ad:49   D             -   et-2/2/1.0           <<<< local
    v100                00:00:05:ed:ae:01   D             -   vtep.32768         <<< over the vxlan tunnel 

Here are some useful vxlan commands to check when the number of vteps grow larger

user@EX1# run show ethernet-switching vxlan-tunnel-end-point source   
Logical System Name       Id  SVTEP-IP         IFL   L3-Idx
<default>                 0   4.4.4.4          lo0.0    0 
    L2-RTT                   Bridge Domain              VNID     MC-Group-IP
    default-switch           v100+100                   1        239.1.1.1     

user@EX1# run show ethernet-switching vxlan-tunnel-end-point remote   
Logical System Name       Id  SVTEP-IP         IFL   L3-Idx
<default>                 0   4.4.4.4          lo0.0    0 
 RVTEP-IP         IFL-Idx   NH-Id
 5.5.5.5          324       597     
    VNID          MC-Group-IP     
    1             239.1.1.1      

user@EX1# run show ethernet-switching vxlan-tunnel-end-point remote mac-table

MAC flags (S -static MAC, D -dynamic MAC, L -locally learned, C -Control MAC
           SE -Statistics enabled, NM -Non configured MAC, R -Remote PE MAC)

Logical system   : <default>
Routing instance : default-switch
 Bridging domain : v100+100, VLAN : 100, VNID : 1
   MAC                 MAC      Logical          Remote VTEP
   address             flags    interface        IP address
   00:00:05:ed:ae:01   D        vtep.32768       5.5.5.5     

On the RP, you'll need to make sure Multicast is working.

CORE2
user@CORE2# run show pim join detail
Instance: PIM.master Family: INET
R = Rendezvous Point Tree, S = Sparse, W = Wildcard

Group: 239.1.1.1
    Source: *
    RP: 192.168.0.1
    Flags: sparse,rptree,wildcard
    Upstream interface: Local                
    Downstream neighbors:
        Interface: et-2/0/0.0            
        Interface: et-3/2/0.0            

Group: 239.1.1.1
    Source: 4.4.4.4
    Flags: sparse
    Upstream interface: et-3/2/1.0           
    Downstream neighbors:
        Interface: et-3/2/0.0 (pruned)

Group: 239.1.1.1
    Source: 5.5.5.5
    Flags: sparse,spt
    Upstream interface: et-2/0/0.0           
    Downstream neighbors:              
        Interface: et-3/2/0.0            


Now the question you might ask is why use EVPN + MPLS as there are a lot of configuration invovled. Well that's like comparing apples to oranges. VXLAN and MPLS should be compared a little more as they are both transport mechanisms. MPLS is far more superior to VXLAN as MPLS can do traffic engineering, bandwidth reservation and ~ 50 ms convergence on failure, while VXLAN relies on the underlying IGP for much of the decision making.

Thursday, July 31, 2014

EVPN for Layer 2 stretch between Data Centers Pt.1

EVPN (Enhanced VPN or Ethernet VPN) is a great technology for stretching Layer 2 between Data Centers (aka Data Center Interconnect or DCI). It uses MP-BGP for control plane exchange of Tenant information and mac-addresses. Data Plane traffic is tunneled inside a tunneling protocol such as MPLS, VXLAN or PBB. EVPN is used in lieu of VPLS because it provides better control over BUM traffic (Broadcast, Unknown Unicast, and Multicast). It also supports the ability to forward traffic over multiple active paths and Multihoming. EVPN used over MPLS provides the benefits of traffic engineering and fast convergence.

In part I, I've created a small single-homed setup to show how this works.


The first step is to create the trunk port facing the Leaf switch. The leaf switch is a standard TOR switch with no special config.

set interfaces et-2/2/1 description TO-LEAF1
set interfaces et-2/2/1 flexible-vlan-tagging
set interfaces et-2/2/1 encapsulation flexible-ethernet-services
set interfaces et-2/2/1 unit 100 encapsulation vlan-bridge
set interfaces et-2/2/1 unit 100 vlan-id 100
 
I created a sub-interface and placed it into a routing instance.

set routing-instances evpn100 instance-type evpn
set routing-instances evpn100 vlan-id 100
set routing-instances evpn100 interface et-2/2/1.100
set routing-instances evpn100 route-distinguisher 4.4.4.4:100
set routing-instances evpn100 vrf-target target:65000:100
set routing-instances evpn100 protocols evpn interface et-2/2/1.100
set routing-instances evpn100 protocols evpn label-allocation per-instance


Instance configuration looks like a normal VPLS configuration except for the instance-type and evpn protocol parameters.

Next I configure BGP to exchange control plane info.

set protocols bgp group IBGP type internal
set protocols bgp group IBGP local-address 4.4.4.4
set protocols bgp group IBGP family inet unicast
set protocols bgp group IBGP family evpn signaling
set protocols bgp group IBGP neighbor 5.5.5.5

A new address family is used called evpn.
After that, the normal MPLS, your flavor of MPLS signaling and IGP protocol configuration is used as well as the Core MPLS facing interfaces.

set protocols mpls interface all
set protocols mpls interface fxp0.0 disable
set protocols mpls interface lo0.0
set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive
set protocols ldp interface all
set protocols ldp interface fxp0.0 disable
set protocols ldp interface lo0.0

set interfaces et-2/0/0 description TO-CORE1
set interfaces et-2/0/0 unit 0 family inet address 192.168.24.4/24
set interfaces et-2/0/0 unit 0 family iso
set interfaces et-2/0/0 unit 0 family mpls


Once configured, MP-BGP exchanges "control plane" information.

# run show bgp summary
Groups: 1 Peers: 1 Down peers: 0
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0              
                       0          0          0          0          0          0
bgp.evpn.0          
                       2          2          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
5.5.5.5               65000        137        136       0       0       57:42 Establ
  inet.0: 0/0/0/0
  bgp.evpn.0: 2/2/2/0
  evpn100.evpn.0: 2/2/2/0

  __default_evpn__.evpn.0: 0/0/0/0


# run show route receive-protocol bgp 5.5.5.5

inet.0: 24 destinations, 24 routes (24 active, 0 holddown, 0 hidden)

inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)

iso.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)

mpls.0: 15 destinations, 15 routes (15 active, 0 holddown, 0 hidden)

bgp.evpn.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
  Prefix          Nexthop           MED     Lclpref    AS path
  2:5.5.5.5:100::100::00:00:05:ed:ae:01/304                  
*                         5.5.5.5                      100        I
  3:5.5.5.5:100::100::5.5.5.5/304                  
*                         5.5.5.5                      100        I

evpn100.evpn.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
  Prefix          Nexthop           MED     Lclpref    AS path
  2:5.5.5.5:100::100::00:00:05:ed:ae:01/304                  
*                         5.5.5.5                      100        I
  3:5.5.5.5:100::100::5.5.5.5/304                  

You can also check the status of the EVPN and it's mac table


# run show evpn mac-table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : evpn100
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    __evpn100__         00:00:05:ed:ad:49   D             -   et-2/2/1.100        
    __evpn100__         00:00:05:ed:ae:01   DC            -   pip-13.010010000000    1048577   1048577


This shows you locally learned macs and macs learned over the WAN.

# run show evpn statistics   
Instance: evpn100
   Local interface: et-2/2/1.100, Index: 338
     Broadcast packets:                     1
     Broadcast bytes  :                    60
     Multicast packets:                     0
     Multicast bytes  :                     0
     Flooded packets  :                  4240
     Flooded bytes    :               6341604
     Unicast packets  :               3292539
     Unicast bytes    :            3528822524
     Current MAC count:                     1 (Limit 0)

In Part II I'll go more into configuring Gateway information to prevent the trombone effect.


Tuesday, March 4, 2014

Use (EVPN) Ethernet Virtual Private Network for Data Center Interconnections (DCI)

As Enterprises build Data Centers at different locations for disaster recovery and traffic distribution, there is a need to interconnect them transparently. Stretching Layer 2 across a WAN poses some challenges.

1) Workload Mobility aka VM migration from one DC to another.

2) Fast convergence in a multi homed environment.

3) Load balancing across multiple active paths between data centers.

The Trombone effect when migrating VMs across a WAN.



When VM1 is moved from one Hypervisor in DC1 to the other Hypervisor in DC2, the default GW for VM1 still resides on DC1. When VM1 sends traffic to VM2, the traffic will traverse the core before tromboning back to DC2.

EVPN solves this.  EVPN is a similar technology to VPLS except that mac addresses are learned and exchanged through the control plane using BGP as the transport protocol.  A new BGP family is introduced called EVPN.

bgp {
    group IBGP {
        local-address 1.1.1.1;        
        family evpn {
            signaling;  
        }
        neighbor 2.2.2.2;
    }
}

First an understanding of how EVPN works.

In a multi-tenant environment, each tenant will correspond to an EVPN instance (EVI). Route Distinguishers are used to distinguish between each EVI and Route Targets are used to share learned mac addresses between EVIs.

For mac learning, each PE router snoops for DHCP and/or ARP(IPv4)/ND(IPv6) packets for a particular EVI. The PE can then advertise the locally learned MAC address to remote PE nodes through MP-iBGP. MAC addresses are aggregated and a MAC prefix is advertised rather than advertising every single MAC address, thus allowing the ability to scale thousands of MAC addresses.  When a remote PE receives this bgp update it will extract the mac address and build a table with the next-hop pointing to the LSP of the advertising PE. Because this is BGP, policies can be created to filter and manipulate forwarding decisions.

When a local PE router sees an ARP request for an IP address and if the PE router has the MAC address binding for that IP address across the wan, the PE router performs a proxy ARP and responds to the ARP Request and can make the forwarding decision locally.  This reduces (BUM) flooding (Broadcast, Unknown Unicast and Multicast) across WAN links.

Gateway IP and MAC addresses syncing in EVPN allows the host to use the nearest gateway to route traffic. You do this by creating IRBs on both PEs using different GW IP addresses. To accomplish this IRBs (IP  + MAC addresses) are advertised using a BGP extended community. When VM1 migrates to DC2, it sends packets to the mac address associated to GW IP address of DC1. The IRB in DC2 notices that the destination mac address for these packets is across the WAN, so it does the routing locally. When the arp entry for the GW in VM1 expires, the VM will arp again and the IRB in DC2 will send a reply to VM1 with it's updated mac address.

Another thing that happens when VM migration is performed in an EVPN network, the MAC address of the VM is now advertised in DC2, the PE in DC2 updates their mac table table while the PE in DC1 withdraws the entry.

To address fast convergence in a multi homed environment, a concept called an Ethernet Segment is introduced. The set of links connecting to two or more local PE routers are called an Ethernet Segment. Each segment has an unique identifier called an ESI. An ethernet tag is also used to identify each broadcast domain such a vlan. When an Ethernet segment fails, the local PE withdraws the corresponding Ethernet "route" from BGP which triggers all remote PE routers to update their forwarding tables to update the corresponding next-hop to the backup PE.

EVPN introduces Split Horizon. BUM flooding aka, Broadcast, Unknown unicast or Multicast traffic are encapsulated in a MPLS packet with the Ethernet Segment Identifier. This allows the Egress PE to make a forwarding decision and prevents loops, because the PEs know where the packet originated from.
This in turn makes it possible to forward traffic over multiple active links through the WAN and allows for the ability to load balance.

With these advantages EVPN makes it a viable choice for interconnecting Data Centers.