Monday, November 24, 2014

Vmotion over a VXLAN overlay.

In an old Post I showed how to create a simple VXLAN tunnel to stretch Layer 2 over an Layer 3 network. Now I'll show how VMWare VMotion works over this same type of setup. Note: I researched this and it warned that VXLAN was not a solution for doing VMotion over VXLAN. The latency could be high over a DCI so VMotion may not always work. This is just a proof of concept.


From what I've read, vmotion over vxlan on the hypervisor is not supported in ESXi 5.1 or earlier. I'm not sure about ESXi 5.5. In any case it doesn't matter what version of ESXi you have, because in this scenario the VXLAN tunnel starts and stops at the SPINE layer. The Hypervisor is NOT using VXLAN at all and is communicating to the underlay through a normal vlan tag. Even the leaf switch is communicating to the spine over a vlan tag. 

In this setup I have a NetApp SAN that will serve as the NFS datastore for the VMs. The SAN network is using VLAN 10, while the VMs Data path is using VLAN 100. 

On VMWare I created a NFS Datastore. The IP address of SAN NetApp is 100.10.1.254.


Each Host Serveris is seperated by a L3 network as in the above topology.

Host 1 has an VMKernal IP of 100.10.1.4 and Host 2 has an IP of 100.10.1.5 for communicating with NetApp.

 

Vmotion is enabled on both dvSwitches.


I created a distributed switch for the NFS network and added a 1G NIC as an uplink. This was done on both Hosts.
 
A second dvSwitch was created for the VMs Data path. This is using a 10G NIC for the uplink.
On NetApp


Vol1 was created which allows Read/Write access to the VMware hosts.

NetApps Network Interface is configured with the IP 100.10.1.254 and uses a vlan tag of 10.


On Juniper Leaf switch.

vlan 10 and vlan 100 are created

jnpr@QFX5100-48S-5# show vlans
v10 {
    vlan-id 10;
}
v100 {
    vlan-id 100;
}

Interfaces are created and vlans added.

set interfaces xe-0/0/2 description TO-VMW-145-vmnic9
set interfaces xe-0/0/2 unit 0 family ethernet-switching interface-mode trunk
set interfaces xe-0/0/2 unit 0 family ethernet-switching vlan members v100
set interfaces ge-0/0/47 description TO-VMW-145-vmnic3-for-NETAPP
set interfaces ge-0/0/47 unit 0 family ethernet-switching interface-mode trunk
set interfaces ge-0/0/47 unit 0 family ethernet-switching vlan members v10
set interfaces et-0/0/50 description TO-SPINE1
set interfaces et-0/0/50 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-0/0/50 unit 0 family ethernet-switching vlan members all

The 1GE link is used for the NFS and the 10G is used for Data between the Leaf and Host1. The uplink to Spine1 is over a 10G link and has vlan member all.

On the Juniper Spine1:

Again two vlans are created, but here is where the mapping of vlan to VXLAN tunnel is created.


jnpr@EX9200-1# show vlans
v10 {
    vlan-id 10;
    l3-interface irb.10;
    vxlan {
        vni 10;
        multicast-group 239.1.1.10;
        encapsulate-inner-vlan;
        decapsulate-accept-inner-vlan;
    }
}
v100 {
    vlan-id 100;
    l3-interface irb.0;
    vxlan {
        vni 100;
        multicast-group 239.1.1.100;
        encapsulate-inner-vlan;
        decapsulate-accept-inner-vlan;
    }
}

Core interface is L3.
set interfaces et-2/0/0 description TO-CORE1
set interfaces et-2/0/0 unit 0 family inet address 192.168.24.4/24
set interfaces et-2/0/0 unit 0 family iso

Link towards Leaf1 is a trunk with the 2 vlans.
set interfaces et-2/2/1 description TO-QFX5100-48S-5
set interfaces et-2/2/1 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-2/2/1 unit 0 family ethernet-switching vlan members v100
set interfaces et-2/2/1 unit 0 family ethernet-switching vlan members v10

The NetApp sits only off of Spine 1 and traffic is switched there from Leaf1.

set interfaces ge-7/0/0 description TO-NETAPP
set interfaces ge-7/0/0 unit 0 family ethernet-switching interface-mode trunk
set interfaces ge-7/0/0 unit 0 family ethernet-switching vlan members v10

IRBs are created for the two vlans.  Vlan 100's IRB is the default gateway for the Data Path of the VMs. IRB for vlan 10 is used just to make sure we can ping the NetApp and the VMware VMKernal IP.

set interfaces irb unit 0 family inet address 100.1.1.1/24
set interfaces irb unit 10 family inet address 100.10.1.200/24

PIM and IGP routing protocols are created for connectivity.

set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive
set protocols pim rp static address 192.168.0.1
set protocols pim interface lo0.0 mode bidirectional-sparse
set protocols pim interface et-2/0/0.0 mode bidirectional-sparse
set protocols lldp interface all
ON the Remote Spine2 switch the configuration is almost the same. The only difference is that there is no direct NetApp connection, so NFS needs to be tunneled through VXLAN so the VMKernal on Host 2 can access the storage.

jnpr@EX9200-2# show vlans       
v10 {
    vlan-id 10;
    l3-interface irb.10;
    vxlan {
        vni 10;
        multicast-group 239.1.1.10;
        encapsulate-inner-vlan;
        decapsulate-accept-inner-vlan;
    }
}
v100 {
    vlan-id 100;
    l3-interface irb.0;
    vxlan {
        vni 100;
        multicast-group 239.1.1.100;
        encapsulate-inner-vlan;
        decapsulate-accept-inner-vlan;
    }
}


jnpr@EX9200-2# show protocols | display set
set protocols isis reference-bandwidth 40g
set protocols isis interface et-2/0/0.0
set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive
set protocols pim rp static address 192.168.0.1
set protocols pim interface lo0.0 mode bidirectional-sparse
set protocols pim interface et-2/0/0.0 mode bidirectional-sparse
set protocols lldp interface all

LLDP is turned on the Juniper switches and dvSwitches so we can monitor the underlay.

On Leaf1 we can see both dvSwitches

jnpr@QFX5100-48S-5# run show lldp neighbors
Local Interface    Parent Interface    Chassis Id          Port info          System Name
xe-0/0/2           -                   00:05:33:48:70:b9   port1
xe-0/0/2           -                   00:50:56:b9:0b:b3   eth1               South-VM           
et-0/0/50          -                   4c:96:14:6b:bb:c0   TO-QFX5100-48S-5   EX9200-1           
ge-0/0/47          -                                       port 2 on dvSwitch dvSwitch-NFS-v10 (etherswitch) localhost.jnpr.net 
xe-0/0/2           -                                       port 2 on dvSwitch dvSwitch-v100 (etherswitch) localhost.jnpr.net

And a Host running lldpd. We can also see the uplink to Spine1.


jnpr@QFX5100-48S-5# run show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 5 entries, 5 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical
    name                address             flags              interface
    v10                 00:50:56:6f:2c:31   D             -   ge-0/0/47.0         
    v10                 02:a0:98:2c:06:cc   D             -   et-0/0/50.0  

The things to note here are the two macs. The NetApp uses 02:a0:98:2c:06:cc which is being learned from the uplink to the Spine. And 00:50:56:6f:2c:31 is the NIC of Host 1.

On Spine 1 

jnpr@EX9200-1# run show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 5 entries, 5 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v10                 00:50:56:69:37:db   D             -   vtep.32769          
    v10                 00:50:56:6f:2c:31   D             -   et-2/2/1.0          
    v10                 02:a0:98:2c:06:cc   D             -   ge-7/0/0.0          
    v10                 4c:96:14:f2:b6:e0   D             -   vtep.32769          
    v10                 a8:d0:e5:f7:bf:f0   D             -   vtep.32769

We can see that macs are being learned over the VTEP or VXLAN tunnel.

Mac  00:50:56:69:37:db is the NIC of Host2.

Spine 2 we can see NETAPP is being learned through the VXLAN tunnel.

# run show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 3 entries, 3 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v10                 00:50:56:69:37:db   D             -   et-2/2/1.0          
    v10                 02:a0:98:2c:06:cc   D             -   vtep.32769           <<< NETAPP
    v10                 4c:96:14:6b:bb:f0   D             -   vtep.32769 



Leaf 2

jnpr@QFX5100-48S-6> show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 4 entries, 4 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical
    name                address             flags              interface
    v10                 00:50:56:69:37:db   D             -   ge-0/0/47.0        
    v10                 02:a0:98:2c:06:cc   D             -   et-0/0/50.0        
    v10                 4c:96:14:6b:bb:f0   D             -   et-0/0/50.0        
    v10                 a8:d0:e5:f7:bf:f0   D             -   et-0/0/50.0

Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical
    name                address             flags              interface
    v100                00:50:56:b9:55:58   D             -   xe-0/0/2.0          
    v100                4c:96:14:6b:bb:f0   D             -   et-0/0/50.0    

Now we are ready to do vMotion.

You can see from Ubuntu the mac address is on eth1 which is going to vlan 100 on Leaf2


jnpr@vmotion-ubuntu:~$ ifconfig

eth1      Link encap:Ethernet  HWaddr 00:50:56:b9:55:58 
          inet addr:100.1.1.30  Bcast:100.1.1.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:feb9:5558/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8633 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7023 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1200614 (1.2 MB)  TX bytes:905670 (905.6 KB)

On Spine2 it's a local mac.
jnpr@EX9200-2# run show ethernet-switching table vlan-id 100
Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v100                00:50:56:b9:55:58   D             -   et-2/2/1.0          



On VMWare I choose the Vmotion ubuntu VM.




And Select Migrate and chooose Change Host and select Host1 as the Destination. Select the Priority of your choice and click Finish.



Migration completed!

On Leaf1 I can see the mac move.
jnpr@QFX5100-48S-5# run show ethernet-switching table vlan-id 100

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical
    name                address             flags              interface
    v100                00:50:56:b9:55:58   D             -   xe-0/0/2.0           
    v100                4c:96:14:6b:bb:f0   D             -   et-0/0/50.0

On Spine 1 it is local.

jnpr@EX9200-1# run show ethernet-switching table vlan-id 100

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 1 entries, 1 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v100                00:50:56:b9:55:58   D             -   et-2/2/1.0          

And on Spine 2 it now sits across the VTEP.


jnpr@EX9200-2# run show ethernet-switching table vlan-id 100

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v100                00:50:56:b9:55:58   D             -   vtep.32769          
    v100                4c:96:14:6b:bb:f0   D             -   vtep.32769          

One caveat I found was that vmotion sends large frames. IP mcast + VXLAN adds overhead. So when I setup my network with the default configuration I could not initiate vmotion. I had to change the MTU size on Spine to 9000 and then it worked.

So the conclusion is, you don't need VXLAN support on the Hypervisor if you want to do vMotion over a Layer 2 stretch technology. You can run VXLAN Tunnels between two Juniper Ex9200 Spine switches and map vlans to these tunnels.  The Overlay is unaware of what is happening in the Underlay.


No comments:

Post a Comment