Showing posts with label Data Center. Show all posts
Showing posts with label Data Center. Show all posts

Thursday, October 2, 2014

Is it the end of the TOR as we know it?

There are two driving forces that may break the TOR (Top of Rack) switch market.

The first one is the spiraling fall of switch prices. Competition among white box switch makers Cumulus and Pica8, incumbents such as Juniper and Cisco and new startups Arista are pushing margins ever so slim. Pretty much all the vendors sell the same exact TOR switch. Currently it’s a 40x10GE port switch with 4x40G uplinks with a Broadcom chip set. The only difference really is the Operating System. All this competition basically means a race to the bottom. I see a few outcomes of this market. 

If you’re an enterprise like Facebook or Google, you have the technical savvy and economics of scale, to OEM just the physical hardware components and put on it a stripped down version of linux that does specifically what you need. Facebook already has a blue print for one called the Wedge. This means, none of the vendors can sell to these customers.

Next you have a customer that is DevOps savvy, but doesn’t want to build their own switches. These can be startups or large customers that have a good engineering team. They probably buy a TOR switch such as Pica8 or Cumulus and automate on top of these boxes. They don’t need much support.

Last you have a customer who just needs support so they just buy a brand named switch. These can be large Enterprises that are not technical enough to program their switches. These are the Coca-Colas and traditional companies where technology is not in their wheel house. I mean does Coca-Cola need SDN? 

The second driving force is on the physical hardware side. Intel has announced the roadmap of the Rack Scale Architecture (RSA). They want to dominate the cloud computing infrastructure by providing mega scale data centers with a highly scalable solution. To understand what this means we have to see the trends of the current data center.




Currently in a typical rack you have several servers. Each server is usually kept in a server tray or server blade. If you break it down, a server has the following components. A CPU, Memory, Storage (HDD or SSD) and a Network Card. You might have some other peripherals, but you don’t really need them. All of these components interconnect via a PCIe fiber connection. Now this PCIe fiber connection is the key component.






If you disaggregate servers, you can create pools of resources. You can have a pool or tray of CPUs, a pool of Memory and a pool of storage. When you need to upgrade, you pull out the tray and  swap in a new tray of resources. To connect all these together, Intel created silicon photonics PCIe (SiPh). These have speeds between 40-50Gbps in a single optical fiber. If you think about it, a majority of traffic runs east and west in a datacenter. With SiPh you can send traffic, intra-Rack at 50Gbs. 


You don’t need a TOR switch because you can do this using a virtual switch and SiPh. You can bypass ethernet connections by connecting directly to another logical server over the PCIe connection. Why waste time going out an ethernet port up to a ToR switch and go back down to another NIC to get to a server when you can directly. You can shave even more nano seconds of networking speed.

Now you can say that traffic going between logical servers could eat up CPU, therefore you need a ToR switch to speed this up.  If you need hardware acceleration, then Intel could just place a Fulcrum Switch Chip in the rack as a resource. North/South traffic or Inter-Rack traffic would egress the RSA via uplink ethernet connections. Again with RSA, you could create a separate Ethernet Resource pool of NIC for interconnects. Some people may wonder why? I mean have separate servers does the job just nicely. Also with all the resources pooled together, you really need  DevOps team that can program the Rack. Well if I was Intel I'd target the Financial Industry. They would pay for the shaving of speeds that RSA could save. They have an Army of DevOps who can do this.

Saturday, August 30, 2014

Data Center and Lab best practices

I saw my buddies Mark Honer of VM Ware and Dan Luderville from Gigamon speak at a Mirapath Sponsored Data Center Event. Now while I'm not much of a Data Center Infrastructure technologist specializing on the physical Underlay (Layer 1),  I do like to sit in on events like these to see where the trend is going and what problems these guys are seeing.  Mark is a very funny guy and likes to add humor in his presentations. One of the topics of interest is the Data Center Best Practices. We do these in our lab, so while this may seem common sense, you'd be surprised at how many other labs and data centers I've seen that doesn't.


I'll illustrate a few of them.

1) Use copper and fiber cables with unique serial number and lengths at both ends for tracing connections. Now you don't need serial numbers for inventory purposes. Most cables don't come like this, but Mirapath can do this as part of your order. Saves a lot of time especially when the distance is more than 1 meter. The cool thing is that Mirapath keeps a record of your serial number so when you order more, you don't have to remember where you last left off.


2. Use blanks in spaces were equipment are not installed. First it keeps the airflow going and two it prevents people from using your equipment as a shelf.

3. Using colored cables.

You should standardize around a particular color for each kind of connection.

We use white for our management connections and red for our console connections. Whenever we see some other color connected to the switch, we know someone has thrown in a rouge device and can easily track down who did it.
One of these kids don't look like the other

4. Buy special colored power cables for critical devices. If all your power cables are the same, which usually come in black, someone might pull your critical device from the network because they might not know which cable goes to which device. Now if you have some that are red, then they know not to unplug that device.

We call these the 'your fired' cables. We've had no incidents of accidental pulling of cables when we had these installed.
5. Invest in a nice labeler. You don't know how much time you can save if you just label things correctly. I used to spend many hours having to physically go back and forth between my desk and the rack just to figure out which hardware device I'm trying to find in the lab. If I had a labeler at that time to mark the devices with names, ips and basic information, I wouldn't have wasted my time.

These techniques are good for the small and young companies who are growing. The right best practices can prevent your company into turning into one of these.





Monday, August 4, 2014

VXLAN for Layer 2 stretch over L3 network

I showed how EVPN with MPLS is used to stretch Layer 2 across Data Centers. Now I'll show how to stretch Layer 2 using VXLAN as the tunneling protocol. I'm not going to setup EVPN with VXLAN as that is a different technical method. This is a simple point to point VTEP setup to show how it works.

With Juniper EX9200s you can map vlans into a specific VXLAN tunnel or VTEP (Vxlan Tunnel End Point) The original L2 frame gets encapsulated into a VXLAN header. The outer header is an IP frame.


This allows it to cross a L3 network while retaining the original L2 frame. Communication to setup the tunnel is done through multicast.





Note: VXLAN tunnels can originate from the Hypervisor itself using vShield in VMware. This is another method if you want to have the tunnels originate in the underlay.

The EXs will need to be configured for PIM and an RP will be needed to build the Multicast tree. Multicast is used to interconnect the different VTEPs. It's used to optimize network traffic. Only End points listening for the multicast traffic will be forwarded frames. Other devices in the network will not receive this traffic.

 Each VTEP will need to have two things.

1) A VXLAN Network Identifier (aka VNI) which is like a dlci in Frame relay or vc-id in Point to Point Psuedo-wires in MPLS. 

2) An IP multicast address

When a L2 packet hits the switch, it will be encapsulated into an ip mulicast address + a vxlan header. This packet will then go to the RP to do the replication  to all the "receivers". In our case we only have 2 end points so the RP will only see 2 receivers.

Here's config snippets of how this is built.

EX1

First build the interface connecting to the LEAF switch.

set interfaces et-2/2/1 description TO-LEAF1
set interfaces et-2/2/1 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-2/2/1 unit 0 family ethernet-switching vlan members v100

Add the core facing interface

set interfaces et-2/0/0 description TO-CORE1
set interfaces et-2/0/0 unit 0 family inet address 192.168.24.4/24
set interfaces et-2/0/0 unit 0 family iso

setup the EX to use a tunneling resource

set chassis fpc 9 pic 1 tunnel-services

Add your IGP flavor of choice and it's related config to exchange L3 information through the network.

set interfaces lo0 unit 0 family inet address 4.4.4.4/32
set interfaces lo0 unit 0 family iso address 49.0001.0040.0400.4004.00
set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive

Configure PIM and point it to the RP

set protocols pim rp static address 192.168.0.1
set protocols pim interface lo0.0 mode bidirectional-sparse
set protocols pim interface et-2/0/0.0 mode bidirectional-sparse

Then map your VLAN into a VTEP

set vlans v100 vlan-id 100
set vlans v100 l3-interface irb.0
set vlans v100 vxlan vni 1
set vlans v100 vxlan multicast-group 239.1.1.1
set vlans v100 vxlan encapsulate-inner-vlan
set vlans v100 vxlan decapsulate-accept-inner-vlan

Then setup the switch to use it's ip address as the source of the tunnel

set switch-options vtep-source-interface lo0.0

On the RP you would only need to setup your IPs addresses and PIM configuration

set chassis fpc 1 pic 2 tunnel-services
set interfaces et-2/0/0 description TO-EX2
set interfaces et-2/0/0 unit 0 family inet address 192.168.35.3/24
set interfaces et-2/0/0 unit 0 family iso
set interfaces et-3/2/0 description TO-CORE1
set interfaces et-3/2/0 unit 0 family inet address 192.168.23.3/24
set interfaces et-3/2/0 unit 0 family iso
set interfaces et-3/2/1 description TO-CORE1
set interfaces et-3/2/1 unit 0 family inet address 192.168.123.3/24
set interfaces et-3/2/1 unit 0 family iso
set interfaces lo0 unit 0 family inet address 3.3.3.3/32 primary
set interfaces lo0 unit 0 family inet address 192.168.0.1/32
set interfaces lo0 unit 0 family iso address 49.0001.0030.0300.3003.00
set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive
set protocols pim rp local family inet address 192.168.0.1
set protocols pim interface all mode bidirectional-sparse
set protocols pim interface fxp0.0 disable


You would create a similar VTEP on the remote EX

set chassis fpc 3 pic 0 tunnel-services
set interfaces et-2/0/0 description TO-CORE2
set interfaces et-2/0/0 unit 0 family inet address 192.168.35.5/24
set interfaces et-2/0/0 unit 0 family iso
set interfaces et-2/2/1 unit 0 family ethernet-switching interface-mode trunk
set interfaces et-2/2/1 unit 0 family ethernet-switching vlan members v100
set interfaces irb unit 0 family inet address 100.1.1.2/24
set interfaces lo0 unit 0 family inet address 5.5.5.5/32
set interfaces lo0 unit 0 family iso address 49.0001.0050.0500.5005.00
set protocols isis reference-bandwidth 40g
set protocols isis interface et-2/0/0.0
set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive
set protocols pim rp static address 192.168.0.1
set protocols pim interface lo0.0 mode bidirectional-sparse
set protocols pim interface et-2/0/0.0 mode bidirectional-sparse
set protocols lldp interface all
set switch-options vtep-source-interface lo0.0
set vlans v100 vlan-id 100
set vlans v100 l3-interface irb.0
set vlans v100 vxlan vni 1
set vlans v100 vxlan multicast-group 239.1.1.1
set vlans v100 vxlan encapsulate-inner-vlan
set vlans v100 vxlan decapsulate-accept-inner-vlan

-----------------------

Once that is done you can check the connectivity

EX1

Check the PIM state and see if the Mcast Join was sent to the RP and if the multicast route is seen by the pim neighbor


user@EX1# run show pim join detail
Instance: PIM.master Family: INET
R = Rendezvous Point Tree, S = Sparse, W = Wildcard

Group: 239.1.1.1
    Source: *
    RP: 192.168.0.1
    Flags: sparse,rptree,wildcard
    Upstream interface: et-2/0/0.0           
    Downstream neighbors:
        Interface: Pseudo-VXLAN          

Group: 239.1.1.1
    Source: 4.4.4.4
    Flags: sparse,spt
    Upstream interface: Local                
    Downstream neighbors:
        Interface: pe-9/1/0.32770        
        Interface: Pseudo-VXLAN  
        

Group: 239.1.1.1
    Source: 5.5.5.5
    Flags: sparse,spt
    Upstream interface: et-2/0/0.0           
    Downstream neighbors:
        Interface: Pseudo-VXLAN      

Once traffic is flowing from the LEAF switches,  you can then check the vtep to see traffic statistics

user@EX1# run show vlans   

Routing instance        VLAN name             Tag          Interfaces
default-switch          v100                  100     
                                                                                  et-2/2/1.0*
                                                                                  vtep.32768*
  
user@EX1# run show interfaces vtep.32768 detail
  Logical interface vtep.32768 (Index 324) (SNMP ifIndex 604) (Generation 239)
    Flags: Up SNMP-Traps Encapsulation: ENET2
    VXLAN Endpoint Type: Remote, VXLAN Endpoint Address: 5.5.5.5, L2 Routing Instance: default-switch, L3 Routing Instance: default
    Traffic statistics:
     Input  bytes  :            508486320
     Output bytes  :            509589960
     Input  packets:               498516
     Output packets:               499598
    Local statistics:
     Input  bytes  :                    0
     Output bytes  :                    0
     Input  packets:                    0
     Output packets:                    0
    Transit statistics:
     Input  bytes  :            508486320              8158304 bps
     Output bytes  :            509589960              8158280 bps
     Input  packets:               498516                  999 pps
     Output packets:               499598                  999 pps
    Protocol eth-switch, MTU: 1600, Generation: 331, Route table: 6
      Flags: Trunk-Mode

The EX is acting as a switch so you can see the mac table and find out where the macs are learned


user@EX1# run show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    v100                00:00:05:ed:ad:49   D             -   et-2/2/1.0           <<<< local
    v100                00:00:05:ed:ae:01   D             -   vtep.32768         <<< over the vxlan tunnel 

Here are some useful vxlan commands to check when the number of vteps grow larger

user@EX1# run show ethernet-switching vxlan-tunnel-end-point source   
Logical System Name       Id  SVTEP-IP         IFL   L3-Idx
<default>                 0   4.4.4.4          lo0.0    0 
    L2-RTT                   Bridge Domain              VNID     MC-Group-IP
    default-switch           v100+100                   1        239.1.1.1     

user@EX1# run show ethernet-switching vxlan-tunnel-end-point remote   
Logical System Name       Id  SVTEP-IP         IFL   L3-Idx
<default>                 0   4.4.4.4          lo0.0    0 
 RVTEP-IP         IFL-Idx   NH-Id
 5.5.5.5          324       597     
    VNID          MC-Group-IP     
    1             239.1.1.1      

user@EX1# run show ethernet-switching vxlan-tunnel-end-point remote mac-table

MAC flags (S -static MAC, D -dynamic MAC, L -locally learned, C -Control MAC
           SE -Statistics enabled, NM -Non configured MAC, R -Remote PE MAC)

Logical system   : <default>
Routing instance : default-switch
 Bridging domain : v100+100, VLAN : 100, VNID : 1
   MAC                 MAC      Logical          Remote VTEP
   address             flags    interface        IP address
   00:00:05:ed:ae:01   D        vtep.32768       5.5.5.5     

On the RP, you'll need to make sure Multicast is working.

CORE2
user@CORE2# run show pim join detail
Instance: PIM.master Family: INET
R = Rendezvous Point Tree, S = Sparse, W = Wildcard

Group: 239.1.1.1
    Source: *
    RP: 192.168.0.1
    Flags: sparse,rptree,wildcard
    Upstream interface: Local                
    Downstream neighbors:
        Interface: et-2/0/0.0            
        Interface: et-3/2/0.0            

Group: 239.1.1.1
    Source: 4.4.4.4
    Flags: sparse
    Upstream interface: et-3/2/1.0           
    Downstream neighbors:
        Interface: et-3/2/0.0 (pruned)

Group: 239.1.1.1
    Source: 5.5.5.5
    Flags: sparse,spt
    Upstream interface: et-2/0/0.0           
    Downstream neighbors:              
        Interface: et-3/2/0.0            


Now the question you might ask is why use EVPN + MPLS as there are a lot of configuration invovled. Well that's like comparing apples to oranges. VXLAN and MPLS should be compared a little more as they are both transport mechanisms. MPLS is far more superior to VXLAN as MPLS can do traffic engineering, bandwidth reservation and ~ 50 ms convergence on failure, while VXLAN relies on the underlying IGP for much of the decision making.

Thursday, July 31, 2014

EVPN for Layer 2 stretch between Data Centers Pt.1

EVPN (Enhanced VPN or Ethernet VPN) is a great technology for stretching Layer 2 between Data Centers (aka Data Center Interconnect or DCI). It uses MP-BGP for control plane exchange of Tenant information and mac-addresses. Data Plane traffic is tunneled inside a tunneling protocol such as MPLS, VXLAN or PBB. EVPN is used in lieu of VPLS because it provides better control over BUM traffic (Broadcast, Unknown Unicast, and Multicast). It also supports the ability to forward traffic over multiple active paths and Multihoming. EVPN used over MPLS provides the benefits of traffic engineering and fast convergence.

In part I, I've created a small single-homed setup to show how this works.


The first step is to create the trunk port facing the Leaf switch. The leaf switch is a standard TOR switch with no special config.

set interfaces et-2/2/1 description TO-LEAF1
set interfaces et-2/2/1 flexible-vlan-tagging
set interfaces et-2/2/1 encapsulation flexible-ethernet-services
set interfaces et-2/2/1 unit 100 encapsulation vlan-bridge
set interfaces et-2/2/1 unit 100 vlan-id 100
 
I created a sub-interface and placed it into a routing instance.

set routing-instances evpn100 instance-type evpn
set routing-instances evpn100 vlan-id 100
set routing-instances evpn100 interface et-2/2/1.100
set routing-instances evpn100 route-distinguisher 4.4.4.4:100
set routing-instances evpn100 vrf-target target:65000:100
set routing-instances evpn100 protocols evpn interface et-2/2/1.100
set routing-instances evpn100 protocols evpn label-allocation per-instance


Instance configuration looks like a normal VPLS configuration except for the instance-type and evpn protocol parameters.

Next I configure BGP to exchange control plane info.

set protocols bgp group IBGP type internal
set protocols bgp group IBGP local-address 4.4.4.4
set protocols bgp group IBGP family inet unicast
set protocols bgp group IBGP family evpn signaling
set protocols bgp group IBGP neighbor 5.5.5.5

A new address family is used called evpn.
After that, the normal MPLS, your flavor of MPLS signaling and IGP protocol configuration is used as well as the Core MPLS facing interfaces.

set protocols mpls interface all
set protocols mpls interface fxp0.0 disable
set protocols mpls interface lo0.0
set protocols isis interface all
set protocols isis interface fxp0.0 disable
set protocols isis interface lo0.0 passive
set protocols ldp interface all
set protocols ldp interface fxp0.0 disable
set protocols ldp interface lo0.0

set interfaces et-2/0/0 description TO-CORE1
set interfaces et-2/0/0 unit 0 family inet address 192.168.24.4/24
set interfaces et-2/0/0 unit 0 family iso
set interfaces et-2/0/0 unit 0 family mpls


Once configured, MP-BGP exchanges "control plane" information.

# run show bgp summary
Groups: 1 Peers: 1 Down peers: 0
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0              
                       0          0          0          0          0          0
bgp.evpn.0          
                       2          2          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
5.5.5.5               65000        137        136       0       0       57:42 Establ
  inet.0: 0/0/0/0
  bgp.evpn.0: 2/2/2/0
  evpn100.evpn.0: 2/2/2/0

  __default_evpn__.evpn.0: 0/0/0/0


# run show route receive-protocol bgp 5.5.5.5

inet.0: 24 destinations, 24 routes (24 active, 0 holddown, 0 hidden)

inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)

iso.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)

mpls.0: 15 destinations, 15 routes (15 active, 0 holddown, 0 hidden)

bgp.evpn.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
  Prefix          Nexthop           MED     Lclpref    AS path
  2:5.5.5.5:100::100::00:00:05:ed:ae:01/304                  
*                         5.5.5.5                      100        I
  3:5.5.5.5:100::100::5.5.5.5/304                  
*                         5.5.5.5                      100        I

evpn100.evpn.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
  Prefix          Nexthop           MED     Lclpref    AS path
  2:5.5.5.5:100::100::00:00:05:ed:ae:01/304                  
*                         5.5.5.5                      100        I
  3:5.5.5.5:100::100::5.5.5.5/304                  

You can also check the status of the EVPN and it's mac table


# run show evpn mac-table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : evpn100
    Vlan                MAC                 MAC         Age    Logical                NH        RTR
    name                address             flags              interface              Index     ID
    __evpn100__         00:00:05:ed:ad:49   D             -   et-2/2/1.100        
    __evpn100__         00:00:05:ed:ae:01   DC            -   pip-13.010010000000    1048577   1048577


This shows you locally learned macs and macs learned over the WAN.

# run show evpn statistics   
Instance: evpn100
   Local interface: et-2/2/1.100, Index: 338
     Broadcast packets:                     1
     Broadcast bytes  :                    60
     Multicast packets:                     0
     Multicast bytes  :                     0
     Flooded packets  :                  4240
     Flooded bytes    :               6341604
     Unicast packets  :               3292539
     Unicast bytes    :            3528822524
     Current MAC count:                     1 (Limit 0)

In Part II I'll go more into configuring Gateway information to prevent the trombone effect.


Sunday, July 27, 2014

Juniper MC-LAG configuration and behavior

A customer had an unusual requirement. Their spine switches didn't have any out of band management connectivity and they were not yet going to run any IP protocols so we couldn't use a loopback and reditribute that into an IGP. Their spine switches were also utilizing MC-LAG. The problem was how to access the switches to manage them. We basically setup in-band ip addresses on the MC-LAG. These addresses would have to be reachable through static routes. The problem we encountered was, in an MC-LAG which member would be received as the owner of the ip address. There is an option called status-control which does this. I ran a test and found out that it seems to work opposite of what our expected behavior.

First EX1's status-control is set to standby and EX2 is active.

jnpr@EX1# set interfaces ae0 aggregated-ether-options mc-ae status-control standby

jnpr@EX2# set interfaces ae0 aggregated-ether-options mc-ae status-control active

I put IRBs on both the MC-LAG Spine switches and the QFX leaf on vlan-id 100. 100.1.1.1 is the MC-LAG's ip and 100.1.1.100 is the QFX just for this test.  From the QFX I try to access the Spine.

jnpr@QFX5100-LEAF# run show arp no-resolve  
MAC Address       Address         Interface     Flags
00:00:5e:00:01:02 10.161.39.254   vme.0                none
4c:96:14:6b:bb:f0 100.1.1.1       ae0.0                none
4c:96:14:f2:b6:e3 192.168.1.1     em2.32768            none
Total entries: 4

{master:0}[edit]
jnpr@QFX5100-LEAF# run telnet 100.1.1.1
Trying 100.1.1.1...
Connected to 100.1.1.1 
Escape character is '^]'.

EX1 (ttyp1)

login: ^C
Client aborted login
Connection closed by foreign host.

I'm in EX1?!


Then I change the status control

jnpr@EX1# set interfaces ae0 aggregated-ether-options mc-ae status-control active 

jnpr@EX2# set interfaces ae0 aggregated-ether-options mc-ae status-control standby 

{master:0}[edit]
jnpr@QFX5100-LEAF# run show arp no-resolve  
MAC Address       Address         Interface     Flags
00:00:5e:00:01:02 10.161.39.254   vme.0                none
a8:d0:e5:f7:bf:f0 100.1.1.1       ae0.0                none
4c:96:14:f2:b6:e3 192.168.1.1     em2.32768            none
Total entries: 6

{master:0}[edit]
jnpr@QFX5100-LEAF# run telnet 100.1.1.1      
Trying 100.1.1.1...
Connected to 100.1.1.1
Escape character is '^]'.

EX2 (ttyp1)

login: ^C
Client aborted login

Weird. Not sure why this behavior seems backwards.

So the next issue is how do you access the other MC-LAG member? There are two ways. You can either access it via the ip address that is using the iccp connection. Or if you have the resources you can basically have two MC-LAGs per Spine switch and make one of them standby on one IRB and the other standby on a different IRB say 101 so both Chassises are IP reachable for management.

MC-LAG configuration example


EX1
set chassis redundancy graceful-switchover
set chassis aggregated-devices ethernet device-count 2
set interfaces et-2/0/1 description TO-LEAF
set interfaces et-2/0/1 ether-options 802.3ad ae0
set interfaces et-2/2/1 description TO-LEAF
set interfaces et-2/2/1 ether-options 802.3ad ae0
set interfaces xe-3/1/0 description ICCP
set interfaces xe-3/1/0 unit 0 family inet address 200.1.1.1/30
set interfaces ae0 aggregated-ether-options lacp active
set interfaces ae0 aggregated-ether-options lacp system-priority 100
set interfaces ae0 aggregated-ether-options lacp system-id 00:00:00:00:00:05
set interfaces ae0 aggregated-ether-options lacp admin-key 1
set interfaces ae0 aggregated-ether-options mc-ae mc-ae-id 1
set interfaces ae0 aggregated-ether-options mc-ae redundancy-group 1
set interfaces ae0 aggregated-ether-options mc-ae chassis-id 0
set interfaces ae0 aggregated-ether-options mc-ae mode active-active
set interfaces ae0 aggregated-ether-options mc-ae status-control standby
set interfaces ae0 unit 0 multi-chassis-protection 200.1.1.2 interface xe-9/1/1.0
set interfaces ae0 unit 0 family ethernet-switching interface-mode trunk
set interfaces ae0 unit 0 family ethernet-switching vlan members all
set interfaces irb unit 100 family inet address 100.1.1.1/24
set vlans v100 vlan-id 100
set vlans v100 l3-interface irb.100


EX2
set chassis redundancy graceful-switchover
set chassis aggregated-devices ethernet device-count 2
set interfaces et-2/0/1 description TO-LEAF
set interfaces et-2/0/1 ether-options 802.3ad ae0
set interfaces et-2/2/1 description TO-LEAF
set interfaces et-2/2/1 ether-options 802.3ad ae0
set interfaces xe-3/1/0 description ICCP
set interfaces xe-3/1/0 unit 0 family inet address 200.1.1.2/30
set interfaces ae0 aggregated-ether-options lacp active
set interfaces ae0 aggregated-ether-options lacp system-priority 100
set interfaces ae0 aggregated-ether-options lacp system-id 00:00:00:00:00:05
set interfaces ae0 aggregated-ether-options lacp admin-key 1
set interfaces ae0 aggregated-ether-options mc-ae mc-ae-id 1
set interfaces ae0 aggregated-ether-options mc-ae redundancy-group 1
set interfaces ae0 aggregated-ether-options mc-ae chassis-id 1
set interfaces ae0 aggregated-ether-options mc-ae mode active-active
set interfaces ae0 aggregated-ether-options mc-ae status-control active
set interfaces ae0 unit 0 multi-chassis-protection 200.1.1.1 interface xe-3/1/1.0
set interfaces ae0 unit 0 family ethernet-switching interface-mode trunk
set interfaces ae0 unit 0 family ethernet-switching vlan members all
set interfaces irb unit 100 family inet address 100.1.1.1/24
set vlans v100 vlan-id 100
set vlans v100 l3-interface irb.100


EX1
# run show interfaces ae0 extensive
Physical interface: ae0 (MC-AE-1, active), Enabled, Physical link is Up
  Interface index: 186, SNMP ifIndex: 561, Generation: 189
  Link-level type: Ethernet, MTU: 1518, Speed: 40Gbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled,
  Source filtering: Disabled, Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 1bps
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x4000
  Current address: 4c:96:14:6b:bb:c0, Hardware address: 4c:96:14:6b:bb:c0
  Last flapped   : 2014-07-25 17:13:08 PDT (1d 06:39 ago)
  Statistics last cleared: Never
  Traffic statistics:
   Input  bytes  :                 2141                    0 bps
   Output bytes  :             22808481                 2616 bps
   Input  packets:                   30                    0 pps
   Output packets:               313340                    5 pps
   IPv6 transit statistics:
    Input  bytes  :                   0
    Output bytes  :                   0
    Input  packets:                   0
    Output packets:                   0
  Dropped traffic statistics due to STP State:
   Input  bytes  :                    0
   Output bytes  :                    0
   Input  packets:                    0
   Output packets:                    0
  Input errors:
    Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Giants: 0, Policed discards: 0, Resource errors: 0
  Output errors:
    Carrier transitions: 0, Errors: 0, Drops: 0, MTU errors: 0, Resource errors: 0
  Ingress queues: 8 supported, 4 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped packets
    0                                0                    0                    0
    1                                0                    0                    0
    2                                0                    0                    0
    3                                0                    0                    0
  Egress queues: 8 supported, 4 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped packets
    0                            39562                39562                    0
    1                                0                    0                    0
    2                                0                    0                    0
    3                           253116               253116                    0
  Queue number:         Mapped forwarding classes
    0                   best-effort
    1                   expedited-forwarding
    2                   assured-forwarding
    3                   network-control

  Logical interface ae0.0 (Index 348) (SNMP ifIndex 563) (Generation 177)
    Flags: Up SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Statistics        Packets        pps         Bytes          bps
    Bundle:
        Input :             0          0             0            0
        Output:        309312          4      28634313         2008
    Adaptive Statistics:
        Adaptive Adjusts:          0
        Adaptive Scans  :          0
        Adaptive Updates:          0
    Link:
      et-2/0/1.0
        Input :             0          0             0            0
        Output:        313340          4      29939385         2008
    LACP info:        Role     System             System      Port    Port  Port
                             priority          identifier  priority  number   key
      et-2/0/1.0     Actor        100  00:00:00:00:00:05       127       1     1
      et-2/0/1.0   Partner        127  4c:96:14:f2:b6:e0       127       2     1
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      et-2/0/1.0            111233      111816            0            0
    Marker Statistics:   Marker Rx     Resp Tx   Unknown Rx   Illegal Rx
      et-2/0/1.0                 0           0            0            0
    Protocol eth-switch, MTU: 1518, Generation: 229, Route table: 6
      Flags: Trunk-Mode

EX2

show interfaces ae0 extensive
Physical interface: ae0 (MC-AE-1, active), Enabled, Physical link is Up
  Interface index: 219, SNMP ifIndex: 501, Generation: 351
  Link-level type: Ethernet, MTU: 1518, Speed: 40Gbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled,
  Source filtering: Disabled, Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 1bps
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x4000
  Current address: a8:d0:e5:f7:bf:c3, Hardware address: a8:d0:e5:f7:bf:c3
  Last flapped   : 2014-07-25 17:13:10 PDT (1d 06:39 ago)
  Statistics last cleared: Never
  Traffic statistics:
   Input  bytes  :               153427                    0 bps
   Output bytes  :             13628185                 1832 bps
   Input  packets:                 2414                    0 pps
   Output packets:               193271                    1 pps
   IPv6 transit statistics:
    Input  bytes  :                   0
    Output bytes  :                   0
    Input  packets:                   0
    Output packets:                   0
  Dropped traffic statistics due to STP State:
   Input  bytes  :                    0
   Output bytes  :                    0
   Input  packets:                    0
   Output packets:                    0
  Input errors:
    Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Giants: 0, Policed discards: 0, Resource errors: 0
  Output errors:
    Carrier transitions: 0, Errors: 0, Drops: 0, MTU errors: 0, Resource errors: 0
  Ingress queues: 8 supported, 4 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped packets
    0                                0                    0                    0
    1                                0                    0                    0
    2                                0                    0                    0
    3                                0                    0                    0
  Egress queues: 8 supported, 4 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped packets
    0                      90356345631          90356345631                    0
    1                                0                    0                    0
    2                                0                    0                    0
    3                           402965               402965                    0
  Queue number:         Mapped forwarding classes
    0                   best-effort
    1                   expedited-forwarding
    2                   assured-forwarding
    3                   network-control

  Logical interface ae0.0 (Index 343) (SNMP ifIndex 18551) (Generation 128345)
    Flags: Up SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Statistics        Packets        pps         Bytes          bps
    Bundle:
        Input :             0          0             0            0
        Output:        189232          1      13150169          512
    Adaptive Statistics:
        Adaptive Adjusts:          0
        Adaptive Scans  :          0
        Adaptive Updates:          0
    Link:
      et-2/2/1.0
        Input :             0          0             0            0
        Output:        193272          1      14459095          512
    LACP info:        Role     System             System      Port    Port  Port
                             priority          identifier  priority  number   key
      et-2/2/1.0     Actor        100  00:00:00:00:00:05       127   32769     1
      et-2/2/1.0   Partner        127  4c:96:14:f2:b6:e0       127       1     1
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      et-2/2/1.0            111288      111983            0            0
    Marker Statistics:   Marker Rx     Resp Tx   Unknown Rx   Illegal Rx
      et-2/2/1.0                 0           0            0            0
    Protocol eth-switch, MTU: 1518, Generation: 37237, Route table: 3
      Flags: Trunk-Mode



LEAF

{master:0}[edit]
jnpr@QFX5100-LEAF#run show interfaces ae0 extensive
Physical interface: ae0, Enabled, Physical link is Up
  Interface index: 659, SNMP ifIndex: 550, Generation: 150
  Description: TO-EX2
  Link-level type: Ethernet, MTU: 1514, Speed: 80Gbps, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 0
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x4000
  Current address: 4c:96:14:f2:b7:a0, Hardware address: 4c:96:14:f2:b7:a0
  Last flapped   : 2014-07-25 16:49:38 PDT (1d 07:04 ago)
  Statistics last cleared: Never
  Traffic statistics:
   Input  bytes  :       12286985366334                 2832 bps
   Output bytes  :       12315040186254                 2208 bps
   Input  packets:         179102021645                    3 pps
   Output packets:         179048324703                    2 pps
   IPv6 transit statistics:
    Input  bytes  :                   0
    Output bytes  :                   0
    Input  packets:                   0
    Output packets:                   0
  Input errors:
    Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Giants: 0, Policed discards: 0,
    Resource errors: 0
  Output errors:
    Carrier transitions: 4, Errors: 0, Drops: 0, MTU errors: 0, Resource errors: 0
  Egress queues: 12 supported, 5 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped packets
    0 best-effort                    0             95319185                    0
    3 fcoe                           0                    0                    0
    4 no-loss                        0                    0                    0
    7 network-cont                   0               242393                    0
    8 mcast                          0         178948686348                    0
  Queue number:         Mapped forwarding classes
    0                   best-effort
    3                   fcoe
    4                   no-loss
    7                   network-control
    8                   mcast

  Logical interface ae0.0 (Index 557) (SNMP ifIndex 553) (Generation 167)
    Flags: SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Statistics        Packets        pps         Bytes          bps
    Bundle:
        Input :             0          0             0            0
        Output:          3338          0        183668            0
    Link:
      et-0/0/50.0
        Input :             0          0             0            0
        Output:          5685          0       1470451            0
      et-0/0/51.0
        Input :             0          0             0            0
        Output:          4057          0       1386350            0
    LACP info:        Role     System             System      Port    Port  Port
                             priority          identifier  priority  number   key
      et-0/0/50.0    Actor        127  4c:96:14:f2:b6:e0       127       1     1
      et-0/0/50.0  Partner        100  00:00:00:00:00:05       127   32769     1
      et-0/0/51.0    Actor        127  4c:96:14:f2:b6:e0       127       2     1
      et-0/0/51.0  Partner        100  00:00:00:00:00:05       127       1     1
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      et-0/0/50.0           111855      111309            0            0
      et-0/0/51.0           111853      111308            0            0
    Marker Statistics:   Marker Rx     Resp Tx   Unknown Rx   Illegal Rx
      et-0/0/50.0                0           0            0            0
      et-0/0/51.0                0           0            0            0
    Protocol eth-switch, MTU: 1514, Generation: 181, Route table: 3
      Flags: Trunk-Mode




Monday, July 21, 2014

Use Zero Touch Provisioning (ZTP) to auto-configure and upgrade new or replacement switches in a datacenter.

A Typical Data Center can host 10s if not hundreds of Top of the Rack (TOR) Switches. Managing and configuring each one of these can become a tedious task. Replacing a switch that goes out of service is just as time consuming. ZTP is an automation method that reduces the amount of time, minimizes errors and the need for a Network Engineer to be on location. You would only need a junior engineer or technician to re-cable links, rack the units and power them on without having to console in or add any configuration.


HOW ZTP WORKS



ZTP uses a combination of DHCP and TFTP/FTP/HTTP servers for dynamically allocating ip addresses, uploading configuration and upgrading switch software images. Juniper EX and QFX switches automatically default to ZTP on boot up and basically become DHCP clients.

To start you would configure a DHCP server, modifying the dhcpd.conf file by adding a few options: DHCP option 43 with vendor specific information sub options and DHCP option 150 or 66 which contains the address of the TFTP server. On the TFTP or FTP server you would archive all your switches' configurations and software images.

On your linux server the dhcpd.conf file would look similar to this:

host <EX SWITCH NAME> {
hardware ethernet 4c:96:14:e5:a3:41; ## MAC address of the management interface, you can also use the dynamic IP allocation and also we can use any of the network port's (MAC add# chassis mac +1) for ZTP
  fixed-address 100.1.1.90;     # Switch's irb ip address
  option option-150 100.1.1.1; # TFTP Server address to download config and image
  option host-name "EX4300-1";
  option VENDOR_OP.image-file-name "jinstall-ex-4300-13.2X51-D20.2-domestic-signed.tgz";
  option VENDOR_OP.config-file-name "PE3713320070.conf";
  option VENDOR_OP.transfer-mode "tftp";
  option VENDOR_OP.image-file-type "symlink";
}

ZTP works on untagged interfaces on any ports on the switch (data ports or management ports).

If you were to console into the box during a ZTP sequence it would look like this:

---------------------------------------

Committing autoinstall config                                                 
                                                                              
FIRST THE SWITCH WILL TRY DHCP OVER THE MANAGMENT address. (VME or ME)
                                                                              
Auto Image Upgrade: DHCP OFFER Client vme.0: Invalid config, no file server information. OFFER REJECTED.                                                                              

If no server is reachable it will try all the interfaces that are up on the switch using the default vlan and a temporary irb.

It will then check the DHCP Options that are  passed between the server and the switch, noting the TFTP server IP address, the configuration file name and software image to be installed.

Auto Image Upgrade: DHCP Options for client interface irb.0:                  
ConfigFile: PE3713320070.conf ImageFile: jinstall-ex-4300-13.2X51-D20.2-domesti
c-signed.tgz Gateway: 100.1.1.1DHCP Server: 100.1.1.20 File Server: 100.1.1.1 O
ptions state: All options set                                                                              
                                                                              
Auto Image Upgrade: DHCP Client Bound interfaces: irb.0   vme.0                                                                                 
                                                                              
Auto Image Upgrade: DHCP Client Unbound interfaces: ge-0/0/0.0   ge-0/0/1.0   g
e-0/0/2.0   ge-0/0/3.0   ge-0/0/4.0   ge-0/0/5.0   ge-0/0/6.0   ge-0/0/7.0   ge
-0/0/8.0   ge-0/0/9.0   ge-0/0/10.0   ge-0/0/11.0   ge-0/0/12.0   ge-0/0/13.0 
 ge-0/0/14.0   ge-0/0/15.0   ge-0/0/16.0   ge-0/0/17.0   ge-0/0/18.0   ge-0/0/1
9.0   ge-0/0/20.0   ge-0/0/21.0   ge-0/0/22.0   ge-0/0/23.0   ge-0/0/24.0   ge-
0/0/25.0   ge-0/0/26.0   ge-0/0/27.0   ge-0/0/28.0   ge-0/0/29.0   ge-0/0/30.0
  ge-0/0/31.0   ge-0/0/32.0   ge-0/0/33.0   ge-0/0/34.0   ge-0/0/35.0   ge-0/0/
36.0   ge-0/0/37.0   ge-0/0/38.0   ge-0/0/39.0   ge-0/0/40.0   ge-0/0/41.0   ge
-0/0/42.0   ge-0/0/43.0   ge-0/0/44.0   ge-0/0/45.0   ge-0/0/46.0   ge-0/0/47.0
                                                                                 
                                                                              
Auto Image Upgrade: To stop, on CLI apply "delete chassis auto-image-upgrade" 
and commit                                                                              

The EX switch will then parse the dhcp response
                                                                              
Auto Image Upgrade: Active on client interface: irb.0                                                                              
                                                                              
Auto Image Upgrade: Interface::   "irb"                                       

Auto Image Upgrade: Server::      "100.1.1.1"                                 

Auto Image Upgrade: Image File::  "jinstall-ex-4300-13.2X51-D20.2-domestic-sign
ed.tgz"                                                                       

Auto Image Upgrade: Server File:: "PE3713320070.conf"                         

Auto Image Upgrade: Gateway::     "100.1.1.254"                                 

Auto Image Upgrade: Protocol::    "tftp"        
                              
                                                                            
The EX switch will then download the config file and the software image. 
                                                                              
Auto Image Upgrade: Start fetching PE3713320070.conf file from server 100.1.1.1
 through irb using tftp                                                       
                                                                              
                                                                              
Auto Image Upgrade: File PE3713320070.conf fetched from server 100.1.1.1 throug
h irb                                                                         
                                                                              
                                                                              
Auto Image Upgrade: Start fetching jinstall-ex-4300-13.2X51-D20.2-domestic-sign
ed.tgz file from server 100.1.1.1 through irb using tftp              

If the installed version on the switch and the version on the tftp server are the same, then the upgrade process aborts.

Auto Image Upgrade: Aborting image installation of jinstall-ex-4300-13.2X51-D21
.1-domestic-signed.tgz received from 100.1.1.1 through irb: Installed and fetch
ed image version same                                                         
                       
If the images are not the same, the EX switch will auto upgrade.

Auto Image Upgrade: File jinstall-ex-4300-13.2X51-D20.2-domestic-signed.tgz fet
ched from server 100.1.1.1 through irb   
                                                                              
Auto Image Upgrade: To install /var/tmp/jinstall-ex-4300-13.2X51-D20.2-domestic
-signed.tgz image fetched from server 100.1.1.1 through irb                   
                                                                                                                                                             
WARNING!!! On successful image installation, system will reboot automatically 

Auto Image Upgrade: Installation of /var/tmp/jinstall-ex-4300-13.2X51-D20.2-dom
estic-signed.tgz image fetched from server 100.1.1.1 through irb is done, proce
eding for reboot of system                                                    
                                                                              
                                                                              
Broadcast Message from root@EX4300-1                                          
        (no tty) at 5:47 UTC...                                               
                                                                              
Auto image Upgrade: Stopped                                                   
                                                                              
                                                                              
*** System shutdown message from root@EX4300-1 ***                          

System going down in 1 minute                                                 

*** FINAL System shutdown message from root@EX4300-1 ***                    

System going down IMMEDIATELY     

### AFTER REBOOT
EX4300-1 (ttyu0)

login: jnpr
Password:

--- JUNOS 13.2X51-D20.2 built 2014-04-29 08:43:38 UTC
{master:0}
jnpr@EX4300-1>

The EX is now running the new version of code and the downloaded configuration file and is ready for production.
Here's the config I see on the switch after reboot. It matches the config I saved on the TFTP server.

------------
jnpr@EX4300-1# show
## Last changed: 2014-07-20 07:56:01 UTC
version 13.2X51-D21.1;
/*
 * dhcpd-generated /var/etc/dhcpd.options.conf
 * Version: JDHCPD release 13.2X51-D21.1 built by builder on 2014-05-29 13:06:11 UTC
 * Written: Sun Jul 20 07:49:45 2014
 */

system {
    host-name EX4300-1;
    root-authentication {
        encrypted-password "$1$byLFhlG6$my6QnZANcF7DqD9m9Op5s."; ## SECRET-DATA
    }
    login {
        user jnpr {
            uid 2005;
            class super-user;
            authentication {
                encrypted-password "$1$FNz57vVN$lQYXYBuxDKlPwtTBFQXWa0"; ## SECRET-DATA
            }
        }
    }
    services {                         
        ssh;
        telnet;
        web-management {
            http;
        }
    }
    syslog {
        user * {
            any emergency;
        }
        host 100.1.1.72 {
            any any;
        }
        file messages {
            any notice;
            authorization info;
        }
        file interactive-commands {
            interactive-commands any;
        }
    }
    ntp {
        server 100.1.1.73;            
    }
}
interfaces {
    ge-0/0/0 {
        unit 0 {
            family ethernet-switching {
                vlan {
                    members default;
                }
            }
        }
    }
    irb {
        unit 0 {
            family inet {
                address 100.1.1.90/24;
            }
        }
    }
}
vlans {
    default {
        vlan-id 1;
        l3-interface irb.0;
    }
}

dhcpd.conf file on your unix box
--------------------------------

#STARTING OPTIONS
option subnet-mask 255.255.255.0;
option routers 100.1.1.1;   # Default GW
option option-150 code 150 = ip-address;

#Vendor Specific Option
option space VENDOR_OP;        #Define the Vendor Specific Option called VENDOR_OP
option VENDOR_OP-encapsulation code 43 = encapsulate VENDOR_OP;
option VENDOR_OP.image-file-name code 0 = text;
option VENDOR_OP.config-file-name code 1 = text;
option VENDOR_OP.image-file-type code 2 = text;
option VENDOR_OP.transfer-mode code 3 = text;

# DHCP IP Pool for your PCs, etc.

subnet 100.1.1.0 netmask 255.255.255.0 {
  range 100.1.1.50 100.1.1.60;
  option routers 100.1.1.1;
  option broadcast-address 100.1.1.255;
  option subnet-mask 255.255.255.0;
  option domain-name-servers 8.8.8.8;
  option domain-name "mydomain.net";
}

### EX Switch entries

host EX4300-1 {
hardware ethernet 4c:96:14:e5:a3:41; ## MAC address of the management interface, you can also use the dynamic IP allocation
 and also we can use any of the network port's (MAC add# chassis mac +1) for ZTP
  fixed-address 100.1.1.90;     # Switch's irb ip address to be assigned
  option routers 100.1.1.1;     # Default GW in case tftp is on another subnet
  option option-150 200.1.1.1; # TFTP Server address to download config and image
  option host-name "EX4300-1";
  option VENDOR_OP.image-file-name "jinstall-ex-4300-13.2X51-D21.1-domestic-signed.tgz";
  option VENDOR_OP.config-file-name "PE3713320070.conf";
  option VENDOR_OP.transfer-mode "tftp";
  option VENDOR_OP.image-file-type "symlink";

  option log-servers 100.1.1.72;
  option ntp-servers 100.1.1.73;
}