Thursday, July 17, 2014

In service software upgrade ISSU on a Juniper leaf switch QFX5100 with minimal traffic distruption.

QFX5100 ISSU

The Juniper QFX5100 switch has the ability to be upgraded while in service (production) with minimal impact. This is useful when let's say this QFX is used as a leaf switch with servers directly attached to it. Since most TOR (Top-of-Rack) switches do not have redundant CPUs, i.e. control plane, this feature is a necessity.

Currently Data Centers would have to migrate VMs on host servers to other hosts residing on other TOR switches. An upgrade would happen on the switch that would be taken out of service and then VMs would have to be migrated back. This could take several hours and also could be an issue if resources on other switches were limited.

How ISSU works

Just like Virtualization in the Server world, the QFX switch has a hypervisor (KVM) and a VM that runs JUNOS (JVM).
You would need to configure GRES (Graceful Routing-Engine Switchover), non-stop routing and non-stop bridging before starting.
From the JVM (running the current code) you would issue the ISSU upgrade command: request system software in-service-upgrade <image version>
Using the new code image a second JVM is launched as a backup.
Next a Master-Backup election would happen between the VMs with the Master being elected on the current version of code.
It will then sync all the state tables to between the two VMs.
The Backup VM would then connect to the Packet Forwarding Engine (i.e. asic).
The device drivers would then detach from the old master while simultaneously attach the device drivers to current backup JVM
Switchover mastership for the current backup to tbecome the new master
Kill the old JVM running the old version of code.


In this test I setup IXIA traffic testers that injected 10K OSPF, 2K BGP routes and 10K mac addresses. Traffic was then sent to these destination addresses.

jnpr> show route summary
Autonomous system number: 100
Router ID: 1.1.1.1

inet.0: 12015 destinations, 12015 routes (12015 active, 0 holddown, 0 hidden)
              Direct:      7 routes,      7 active
               Local:      6 routes,      6 active
                OSPF:  10001 routes,  10001 active
                 BGP:   2000 routes,   2000 active
              Static:      1 routes,      1 active

{master:0}
jnpr> show ethernet-switching table summary
Total dynamic and static MAC addresses learned globally : 10004
Configured static MAC addresses learned globally       : 0

{master:0}
jnpr> edit  
Entering configuration mode

{master:0}[edit]
jnpr# show routing-options
nonstop-routing;
autonomous-system 100;

{master:0}[edit]
jnpr# show protocols layer2-control
nonstop-bridging;

{master:0}[edit]
jnpr# show chassis
redundancy {
    graceful-switchover;
}



jnpr# show interfaces
xe-0/0/0:0 {
    unit 0 {
        family ethernet-switching {
            interface-mode trunk;
            vlan {
                members all;
            }
        }
    }
}
xe-0/0/0:1 {
    unit 0 {
        family ethernet-switching {
            interface-mode trunk;
            vlan {
                members all;
            }
        }
    }
}
irb {
    unit 100 {
        family inet {
            address 100.1.1.1/24;
        }
    }
    unit 101 {
        family inet {
            address 101.1.1.1/24;
        }
    }
    unit 200 {                         
        family inet {
            address 200.1.1.1/24;
        }
    }
    unit 201 {
        family inet {
            address 201.1.1.1/24;
        }
    }
    unit 300 {
        family inet {
            address 110.1.1.254/24;
        }
    }
}
lo0 {
    unit 0 {
        family inet {
            address 1.1.1.1/32;
        }
    }
}

{master:0}[edit]
jnpr# show protocols
bgp {
    group EBGP {
        type external;
        neighbor 200.1.1.2 {
            peer-as 200;
        }
        neighbor 201.1.1.2 {
            peer-as 201;
        }
    }
}
ospf {
    area 0.0.0.0 {
        interface irb.100;
        interface irb.101;
    }
}
layer2-control {
    nonstop-bridging;
}


UPGRADE PROCESS:


{master:0}
jnpr> show version
fpc0:
--------------------------------------------------------------------------
Model: qfx5100-24q-2p
JUNOS Base OS Software Suite [13.2X51-D15.5]
JUNOS Base OS boot [13.2X51-D15.5]
JUNOS Crypto Software Suite [13.2X51-D15.5]
JUNOS Online Documentation [13.2X51-D15.5]
JUNOS Kernel Software Suite [13.2X51-D15.5]
JUNOS Packet Forwarding Engine Support (qfx-x86-32) [13.2X51-D15.5]
JUNOS Routing Software Suite [13.2X51-D15.5]
JUNOS Enterprise Software Suite [13.2X51-D15.5]
JUNOS py-base-i386 [13.2X51-D15.5]
JUNOS Host Software [13.2X51-D15.5]


{master:0}
jnpr> request system software in-service-upgrade ftp://jnpr:pass123@192.168.1.1:/jinstall-qfx-5-13.2X51-D21.1-domestic-img.tgz  
warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost!
ISSU: Validating Image
Fetching package...
ISSU: Preparing Backup RE
Prepare for ISSU
ISSU: Backup RE Prepare Done
Spawning the backup RE
Spawn backup RE, index 0 successful
GRES in progress
GRES done in 13 seconds
Waiting for backup RE switchover ready
GRES operational
Copying home directories
Copying home directories successful
Initiating Chassis In-Service-Upgrade
Chassis ISSU Started
ISSU: Preparing Daemons
ISSU: Daemons Ready for ISSU
ISSU: Starting Upgrade for FRUs
ISSU: FPC Warm Booting
ISSU: FPC Warm Booted
ISSU: Preparing for Switchover
ISSU: Ready for Switchover
Checking In-Service-Upgrade status
  Item           Status                  Reason
  FPC 0          Online (ISSU)       
Send ISSU done to chassisd on backup RE
Chassis ISSU Completed
ISSU: IDLE
Initiate em0 device handoff
Connection closed by foreign host.
[jnpr-laptop:~] jnpr%
[jnpr-laptop:~] jnpr% expect telly 10.161.33.53
spawn telnet -K 10.161.33.53
Trying 10.161.33.53...
Connected to 10.161.33.53.
Escape character is '^]'.

 (ttyp0)

login: jnpr
Password:

--- JUNOS 13.2X51-D21.1 built 2014-05-29 11:41:16 UTC
{master:0}
jnpr> show version
fpc0:
--------------------------------------------------------------------------
Model: qfx5100-24q-2p
JUNOS Base OS Software Suite [13.2X51-D21.1]
JUNOS Base OS boot [13.2X51-D21.1]
JUNOS Crypto Software Suite [13.2X51-D21.1]
JUNOS Online Documentation [13.2X51-D21.1]
JUNOS Kernel Software Suite [13.2X51-D21.1]
JUNOS Packet Forwarding Engine Support (qfx-ex-x86-32) [13.2X51-D21.1]
JUNOS Routing Software Suite [13.2X51-D21.1]
JUNOS Enterprise Software Suite [13.2X51-D21.1]
JUNOS py-base-i386 [13.2X51-D21.1]
JUNOS Host Software [13.2X51-D15.5]


IXIA Results

That's 52 frames lost with a service disruption of 2.6 ms (milliseconds).

No comments:

Post a Comment