Friday, February 28, 2014

SDN - A look at Openflow running on white box switches.

In my quest to understand SDN, I attended a presentation on Openflow running on whitebox switches by Pica8 Open Networking. Their goal is to commoditize switching hardware and allow the control plane decisions to be made by a controller using Openflow. Pica8 has a cheap whitebox switch that runs Open Vswitch. The switches can be setup through Zero Touch provisioning, meaning you take an unconfigured box, drop it into the network and it will communicate with a server and automagically configure itself. They run a very light weight open source operating system with hardly any functionality which keeps costs low. I've read on the internet that their switches sell at half the cost of other vendor equipment. 

         While this may sound enticing, you get what you pay for. The size of their TCAM is around 1K-2K entries for some of their switches and 10k entries for others. This is pretty low! Why is the TCAM important? Think of it as the table size for an ACL entry. This is the 5 tuple flow where you "program" the switch based on the source and destination IP address, source and destination port and protocol. This is the gist of how Openflow programs the switches in your datacenter.

The low number of entries means that Pica8 has to limit the size of a data center. They resolve this limitation by recommending grouping switches into different "clusters". (Note: clusters is my name for it, as they called it a unit of calculation) They've calculated that a typical cluster can scale to a maximum of 12 racks of servers with two TOR switches, two AGG switches and two CORE boxes. When you want to expand, you create another "cluster" of switches/servers and interconnect them through the CORE boxes. This seems to be a waste of ports at the AGG layer.

          I believe the reason for this is based on the TCAM problem. If the AGG layer runs out of TCAM space, you are forced to build another cluster. I can imagine that each cluster is managed by a separate SDN controller. Theoretically a single SDN controller could manage all the clusters, but they didn't really talk about it.
          Pica8 at the time of writing has 200 customers, but ZERO deployments. Which means that even though SDN and Openflow is a cool technology, no customer in their right mind is going to put this in their production environment until the technology is more mature.

Now some of the problems with this implementation is that you literally have to program your network. You first have to create drop flow profiles. If you don't want IPv6 traversing your switch, create a drop flow. No multicast, create a drop flow. Then you create your forwarding entry. Need to go from a VM on port 1 to a VM on port 2. Create a forwarding entry. Now imagine that you have 48 ports per switch and two TOR switches, that right there means 96 entries in a single direction. Add the reverse direction and you have 192 entries to program. As you can see this could get very tedious. 
         Hopefully someone has a controller that can do this automatically. Once you run out of TCAM space you'll have to move VMs to a new cluster. Which brings me to the second issue, which is support for VMotion. Moving VMs between servers requires you to reprogram flows. But VMotion will have to integrate with the SDN Controller for this to work. This is also a flow based mechanism which means that this can be susceptible to DDOS attacks. Just send a bunch of arps and the switch will punt this to the controller. Get enough ARPs and you can overwhelm a controller and bring down a cluster.

Next they discussed network diagnostics. This was a very interesting topic. Where do you put this? On a normal switch you have counters and can retrieve them typically through SNMP. But on an Openflow switch, the hardware is supposed to be dumb. You need to put this on the controller. But how do you access a switch's counters without compromising performance. Do you retrieve this though Openflow? Is there another Northbound connection that will be both lightweight and scalable? Also Pica8 mentioned that some counters such as ingress and egress port statistics were not easily accessible. Another issue was when an upstream AGG switch did not have a proper flow entry it blackholed the packet and sent flow control packets down stream to the TOR, which filled up the buffers, thus preventing no packets to be forwarded from the TOR. They had to drop down to the switch's debug level to figure this out. In this scenario, where is the troubleshooting? You think it's a TOR issue, but in fact it's an AGG issue. This is a big concern I have about SDN, lack of diagnostics to troubleshoot issues. There is not enough visibility into the network to trace down the problem.

          While Openflow is an interesting technology this implementation is not yet mature and requires a lot of customization. Because of the limitation in a switches hardware this is not a scalable solution. You also need an intelligent controller that can automate your flow entries in a simple manner. 

Now to resolve this hardware issue I can imagine building a switch like building a bare metal server. Make the parts swappable. Running out of TCAM? Pull out the current one and install a new one just like you can swap out RAM and CPU Cores. White boxes need to be built so that their network connections stay in place and you swap out the FRUs around it. There also needs to be a way to get the optics to the point where they are tri-rate like copper links. Need to upgrade from 1G to 10G to 40G to 100G? Just update the flash. However this will also need some kind of black plane to upgrade the switch fabric.. Commoditized hardware needs to be built modularly. But this may be a Chassis based switch, not a TOR. However technology is constantly shrinking things down while packing more punch so I can imagine that eventually this will happen.

No comments:

Post a Comment