Tuesday, August 2, 2011

What's worse that the dreaded "Works on my setup?"

Or.. .why do you care about DevOps (and why you want devOps ++)



If you've ever written (or tested) a piece of software, you've heard this all before - "but... it works in my setup". That endless back and forth of finger pointing, until way too often, it ends up being some silly environment setting that causes the application or service to miss-behave.

DevOps is there to help. No more manual deployment steps to deploy an app. No longer 13 page project plans requiring coordination of at least 5 geo-distributed teams (half of which are in a timezone where the sun has long stopped shining).

The DevOps motto is simple - "if it hurts and its dangerous, do it often". The industry has accepted this in the Continuous Integration case - builds that integrate disparate development streams are painful. Assumptions made in isolation prove to be wrong, and whole hell breaks loose when you try to put different, independently developed pieces together. This is a painful process... so CI forces you to do it often. It spreads the pain into smaller pieces, and makes sure that you address the pimple before it requires an amputation. The main enabler of this quick iteration (and associated pain reduction) provided by CI is automation - build, integration and tests are automated to the extreme. Any breakage is found quickly by drones of build/test servers, which exercise the application (you ARE writing automated tests, right?) to flush out the kinks.


DevOps takes the same model (well, this is at least part of what DevOps addresses) to the next painful point in delivering applications to customers, at least those that are consumed as Saas, namely: If deploying into production is a painful risky process, the logic goes, then you should do that all the time !

Pushing bits into production systems should not be the first time you're kicking the tires of your deployment process. Dev and QA (and CI) should be using it all the time, so when the proverbial rubber hits the road, you're not faced with an octagonal, but rather with an automated, tested process. And since this process is tested and proven, then you should have no fear of deploying to production on a regular basis. DevOps believe that no rings need be kissing or blessing sought to deploy to production - the whole process is a well oiled machine that clicks.


To work, the process must take care of all facets affecting the correct operation of the application - be it the bits of code, the application's configuration and even OS level configuration tweaks (Out of sockets anyone?). There are leading solutions out there that make this all possible, and dare I say even simple. (Search for puppetlabs.com or OpsCode.com for some nice tools)

So far so good. Your app works in production as it did in your QA environment. Before you get too happy, remember there's hardware involved (still), as much as software folks would like to ignore it. Your code works happily in production, but it doesn't perform nearly as well..... now what? Are you sure the hardware is configured the same? Are the disks in the RAID config? Is the BIOS set to the recommended settings?


Software settings are (they better be!) easy to validate - diff the config file(s) and see what's different. If you're orderly, you have them all handy in the production repository (maybe even in a production Git repo). But what do you do about HW settings?


Enter Crowbar.

Crowbar is the system I've been working on at Dell. among its many functions, it makes hardware configuration behave like config files. The idea is simple: when you provision a system for use, you want it to look head-to-toe (or bios/raid to OS and application bits) like its mirror image in the QA environment. Obviously, different machines have different roles - in similar way their OS is tweak for their role, so should their hardware configuration. Think a Hadoop storage node (disks are better as jbod) vs. a file server (disks are better as RAID10... ok.. could be better). As Crowbar provisions a system, if it detects the BIOS version, BIOS parameters or RAID configurations are out of compliance - it adjusts the system intelligently to match its currently assigned role.

No more chasing this tweak or that. This is truly "set it and forget it" ...

That said, Crowbar is much more than a hardware config utility. It is a system to provision large scale (as in cloud scale) hardware environments, configuring application clusters soup-to-nuts.


The Crowbar is opensource (on github). With this version you get bare metal (or if you really want to tweak, even bare-cloud) to deployed application clusters (at this point, we're focusing on OpenStack, but more to come). If you want to unbox some metal... see the github wiki for some quick instructions.

P.S:

The BIOS and RAID configuration only work on Dell hardware at this point (and they use proprietary bits) so they're not in the wild. They're currently only distributed as part of the Dell Openstack solution. (see dell.com/OpenStack)