Wednesday, August 31, 2011

Agile process

or... what does actually matter in an agile process?


The holly grail of any software outfit is: deliver high value, high quality results; consistently and predictably. Talk about a loaded sentence ! but those are the requirements to make a software team  valuable to the business (or is the business). Breaking it down::
  • Team - A single developer doesn't need process.... And finding a unified process that works a huge conglamorate corp is not likely. A process then is meant for a TEAM of some reasonable size.
  • High value - if you deliver the wrong thing at the right time.... well, I've been at too many (now deceased) startups to recognize that value is key. 
  • High quality - need I say more? If you want to deliver 2.0, but your stuck bug-fixing 1.0 into existence you will have 2 customers - the one that is unhappy with 1.0 and the one waiting for the features in 2.0. Neither would keep the revenue coming in.
  • Results - activities by themselves, with no results, provides little value.
  • Consistently - It's hard to build a successful software product with just super-heros. A 1.0 release that leaves the team burnt out is not very useful over the long run... 
  • Predictably - To be able to plan ahead for the next set of features, to hit the market at a relevant time, the team must be predictable in delivering against its estimates.
While the details above could be nitpicked, in broad strokes every good programmer and software manager aspires to be in a team that achieves them - it's fun to be on such a team. But how do you achieve these goals?

Many manages and development teams I've worked with frequently make the similar process choice - first define the process, then figure out what you're actually trying to achieve by it. This is not done just because of  pointy-hair-boss whims. It's the reality in many, especially large, organization that a process is required. It could be a business requirement - ISO 9000 certification, SAS70 or any other industry norm demanded by customers as a result of legal requirements. It could be just because of the shear size of the organization and the many moving parts required to achieve business goals, where informal communications are just not sufficient.

What makes agile processes more adept at getting there is Closed Loop feedback system built into it. The general idea is that everything can be improved on, and should. The team continuously adjust the way it works, in a strive to achieve the holly grail.

Like any other closed loop system, the adjustment to the activities are determined by the metric being measured. If you're using a thermostat that measures temperature, you can make adjustments to the temperature... but not to the humidity, because you're not measuring that. The idea is simple - you choose what you want to optimize/control, and measure the relevant metrics to achieve that control.

Some of the key entities in agile development are:

  • user stories - or end user meaningful features delivered
  • tasks - the activities (leading to work products) required to fulfill a user story
  • story points - estimated amount of effort to deliver a user story
  • velocity - how many story points the team completes in a sprint or a given unit of time
  • release burn down - measure of velocity over the lifetime of  a release
The question is - what do you measure and optimize on?  how much overhead are you willing to accept as a price for getting accurate information?

Some more pondering required here... but the editor has been open for many days now...
















Sunday, August 14, 2011

Desktop sized "cloud"

Or, here's my pet mini-cloud

I first used VMWare workstation as a development tool somewhere around 20003. It was a cost effective way to test the software I was working on against quirks of different operating systems. On one linux desktop I could have all my development tools, including a few Windows95 installations.
So, now you're asking - "what does that have to do with  clouds?" Well, no much yet.

For my current project (Crowbar) I found myself needing a setup that includes:

  • Multiple OS flavors 
    • Win XP for access to corporate resources
    • ubuntu 10.10 and 11.04 - main testing targets
    • Redhat  and Centos - for some dev work and testing
  • Varying numbers of machines, pretty much depending on the day
    • For basic development 3 machines - development, admin node and a target node
    • For swift and nova (components of openstack) related development - at least 5, sometimes a few more
  • Weird-ish network configurations
    • Network segments to simulate public, private and internal networks for nova and swift
    • An isolated network for provisioning work (avoid nuking other folks' machines)
    • Corporate network... must have the umbilical cord.

Oh, and the mix changes at least twice weekly if not a bit more.
That list is probably a bit too long (though if I was more thorough it would take 2 pages). Imagine how would you build that? How much would it cost to setup? How many cables would you have to run down to make a change?

The most relevant quote I could find (thanks google) , is from Walter Chrysler:

"Whenever there is a hard job to be done I assign it to a lazy man; he is sure to find an easy way of doing it."


And I am lazy. And VMWare ESXi is a great solution for lazy people... standing up a new server by cloning an existing one is so much easier than racking one up and running all the cables (not to mention the paperwork to buy it). Making networking changes is a cinch too - just create the appropriate virtual switch and add  virtual NIC's on that network to all the machines that need access.

Here's a simplified setup, and some notes:



A few observations and attempts to provide logic to the madness:

  • Corporate IT does not really like it to have machines they're not familiar with on their network (i.e. non-corp image). On a network as large as Dell's it's hard to complain about it. That mandates the isolation of all but the one machine from the corp net.
  • The WinXP image is a bastion host of sorts - it is there to provide access to the environment. It is also used to manage the ESX server itself 
  • Access to multiple physical labs is achieved in 2 ways:
    • Multiple NIC's in the ESX server (6 total 1GigE ports)
    • VLan  tagging on the virtual NIC's. The access switch to the isolated labs uses the incoming VLAN ID to select the correct environment

ESX configuration

The test VM's are configured with the equivalent of a AWS tiny to small instances (i.e.1-2 Gb RAM, 2 virtual cores) depending on their workloads. The actual development VM's are beefier, more like a large (7 Gb RAM, 4 cores).
The server is configured with 2 resources pools - one for "Access" and one for pretty much everything else. The intent is to ensure that whatever crazy things are going on in the test VM's (anyone ever peg a CPU @100% for a bit?), I can keep working.

As was famously said - clouds still run on metal ... so here are the metal specs:
  •  2 socket, 4 core Xenon @ 2.4Ghz
  • 24 Gb Ram
  • 500Gb 15k SAS disks
With this hardware I've been running up to 12 VM's, and being somewhat productive. My biggest complaint is about Disk IO performance, especially seeing that DVD sized images fly all over the place. To solve that, I'll be migrating this setup to a Dell PEC 2100 box, with 12 spindles configured in a RAID 1E setup. 




But is it a cloud?

Some of the popular definitions for cloud involve the following criteria:

  1. Pay-per-use
  2. On-demand access to resources, or at least the ability of the resource user to self provision resources
  3. Unlimited resources, or at least the perception thereof
For the desktop sized cloud, money is not a factor. so scratch that.

Since I pretty much configure ESX at will, I think I get a check on the "on-demand" aspect. Yes, I mostly use the vCenter user interface, nothing as snazzy as the AWS API's. In previous life I've used the VMWare SDK (the perl version), and was pretty happy with the results. Its just that most of the changes to the environment fall into too many buckets to justify spending the time trying to automate them

Now, my current server is far from having unlimited resources....but unless you deal with abstract math - there  is no real unlimited-ness. What actually matters is the ratio of:  "how much I need" to "how much I have".  The oceans are bounded, but if all you need is a few tea cups worth of water, you're probably comfortable considering the ocean an unlimited resource.
Opps... back to servers. Over the past few weeks, I've been taking snapshots of performance metrics. CPU utilization is around 50%, Disk capacity is < 30%. Memory is my current barrier, around 80%.

If I needed to double the size of my virtual setup today, I could probably do that. The physical resources are pretty limited, but the head room they afford me is making me think oceans.

This is by no means AWS scale. But, I think I'll still name my server "cloudi - the friendly desktop mini-cloud"


















Tuesday, August 2, 2011

What's worse that the dreaded "Works on my setup?"

Or.. .why do you care about DevOps (and why you want devOps ++)



If you've ever written (or tested) a piece of software, you've heard this all before - "but... it works in my setup". That endless back and forth of finger pointing, until way too often, it ends up being some silly environment setting that causes the application or service to miss-behave.

DevOps is there to help. No more manual deployment steps to deploy an app. No longer 13 page project plans requiring coordination of at least 5 geo-distributed teams (half of which are in a timezone where the sun has long stopped shining).

The DevOps motto is simple - "if it hurts and its dangerous, do it often". The industry has accepted this in the Continuous Integration case - builds that integrate disparate development streams are painful. Assumptions made in isolation prove to be wrong, and whole hell breaks loose when you try to put different, independently developed pieces together. This is a painful process... so CI forces you to do it often. It spreads the pain into smaller pieces, and makes sure that you address the pimple before it requires an amputation. The main enabler of this quick iteration (and associated pain reduction) provided by CI is automation - build, integration and tests are automated to the extreme. Any breakage is found quickly by drones of build/test servers, which exercise the application (you ARE writing automated tests, right?) to flush out the kinks.


DevOps takes the same model (well, this is at least part of what DevOps addresses) to the next painful point in delivering applications to customers, at least those that are consumed as Saas, namely: If deploying into production is a painful risky process, the logic goes, then you should do that all the time !

Pushing bits into production systems should not be the first time you're kicking the tires of your deployment process. Dev and QA (and CI) should be using it all the time, so when the proverbial rubber hits the road, you're not faced with an octagonal, but rather with an automated, tested process. And since this process is tested and proven, then you should have no fear of deploying to production on a regular basis. DevOps believe that no rings need be kissing or blessing sought to deploy to production - the whole process is a well oiled machine that clicks.


To work, the process must take care of all facets affecting the correct operation of the application - be it the bits of code, the application's configuration and even OS level configuration tweaks (Out of sockets anyone?). There are leading solutions out there that make this all possible, and dare I say even simple. (Search for puppetlabs.com or OpsCode.com for some nice tools)

So far so good. Your app works in production as it did in your QA environment. Before you get too happy, remember there's hardware involved (still), as much as software folks would like to ignore it. Your code works happily in production, but it doesn't perform nearly as well..... now what? Are you sure the hardware is configured the same? Are the disks in the RAID config? Is the BIOS set to the recommended settings?


Software settings are (they better be!) easy to validate - diff the config file(s) and see what's different. If you're orderly, you have them all handy in the production repository (maybe even in a production Git repo). But what do you do about HW settings?


Enter Crowbar.

Crowbar is the system I've been working on at Dell. among its many functions, it makes hardware configuration behave like config files. The idea is simple: when you provision a system for use, you want it to look head-to-toe (or bios/raid to OS and application bits) like its mirror image in the QA environment. Obviously, different machines have different roles - in similar way their OS is tweak for their role, so should their hardware configuration. Think a Hadoop storage node (disks are better as jbod) vs. a file server (disks are better as RAID10... ok.. could be better). As Crowbar provisions a system, if it detects the BIOS version, BIOS parameters or RAID configurations are out of compliance - it adjusts the system intelligently to match its currently assigned role.

No more chasing this tweak or that. This is truly "set it and forget it" ...

That said, Crowbar is much more than a hardware config utility. It is a system to provision large scale (as in cloud scale) hardware environments, configuring application clusters soup-to-nuts.


The Crowbar is opensource (on github). With this version you get bare metal (or if you really want to tweak, even bare-cloud) to deployed application clusters (at this point, we're focusing on OpenStack, but more to come). If you want to unbox some metal... see the github wiki for some quick instructions.

P.S:

The BIOS and RAID configuration only work on Dell hardware at this point (and they use proprietary bits) so they're not in the wild. They're currently only distributed as part of the Dell Openstack solution. (see dell.com/OpenStack)