Google+ Followers

Monday, March 10, 2014

agility with accountability

Or,  How to retain fiscal sanity and control over cloud usage

The humdrum of Lean, Agility, Cloud and Devops converging has been in the air for a while now. Sometimes it feels like teenagers on prom night discussing their upcoming exploits - many talk about it, some fumble around experimenting, and few are actually doing anything with it.
$ per hour...

The affinity between the movements is evident - minimize upfront investment until value is proven, while ensuring you can grown your investment and capacity as this occurs. Lean applies it to the business model, Agile to the development process and the combination of DevOps powered by "resources at the point of the mouse" of the public cloud bring it to resource provisioning.

The mixture of DevOps and public cloud produce this mixture that allows companies such as AirBnB to prove their business and scale it up. There's one big fly in this otherwise delicious soup - the bean counters. They want to be able to exert the level of control over budgets and spend that they're used to in the world of data centers and 6 months server provisioning cycles.  When an Autoscaling group launches a handful of additional instances to accommodate load, there's no excel sheet circulated in triplicates to provide approvals...where's the control?

Even more scary, AWS offers a huge diversity of instances, costing anywhere from 14$ a month to just about 5000$. A simple miscalculation or misconfiguration, say using a i2.8xlarge instance ($6.820 an hour) instead of a c3.8xlarge ($2.40 an hour) can be really costly if left uncorrected for a  whole month. It's the difference between the expected ~ $1700 with the actual of ~ $5000.
And that's just 1 instance. In a whole zoo full of such pricy items... all the beans are spilling out of control !!!
(To keep to my main point, I'm not even touching the various purchasing options that AWS offer ranging from Spot to 3 year reservations!)

One approach, which stogy companies apply (see my CV) is to restrict access and limit agility, Control is retained and the world is happy. Well.. part of the world.

Another approach is to retain the benefits of agility, but add layer of accountability on top of it.
Lets say, that you could look across an organization that runs 500 instances (or 5000) and you could quickly answer these questions:
  • What changed over the last week?
  • Who performed those changes? In what business unit? supporting what product?
  • Are those changes justified by the environment they're in? by the performance profile of the workload? 
  • What is the cost impact of those changes?
In the example above the $3300 mistake ($5000-$1700) will be much smaller, because the error is caught within the week, rather than waiting for the monthly bill. 
If the auditing is daily, or hourly... even if there were a few dozens of these instances launched, the mistake has almost negligible financial impact!

The solution to the conundrum - agility or control, at least in my mind is obvious - Let innovation roam, take the risk that mistakes will happen, and famously "move fast and break things" - others have proven it might just be worth it. But don't forget to keep an watchful eye on the cost of mistakes, and catch them when they're negligible.