Thursday, December 1, 2011

Agile Ops

or, what's this DevOps thing everybody is talking about


Agile in software development is an attempt to be more real - rather than attempt to predict the future, the methodology is all about taking stabs and correcting with a fast feedback cycle. Start producing real value quickly, and keep chasing the value fanatically.

In a recent Openstack meetup discussing Swift an attendee asked - so swift has all these parameters, do you have good guidance on how to set them ? 
A traditional approach would involve enormous efforts attempting to predict workloads, methodically simulating them in a lab environment, and laboriously attempting to tune parameter after parameter to find the optimum setting for those workloads. Can you predict what would happen? You will end up deploying optimized the system based on the lab results driven by the simulated workloads.  By the second week in production you realize that the actual workloads are substantially different than what you predicted and performance can be gently be described as "sub-optimal". All the ops guys are now running around trying to adjust parameters, reacting to users' complaints.

An agile ops attitude would address this scenario differently. To deliver value quickly, a beta deployment would be stood up quickly, with expectations set that glitches are to be expected. The deployment would be configured in a manner that allows ops folks to quickly modify cluster wide configuration and to deploy updated  software. A second point of emphasis would be around deep monitoring of the environment, and the ability to quickly add additional monitoring to diagnose suspected symptoms.

Such an approach to operations is just another manifestation of agile core principles - quick value delivery and quick feedback loop. Delight your users quickly, and make sure you keep them happy. As in agile software development, the pivotal element is the attitude - embrace (or just accept the inevitability) of change and uncertainty and be prepared to adjust to reality as it may come. 

That said, it helps having the right tools, not just attitude, to succeed. For any decently sized system, trying to manage tens or hundreds of servers manually borders insanity (at least its certain to produce insanity quickly).Employing automated configuration management systems such as Chef or Puppet is a must (or you could step up to Crowbar, which happens to be the project I'm working on). These systems empower the operations teams to quickly inspect the status of their deployment and if need be (based on feedback produced by deep monitoring) quickly take system wide action by applying configuration changes on deploying patches in minutes.

So, next time you'r tempted to hide out in the lab, simulating what you think reality will unleash on your next deployment - stay on your toes and keep agile !