Thursday, December 22, 2011

OpenStack in the community

or,dissipating the fog in the cloud


Update (Feb 3,2012): read the meetup retrospective here

OpenStack is creating lots of buzz in the IT industry. Its promise is to revolutionize the cloud market in the same way that Linux revolutionized the operating system market. Part of the reason that Linux was successful is the community of devotees that saw the potential and ushered the code base to where it is today.

But communities don't always just happen, they're made with sweat and pizza. My group has taken on the goal of making OpenStack users in the Boston area a vibrant community. The promise of OpenStack and quick pace of activity has spurred a worldwide wave of excitement, with local groups spanning the globe - from Argentina  to Japan, Australia to British Columbia.

I half stumbled into organizing the Boston meetup group. Being an engineer, it gave me a new appreciation to what it takes to make an event happen. From getting relevant technical content down to salad dressing ... The general feedback from the folks who attended the previous meetup (see agenda and such) was positive. And so - on with the next meetups !

Thursday, December 1, 2011

Agile Ops

or, what's this DevOps thing everybody is talking about


Agile in software development is an attempt to be more real - rather than attempt to predict the future, the methodology is all about taking stabs and correcting with a fast feedback cycle. Start producing real value quickly, and keep chasing the value fanatically.

In a recent Openstack meetup discussing Swift an attendee asked - so swift has all these parameters, do you have good guidance on how to set them ? 
A traditional approach would involve enormous efforts attempting to predict workloads, methodically simulating them in a lab environment, and laboriously attempting to tune parameter after parameter to find the optimum setting for those workloads. Can you predict what would happen? You will end up deploying optimized the system based on the lab results driven by the simulated workloads.  By the second week in production you realize that the actual workloads are substantially different than what you predicted and performance can be gently be described as "sub-optimal". All the ops guys are now running around trying to adjust parameters, reacting to users' complaints.

An agile ops attitude would address this scenario differently. To deliver value quickly, a beta deployment would be stood up quickly, with expectations set that glitches are to be expected. The deployment would be configured in a manner that allows ops folks to quickly modify cluster wide configuration and to deploy updated  software. A second point of emphasis would be around deep monitoring of the environment, and the ability to quickly add additional monitoring to diagnose suspected symptoms.

Such an approach to operations is just another manifestation of agile core principles - quick value delivery and quick feedback loop. Delight your users quickly, and make sure you keep them happy. As in agile software development, the pivotal element is the attitude - embrace (or just accept the inevitability) of change and uncertainty and be prepared to adjust to reality as it may come. 

That said, it helps having the right tools, not just attitude, to succeed. For any decently sized system, trying to manage tens or hundreds of servers manually borders insanity (at least its certain to produce insanity quickly).Employing automated configuration management systems such as Chef or Puppet is a must (or you could step up to Crowbar, which happens to be the project I'm working on). These systems empower the operations teams to quickly inspect the status of their deployment and if need be (based on feedback produced by deep monitoring) quickly take system wide action by applying configuration changes on deploying patches in minutes.

So, next time you'r tempted to hide out in the lab, simulating what you think reality will unleash on your next deployment - stay on your toes and keep agile !



Wednesday, August 31, 2011

Agile process

or... what does actually matter in an agile process?


The holly grail of any software outfit is: deliver high value, high quality results; consistently and predictably. Talk about a loaded sentence ! but those are the requirements to make a software team  valuable to the business (or is the business). Breaking it down::
  • Team - A single developer doesn't need process.... And finding a unified process that works a huge conglamorate corp is not likely. A process then is meant for a TEAM of some reasonable size.
  • High value - if you deliver the wrong thing at the right time.... well, I've been at too many (now deceased) startups to recognize that value is key. 
  • High quality - need I say more? If you want to deliver 2.0, but your stuck bug-fixing 1.0 into existence you will have 2 customers - the one that is unhappy with 1.0 and the one waiting for the features in 2.0. Neither would keep the revenue coming in.
  • Results - activities by themselves, with no results, provides little value.
  • Consistently - It's hard to build a successful software product with just super-heros. A 1.0 release that leaves the team burnt out is not very useful over the long run... 
  • Predictably - To be able to plan ahead for the next set of features, to hit the market at a relevant time, the team must be predictable in delivering against its estimates.
While the details above could be nitpicked, in broad strokes every good programmer and software manager aspires to be in a team that achieves them - it's fun to be on such a team. But how do you achieve these goals?

Many manages and development teams I've worked with frequently make the similar process choice - first define the process, then figure out what you're actually trying to achieve by it. This is not done just because of  pointy-hair-boss whims. It's the reality in many, especially large, organization that a process is required. It could be a business requirement - ISO 9000 certification, SAS70 or any other industry norm demanded by customers as a result of legal requirements. It could be just because of the shear size of the organization and the many moving parts required to achieve business goals, where informal communications are just not sufficient.

What makes agile processes more adept at getting there is Closed Loop feedback system built into it. The general idea is that everything can be improved on, and should. The team continuously adjust the way it works, in a strive to achieve the holly grail.

Like any other closed loop system, the adjustment to the activities are determined by the metric being measured. If you're using a thermostat that measures temperature, you can make adjustments to the temperature... but not to the humidity, because you're not measuring that. The idea is simple - you choose what you want to optimize/control, and measure the relevant metrics to achieve that control.

Some of the key entities in agile development are:

  • user stories - or end user meaningful features delivered
  • tasks - the activities (leading to work products) required to fulfill a user story
  • story points - estimated amount of effort to deliver a user story
  • velocity - how many story points the team completes in a sprint or a given unit of time
  • release burn down - measure of velocity over the lifetime of  a release
The question is - what do you measure and optimize on?  how much overhead are you willing to accept as a price for getting accurate information?

Some more pondering required here... but the editor has been open for many days now...
















Sunday, August 14, 2011

Desktop sized "cloud"

Or, here's my pet mini-cloud

I first used VMWare workstation as a development tool somewhere around 20003. It was a cost effective way to test the software I was working on against quirks of different operating systems. On one linux desktop I could have all my development tools, including a few Windows95 installations.
So, now you're asking - "what does that have to do with  clouds?" Well, no much yet.

For my current project (Crowbar) I found myself needing a setup that includes:

  • Multiple OS flavors 
    • Win XP for access to corporate resources
    • ubuntu 10.10 and 11.04 - main testing targets
    • Redhat  and Centos - for some dev work and testing
  • Varying numbers of machines, pretty much depending on the day
    • For basic development 3 machines - development, admin node and a target node
    • For swift and nova (components of openstack) related development - at least 5, sometimes a few more
  • Weird-ish network configurations
    • Network segments to simulate public, private and internal networks for nova and swift
    • An isolated network for provisioning work (avoid nuking other folks' machines)
    • Corporate network... must have the umbilical cord.

Oh, and the mix changes at least twice weekly if not a bit more.
That list is probably a bit too long (though if I was more thorough it would take 2 pages). Imagine how would you build that? How much would it cost to setup? How many cables would you have to run down to make a change?

The most relevant quote I could find (thanks google) , is from Walter Chrysler:

"Whenever there is a hard job to be done I assign it to a lazy man; he is sure to find an easy way of doing it."


And I am lazy. And VMWare ESXi is a great solution for lazy people... standing up a new server by cloning an existing one is so much easier than racking one up and running all the cables (not to mention the paperwork to buy it). Making networking changes is a cinch too - just create the appropriate virtual switch and add  virtual NIC's on that network to all the machines that need access.

Here's a simplified setup, and some notes:



A few observations and attempts to provide logic to the madness:

  • Corporate IT does not really like it to have machines they're not familiar with on their network (i.e. non-corp image). On a network as large as Dell's it's hard to complain about it. That mandates the isolation of all but the one machine from the corp net.
  • The WinXP image is a bastion host of sorts - it is there to provide access to the environment. It is also used to manage the ESX server itself 
  • Access to multiple physical labs is achieved in 2 ways:
    • Multiple NIC's in the ESX server (6 total 1GigE ports)
    • VLan  tagging on the virtual NIC's. The access switch to the isolated labs uses the incoming VLAN ID to select the correct environment

ESX configuration

The test VM's are configured with the equivalent of a AWS tiny to small instances (i.e.1-2 Gb RAM, 2 virtual cores) depending on their workloads. The actual development VM's are beefier, more like a large (7 Gb RAM, 4 cores).
The server is configured with 2 resources pools - one for "Access" and one for pretty much everything else. The intent is to ensure that whatever crazy things are going on in the test VM's (anyone ever peg a CPU @100% for a bit?), I can keep working.

As was famously said - clouds still run on metal ... so here are the metal specs:
  •  2 socket, 4 core Xenon @ 2.4Ghz
  • 24 Gb Ram
  • 500Gb 15k SAS disks
With this hardware I've been running up to 12 VM's, and being somewhat productive. My biggest complaint is about Disk IO performance, especially seeing that DVD sized images fly all over the place. To solve that, I'll be migrating this setup to a Dell PEC 2100 box, with 12 spindles configured in a RAID 1E setup. 




But is it a cloud?

Some of the popular definitions for cloud involve the following criteria:

  1. Pay-per-use
  2. On-demand access to resources, or at least the ability of the resource user to self provision resources
  3. Unlimited resources, or at least the perception thereof
For the desktop sized cloud, money is not a factor. so scratch that.

Since I pretty much configure ESX at will, I think I get a check on the "on-demand" aspect. Yes, I mostly use the vCenter user interface, nothing as snazzy as the AWS API's. In previous life I've used the VMWare SDK (the perl version), and was pretty happy with the results. Its just that most of the changes to the environment fall into too many buckets to justify spending the time trying to automate them

Now, my current server is far from having unlimited resources....but unless you deal with abstract math - there  is no real unlimited-ness. What actually matters is the ratio of:  "how much I need" to "how much I have".  The oceans are bounded, but if all you need is a few tea cups worth of water, you're probably comfortable considering the ocean an unlimited resource.
Opps... back to servers. Over the past few weeks, I've been taking snapshots of performance metrics. CPU utilization is around 50%, Disk capacity is < 30%. Memory is my current barrier, around 80%.

If I needed to double the size of my virtual setup today, I could probably do that. The physical resources are pretty limited, but the head room they afford me is making me think oceans.

This is by no means AWS scale. But, I think I'll still name my server "cloudi - the friendly desktop mini-cloud"


















Tuesday, August 2, 2011

What's worse that the dreaded "Works on my setup?"

Or.. .why do you care about DevOps (and why you want devOps ++)



If you've ever written (or tested) a piece of software, you've heard this all before - "but... it works in my setup". That endless back and forth of finger pointing, until way too often, it ends up being some silly environment setting that causes the application or service to miss-behave.

DevOps is there to help. No more manual deployment steps to deploy an app. No longer 13 page project plans requiring coordination of at least 5 geo-distributed teams (half of which are in a timezone where the sun has long stopped shining).

The DevOps motto is simple - "if it hurts and its dangerous, do it often". The industry has accepted this in the Continuous Integration case - builds that integrate disparate development streams are painful. Assumptions made in isolation prove to be wrong, and whole hell breaks loose when you try to put different, independently developed pieces together. This is a painful process... so CI forces you to do it often. It spreads the pain into smaller pieces, and makes sure that you address the pimple before it requires an amputation. The main enabler of this quick iteration (and associated pain reduction) provided by CI is automation - build, integration and tests are automated to the extreme. Any breakage is found quickly by drones of build/test servers, which exercise the application (you ARE writing automated tests, right?) to flush out the kinks.


DevOps takes the same model (well, this is at least part of what DevOps addresses) to the next painful point in delivering applications to customers, at least those that are consumed as Saas, namely: If deploying into production is a painful risky process, the logic goes, then you should do that all the time !

Pushing bits into production systems should not be the first time you're kicking the tires of your deployment process. Dev and QA (and CI) should be using it all the time, so when the proverbial rubber hits the road, you're not faced with an octagonal, but rather with an automated, tested process. And since this process is tested and proven, then you should have no fear of deploying to production on a regular basis. DevOps believe that no rings need be kissing or blessing sought to deploy to production - the whole process is a well oiled machine that clicks.


To work, the process must take care of all facets affecting the correct operation of the application - be it the bits of code, the application's configuration and even OS level configuration tweaks (Out of sockets anyone?). There are leading solutions out there that make this all possible, and dare I say even simple. (Search for puppetlabs.com or OpsCode.com for some nice tools)

So far so good. Your app works in production as it did in your QA environment. Before you get too happy, remember there's hardware involved (still), as much as software folks would like to ignore it. Your code works happily in production, but it doesn't perform nearly as well..... now what? Are you sure the hardware is configured the same? Are the disks in the RAID config? Is the BIOS set to the recommended settings?


Software settings are (they better be!) easy to validate - diff the config file(s) and see what's different. If you're orderly, you have them all handy in the production repository (maybe even in a production Git repo). But what do you do about HW settings?


Enter Crowbar.

Crowbar is the system I've been working on at Dell. among its many functions, it makes hardware configuration behave like config files. The idea is simple: when you provision a system for use, you want it to look head-to-toe (or bios/raid to OS and application bits) like its mirror image in the QA environment. Obviously, different machines have different roles - in similar way their OS is tweak for their role, so should their hardware configuration. Think a Hadoop storage node (disks are better as jbod) vs. a file server (disks are better as RAID10... ok.. could be better). As Crowbar provisions a system, if it detects the BIOS version, BIOS parameters or RAID configurations are out of compliance - it adjusts the system intelligently to match its currently assigned role.

No more chasing this tweak or that. This is truly "set it and forget it" ...

That said, Crowbar is much more than a hardware config utility. It is a system to provision large scale (as in cloud scale) hardware environments, configuring application clusters soup-to-nuts.


The Crowbar is opensource (on github). With this version you get bare metal (or if you really want to tweak, even bare-cloud) to deployed application clusters (at this point, we're focusing on OpenStack, but more to come). If you want to unbox some metal... see the github wiki for some quick instructions.

P.S:

The BIOS and RAID configuration only work on Dell hardware at this point (and they use proprietary bits) so they're not in the wild. They're currently only distributed as part of the Dell Openstack solution. (see dell.com/OpenStack)

Wednesday, June 29, 2011

Cocktails and clouds, or how to explain SaaS, PaaS and Iaas



Recently I’ve been finding myself at the same awkward moment repeatedly; a social gathering of one sort or another, introductions, platitudes and the inevitable 2 questions:  “what do you do?” and “oh, software, what kind of software?” I have been experimenting with different explanations and examples, but judging by the faces looking at me I might as well just said I herd cats for a living.
Using familiar examples seems to resonate, it goes something like this:
A:  do you used Gmail or Yahoo?
WEP:  “yes… doesn’t everyone?
A: Why then you’re using ‘’the cloud’’! That is just an example of SaaS.

Ok. That gets the ball rolling, a bit.  But WEP (wide eyed person) is somewhat left unconvinced about this whole cloud thing. Heck, web based email has been around long before Cisco started advertising cloud on TV.
Setting up this blog triggered a thought – blogging is almost as common as ATM machines and Oprah. What if this could be the linchpin in explaining cloud to my mom?

So, you want a blog you say. Why it’s simple. Get yourself a cable modem, a decent server, tinker with some good old open source tools (linux, apache, php and wordpress) for a bit, and presto. Your ramblings are now out there for the world to enjoy.
That’s so last century.

Lets go up the stack – why do you need a server? And what happens if you’re in the NE where we get snow once in a while and the power goes out. Not to mention arranging for the world to reach your server is not that trivial either (static IP addresses and DNS setup and all sorts of headaches). Assuming the gods of shipping and cable companies cooperate, you’d be up and running in a week or two for less than 1000$ for the server and 100$ for power and net connection.

Rather than having a physical server with a dedicated net connection in your basement, the cloud offers you IaaS, or Infrastructure as a Service.
Head over to Amazon EC2, sign up for an account, and within a few minutes you have a server hovering in low altitude clouds for you to tinker with. This server comes with guarantees for availability, even if you're snowed in. You log in as super user and start setting things up – web server and up. A decent system administrator could get this setup in less than a couple of days.
But wait, now you need to setup security and worry about ongoing feeding of the beast (patching). That sys admin is no longer a onetime shot.
Cloud comes to the rescue again. After all, you just want a blog. www.0php.com can host anything that runs php – like the open source blog engine http://www.s9y.org/. You just ran into PaaS – in a nascent way. 0php gives you a platform – OS , web server, database and the likes, that your favorite software can use (as long as it only uses what it is allowed to use).

But me, I was lazy. I don’t want to run a blog engine… just ramble on an on, with not a worry (or maybe just worry about color schemes and such). 
Enter Blogger.com – you guessed it, there’s a cloudy name for them too. They’re SaaSy – the blog engine, the software I actually care to use, is offered for me as a service.
So on I ramble. Maybe these XaaS make more sense now? 

Tuesday, June 28, 2011

Inaugural post

I have succumbed to the web completely - so here's a blog. Working on open source projects and interacting with the world seems to absolutely mandate that I have a blog.  Well - here it is.

Some of the topics I'm involved with include, in no particular order:

  • Open source cloud platforms, mostly OpenStack but poking at things
  • SOA and Cloud - best buddies?
  • Java, python ruby php and their ilk seem to be part of the territory...
For the occasional excursion from excitement I try to go hiking camping and when the weather in the US NE is like today, riding. I'm still hoping to get (back) around to flying and fly my homebuilt Q200 (at that point in time, I'd probably need a pig-radar)

... that's it for today. I've got me some RAID's to configure!