Friday, June 8, 2012

Object Store in OpenStack–the secret weapon


Or, why swift rocks

Swift is the Object Storage component of Openstack. Object Storage is a new paradigm for storage. It’s not files (as in a filesystem) or blocks (as in SCSI variants), but rather Objects. Objects are immutable; once written, their contents can’t be changed, only replaced. Why would you want that? Think facebook, or tumblr or flickr – you’re not likely to ever update the content of an image… and the benefits that Swift brings are worth the loss of this capability.

To clear some confusion, Swift is SOFTWARE. It is not a piece of hardware or an appliance, like a NAS box. It is not a Service (Like the S3 offering from Amazon, or CloudFiles from Rackspace).  Swift is software, and free open-source software at that. The cost to deploy it is driven by the choices of hardware (physical or virtual) and operational choices made. But what considerations drive those decisions, and what are the tradeoffs?

A coarse understanding of what magic Swift performs and how, is required to make intelligent deployment design decisions. The magic is to convert a bunch of cheap, unreliable machines and disks into a very reliable and performing storage system. To improve reliability Swift is designed to expect failures. Data is replicated to multiple disks, such that no single disk failures don’t lead to data loss. Replicas are located different “zones” (different machines, different racks etc.), to further isolate failures. The secret for swift performance is the ability to add  resources selectively to match the workload, in a scale out manner. To see how this works, it is useful to describe the main components of swift.

Swift has the following cooperative servers:

  • Proxy server – Exposes the user facing API. This server accepts requests, validates them and generates requests to the other servers.
  • Account and Container servers – These are the metadata servers in swift. Accounts represent different tenants or users in a shared environment. Each account can has containers. Containers hold objects. There are no limits on the number of accounts, containers with accounts or objects within a container.
  • Object server – this is where the real data is stored.

(each server has an associated set of processes which together fulfill the server’s mission.)

To spread the load among the different locations data is stored, Swift uses a Consistent Hash algorithm. ‘Ring’ files contain the consistent hash mapping form Object names to disks (well, actually ‘partitions’ to ‘disks’), and need to be synchronized across all the servers by an outside process. All the metadata for swift (i.e. list of containers for an account, objects in a container) is stored in swift itself, and is effectively treated as objects. The more disks (obviously in machines) are added, the more capacity is available, and the more (potential) performance is available

While server roles can be co-located on the same physical machine, they don’t have to. Both the software and hardware configurations of the servers can be tuned to reflect the needs of the environment, and the tradeoffs between them. Other than the obvious cost vs. performance tradeoffs, others include:

  • Speed vs. resiliency – is ensuring no data loss more or less important than serving requests fast?
  • Read vs. Write – performance tuning and system design can favor one or the other.
  • Distributed environment – are there dispersed users (across slow links) ?
  • Cheap or expensive hardware – in other words, lots of high failure rate devices vs. fewer less flaky ones

This post is getting long, so the next one (or ones…?)will discuss tradeoffs and how to reflect them in swift.

Wednesday, May 23, 2012

Stacked for Business

Or, my impressions from the OpenStack Folsom Summit

OpenStack proved again that it's a community effort. The number of developers in design half and overall participants in the Conference part was amazing. Most design sessions (I focused on Quantum and a few Nova sessions) were standing room only. literally. The representation of big name vendors was palpable, though it was nice to see that big names didn't necessarily get undue influence on the conversation.
(For my part, the fun started on the way there, on an RV for 1800 miles...but that's a story told somewhere else)

My major takeaway is that Openstack is open for business, in multiple ways. The first is that Openstack is production ready. This was manifested even prior to the summit, with dueling announcements from HP and Rackspace about deploying the Essex release into their respective production public clouds. A second way in which business friendliness is achieved is by opportunities More supporting evidence to for this conclusion came from the design session, especially in Quantum and in the Nova-Volume discussions ( more below).

A common trend across sessions and projects is shrinkage (no, not like George) - which is a Good Thing. Like many software projects (especially opensource ones) many of the projects have accumulated features and adjuncts that deviate from their core mission. This bloats the code, adding complexity and occasionally requires specialized hardware to fully test the code. Even more importantly, it hinders innovation by newcomers - API's and semantics are complex, and providing alternative innovative implementations becomes much costlier. A mean'n lean API, focused on a core mission, can encourage vendors of specialized technologies to adapt their products to fit into Openstack.

Two good examples Quantum and Nova Volume. Quantum is being designed from the ground up to separate the API layer and the layer that manifests the virtualized networks into the environment. The focus at the 2 days of design sessions was on ensuring a quality open-source experience for common usecases, while ensuring that commercial offerings can be easily implemented (the fact that the 2 main vendors behind quantum are Nicira and Cisco helps...). Nova-Volume, or Cinder, is a spin-off into its own project of capabilities previously hidden within Nova (the Compute virtualization component). As the new API's are being defined, Storage vendors are early to the table, to represent their considerations in the API.
In both cases, a pure opensource solution will be available - this is part of the Openstack mission. However, having vendors provide differentiated solutions for more specialized usecases is a Good Thing.

One of the best proofs that the need for these differentiated solution is real, is the work done to support Tilera as compute resources. Tilera is not your run-of-the-mill type of a computing machine...with 100's of cores, it's currently a bit off the main stream. The folks at USC-ISI however, went ahead and built an extension to Nova to provision their TILEmpower boards. Just imagine  mapping RFC 2325 to Openstack ;)

Sunday, March 18, 2012

Cloud and AI

Or, Running Lisp Probabilistically, backwards.

It started at a talk I attended in previous post. Yes, there's been lots of talk about analytics and cloud, even made famous in popular media.... But the "whats" and the "hows" are in constant flux. The curiosity triggered by the subtitle, lead to a weekend filled with AI and to this post.

AI is a big space. I'm focusing on one small part, Machine Learning. And specifically, translation and categorization, in exploring how The cloud supports AI, and AI supports the cloud...

What does the could have to do with machine translation? The early attempts at machine translation (circa 1950's) went down the path of Natural Language Processing. They failed for various reasons. current approaches (do the quizzes!!), as in Google Translate, play a matching game. Use the cloud to collect ridiculous amounts of sample translations, e.g a newspaper which publishes in more than language, restaurant menus and other sources of correlated text in multiple languages. These examples "teach" the machine, by example, what words in one language correspond to in other languages. The results are so surprisingly good, that you might be tempted to believe the machine actually knows what the words mean...

The ""standard"" proof that the meaning of the word was lost on the machines was translating this phrase: "The spirit is noble, but the flesh weak". Translating to a foreign language, and then back to English typically resulted in compliments for the wine and complaints about the meat... However, the genius of Google translate is the continuous learning, augmented by humans. If it got your translation wrong - you can teach Translate about the true meaning of the phrase. So far, it seems that French, Hebrew and even Korean round-trip correctly to English. Learning is an ongoing process...even for machines. Who knows we might have those electric monks sooner than later...

So, the cloud can provide a huge source for learning.... not just for tweens doing homework, but for machines too.

How does AI help the cloud? And where's that backward LISP thing?

The Target story involved lots of human intuition, and lots of data. A set of problems in AI, Unsupervised Categorization learning, is aiming at replicating the human intuition component. The CorssCat algorithm from MIT's CASIL is a really interesting (and successful) attempt at replicating the brains ability to find patterns in apparently random data. Given objects and features, it can discover different, even orthogonal, systems of categories to which these objects belong.
The core idea is simple - pick objects' features to use for a classification system, randomly. Then classify the objects into the resulting categories. Using Bayesian inference, score the probability of the resulting classification. In English, assume that the random set of features and categories is correct and calculate how likely it is for the objects fit into their assigned categories.
To estimate the probabilities (exact computation is really expensive) Markov Chain Monte Carlo (MCMC) are used to randomly pick values for variables, based on the current best estimate of the probability of the result.
Ok, that last sentence was a painful introduction to the concept of probabilistic programming, and specifically the Church language..... which runs LISP probabilistically, backwards. A program in Church provides answers to question like "how likely is this to happen?", or "what's the most likely categorization?". This is not your daddy's lisp...

A probabilistic algorithm basically has normal logic, interspersed with random functions. Whenever a random function is encountered, the system produces a ...random value for it, based on a probability distribution function. E.g. if you model a coin flip, (the 'flip' function in Church) - you'll get Heads half the time, and Tails the other (assuming a fair coin, it's likely to be 50/50).  The algorithm keeps running with the selected value(s), computing the overall result. Once a result is computed, the "query" function (the workhorse of Church) walks the evaluation history that lead to this result, and computes the joint-probability of the random choices made along the way. This overall probability is the desired outcome - how likely, overall, is it to arrive at this particular outcome. The system runs the algorithm repeatedly, making random choices and evaluating their results, and the probability of each result. Depending on how it was written, the program either produces the most likely result, or a distribution of the possible results.

Back to Target... applying CrossCat, Target could mine their shoppers' buying habits, and identify categories of shoppers based on their historical "features". Shoppers that share enough of the same "features" would probably fall into the same category, and are thus likely to share behaviors not yet exhibited...

AI can extract that information from all the data produced in the cloud, using human like intuition!

Monday, March 12, 2012

Cloudy data

Or, if you have the bits, but not the information....

I'm sitting at an event which combines a common thread of thoughts that has been floating in my head - Hadoop deals with lots of data... but how do I get to the Information contained there.
ThaDa... Big Data and machine learning are made for each other.

I've just learned about the Mahout - the tool to make information out of data, by using machine learning! Definitely something to look at.

I've spent some time reading and digesting some of the AI topics, with the results in this followup post

Friday, February 3, 2012

Openstack and Pizza !!!

Or ... some thoughts about organizing OpenStack Meetups

On February 1st I've had to joy (and hassle) to coordinate another Boston OpenStack meetup. This time, using Harvard facilities, which are quite different than our previous venue - suffice to say that Harvard is very different than the Lexington Historical Society (you don't need to know where the vacuum is). But I digress.

If you were in the room, you know all about the value of a community, so skip ahead to the "closing notes" for links to preso's, future events, acknowledgements and such.

What I find exciting about open source projects is the sense of shared mission and destiny. An open source project fails or succeeds to a large extent based on how well it builds a community.  To be successful a project needs to create a dedicated community of users, developers, vendors and service providers.

Users have real problems to solve. Real use cases, real businesses, real money.
Developers want to write cool code, which solves real problems and delights users.
Vendors want to solve the "interesting" problems that address the concerns of paying customers that benefit from the "special sauce" vendors bring to the table (as opposed to table-stakes features and capabilities )
Service provides perfect their delivery capabilities and differentiate on customer service and operational excellence.

To be successful... each constituent of the community needs to see the value derived from the community, to justify the investment in the success of the community.

OpenStack in particular, is great about bringing the above stakeholders together twice yearly for Expo / design summit (next one is in April - be there or be square !). Its a great forum for the community to interact. Developers hash out issues. Users provide unfiltered descriptions of their problems looking for solutions. You get the drift. But I digress again - where's the pizza?

Pizza brings all the stakeholders above, into a room, to share in between design summits.Share the good, the bad, and the ugly. The capabilities that are there, and the ones that are missing (and the ones that are redundant...)

In our meetup tonight (yup... here's the pizza) we've had folks for major vendors (can't drop names here... but my whole team from Dell was present, as well as other 2,3 and 4 letter companies); Service providers; Users  and obviously some developers (me included). The "formal agenda" revolved around Quantum and where is the OpenStack Foundation headed.

Closing notes

David Lapsley's presentation, introducing OpenStack's Quantum Network-as-a-Service can be found here.
The discussion about the OpenStack foundation was... a discussion, hence no preso. A few resources are available online:

While we didn't officially talk about Crowbar (my project) this time around, there where quite a few questions and comments about that. Crowbar is our approach to apply DevOps principles to deploying OpenStack. You can find the code on github. Make sure to check the wiki and Rob Hirschfeld's crowbar posts (but poke around.. there's lots of good cloudiness there).

Some of the ideas for future meetups we discussed included (at the end of the meetup and in 1-1 conversations):
  • Hacking on OpenStack - getting started
  • Hackday for Essex - once the Essex -4 milestone is ready (feature frozen) lets start working deployments
  • Putting OpenStack to use - users (potential and current) discussing how/what/if is Openstack best applied to.
If you have additional suggestions, would want to participate in the forums or events - please contact me (simplest method is via the Meetup Page Contact Us link. Sorry - it does require you to signup to meetup). 

Last but not least, thanks to the folks that help sponsor the evening:
  • I work for Dell... and not only do I spend working hours (in between coding mini-vans) organizing this meetup, Dell also flips the bill for logistics.
  • Rackspace fed us with delicious pizza and salad, and sent us some cool T-shirt explaining how free is OpenStack ("Free as in Beer Speech & Love"), that definitely stand out in a crowd
  • The School of Engineering and Applied Science at Harvard provided the space. Specifically IACS - Institute for Applied Computational Science. (Thanks to Tricia at the Student Affairs office for being a great team to work with!). Oh yes - SEAS offers some interesting courses (some available remotely online), and events worth checking.
See you all at the next event !

Friday, January 20, 2012

To Be (HA) or Not to Be

Or, what does it really mean to be highly available in the cloud

Good IT practices try to maximize SLA conformance, especially around availability. Lessons learned from a disk failure in the Exchange server leading to mail outages and the inevitable fire drills have been deeply embedded into minds. REDUNDANCY EVERYWHERE. power supplies, network connections, disks - if you can put 2 of them suckers in there, you do. Just to keep that machine running. That machine should never fail.

The web has mitigated things somewhat. Rather than a relying on hardware redundancy (where you don't use half your equipment), deployment strategies have evolved. A large pool of web servers can sustain SLA's with some servers failing by utilizing load-balancers to only direct traffic to live web servers. This scheme brings with it worries about session state availability and other share information (e.g database) but nonetheless its progress. Since hardware is now allowed to fail, software developers came up with schemes to work around the failures.  Distributed clustered session stores, MySQL clusters or just replicas gained lots of traction  (circa 2000). Shared Nothing became a new mantra.

The Shared Nothing revolution got to a full swing, and formalized in various best-practice architectures that span the whole application stack, not just the web-server front end. These architectures rely on distributing both load and risk of failure; rather than a single big, expensive server, many small cheap and coordinated ones are used. If more capacity is required, more (small & cheap) servers are added, to match the load.  If one machine fails, the load is redistributed among the surviving. If data is persisted, its never on just one node, it's replicated to a redundant one.
These principles obviously add various complexities (e.g. the CAP Theorem, which captures succinctly the available trade-offs.  Consistency, Availability or Performance - you can have any 2, but not all 3 in any solution). But they provide benefits too (below)

Enter cloud.
If your application has followed the architecture evolution curve, the cloud is your friend. You can scale out as load increases, and obviously, pay for just the capacity you need.  Amazon goes so far as providing  guides (pdf) on how to optimize both your architecture and your cost.

But what if your application is still in the stone age? What if you're application is designed to run on a single server, but you still want to use the cloud?

  • If you need more capacity, you need to resize your server to the next size. Based on published pricing, every step up is pretty painful ($/hr) 0.5, 1.00, 2.00 and on. If your app was scaling out, you'd go from 1$ to 1.5$ rather than 2$.
  • If your provider decided to reboot your instance, you'd be scrambling to stand up another server, where they're not being rebooted (andyou probably didn't really build deployment automation, did you?) and then take care of the plumbing (move IP's or update DNS and all that fun). With an evolved architecture, you'd care about a few of your instances, but just to the extent that not all of the instances for the same function will be restarted at the same time. Your auto-scaling infrastructure could potentially just make magic happen
  • That availability figure (99.95% for amazon) could actually get put to practice and you hit that 0.05% chance. Those 3.6 hours a month or that day and a half a year hits and you're server goes puff....together with your app. The refrain is probably familiar by now, so I won't repeat it a 3 time.
While these are obviously risks present in your own data center, not just in the cloud, they're out of your control in the cloud.
The take away is probably pretty clear - but I like to be explicit. To be happy and prosperous in the cloud, you have to evolve, and forget about your traditional notions of HA.