Friday, June 8, 2012

Object Store in OpenStack–the secret weapon


Or, why swift rocks

Swift is the Object Storage component of Openstack. Object Storage is a new paradigm for storage. It’s not files (as in a filesystem) or blocks (as in SCSI variants), but rather Objects. Objects are immutable; once written, their contents can’t be changed, only replaced. Why would you want that? Think facebook, or tumblr or flickr – you’re not likely to ever update the content of an image… and the benefits that Swift brings are worth the loss of this capability.

To clear some confusion, Swift is SOFTWARE. It is not a piece of hardware or an appliance, like a NAS box. It is not a Service (Like the S3 offering from Amazon, or CloudFiles from Rackspace).  Swift is software, and free open-source software at that. The cost to deploy it is driven by the choices of hardware (physical or virtual) and operational choices made. But what considerations drive those decisions, and what are the tradeoffs?

A coarse understanding of what magic Swift performs and how, is required to make intelligent deployment design decisions. The magic is to convert a bunch of cheap, unreliable machines and disks into a very reliable and performing storage system. To improve reliability Swift is designed to expect failures. Data is replicated to multiple disks, such that no single disk failures don’t lead to data loss. Replicas are located different “zones” (different machines, different racks etc.), to further isolate failures. The secret for swift performance is the ability to add  resources selectively to match the workload, in a scale out manner. To see how this works, it is useful to describe the main components of swift.

Swift has the following cooperative servers:

  • Proxy server – Exposes the user facing API. This server accepts requests, validates them and generates requests to the other servers.
  • Account and Container servers – These are the metadata servers in swift. Accounts represent different tenants or users in a shared environment. Each account can has containers. Containers hold objects. There are no limits on the number of accounts, containers with accounts or objects within a container.
  • Object server – this is where the real data is stored.

(each server has an associated set of processes which together fulfill the server’s mission.)

To spread the load among the different locations data is stored, Swift uses a Consistent Hash algorithm. ‘Ring’ files contain the consistent hash mapping form Object names to disks (well, actually ‘partitions’ to ‘disks’), and need to be synchronized across all the servers by an outside process. All the metadata for swift (i.e. list of containers for an account, objects in a container) is stored in swift itself, and is effectively treated as objects. The more disks (obviously in machines) are added, the more capacity is available, and the more (potential) performance is available

While server roles can be co-located on the same physical machine, they don’t have to. Both the software and hardware configurations of the servers can be tuned to reflect the needs of the environment, and the tradeoffs between them. Other than the obvious cost vs. performance tradeoffs, others include:

  • Speed vs. resiliency – is ensuring no data loss more or less important than serving requests fast?
  • Read vs. Write – performance tuning and system design can favor one or the other.
  • Distributed environment – are there dispersed users (across slow links) ?
  • Cheap or expensive hardware – in other words, lots of high failure rate devices vs. fewer less flaky ones

This post is getting long, so the next one (or ones…?)will discuss tradeoffs and how to reflect them in swift.