SaaSy Cloudy SSD's

Or, Should you abandon old wisdom

In the world of Public Cloud little is stable, especially Common Wisdom. Following accepted Common Wisdom blindly leads to lost opportunities to capitalize on these enhanced offerings. Case in point - databases on EC2.

Databases are demanding beasts, which  presents a few challenges:

  • Databases tend to be mission critical.
  • Recovery Time Objectives (RTO) and Recovery Point Objectives(RPO) are very stringent

These demands are somewhat in conflict with the fickle nature of public cloud - your servers might disappear or fail with little notice.

In the datacenter this implied highly redundant hardware and expensive and scale up architectures.You have more data, you get a forklift to deliver a bunch more disks for your SAN (or whatever your storage solution is) , and a few more blades/chassis to increase the number of cores in your Oracle RAC cluster.

The first generation mapping of the old datacenter architecture into the public cloud had these general guidelines:

  • Store your data on highly available storage - EBS volumes. To achive the required stroage capacity and IOPS performance, RAID as many volumes as you can manage.
  • To get better performance, shell out extra for Provisioned IOPS. In many cases the cost of provisioned IOPS dominates the cost of storage, over the actual storage costs.
  • EBS, considered the most reliable online storage available, helps ensure RPO. In case of volume failures (which used to be much more frequent) recovery from volume snapshots + binlogs allows for RPO.
  • RTO is achieved by one of the options below (order from most expensive to least):
    • fully replicated hot standby, effectively running 2x the server capacity
    • warm standby for sub minute 
    • no standby, but automating launching a new instance and rebuilding the RAID set for < 10min RTO. 
These strategies became the common wisdom, and any change to this blueprint was considered taboo (e.g see https://aws.amazon.com/articles/1663).

This blueprint has a few shortcomings:

  • While EBS is very cost effective, the blueprint requires provisioning large amounts of storage upfront, negating the benefit of consumption based pricing
  • Scaling up requires changing instance types, and is inherently limited by the available cloud provider offerings

For the willing to go unpend common wisdom, there are better options enabled by new storage offerings from AWS (SSD, Dense Storage, GP2 Volumes).

The superior blueprint has these characteristics:

  • Store the database on instance storage - leveraging SSD's or dense magnetic storage. 
  • Scale out rather than up - this is partly required because of smaller capacities available on instance storage vs EBS.
  • Leverage other storage options for achieving RTO and RPO objectives.


Conventional wisdom didn't consider instance storage suitable for a database because its ephemeral nature - its contents get lost if the instance gets destroyed. The need for resiliency outweighed performance and cost considerations.
This approach forgoes the very cost effective performance benefits new storage offerings enable - up to 120K IOPS. Achieving this performance on EBS (not that it is achievable) would much costlier.

How is cost effective resilience achieved, when using instance store then? Simple have a backup strategy that:

  • Frequently snapshot the database (e.g. every 4 - 24 hours)
  • Stores sufficient binlogs on EBS to cover 2 full snapshot periods (at least 2 days worth)


In case of a database instance failure, recovery involves loading the last snapshot and applying the binlogs from the resilient EBS volume.

Another consideration when switching to using instance storage, rather than EBS is scaling the storage capacity. Instance storage option range from 40Gig (c3.large) - 2T (on d2.xl). While with EBS you can use larger volumes, or multiple volumes, this option is not available for instance store.

Larger databases than available storage require a sharding strategy - whereby different database instances are deployed to house fragments of the whole dataset, partitioned on logical boundaries.
In the world of SaaS offerings, these boundaries are often apparent - a  user, a tenant (in multi tenant environment etc).
If you current application is not setup to work with shards, there are options to avoid changing your code, e.g. Tesora's Database Virtualization Engine.

As opposed to the ""traditional"" scaling strategy, this approach has the following benefits:

  • Storage (and compute) capacity is provisioned as the need arises, leveraging consumption based pricing.
  • As the data set grows, more compute resources are deployed together with storage
  • There are no bounds for scale.


Comparing performance, when using EBS an instance is limited to 48k IOPS, even with EBS Optimization enabled. To realize this IO performance at least 3 volumes would need to be attached (because of the 20k IOPS limit for a volume).
Compare this to instance store - up to 315k provided by SSD instance storage on i2 instances. For EBS, on top of the charges for storage (0.10$ GB-Month), expect to add up to 1300$ per volume for provisioned IOPS and up to 30$ for EBS-Optimized charges for the instance to realize the throughput to the EBS backend.

For instance store based solution, all the costs are part of the hourly usage charge!

We have been successfully running with this setup ever since AWS released SSD based instance store on the C3 family, and have not lost a single bit of data.

As a side note, unfortunately, RDS does not yet allow you the option of leveraging instance storage, you are forced to attached EBS storage of some type. If you want the price effective performance, you have to roll out your own Db. That said, RDS has a good history of catching up with new practices.


New realities in any realm require re-evaluating historic dogmas, and in the world of cloud, reality changes often, so take a step back, and evaluate if you're squeezing all the performance that's available to you.






Comments

Popular posts from this blog

Pricey Architecture

Why is this blog so UGLY

agility with accountability