Wednesday, September 3, 2014

Thinking in Gigabytes and Cents

AWS offers a variety of storage options that fit different usage patterns, retention needs and cost profiles. When making an architectural choice of storage in the cloud, today you have a multitude of options to achieve your technical goals. For example, you can use RDS, or stand up your own database using EC2 and EBS. While the two options provide an almost identical service, they differ in the flexibility and cost profiles - with the right one for you dependent on your specific use case.
Below is a summary of the different categories of storage options available from AWS, and a short summary of the price drivers behind them:

  • Functional storage: This category includes storage exposed as a functional component. These are databases, traditional or modern. While these services store data, they main pricing driver is the speed of access  - the faster the required access, the higher the price. In this category you’ll find:
    • Relational Database Server (RDS) in various flavors
    • RedShift - A highly scalable, clustered, MySQL variant.
    • DynamoDB - a predecessor to Cassandra, a nosql highly scalable database, which trades flexibility in query for speed and scale
    • Elastic Map Reduce - A managed hadoop service, which can leverage different underlying storage options.
    • ElastiCache - managed Redis or Memcache - for smaller, but memory speed, datasets.
    • CloudSearch - a fully managed service similar to ElasticSearch.
  • Block storage: These services look and feel like traditional SAN services, and are priced in similar terms - namely: data volume used.  In this category:
    • Elastic Block Storage (EBS) provides 3 types of volumes (Magnetic, IOPS provisioned and SSD) with differences  in pricing and performance. EBS Snapshots are an add on to EBS, which provides highly resilient backups (on top of S3, see below).
    • Storage Gateway - A path into the cloud for the traditional datacenter. It offers a transparent hybrid on-prem/cloud tiering option where backups or less frequently accessed data can be stored in the cloud on top of either EBS, S3 or Glacier (below).
  • Object Storage: Simple Storage Service(S3) was a game changer in terms of $/GB stored when first introduced in 2006, and prices has been dropping. To provide more flexibility in $/GB, AWS has also introduced:
    • Reduced Redundancy storage option for S3. This option keeps the same API, but a lower level of redundancy (with the possibility of rarely losing objects) for a 20% price cut.
    • Glacier long term archiving. With a 66% lower price for storage than S3, it’s worth considering, but only if your data access patterns match the intended use case - rarely accessing large percentages of data, and having the patience to get it.

Choosing the Right Storage for Your Need

The options are many, and target different use cases - those requiring huge amounts of archive data for mostly offline storage, to high level 10,000’s request per second highly available database. The selection criteria must obviously include suitability to the problem at hand. While you can’t use Object Storage as a backend for a database, you also shouldn’t neglect the pricing model details.

Practical Example

As a simple example, consider the options in using EBS volumes:

  • At current pricing, Magnetic Volumes at  $0.05 Per GB/Month is 50% cheaper than SSD storage at $0.10 Per GB/Month.
  • For a 10GB volume which has 200 sustained IOPS during the month, the total charges will be: 37.2$ for magnetic but only 1$ for SSD. SSD volumes do not incur additional cost for IOPS, while magnetic ones do. What appears to be more expensive at first look, is actually 300% cheaper.

This example highlights the need to understand the price drivers for the services you are planning to consume, and how they map to your workloads. It’s often hard to predict the exact behavior of different workloads, and in an agile world you should ship first, provide the value, and optimize for cost on an ongoing basis. At least that’s the oft repeated excuse. ;) There is however no excuse to not performing the on going monitoring and required adjustments.

How CloudHealth Can Help

Screen Shot 2014-08-31 at 11.18.50 AM.png
As a founding engineer in CloudHealth, I can tell you first hand the value of an analytics platform that helps customers to understand the drivers behind their spend, justify it and optimize it. Having deep visibility across your cloud infrastructure, usage and spend provides the the ability to make well-informed tactical and strategic decisions around architecture.

Small changes, can lead to big savings… if you can find them.

Monday, March 10, 2014

agility with accountability

Or,  How to retain fiscal sanity and control over cloud usage

The humdrum of Lean, Agility, Cloud and Devops converging has been in the air for a while now. Sometimes it feels like teenagers on prom night discussing their upcoming exploits - many talk about it, some fumble around experimenting, and few are actually doing anything with it.
$ per hour...

The affinity between the movements is evident - minimize upfront investment until value is proven, while ensuring you can grown your investment and capacity as this occurs. Lean applies it to the business model, Agile to the development process and the combination of DevOps powered by "resources at the point of the mouse" of the public cloud bring it to resource provisioning.

The mixture of DevOps and public cloud produce this mixture that allows companies such as AirBnB to prove their business and scale it up. There's one big fly in this otherwise delicious soup - the bean counters. They want to be able to exert the level of control over budgets and spend that they're used to in the world of data centers and 6 months server provisioning cycles.  When an Autoscaling group launches a handful of additional instances to accommodate load, there's no excel sheet circulated in triplicates to provide approvals...where's the control?

Even more scary, AWS offers a huge diversity of instances, costing anywhere from 14$ a month to just about 5000$. A simple miscalculation or misconfiguration, say using a i2.8xlarge instance ($6.820 an hour) instead of a c3.8xlarge ($2.40 an hour) can be really costly if left uncorrected for a  whole month. It's the difference between the expected ~ $1700 with the actual of ~ $5000.
And that's just 1 instance. In a whole zoo full of such pricy items... all the beans are spilling out of control !!!
(To keep to my main point, I'm not even touching the various purchasing options that AWS offer ranging from Spot to 3 year reservations!)

One approach, which stogy companies apply (see my CV) is to restrict access and limit agility, Control is retained and the world is happy. Well.. part of the world.

Another approach is to retain the benefits of agility, but add layer of accountability on top of it.
Lets say, that you could look across an organization that runs 500 instances (or 5000) and you could quickly answer these questions:
  • What changed over the last week?
  • Who performed those changes? In what business unit? supporting what product?
  • Are those changes justified by the environment they're in? by the performance profile of the workload? 
  • What is the cost impact of those changes?
In the example above the $3300 mistake ($5000-$1700) will be much smaller, because the error is caught within the week, rather than waiting for the monthly bill. 
If the auditing is daily, or hourly... even if there were a few dozens of these instances launched, the mistake has almost negligible financial impact!

The solution to the conundrum - agility or control, at least in my mind is obvious - Let innovation roam, take the risk that mistakes will happen, and famously "move fast and break things" - others have proven it might just be worth it. But don't forget to keep an watchful eye on the cost of mistakes, and catch them when they're negligible.

Sunday, February 23, 2014

When the doors are locked too tightly...

Or, Letting Power users change their password

  This is a short post, venting some frustration with a silly AWS default, with the hopes of sparing someone else the joys.

  Best security practices call for frequently changing your passwords, that's just common sense. AWS Identity and Access Management goes to the extent of providing some really cool tools to ensure that happens. IAM Roles provide a mechanism to allow software running on designated EC2 instances to retrieve "frequently" rotated access credentials. Seems like a well thought out solution to a common problem - how to let your software in EC2 securely access AWS resources, without embedding credentials in your AMI or code.

  That said, allowing users to change their console password, even users whose policy is Power User. True, you should probably not really use the console... there's an API, but the default Power User template prevents all and any IAM calls, with this policy statement:

Reading around, the ""trivial"" IAM policy evaluation, well... I didn't finish. What I did end up playing with is the very useful (though not extremely intuitive) Policy Simulator (requires logging in to the management console, with very high permissions - see users and policies. I used a root account). It allows simulating performing a set of actions (from any service) by a given principal and in a given context (i.e. set variable values for those mentioned in your policies). After the fact, I ended up finding it's documentation which consists of single page with a video. Love the tool... wish it was linked into the console, and a bit easier to find. Venting over.

  To reveal the end of the story, this policy allows users to modify their password:

After authoring what I thought was the right policy, I tested it on 1 user. And it didn't work. The missing incantation (no small animals harmed) was the "Version" (line 2 above). While the policy evaluation document doesn't mention it, apparently the declared version matters a lot ! Without the version, the permission didn't take affect.
Where do you get the right version you ask? Not from the API docs you don't... those still claim a version of : 2010-05-08.
Fortunately, there's an example page that lends a hand.