Pricey Architecture

Or, How architecting in the cloud is different  When designing cloud scale always on system, system architects are expected to be experienced in core system requirements - scale, security and high availability. By this day and age, this art is pretty well understood. The public cloud is a great help in driving solutions to those core concerns by providing the hard-to-acquire and hard-to-build foundational elements: Apparently infinite amount of compute and storage capacity for scale Fine grained control at the network and API level for security Fault zone isolation in the form of independent zones and regions for HA However, the tax the cloud imposes on these building blocks, even if not apparent at first, is its own complexity. If you don't consider the pricing models for the underlying building blocks and misapply them the tax is converted to $'s. Here are a few examples from recent design mishaps I've witnessed. An expensive scaling story: What with a

Dude I'm (not) Getting fired

Or how to make the conversation be about $200 mistakes rather than $20,000 mistakes. It appears that the common wisdom about cloud has finally caught up - the main benefit in leveraging the cloud is all about agility and other factors (e.g. cost) are secondary. The ability to go to market quickly, with new prototypes or actual solutions, is critical for competitiveness. The evidence supporting these statements is most visible in the movement of large enterprise organizations into the cloud, and the growing ecosystem of MSPs and supporting businesses. However, agility, while ignoring costs, is sometimes risky and.. pricy. Here are some horror stories I have heard (and committed), while enjoying the benefits of agility in the cloud: Volumes of Couch Potatoes: To support overnight backend processing in an economical fashion and leverage the dynamic nature of the cloud, we setup an Auto Scaling Group - we automatically provisioned instances & storage to process 100’s of GB’s o

Why is this blog so UGLY

AND, hard to read to boot. The short answer: intentionally. The rumor: because I can't create a decent UX even if my life depended on it. (Dont believe it). So why so ugly and hard to read? Stats and Tracking, and selective user targeting. Lots of people will read the site with the catchy headline, picture rich and attractive looking pages. I do that while waiting in the checkout line, and looking for something to pass the time with.  The marketing industry got a name for it - ClickBait . I however am not looking for clicks. I'm looking to find which ideas resonate with people. I'm looking to see which entries get passed hand to hand and have an escalated readership. So, I keep it ugly intentionally. If you tend to judge books by their cover, please move on. Ugly cover here. Please move off this page in < 5 seconds as to not skew my stats. If on the other hand, you find the ideas intriguing, by all means, drop me a note, sing me a song or just enjoy and c

SaaSy Cloudy SSD's

Or, Should you abandon old wisdom In the world of Public Cloud little is stable, especially Common Wisdom. Following accepted Common Wisdom blindly leads to lost opportunities to capitalize on these enhanced offerings. Case in point - databases on EC2. Databases are demanding beasts, which  presents a few challenges: Databases tend to be mission critical. Recovery Time Objectives (RTO) and Recovery Point Objectives(RPO) are very stringent These demands are somewhat in conflict with the fickle nature of public cloud - your servers might disappear or fail with little notice. In the datacenter this implied highly redundant hardware and expensive and scale up architectures.You have more data, you get a forklift to deliver a bunch more disks for your SAN (or whatever your storage solution is) , and a few more blades/chassis to increase the number of cores in your Oracle RAC cluster. The first generation mapping of the old datacenter architecture into the public cloud had thes

Thinking in Gigabytes and Cents

Origin post from CloudHealth Engineering blog. AWS offers a variety of storage options that fit different usage patterns, retention needs and cost profiles. When making an architectural choice of storage in the cloud, today you have a multitude of options to achieve your technical goals. For example, you can use RDS, or stand up your own database using EC2 and EBS. While the two options provide an almost identical service, they differ in the flexibility and cost profiles - with the right one for you dependent on your specific use case. Below is a summary of the different categories of storage options available from AWS, and a short summary of the price drivers behind them: Functional storage: This category includes storage exposed as a functional component. These are databases, traditional or modern. While these services store data, they main pricing driver is the speed of access  - the faster the required access, the higher the price. In this category you’ll find: Relat

agility with accountability

Or,  How to retain fiscal sanity and control over cloud usage The humdrum of Lean, Agility, Cloud and Devops converging has been in the air for a while now. Sometimes it feels like teenagers on prom night discussing their upcoming exploits - many talk about it, some fumble around experimenting, and few are actually doing anything with it. $ per hour... The affinity between the movements is evident - minimize upfront investment until value is proven, while ensuring you can grown your investment and capacity as this occurs. Lean applies it to the business model, Agile to the development process and the combination of DevOps powered by "resources at the point of the mouse" of the public cloud bring it to resource provisioning. The mixture of DevOps and public cloud produce this mixture that allows companies such as AirBnB to prove their business and scale it up. There's one big fly in this otherwise delicious soup - the bean counters. They want to be able to exer

When the doors are locked too tightly...

Or, Letting Power users change their password   This is a short post, venting some frustration with a silly AWS default, with the hopes of sparing someone else the joys.   Best security practices call for frequently changing your passwords, that's just common sense. AWS Identity and Access Management goes to the extent of providing some really cool tools to ensure that happens. IAM Roles  provide a mechanism to allow software running on designated EC2 instances to retrieve "frequently" rotated access credentials. Seems like a well thought out solution to a common problem - how to let your software in EC2 securely access AWS resources, without embedding credentials in your AMI or code.   That said, allowing users to change their console password, even users whose policy is Power User. True, you should probably not really use the console... there's an API, but the default Power User template prevents all and any IAM calls, with this policy statement: Reading

Cats, Cattle and Zebras

Or, the Animal kingdom in the cloud Something about IT seems to attract references to the animal kingdom. It might be caused by lively imagination or the unkempt nature of practitioners in the field (I'm thinking about myself). For example, the Agile movement seems to like the story of " the Chicken and the Pig ". Just as an indication as to how prevalent this meme became, I've been asked (actually, usually I do the asking) - who are the chickens and the pigs in a meeting. Only those few unaware of the meme took offense. As most powerful  memes , this one helps in communicating briefly a much more complex set of ideas, hence it's power. "You promised Cats…." you might say about now, "and all you've talked about are pigs. And when did this blog become veterinarian focused?" Bare with me for a bit... Ok. Cats. A more recent meme I've been hearing about is "Cat vs Cattle", or more frequently known as "Pet

Your Customer's pain is not always yours

Or, the one and the many The inspiration for this post was a discussion with Q&A folks about how Crowbar should behave when failures are encountered while configuring  the storage subsystem on a node. Well, that and a binge of reading and listening to folks talking about Lean Startups and the importance of solving real customer issues. The Q&A engineer was adamant, that on a server with 24 drives, Crowbar should be just ignoring a single failed drive, and just use the other 23. For the use case he was trying to solve, this might make sense. He had limited resources (only a handful of servers), and needed to quickly turn up a cluster. The fact that Crowbar flagged the server with the bad disk as having a problem, and refused to use it was nothing but annoyance to him. Crowbar was designed to enable DevOps operations at very large scale. In a recent customer install (more about it in another post, i hope), the customer purchased 5 racks of servers, rather than 5 servers,

Democratizing Storage

Or, you control your bits Traditional storage solutions gravitated towards some central bank of disks - SAN, NAS, Fiber Channel, take your pick, they share a few traits that are not very democratic: They cost lots, and large parts of the cost is Intellectual Property embedded in the solution (i.e. the markup on the underlying hardware is huge) The OEM makes lots of trade-off decisions for you - e.g. ratio of controllers to disks, replication policies and lots of others (most OEM's graciously expose some options that the user can control, but those are just a tiny fraction) They typically require 'forklift updates' - if you use up your capacity, call the forklift to install the next increment, which typically requires a forklift worth of equipment On the plus side, in general those type of systems provide you with reliable, performant storage solution (based on the $$$ you spend, you get more of either qualities). But, in the world of large scale deployments ba

Openstack 'secret sauce'

Or, some less than obvious reasons why refactoring is "A Good Thing" At a meetup tonight, someone challenged me to explain what's really good about Openstack. This was in the context of Openstack-Boston /  Chef-Boston discussion about Openstack and the effort around Community deployment cookbooks, and an approach that uses Pull From Source (which I'll post about in a later date). While I could have spent lots of time describing the CI testing infrastructure and the great work done by Monty and his team, frankly that's not unique to Openstack. It's an enabler for lots of other things. To me, one of the primary sources of excellence in Openstack is the courage to refactor. Not too long ago, there were only 2 services - Nova for Compute, and Swift for Object storage. In Grizzly, through large efforts, there are dedicated services with clear focus and dedicated team passionate about the technology area each services. One of the first refactors was Key

Object Store in OpenStack–the secret weapon

  Or, why swift rocks Swift is the Object Storage component of Openstack. Object Storage is a new paradigm for storage. It’s not files (as in a filesystem) or blocks (as in SCSI variants), but rather Objects. Objects are immutable; once written, their contents can’t be changed, only replaced. Why would you want that? Think facebook, or tumblr or flickr – you’re not likely to ever update the content of an image… and the benefits that Swift brings are worth the loss of this capability. To clear some confusion, Swift is SOFTWARE. It is not a piece of hardware or an appliance, like a NAS box. It is not a Service (Like the S3 offering from Amazon, or CloudFiles from Rackspace).  Swift is software, and free open-source software at that. The cost to deploy it is driven by the choices of hardware (physical or virtual) and operational choices made. But what considerations drive those decisions, and what are the tradeoffs? A coarse understanding of what magic Swift performs and how, is requi

Stacked for Business

Or, my impressions from the OpenStack Folsom Summit OpenStack proved again that it's a community effort. The number of developers in design half and overall participants in the Conference part was amazing. Most design sessions (I focused on Quantum and a few Nova sessions) were standing room only. literally. The representation of big name vendors was palpable, though it was nice to see that big names didn't necessarily get undue influence on the conversation. (For my part, the fun started on the way there , on an RV for 1800 miles...but that's a story told somewhere else ) My major takeaway is that Openstack is open for business, in multiple ways. The first is that Openstack is production ready. This was manifested even prior to the summit, with dueling announcements from HP and Rackspace about deploying the Essex release into their respective production public clouds. A second way in which business friendliness is achieved is by opportunities More supporting evidence

Cloud and AI

Or, Running Lisp Probabilistically, backwards. It started at a talk I attended in previous post . Yes, there's been lots of talk about analytics and cloud, even made famous in popular media .... But the "whats" and the "hows" are in constant flux. The curiosity triggered by the subtitle, lead to a weekend filled with AI and to this post. AI is a big space. I'm focusing on one small part, Machine Learning. And specifically, translation and categorization, in exploring how The cloud supports AI, and AI supports the cloud... What does the could have to do with machine translation? The early attempts at machine translation (circa 1950's) went down the path of Natural Language Processing. They failed for various reasons. current approaches  (do the quizzes!!), as in Google Translate, play a matching game. Use the cloud to collect ridiculous amounts of sample translations, e.g a newspaper which publishes in more than language, restaurant menus and other

Cloudy data

Or, if you have the bits, but not the information.... I'm sitting at an event  which combines a common thread of thoughts that has been floating in my head - Hadoop deals with lots of data... but how do I get to the Information contained there. ThaDa... Big Data and machine learning are made for each other. I've just learned about the Mahout  - the tool to make information out of data, by using machine learning! Definitely something to look at. Update: I've spent some time reading and digesting some of the AI topics, with the results in this followup  post

Openstack and Pizza !!!

Or ... some thoughts about organizing OpenStack Meetups On February 1st I've had to joy (and hassle) to coordinate another Boston OpenStack meetu p. This time, using Harvard facilities, which are quite different than our previous venue - suffice to say that Harvard is very different than the Lexington Historical Society (you don't need to know where the vacuum is). But I digress. If you were in the room, you know all about the value of a community, so skip ahead to the "closing notes" for links to preso's, future events, acknowledgements and such. What I find exciting about open source projects is the sense of shared mission and destiny. An open source project fails or succeeds to a large extent based on how well it builds a community.  To be successful a project needs to create a dedicated community of users, developers, vendors and service providers. Users have real problems to solve. Real use cases, real businesses, real money. Developers want to wri

To Be (HA) or Not to Be

Or, what does it really mean to be highly available in the cloud Good IT practices try to maximize SLA conformance, especially around availability. Lessons learned from a disk failure in the Exchange server leading to mail outages and the inevitable fire drills have been deeply embedded into minds. REDUNDANCY EVERYWHERE. power supplies, network connections, disks - if you can put 2 of them suckers in there, you do. Just to keep that machine running. That machine should never fail. The web has mitigated things somewhat. Rather than a relying on hardware redundancy (where you don't use half your equipment), deployment strategies have evolved. A large pool of web servers can sustain SLA's with some servers failing by utilizing load-balancers to only direct traffic to live web servers. This scheme brings with it worries about session state availability and other share information (e.g database) but nonetheless its progress. Since hardware is now allowed to fail, software devel