Enterprise Grade Cloud Enabled by the Ecosystem

While investing in building new data centers all over the world and creating the management overlay in order to be able to sell their hardware, IaaS operators are also relying on their ecosystem to support the evolving enterprises that go to the cloud (e.g. the “Enterprise Grade Cloud”).

API First – The move to the cloud pushes the data center to re-invent itself within the new environment. It is a fact that, although the cloud is a pure revolution (at least in MHO), terms such as SLA, TCO and ROI are still valid in this new IT era. Thanks to industry leaders such as Salesforce.com that realize the notion of “API first”, vendors such Amazon cloud present new capabilities first through their APIs. In this way, the cloud operator platform enables development of its ecosystem.

Continue reading

5 Key Essentials of Cloud Workloads Migration

imageThe benefits of migrating workloads between different cloud providers or between private and public clouds can only truly be redeemed with an understanding of the cloud business model and cloud workload management. It seems that cloud adoption has reached the phase where advanced cloud users are creating their own hybrid solutions or migrating between clouds while striving to achieve interoperability values within their systems. This article aims to answer some of the questions that arise when managing cloud workloads.

Continue reading

The Cloud in HP’s Cloud (Part 2): HP Discover, the Enterprise and AWS Cloud

imageLast month I attended HP Discover (disclosure: my participation was funded by Ivy World). The IT war already started however HP stands still not taking initiatives and real risks as true leaders should take. At the three-day conference I learned why some companies don’t last and why this IT giant is at a great risk of losing in this new era IT battle. This is a story of a lasting company that might have already lost.

> > > HP’s Washes the Cloud

Continue reading

Who Stole my CPU ?

One of the most important features of the cloud is the sharing of resources by multi-tenants. Without sharing and being able to optimize utilization of resources, the cloud operator can’t provide scalability and support “economies of scale” for its business. The IaaS public contains its “cloud magic” as well as real hardware such as computing, storage and network devices. The utilization of these resources should be optimized by meeting demand (by time), hence they must be shared between the cloud consumers.

What is Steal Time?

Continue reading

Amazon Outage: Is it a Story of a Conspiracy? – Chapter 2

In April 2011, when Amazon’s cloud s east region failed. I posted the first chapter of theAmazon Cloud Outage Conspiracy – it was already very clear that the cloud will fail again and here it is… Chapter 2

Let’s first try to understand Amazon’s explanation for this outage.

At approximately 8:44PM PDT, there was a cable fault in the high voltage Utility power distribution system. Two Utility substations that feed the impacted Availability Zone went offline, causing the entire Availability Zone to fail over to generator power. All EC2 instances and EBS volumes successfully transferred to back-up generator power.

Ok. So the AZ power failed over to generator power.

At 8:53PM PDT, one of the generators overheated and powered off because of a defective cooling fan. At this point, the EC2 instances and EBS volumes supported by this generator failed over to their secondary back-up power (which is provided by a completely separate power distribution circuit complete with additional generator capacity).

Ok. So the generator failed over to a separate power circuit.

Unfortunately, one of the breakers on this particular back-up power distribution circuit was incorrectly configured to open at too low a power threshold and opened when the load transferred to this circuit. After this circuit breaker opened at 8:57PM PDT, the affected instances and volumes were left without primary, back-up, or secondary back-up power.

Ok. So the power circuit was not configured right and the computing resources didn’t get enough power (or something like that).

> > > Did you get that?

Sounds like it might be something as simple as someone stumbling on a wire that led to all that. Anyway Quora, Heroku, Dropbox and other sites failed again due to the cloud outage and were down for hours. The power outage resulted in down time and inconsistent behavior of EC2 services including instances, EBS volumes, RDS and unresponsive API.

After about 5 hours, Amazon announced that they had managed to recover most of EBS (Elastic Block Store) volumes:

“Almost all affected EBS volumes have been brought back online. Customers should check the status of their volumes in the console. We are still seeing increased latencies and errors in registering instances with ELBs.”

Once Quora was back online, I opened the thread – What are the lessons learned from Amazon’s June 2012 us-east-1 outage? Among the great answers submitted, I want to point to a specific interesting feedback returned with regard to the fragility of the EBS volume, suggesting working with an instance store instead of EBS-backed instances. The differences between these two include costs, availability and performance considerations. It is important to learn the differences between these two options and make a smart decision on which to base your cloud environment.

> > > Education

Anyway, back to our conspiracy. In comparison to the last outage, right after this outage new Amazon AWS experts were born who spouted the cloud giant mantra with regards to its building blocks: Amazon provides the tools and resources to create a robust environment, proudly tweeting that their based AWS service didn’t fail. This proves that the April outage served Amazon well with regards to customers’ education. Though there were still some mega websites that failed again.

So, does Amazon examine if its customers improved their deployments following last year outage? Does the cloud giant continue to teach its customers using outage drills? Is that a conspiracy?

> > > Additional Revenues

The outage raised again the discussion with regards to the distinct availability (AZ) zone. Again it seems that the impacted resources on a specific AZ affected the whole AWS east region while generating API latency and inconsistencies (API errors varied from 500s to 503s to RequestLimitExceeded). High availability best practice includes backup, mirroring and distributing traffic between at least two availability zones. The impact on the region apparent hence the dependency between AZs strengthens the need to maintain cross regions or even cross clouds disaster recovery (DR) practice.

These DR practices include more computing resources and data transfer (between AZs and regions), meaning significant additional costs which apparently support the cloud giant’s revenue growth. Is that a conspiracy?

> > > Final words

The cloud giant is a leader and a guide to other IaaS as well as new PaaS players. Without a doubt – Amazon is the Cloud (for now anyway).

To clarifyI don’t think that there is any conspiracy. This is part of the learning curve of the market, including the customers and the vendors, specifically Amazon. Lots of online discussions and articles were published in the last few days explaining what happened and what the AWS cloud’s customers should learn.

No doubt that the cloud will fail again. I believe that although the customers are ultimately responsible for the high availability of their services, the AWS cloud guys should also take a step back to learn and improve – every additional outage diminishes from the cloud’s reliability as a place for all.

(Cross-posted on CloudAve)

My View on CloudConnect 2012

Last week I attended one of the most popular cloud technology conferences in the world – CloudConnect. The CloudConnect conference started about four years ago. Attending the event gave me a clear understanding of the market maturity and evolution rhythm. Check out the following sections for the main points on what I heard and learned:

>  >  >  >  >  Cloud Performance 

The underlying infrastructure performance, round trip time, bandwidth, caching and rendering are to be counted as the major features of an online service performance. In an interesting presentation by @joeweinman (known by his famous “Cloudonomics” theory), it was claimed that latency holds the greatest weight among these faetures. I encourage you to check out his new research – As Time Goes By: The Law of Cloud Response Time presents some good formulas, methods and considerations with regards to online services’ performance and latency (including simple facts, for example, that people tend to prefer selecting from fewer options on an online page –  so you can have less content on a page and achieve a better browsing performance).

“Multi-tenancy leads to noisy neighbor syndrome” noted @jungledave, Founder and CEO at SolidFire. It is known that the lack of SSD storage components in cloud offerings (mostly due to its high cost) results in uncertainty in cloud storage performance expectations. I invite you to listen to @neovise’s recent podcast with Dave, which discusses solid state disks (SSD) and cloud computing. FYI, Amazon AWS already caught on to the need for fast and robust storage capabilities and deployed DynamoDB on SSDs, which have the benefit of offering predictable performance and greatly reducing latency across the board.

The best presentations are like movies; they should be based on real cases (keep that message in mind, I talk about it more later). One such case is Netflix. Netflix CTO, @adrianco presented methods and principles of scaling data in the cloud including Big Data management, availability, performance security, and more. I suggest checking out his presentation (a cool prezi one), to get the list of vendors and AWS components Netflix uses to optimize its data delivery over the cloud.

It was funny that only the last session’s presentation made by @lmacvittie pointed out the “obvious” first – start from understanding what cause the performance issues and only then try to solve them. I say “obvious” because it is a fact that the appealing ease of provisioning cloud apps and resources leads to the “unknown cloud” symptom (due to the uncontrolled sprawl) that contributes to the uncertainty performance. The “unknown cloud” as an issue found great support in the next day’s morning keynote presentation by @gevaperry, who noted that “a lot has already said about CIOs who don’t know about their own cloud use”. Geva presented a survey that clearly shows that the cloud computing adoption decision in an enterprise is made by the development or business units and not by the IT team – Are you surprised? Read more.

From my deep familiarity with the market, I can confidently add that despite cloud consumers’ recognition of the need to “cut through the fog” of the cloud, proven ways to actually do so are not really available in today’s young market. 

>  >  >  >  >  DevOps doesn’t exist 

I attended the panel “In Search of Mad Cloud Skills” led by the cloud-famous @DavidLinthicum and composed of four IT leaders. David presented some great but simple questions that the participants seemed to struggle to answer. One trivial question – “What do you need to find in the candidate for a DevOp?”  brought discussion around to the obvious need to have someone with development skills who also understands the business needs. The title of the session was aligned with the actual comments of the panel members, saying it is difficult (“Mad”) to find the right skills for their DevOp team.

For me, this session brought an end to the debate of NoOps vs. DevOps. The “DevOps team” is in fact  a development team that plays with virtual blocks in the cloud kindergarten. Integrating the product with the cloud is actually a task for R&D under the auspices of the CTO. That leads to the understanding that the enterprise CIO is actually the new enterprise CTO; if we talk about an ISV, then the CIO holds another position as a senior R&D team leader. NoOps rules and the CIO should look for architects and developers. Learning the building blocks of the cloud and the APIs is one task for the R&D (I remind you: “Research and Development”) team same as learning the overall software offering and the supported business workflows.

>  >  >  >  >  The Openness of Cloud 

Wednesday’s keynote included a panel with Redhat, Citrix and Rackspace, which was moderated by @acroll (a great moderator and presenter) discussing the “Open” perception in the cloud.

The great discussion about the Openness of the cloud actually led to some online #ccevent tweets including the phrase “Open washing”, strengthening the fact that some of the traditional mega vendors are actually “cloud washers” that present the “enterprise cloud” which is in fact an hosted environment supported by a traditional professional service. (You can check out my opinion of HP cloud offerings on a past post.)

“An enlightening panel at #ccevent was the “open cloud” conversation but not for the right reasons. ‘Open washing’ season has started.” tweeted @swardley 

These vendors not only struggle with the fact that Amazon is taking big chunks of their main market but also with the fact that it is hard for them to prove the profitability of real cloud delivery offering based on a real pay-per-use model.

“Citrix: we hate VMware. Red Hat: we hate Microsoft. Rackspace: we hate Amazon”, tweeted @acroll once he got off the stage

Cloud put the need for “Open” on the table. It makes the IT (including the traditional enterprise one) consumers to look for open systems including open source ones. The cloud force IT vendors to prove their low level of lock-in and robust API to enable their customers update and custom the application at a low cost with no touch – check MS Azure marketing messages in regards to their efforts to support open source frameworks (though I am not sure that they really “open”).

Open” is definitely one of the important criteria to decide to go with a solution vendor. The “open” cloud vendor shares its code with the community in order to help others come with better solutions including its own customers. The “open ISV” doesn’t afraid to “lose” its code propriety to competitors and find that being “open” actually increase awareness and positive view of its brand as well as the maturity of its offering.

>  >  >  >  >  “Amazon is Snow White” said @adrianco 

At first I was not sure why Amazon didn’t exhibit at the famous CloudConnect conference but after asking several important people this question, the simple conclusion is that as the strongest market leader Amazon can afford to leave the marketing efforts to the crowd. As the beautiful princess in town you attend only to your own parties and you definitely don’t want to position yourself among the dwarfs.

CloudConnect was really about the major IT market disruption Amazon has been leading for the past few years. In almost every session, the discussion about cloud was actually a discussion about Amazon AWS offering and its design partner – Netflix. Every other offering such as OpenStack, Rackspace cloud and IBM cloud offerings are always being compared with the AWS cloud. The final thought of suggesting they change the name CloudConnect to AWSConnect never entirely left my head (although this might make some of the@Clouderati guys really uncomfortable).

Q: What did the CloudConnect miss?  A: Real Case Studies 

I noted above that great movies are based on real stories, same here. I wasn’t in all the sessions but being a dedicated follower of #ccevent and listening carefully to some of the leading thinkers in the industry, I think that most of the sessions were still on more theoretic levels rather than practical ones. You are welcome to check these conference presentations. 

It is not surprising that the best sessions were those presented by organizations that already found their way to the cloud, whether fully public (Netflix), or mostly private (Zynga zCloud). I suggest you to find Zynga’s CTO Infrastructure lecture in the conference recorded videos list.

Personally, I think it would have been great if they had a greater number of sessions and stories based on actual cloud architectures, shifting legacy applications to the cloud, and actual stories of ROI optimization. The market is still totally immature and on shaky ground. Vendors don’t really know how to present their offerings and even the simple phrase “cloud cost” have several interpretations. ISVs and enterprises are misled by the mega vendors – this is one of the major factors that slow down cloud adoption pace. If six months ago I would have said 2-3 years to reach market saturation, CloudConnect made be more realistic and think more about 3-5 years.

CloudConnect was a great opportunity for me to meet all the cloud rockstars I had been twittering with over the last year – great cloud evangelists. Someone said that he felt like walking through the twitter home feed. I found the cloud in twitter – great performance, mobile, open and available. It proves cloud serves my actual needs for networking, communications and knowledge.



The Cloud in HP’s Cloud

Last week I was invited to the HP Tech Day in HP’s campus in Houston to learn and hear more about the giant’s cloud offering. I appreciate HP and Ivy very much for the invitation and for a great event where I was able to learn more and see these clouds in real. I had the privilege to meet savvy and professional guys. It is always great to see people that are enthusiastic on their jobs and are proud of their company. Let me share with you HP’s cloud from my point of view.

> > > The EcoPOD

HP’s guys took me and a my fellow bloggers on a great journey inside HP’s cloud. The most fascinating adventure from me was the HP EcoPOD, an out-of-the-box, ready-made hosting/cloud infrastructure creature. The finalization of the product seems to be a perfect art and with no doubt HP is still a great infrastructure market leader. The Ecopod units serves IaaS providers, huge enterprises and mega websites. The investment of buying this ready-made bank of servers can be stretched from 3 to 5 years commitment so you can actually consider that as a subscription based service. The HP private cloud offering ruled the tech day including support for bursting internally or over to a public cloud, supported by Saavis. Read more about HP’s cloud bursting on TechTalk by Philip Sellers

> > > The Cloud In HP’s Cloud

The second part of the IaaS is the software for provisioning, maintaining and controlling of the cloud resources. For that matter HP conduct a several hours of demonstration of its CloudSystem product. Once the cloud infrastructure deployed, the enterprise can provision the virtual resources, orchestrate and create a catalog of app stacks utilizing the CloudSystem. One of the main features of the platform is the Cloud Maps (I really love the name) that enables the enterprise’ IT to plan and create new app stacks or even import ready made ones straight from the HP web portal. The UI/UX is very compelling though the management capabilities are very basic. I am not sure that I saw a real cloud environment but an upgraded virtualization control and provisioning application. Following my debates on that I was told that there are some implementations of an elastic environment using custom adjustments. HP also revealed that they are working on an OpenStack implementation though I wasn’t convinced enough to believe that there are serious plans for this matter. Due to the lack of out-of-the-box features such as auto-scaling and elasticity as well as the lack of a real cloud perception that a server is just one atomic unit, I still wonder where is the cloud in HP’s cloud ?

On a “cloud security” session, I raised a basic cloud security issue, where the enterprise need to be able to maintain SSO and IAM solutions to all its applications’ portfolio including the SaaS ones. I asked to know if HP support that kind of features or plan to do so in the future. The HP response was not satisfying and led me to think again about the extreme separation between the infrastructure and the applications that the cloud brought. The answer I anticipated to hear was really simple: As an IaaS provider, HP focuses on the internal network security and the access to the on-premise physical and virtual resources. The SaaS players have the responsibility to provide extensions that integrate with the enterprise private cloud and support issues such as SSO.

It is an evident that the cloud brought the need to re-position the traditional IT vendor offerings and make sure these are related to the specific cloud layer (IaaS, PaaS or SaaS), otherwise it is a confusing play that presents a great risk to the business future.

> > > Conclusion

It is clear that this veteran market leader as other IT giants finds itself segmented into a new definition as an IaaS vendor. The giant struggles getting into a leadership position in this emerging market as it is surrounded by a great competition coming from old competitors such as IBM or Oracle. Furthermore I think that a greater competition comes from the advanced cloud vendors such as Amazon, Rackspace, Salesforce and more others that already taking a great market share. I find it exciting to watch the market evolves, how new business threats are born and how the industry giants pushing hard to find their golden path all over again.