Newvem’s April Webinar

image

Dear IAmOnDemand reader, I would like to personally  invite you to join an interesting webinar that will take place this  Wednesday, April 3rd. 

As part of my job at Newvem, I have assembled a power-house panel of some of the top thought-leaders in cloud computing to discuss the importance of a healthy cloud, focusing on cost efficiency, risk tolerance, and resource optimization.

My 5 Enterprise Cloud Predictions for 2013

imageI believe that this is the year when the enterprise will find its way to the cloud.

The mega Internet sites and applications are the new era enterprises. These will become the role models for the traditional enterprise. IT needs remain the same with regards to scale, security, SLA, etc. However, the traditional enterprise CIO has already set the goal for next year: 100% efficiency.

The traditional CIO understands that in order to achieve that goal, IT will need to start and do cloud, make sure that IT resources are utilized right, and that his teams move fast.

Continue reading

Cloud Security Management – Overview and Challenges

What’s your first priority cloud security concern ?

From an attacker’s perspective, cloud providers aggregate access to many victims’ data into a single point of entry. As the cloud environments become more and more popular, they will increasingly become the focus of attacks. Some organizations think that liability can be outsourced, but no, it cannot! This presentation will answer questions such as what are the key security challenges for new cloud comers. What are the options and how you can start with a safe cloud deployment?

My presentation includes the followings and more:

  • The different Cloud security aspects
  • The cloud vendor versus the cloud customer – the responsibility perception
  • How Newvem helps its customers to avoid AWS cloud security vulnerabilities leveraging eco-system of cloud vendors.

Newvem partnered IGT Cloud meetups and opened a cloud management forum conferences. These conferences focus is on the key aspects of cloud management such as cost, security, compliance and more. Each meetup includes different lectures and include real case studies. All the sessions are recorded and published on a mutual videos channel.

Amazon Outage: Is it a Story of a Conspiracy? – Chapter 2

In April 2011, when Amazon’s cloud s east region failed. I posted the first chapter of theAmazon Cloud Outage Conspiracy – it was already very clear that the cloud will fail again and here it is… Chapter 2

Let’s first try to understand Amazon’s explanation for this outage.

At approximately 8:44PM PDT, there was a cable fault in the high voltage Utility power distribution system. Two Utility substations that feed the impacted Availability Zone went offline, causing the entire Availability Zone to fail over to generator power. All EC2 instances and EBS volumes successfully transferred to back-up generator power.

Ok. So the AZ power failed over to generator power.

At 8:53PM PDT, one of the generators overheated and powered off because of a defective cooling fan. At this point, the EC2 instances and EBS volumes supported by this generator failed over to their secondary back-up power (which is provided by a completely separate power distribution circuit complete with additional generator capacity).

Ok. So the generator failed over to a separate power circuit.

Unfortunately, one of the breakers on this particular back-up power distribution circuit was incorrectly configured to open at too low a power threshold and opened when the load transferred to this circuit. After this circuit breaker opened at 8:57PM PDT, the affected instances and volumes were left without primary, back-up, or secondary back-up power.

Ok. So the power circuit was not configured right and the computing resources didn’t get enough power (or something like that).

> > > Did you get that?

Sounds like it might be something as simple as someone stumbling on a wire that led to all that. Anyway Quora, Heroku, Dropbox and other sites failed again due to the cloud outage and were down for hours. The power outage resulted in down time and inconsistent behavior of EC2 services including instances, EBS volumes, RDS and unresponsive API.

After about 5 hours, Amazon announced that they had managed to recover most of EBS (Elastic Block Store) volumes:

“Almost all affected EBS volumes have been brought back online. Customers should check the status of their volumes in the console. We are still seeing increased latencies and errors in registering instances with ELBs.”

Once Quora was back online, I opened the thread – What are the lessons learned from Amazon’s June 2012 us-east-1 outage? Among the great answers submitted, I want to point to a specific interesting feedback returned with regard to the fragility of the EBS volume, suggesting working with an instance store instead of EBS-backed instances. The differences between these two include costs, availability and performance considerations. It is important to learn the differences between these two options and make a smart decision on which to base your cloud environment.

> > > Education

Anyway, back to our conspiracy. In comparison to the last outage, right after this outage new Amazon AWS experts were born who spouted the cloud giant mantra with regards to its building blocks: Amazon provides the tools and resources to create a robust environment, proudly tweeting that their based AWS service didn’t fail. This proves that the April outage served Amazon well with regards to customers’ education. Though there were still some mega websites that failed again.

So, does Amazon examine if its customers improved their deployments following last year outage? Does the cloud giant continue to teach its customers using outage drills? Is that a conspiracy?

> > > Additional Revenues

The outage raised again the discussion with regards to the distinct availability (AZ) zone. Again it seems that the impacted resources on a specific AZ affected the whole AWS east region while generating API latency and inconsistencies (API errors varied from 500s to 503s to RequestLimitExceeded). High availability best practice includes backup, mirroring and distributing traffic between at least two availability zones. The impact on the region apparent hence the dependency between AZs strengthens the need to maintain cross regions or even cross clouds disaster recovery (DR) practice.

These DR practices include more computing resources and data transfer (between AZs and regions), meaning significant additional costs which apparently support the cloud giant’s revenue growth. Is that a conspiracy?

> > > Final words

The cloud giant is a leader and a guide to other IaaS as well as new PaaS players. Without a doubt – Amazon is the Cloud (for now anyway).

To clarifyI don’t think that there is any conspiracy. This is part of the learning curve of the market, including the customers and the vendors, specifically Amazon. Lots of online discussions and articles were published in the last few days explaining what happened and what the AWS cloud’s customers should learn.

No doubt that the cloud will fail again. I believe that although the customers are ultimately responsible for the high availability of their services, the AWS cloud guys should also take a step back to learn and improve – every additional outage diminishes from the cloud’s reliability as a place for all.

(Cross-posted on CloudAve)