Cloud (AWS and Azure) Fault Tolerance for Biotech and Life Science Companies – Part 1.

Cloud (AWS and Azure) Fault Tolerance for Biotech and Life Science Companies – Part 1.


AWS (Amazon Web Services) and Microsoft Azure are the #1 and #2 (respectively) public cloud vendors. There is a good chance that any cloud application a Biotech or Life Science company is considering is hosted by one of these vendors. In an earlier post –-> we provided some questions you can ask cloud application vendors to evaluate their redundancy and fault tolerance, so I won’t repeat those here.

But it is also common for Biotech and Life Sciences data scientists to host their own data analysis systems and database systems in AWS or Azure. So, it is prudent for your IT team to evaluate what type of redundancy and fault tolerance these “home-grown” systems have in place. As a Boston area IT Support firm with a strong focus on Biotech and Life Sciences we are often called upon to do such an evaluation. To evaluate fault-tolerance there are some fundamental areas you should look at. Today we will talk about Availability Zones and Regions.

AWS and Azure are comprised of thousands of servers housed in data centers scattered throughout the globe. When a company is designing infrastructure in AWS or Azure, they have to think about fault tolerance. In other words, how do they make sure the system still works even if parts of the system fail. By default, a solution in AWS or Azure resides in a single data center. This is where Availability Zones and Regions come in.

Availability Zones are groups of data centers within a Region that have some level shared technology services. This means that it is possible a technical problem could affect an entire Availability Zone.

For fault tolerance an application can be replicated across more than 1 Availability zone so that it can run even if an Availability Zone goes down. There is additional cost in having duplicated infrastructure of course. There are details about the exact way in which the system is duplicated (“hot spares”, “warm spares”, “cold spares”) that greatly affect the cost and the responsiveness of these “fail-over” systems.

The next higher level of grouping in both AWS and Azure are Regions. These are geographical groupings of data centers. Azure, for example, has 54 regions spread across 140 countries. In the US Azure has the following general-purpose Regions.

  1. East US
  2. East US 2
  3. Central US
  4. South Central US
  5. West Central US
  6. West US
  7. West US 2

The data centers in each region have some level of shared technology services. This means that it is possible a technical problem could affect an entire region. For applications that truly cannot afford downtime the infrastructure would be replicated across different regions so that if an entire region went offline the application would still run in the other regions. Replication across regions provides a higher level of protection than replication across Availability Zones but at a higher cost. In addition to paying for duplicate infrastructure, you are charged for the transfer of data between regions (not for data transfer across Availability Zones within a single region).

In Part 2 of this post we will explain what to look for in another important area of system redundancy -- data storage redundancy.

Why should you care? The more fault-tolerant a cloud application is the lower the risk of downtime. More fault tolerance is better, but this comes at a cost. The criticality of the solution will determine if it makes economic sense to replicate across Availability Zones or Regions.

A managed IT services company like GizmoFish who is experienced with Biotech and Life Sciences companies and is knowledgeable with cloud platforms like AWS and Azure can help you either evaluate cloud systems who use these platforms or build your own infrastructure on these platforms.