1.0 Introduction
Most enterprise information technology (IT) organizations have data centers that have limited IT staff, data center space, hardware, and budgets. To avoid adding more of these resources, or to more effectively use the resources they already have, many organizations now use external IT services to augment their internal capabilities and services. Examples of such services are Microsoft Office 365 and Microsoft Dynamics CRM Online. Services that are provided by external providers typically exhibit the five essential characteristics of cloud computing (on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service) that are defined in The NIST Definition of Cloud Computing.
In the remainder of this document, the term “cloud services” refers to services that exhibit the United States National Institute of Standards and Technology (NIST) essential characteristics cloud computing. Services that do not exhibit these characteristics are referred to simply as “services.” Services often don’t exhibit many, if any of the essential characteristics. Another term that is used throughout this document is “technical capabilities.” Technical capabilities are the functionality that is provided by hardware or software, and when they are used together in specific configurations, they provide a service, or even a cloud service.
For example, when providing a messaging service in your environment, you’d use an email server application, network, servers, name resolution, storage, authentication, authorization, and directory technical capabilities, at a minimum to provide it. If you wanted to provide that same messaging service as a cloud service in your environment, you’d add additional capabilities such as a self-service portal, and probably an orchestration capability (to execute the tasks that support the essential cloud characteristic of self-service).
Rather than several people consuming a number of cloud services from external providers independently, typically a department within the IT organization establishes a relationship with an external provider at the organizational level. This department consumes the service at the organizational level, integrates the service with some of their own internal technical capabilities and/or services, and then provides the integrated hybrid service to consumers within their own organization. The consumers within the organization are often unaware of whether the service is owned and managed by their own IT organization or owned and managed by an external provider. And they don’t care who owns it, as long as the service meets their requirements.
An important consideration when dealing with a hybrid IT infrastructure is that while the in-house IT department will be seen as a provider of cloud services to the corporate consumer of the hybrid IT solution, it is also true that the IT organization itself is a consumer of cloud services. That means that there are multiple levels of consumers. The corporate consumer might be considered a second-level consumer of the public cloud services, while the IT organization might be considered a first-level consumer of the service. This has important implications when thinking about the architecture of the solution. This issue will be covered later in this document.
This document details the design considerations and configuration options for integrating Windows Azure Infrastructure Services (virtual machines (or “compute”), network, and storage cloud services) with the infrastructure capabilities and/or services that currently exist within typical organizations. This discussion will be driven by requirements and capabilities. Microsoft technologies are mentioned within the context of the requirements and capabilities and not vice versa. It is our expectation that this approach will resonate better with architects and designers who are interested in what problems must be solved and what approaches are available for solving these problems. Only then is the technology discussion relevant.
The primary audience for this document is the enterprise architect or designer who is interested in understanding the issues that need to be considered before engaging in a hybrid IT project, and the options available that enable them to meet the requirements based on the key infrastructure issues. Others that might be interested in this document include IT implementers who are interested in the design considerations that went into the hybrid IT infrastructure they are tasked to build.
The purpose of this document is two-fold. The first purpose is to provide the enterprise architect or designer a collection of issues and questions that need to be answered for each of the issues for building a hybrid IT infrastructure. The second purpose is to provide the enterprise architect or designer a collection of options that can be evaluated and chosen based on the answers to the questions. While the questions and options can be used with any public cloud service provider's solution, examples of available options will focus on Windows Azure. In addition, this document includes:
This document was conceived and written with the desire that enterprise IT should not want to replicate their current datacenter in the cloud. Instead, it is assumed that that enterprise IT would like to base a new solution on new architectural principles specific for a hybrid IT environment. This document focuses on the hybrid IT infrastructure because core infrastructure issues need to be addressed before even considering creating a single virtual machine for production. Issues revolving around security, availability, performance and scalability need to be considered in the areas of networking, storage, compute and identity before embarking on a production environment. We recognize that there is a tendency to want to stand up applications as soon as the public cloud infrastructure service account is created, but we encourage you to stem that urge and read this document so that you can avoid unexpected complications that could put your hybrid IT project at risk. Note that the existing environment can be a private cloud or a traditional data center. The goal is to enable you to integrate your current environment with a public cloud provider of infrastructure services (an Infrastructure as a Service [IaaS] provider). This is important to mention, because we do not want you to assume that a hybrid IT infrastructure is synonymous with hybrid cloud. While hybrid cloud is defined as two or more clouds that work together in providing a service, a hybrid IT infrastructure might include an on-premises private cloud or it might not.
While this document does explain design considerations and the relevant Microsoft technology and configuration options for integrating Windows Azure Infrastructure Services with the existing infrastructure of technical capabilities and/or services in an environment, it does not provide any example designs for doing so. A future document set will address a specific design example. You can find more information about this on the Cloud and Datacenter Solutions Hub at http://technet.microsoft.com/en-US/cloud/dn142895.
If you’re also interested in guidance that includes lab-tested designs that integrate infrastructure cloud services into existing environments, it is available separately. For more information, see http://technet.microsoft.com/en-US/cloud/dn142895.
The following problems or challenges typically drive the need to integrate infrastructure cloud services from external providers into existing environments:
Organizations with a large application portfolio will need to be able to determine hybrid IT infrastructure requirements before starting new applications, or moving existing applications into a cloud environment. Different applications will have different demands in the areas of networking, storage, compute, identity, security, availability and performance. You will need to determine if the public cloud infrastructure service provider you choose is able to deliver on the requirements you define in each of these areas. In addition, you will need to consider are regulatory issues specific to your organization's geo-political alignment.
After clearly defining the problem you’re trying to solve, you can begin to define a solution to the problem that satisfies your consumer’s requirements and fits the constraints of the environment in which you’ll implement your solution.
To solve the problems previously identified, many organizations are beginning to integrate infrastructure cloud services from external providers into their environments. In many organizations today, a department within the organization owns and manages network, compute (virtual machine), and storage technical capabilities. The people in this department may provide these technical capabilities for use by people in other departments within the organization, and/or, with additional technical capabilities, provide these capabilities as services, or even cloud services within their environment. The design considerations in this document are for a solution that enables an organization to:
Before integrating infrastructure cloud services from an external provider with existing infrastructure technical capabilities and/or services to solve the problems that were previously listed, you must first define a number of requirements for doing so, as well as the constraints for integrating the services. Some of the requirements and constraints are defined by the consumers of the capabilities, while others are defined by your existing environment, in terms of existing technical capabilities, services, policies, and processes.
Determining the requirements, constraints, and design for integrating the services is an iterative process. Initial requirements, coupled with the constraints of your environment may drive an initial design that can’t meet all of the initial requirements, necessitating changes to the initial requirements and subsequent design. Multiple iterations through the requirements definition and the solution design are necessary before finalizing the requirements and the design. Therefore, do not expect that your first run through this document will be the last one, as you’ll find that decisions you make earlier will exclude more preferred options that you might want to select later.
The answers to the questions in this section provide a comprehensive list of requirements for integrating infrastructure cloud services from an external provider with the existing infrastructure technical capabilities and/or services in your environment.
3.2.1 Service Delivery Requirements
Before integrating cloud infrastructure services from an external provider with existing infrastructure technical capabilities and/or services in your environment, you’ll need to work with the consumer(s) of these cloud services in your environment to answer the questions in the sections that follow. The questions are aligned to the Service Delivery processes that are defined in the CSFRM. The initial answers to these questions that you get from your consumer(s) are the initial Service Delivery requirements for your initial design.
After further understanding the constraints of your environment and the products and technologies that you will ultimately use to extend your existing infrastructure technical capabilities to an external provider however, you will likely find that not all of the initial requirements can be met. As a result, you’ll need to work with your consumer to adjust the initial requirements and continue iterating until you have a final design that satisfies the requirements and the constraints of your environment.
The outcome of this process is a clear definition of the functionality that will be provided, the service level metrics it will adhere to, and the cost at which the functionality will be provided. The service design applies the outcomes of the following questions.
The following table contains questions that you’ll need to address in these areas.
Service delivery requirements
Questions to ask
Demand and capacity management
Availability and continuity management
Information security management
Regulatory and compliance management
Financial management
3.2.2 Service Operations Requirements
You have a variety of operational processes that are applied to the delivery of all services and technical capabilities in your environment. As a result, you need to answer the questions in the following sections to determine how the hybrid IT infrastructure you’re designing will apply to and comply with your operational processes. The questions are aligned to the service operations processes defined in the CSFRM. The answers to these questions become the service operations requirements for the design of your hybrid IT infrastructure. Questions you need to ask to address these areas are included in the following table.
Service operations requirements
Request fulfillment
Service asset and configuration management
Change management
Release and deployment management
Access management
Systems administration
Knowledge management
Incident and problem management
3.2.3 Management and Support Technical Capability Requirements
While the introduction of this service may require unique changes to the existing capabilities in your environment, it’s assumed that because such changes start to de-standardize the existing capabilities, they should be avoided whenever possible.
When thinking about management support and technical capabilities, you should ask the questions in the following table.
Management and support technical capability
Service reporting
Service Management
Service Monitoring
Configuration Management
Fabric Management
Deployment and Provisioning
Data Protection
Network Support
Billing
Self-Service
Authentication
Authorization
Directory
Orchestration
3.2.4 Infrastructure Services Capabilities Requirements
When considering infrastructure capability requirements, you should start by asking the questions in the following table.
Infrastructure services requirements
Network
Virtual Machine
Storage
3.2.5 Infrastructure Technical Capability Requirements
One of the core tenets of cloud computing is that the infrastructure should be completely transparent to the user. So the users of the cloud service should never know (nor should they care) what the infrastructure services are that support the cloud infrastructure.
The questions in this section are aligned to the Infrastructure component in the CSFRM. The answers to these questions become the Infrastructure Requirements and constraints for the design of your hybrid IT infrastructure. Note that the CSFRM does not assume that the infrastructure for your environment is provided by your own organization. The infrastructure might be provided by your organization, or they might be provided by an external organization. However, in a hybrid IT infrastructure, infrastructure is provided by both the company’s IT organization and the public cloud infrastructure provider.
While the introduction of public cloud infrastructure may require unique changes to the existing services in your environment, it’s assumed that because such changes start to de-standardize the existing services, they should be avoided whenever possible.
The following table includes questions you should ask about infrastructure requirements.
Infrastructure requirements
Compute
Virtualization
3.2.6 Platform Requirements
The questions in this section are aligned to the Platform Services that are defined in the CSFRM. The answers to these questions become the Platform Services Requirements and constraints for the design of your hybrid IT infrastructure. Note that the CSFRM does not assume that the platform services for your environment are provided by your own organization. The platform services might be provided by your organization, or they might be provided by an external organization.
While the introduction of this service may require unique changes to the existing services in your environment, it’s assumed that because such changes start to de-standardize the existing services, they should be avoided whenever possible.
The following table includes questions you should ask about platform service requirements.
Platform service requirements
Structured data
Unstructured data
Application server
Middleware server
Service bus
After determining the requirements and constraints for integrating cloud infrastructure services from a public cloud infrastructure provider into your environment, you can begin to design your solution. Before creating a physical design, it’s helpful to first define a conceptual model (commonly referred to as a “reference model”), and some principles that will work together as a foundation for further design.
A reference model is a vendor-agnostic depiction of the high level components of a solution. A reference model can provide common terminology when evaluating different vendors’ product capabilities. A reference model also helps to illustrate the relationship of the problem domain it was created for to other problem domains within your environment. As a starting point, we can use the previously mentioned Cloud Services Foundation Reference Model (CSFRM).
We won’t include a detailed explanation of the CSFRM in this document, but if you’re interested in understanding it further, you’re encouraged to read the Microsoft Cloud Services Foundation Reference Model document. It will be available as part of the Microsoft Cloud Services Foundation Reference Architecture guidance set. To stay abreast of the work in this area, please see http://aka.ms/Q6voj9
Although it is from Microsoft, this reference model is vendor-agnostic. It can serve as a foundation for hosting cloud services and can be extended, as appropriate, by anyone. If you decide to use it in your environment, you’re encouraged to adjust it appropriately for your own use. Figure 1 illustrates the CSFRM.
Figure 1: Microsoft Cloud Services Foundation Reference Model
Recall from the Solution Definition section of this document, that the solution to the problems that are defined in the Problem Definition section of this document is to host virtual machines with an external provider such that the consumers within the organization can provision new virtual machines in a manner similar to how they provision virtual machines that are hosted on premises today.
The solution also requires that the virtual machines that are hosted by an external provider have capabilities that are similar to the capabilities of the on-premises virtual machines. As mentioned previously, the components, or boxes, in the reference model either change the way existing technical capabilities and/or services are provided in an environment, or introduce new services into an environment.
The Physical Design Considerations section of this document will discuss the design considerations for all of the black-bordered boxes in Figure 1.
After you’ve defined a reference model, you can establish some principles for integrating infrastructure cloud services from an external provider. Principles serve as “guidelines” for physical designs to adhere to. You can use the principles that follow as a starting point for defining your own. They are a combination of both principles from the CSFRA, and principles unique to integrating infrastructure cloud services from an external provider.
The Microsoft Private Cloud Reference Architecture (PCRA) provides a number of vendor-agnostic principles, patterns, and concepts to consider before designing a private cloud. Although they were defined with private clouds in mind, they are in fact applicable to any cloud based solution. You are encouraged to read through the document, Private Cloud Principles, Concepts and Patterns, in full, as the information in it contains valuable insight for almost any type of cloud infrastructure planning, including the hybrid IT infrastructure that is discussed in this document.
As mentioned previously, designing a cloud infrastructure may be different from how you’ve historically designed infrastructure. In the past, you often purchased and managed individual servers with specific hardware specifications to meet the needs of specific workloads. Because these workloads and servers were unique, automation was often difficult, if for no other reason than the sheer volume of variables within the environment. You may have also had different service level requirements for the different workloads you were planning infrastructure for, often causing you to plan for redundancy in every hardware component.
When designing a cloud infrastructure for a mixture of workload types with standardized cost structures and service levels, you need to consider a different type of design process. Consider the following differences between how you planned for and designed unique, independent infrastructures for specific workloads in the past, and how you might plan for and design a highly standardized infrastructure that supports a mixture of workloads for the future.
A hybrid IT infrastructure introduces new variables, because even if you currently host a private cloud infrastructure on premises, you are not responsible for enabling the essential cloud characteristics in the public cloud infrastructure service provider’s side of the solution. And if you don’t have a private cloud on premises, you can still have a hybrid IT infrastructure. In that case, you’re not at all responsible for providing any of the essential characteristics of cloud computing, because the only cloud you’re working with is the one on the public cloud infrastructure side.
The following table provides some perspective on some specific design aspects of a cloud based solution versus how you have done things in a traditional data center environment.
Design aspect
Non-cloud infrastructure
Cloud infrastructure
Hardware acquisition
Purchase individual servers and storage with unique requirements to support unique workload requirements.
Private Cloud: Purchase a collection (in a blade chassis or a rack) of servers, storage, and network connectivity devices pre-configured to act as one large single unit with standardized hardware specifications for supporting multiple types of workloads. These are referred to as scale units. Adding capacity to the data center by purchasing scale units, rather than individual servers, lowers the setup and configuration time and costs when acquiring new hardware, although it needs to be balanced with capacity needs, acquisition lead time, and the cost of the hardware.
Public Cloud: No hardware acquisition costs other than possible gateway devices that are required to connect the corporate network to the cloud infrastructure service provider’s network.
Hardware management
Manage individual servers and storage resources, or potentially aggregations of hardware that collectively support an IT service.
Private Cloud: Manage an infrastructure fabric. To illustrate this simplistically, think about taking all of the servers, storage, and networking that support your cloud infrastructure and managing them like one computer. While most planning considerations for fabric management are not addressed in this guide, it does include considerations for homogenization of fabric hardware as a key enabler for managing it like a fabric.
Public Cloud: No need to manage new in-house servers or storage devices—host servers and storage infrastructure are managed by the public cloud infrastructure service provider.
Hardware utilization
Acquire and manage separate hardware for every application and/or business unit in the organization.
Private Cloud: Consolidate hardware resources into resource pools to support multiple applications and/or business units as part of a general-purpose cloud infrastructure.
Public Cloud: Set up virtual machines and virtual networks for specific applications and business units. No hardware acquisition required.
Infrastructure availability and resiliency
Purchase infrastructure with redundant components at many or all layers. In a non-cloud infrastructure, this was typically the default approach, as workloads were usually tightly coupled with the hardware they ran on, and having redundant components at many layers was generally the only way to meet service level guarantees.
Private Cloud: With a fabric that is designed to run a mixture of workloads that can move dynamically from physical server to physical server, and a clear separation between consumer and provider responsibilities, the fabric can be designed to be resilient, and doesn’t require redundant components at as many layers, which can decrease the cost of your infrastructure. This is referred to as designing for resiliency over redundancy. To illustrate, if a workload running in a virtual machine can be migrated from one physical server to another with little or no downtime, how necessary is it to have redundant NICs and/or redundant storage adapters in every server, as well as redundant switch ports to support them? To design for resiliency, you’ll first need to determine what the upgrade domain (portion of the fabric that will be upgraded at the same time) and physical fault domain (portion of the fabric that is most likely to fail at the same time) are for your environment. This will help you determine the reserve capacity necessary for you to meet the service levels you define for your cloud infrastructure.
Public Cloud: Infrastructure resiliency is built into the public cloud infrastructure service provider’s offering. You don’t need to purchase additional equipment or add redundancy.
A hybrid IT infrastructure shares all of the principles of a private cloud infrastructure. Principles provide general rules and guidelines to support the evolution of a cloud infrastructure. They are enduring, seldom amended, and inform and support the way a cloud fulfills its mission and goals. They should also be compelling and aspirational in some respects because there needs to be a connection with business drivers for change. These principles are often interdependent, and together they form the basis on which a cloud infrastructure is planned, designed, and created.
After you’ve defined a reference model, you can then define principles for integrating infrastructure cloud services from a public provider with your on-premises services and technical capabilities. Principles serve as “guidelines” for physical designs to adhere to, and are oftentimes, inspirational, as fully achieving them often takes time and effort. The Microsoft Cloud Services Foundation Reference Architecture - Principles, Concepts, and Patterns article lists several principles that can be used as a starting point when defining principles for both private and hybrid cloud services. While all of the Microsoft Cloud Services Foundation Reference Architecture principles are relevant to designing hybrid cloud services, the principles listed below are the most relevant, and are applied specifically to hybrid cloud services: 4.2.1 Perception of Infinite Capacity
Statement: From the consumer’s perspective, a cloud service should provide capacity on demand, only limited by the amount of capacity the consumer is willing to pay for.
Rationale:
The rationale for applying each of the following principles is the same as the rationale for each principle listed in the Cloud Services Foundation Reference Architecture - Principles, Concepts, and Patterns, so each rationale is not restated in this article.
Implications:
Combining capacity from a public cloud with your own existing private cloud capacity can typically help you achieve this principle more quickly, easily, and cost-effectively than by adding more capacity to your private cloud alone. Among other reasons, this is because you don't need to manage the physical acquisition process and delay, this process is now the public provider's responsibility.
4.2.2 Perception of Continuous Service Availability
Statement:
From the consumer’s perspective, a cloud service should be available on demand from anywhere, on any device, and at any time.
Designing for availability and continuity often requires some amount of normally unused resources. These resources are utilized only in the event of failures. Utilizing on-demand resources from a public provider in service availability and continuity designs can typically help you achieve this principle more cost-effectively than with private cloud resources alone. To illustrate this point, if your organization doesn't currently have it's own physical disaster recovery site, and is evaluating whether or not to build one, consider the costs in real-estate, additional servers, and software that a disaster recovery site would require. Compare that cost against utilizing a public provider for disaster recovery. In most cases, the cost savings of using a public provider for disaster recovery could be significant.
4.2.3 Optimization of Resource Usage
The cloud should automatically make efficient and effective use of infrastructure resources.
Some service components may have requirements that allow them only to be hosted within a private cloud. Specific security or regulatory requirements are two examples of such requirements. Other service components may have requirements that allow them to be hosted on public clouds. Individual service components may support several different services within an organization. Each service component may be hosted on a private or public cloud. According to Microsoft's The Economics of the Cloud whitepaper, hosting service components on a private cloud can be up to 10X more than hosting the service components with a public cloud provider. As a result, utilizing public cloud resources can help organizations optimize usage of their private cloud resources by augmenting them with public cloud resources. 4.2.4 Incentivize Desired Behavior
Enterprise IT service providers must ensure that their consumers understand the cost of the IT resources that they consume so that the organization can optimize its resources and minimize its costs.
While this principle is important in private cloud scenarios, it's oftentimes a challenge to adhere to if the actual costs to provide services are not fully understood by the IT organization or if consumers of private cloud services are not actually charged for their consumption, but rather, only shown their consumption. When utilizing public cloud resources however, consumption costs are clear, consumption is measured by the public provider, and the consumer is billed on a regular basis. As a result, the actual cost to an organization for consuming public cloud services may be much more tangible and measurable than consuming private cloud services is. These clear consumption costs may make it easier to incent desired behavior from internal consumers as a result.
4.2.5 Create a Seamless User Experience
Within an organization, consumers should be oblivious as to who the provider of cloud services are, and should have similar experiences with all services provided to them.
Many organizations have spent several years integrating and standardizing their systems to provide seamless user experiences for their users, and don't want to go back to multiple authentication mechanisms and inconsistent user interfaces when integrating public cloud resources with their private cloud resources. The myriad of application user interfaces and authentication mechanisms utilized across various applications has made achieving this principle very difficult. The user interfaces and authentication mechanisms utilized across multiple public cloud service providers can make achieving this principle even more difficult. It's important to define clear requirements to evaluate public cloud providers against. These requirements may include specific authentication mechanisms, user interfaces, and other requirements that public providers must adhere to before you incorporate their services into your hybrid service designs.
Patterns are specific, reusable ideas that have been proven solutions to commonly occurring problems. The Microsoft Cloud Services Foundation Reference Architecture - Principles, Concepts, and Patterns article lists and defines the patterns below. In this article, the definitions are not repeated, but considerations for applying the patterns specifically to hybrid infrastructure and service design are discussed for each pattern.
4.3.1 Resource Pooling
Problem: When dedicated infrastructure resources are used to support each service independently, their capacity is typically underutilized. This leads to higher costs for both the provider and the consumer.
Solution: When designing hybrid cloud services, you may have pools of resources on premises and may treat the resources at a public provider as a separate pool of resources. Further, you may separate public provider resources into separate resource partition pools for reasons such as service class, systems management, or capacity management, just as you might for your on-premises resources. For example, and organization may define two separate service class partition resource pools, one within its private cloud, which might host medium and high business impact information, and one in its public cloud, which might host only low business impact information.
4.3.2 Scale Unit
Problem: Purchasing individual servers, storage arrays, network switches, and other cloud infrastructure resources requires procurement, installation, and configuration overhead for each individual resource.
Solution: When designing physical infrastructure, application of this pattern usually encompasses purchasing pre-configured collections of several physical servers and storage. While a public provider's scale unit definition strategy is essentially irrelevant to its consumers, you may still choose to define units of scale for the resources you utilize with a public cloud provider. With a public provider, since you typically pay for every resource consumed, and you have no wait time for new capacity like you do when adding capacity to your private cloud, you may decide that your compute scale unit, for example, is an individual virtual machine. As you near capacity thresholds, you can simply add and remove individual virtual machines, as necessary.
4.3.3 Capacity Plan
Problem: Eventually every cloud infrastructure runs out of physical capacity. This can cause performance degradation of services, the inability to introduce new services, or both.
Solution: The capacity plan in a hybrid solution design incorporates all the same elements of a capacity plan for an on-premises-only solution design. Service designers however, will likely find it to be much less effort to add/remove capacity on-demand when utilizing resources from a public provider, if for no other reason than doing so doesn't require them to order and wait for the arrival of new hardware. Meeting spikes in capacity needs will often prove to be more cost-effective when using public provider resources over using only dedicated on-premises resources too, since when the spike is over, you no longer need to pay for the usage of the extra capacity required to meet the demand spike. Some public providers also offer auto-scaling capabilities, where there systems will auto-scale service component tiers based on user-defined thresholds.
4.3.4 Health Model
Problem: If any component used to provide a service fails, it can cause performance degradation or unavailability of services.
Solution: Initially, you might think that the definition of health models for hybrid services will be more difficult than defining health models for services that only include components hosted on your private cloud. Part of the reason for this may be a fear of the unknown. You understand your private cloud systems, and can do deep troubleshooting on them, if necessary. When using a public provider however, you have little understanding of the underlying hardware configuration, and no troubleshooting capability. While this might initially be concerning, after your confidence in a public provider grows, you'll likely find that defining health models for service components hosted on a public cloud are even easier than when the components are hosted on your private cloud, since all of the hardware configuration and troubleshooting responsibility is now the public provider's, not yours. As a result, your health models will have significantly less failure or degradation conditions, which also means less conditions that your systems must monitor for and remediate. Some public cloud providers offer service level agreements (SLAs) that include an availability level that they commit to meet each month. As long as your service provider meets this SLA, you no longer need be concerned with how the provider meets the SLA, only that it did meet it. While this is true when consuming infrastructure as a service functionality from a public provider, it's even more true when consuming platform as a service (PaaS) capabilities from public providers.
4.3.5 Application
Problem: Not all applications are optimized for cloud infrastructures and may not be able to be hosted on cloud infrastructures.
Solution: Not all public cloud service providers support the same application patterns. For example, if you have an application that relies upon Microsoft Windows Server Failover Clustering as its high-availability mechanism, this application can be thought of as using the stateful application pattern. This application could be deployed with some public service providers, but not with others. Among other reasons, Windows Server Failover Clustering requires some form of shared storage, a capability that few public service providers currently support. It's important to understand which application patterns are used within the organization. It's also important to identify which application patterns a public provider supports. It's only possible to migrate applications that were designed with patterns supported by the public service provider.
4.3.6 Cost Model
Problem: Consumers tend to use more resources than they really need if there's no cost to them for doing so.
Solution: While a public provider will charge your organization based on consumption, you must decide what costs you'll show or charge your internal consumers for the resources. You will likely show or charge a higher cost to your internal consumers than you were charged by the public provider. This is largely due to the fact that you will probably integrate the public cloud provider's functionality with your private cloud functionality, and that integration most likely has a cost. For example, you probably currently show or charge your internal consumers when they use a virtual machine on your private cloud. You may provide some type of single sign-on capability to your internal consumers, and offer that capability with the virtual machines that are hosted on your private cloud. As a result, some portion of the cost that you show or charge internal consumers for that virtual machine is the cost to provide the single sign-on capability. A similar cost should be added to the virtual machines that are hosted with a public provider, if you also offer the same single sign-on capability for them. You may add further costs to support additional capabilities such as monitoring, backup, or other capabilities for public cloud virtual machines too.
With an understanding of the requirements detailed in the Envisioning the Hybrid IT Solution section of this document, and the reference model and principles, you can select appropriate products and technologies to implement the hybrid IT infrastructure design. The following table lists the hardware vendor-agnostic and Microsoft products, technologies, and services that can be used to implement various entities from the reference model that is defined in this document.
Reference model entity
Product/technology/external service
Network (support and services)
Authentication (support and services)
Directory (support and services)
Compute (support and services)
Storage (support and services)
Network infrastructure
Compute infrastructure
Storage infrastructure
After selecting the products, technologies, and services to implement the hybrid IT infrastructure, you can continue the design of the hybrid IT infrastructure solution. The sections that follow outline a logical design process for the service, but, as mentioned in the Envisioning the Hybrid IT Solution section of this document, the design and requirements definition process is iterative until it’s complete. As a result, after you make some design decisions in earlier sections of this document you may find that decisions you make in later sections require you to re-evaluate decisions you made in earlier sections.
The primary sub-sections of this section are the “functional” design for the service, and align to entities in the reference model. Lower-level sub-sections then address specific design considerations which may vary from functional to service-level considerations.
The remainder of the document addresses design considerations and the products, technologies, and services listed in the preceding table. In cases where multiple Microsoft products, technologies, and services can be used to address different design considerations, the trade-offs between them are discussed. In addition to Microsoft products, technologies, and services, relevant vendor-agnostic hardware technologies are also discussed.
The physical design of the hybrid IT infrastructure brings together the answers to the questions that were presented earlier in the document and the technology capabilities and options that are made available to you. The physical design that is discussed in this document uses a Microsoft–based, on-premises infrastructure and a Windows Azure Infrastructure Services–based public cloud infrastructure component. With that said, the design options and considerations can be applied to any on-premises and public cloud infrastructure provider combination. When considering the hybrid IT infrastructure from the physical perspective, the primary issues that you need to address include:
We will discuss each of these topics in detail and will discuss the advantages and disadvantages of each of the options. In many cases, you will find that there is a single option. When this is true, we will discuss capabilities and possible limitations and how you can work with or around the limitations.
When designing a hybrid IT infrastructure, the first issue you need to address is how to obtain and provision accounts with the public cloud infrastructure service provider. In addition, if the public cloud infrastructure service provider supports multiple payment options, you will need to determine which payment option best fits your needs now, and whether, in the future, you might want to reconsider the payment options that you’ve selected.
For example, Windows Azure offers several payment plans:
Pay as you go is the most expensive. Discounts are offered for each of the other four plans. You also have the choice to have the service billed to your credit card or your organization can be invoiced.
For more information on Windows Azure pricing plans, see Windows Azure Purchase Options.
You also need to consider whether you want to have the same person who owns the account (and therefore is responsible for paying for the service) to also have administrative control over the services that are running the public side of your hybrid IT infrastructure. In most cases, the payment duties and the administrative duties will be separate. Determine whether your cloud service provider enables this type of role-based access control.
For example, Windows Azure as the notions of accounts and subscriptions. The Windows Azure subscription has two aspects:
A single Windows Azure account can host multiple subscriptions, which can be used by multiple teams responsible for the hybrid IT infrastructure if you need additional partitioning of your services.
It’s important to be aware that using a single subscription for multiple projects can be challenging from an organizational and billing perspective. The Windows Azure management portal provides no method of viewing only the resources used by a single project, and there is no way to automatically break out billing on a per-project basis. While you can somewhat alleviate organizational issues by giving similar names to all services and resources that are associated with a project (for example, HRHostedSvc, HRDatabase, HRStorage), this does not help with billing.
Due to the challenges with granularity of access, organization of resources, and project billing, you may want to create multiple subscriptions and associate each subscription with a different project. Another reason to create multiple subscriptions is to separate the development and production environments. A development subscription can allow administrative access by developers while the production subscription allows administrative access only to operations personnel.
Separate subscriptions provide greater clarity in billing, greater organizational clarity when managing resources, and greater control over who has administrative access to a project. However this approach can be more costly than using a single subscription for all of your projects. You should carefully consider your requirements against the cost of multiple subscriptions.
For more information on Windows Azure accounts and subscriptions, see What is an Azure Subscription.
For more information on account acquisition and subscriptions, see Provisioning Windows Azure for Web Applications
You need to determine how the public cloud service provider partitions the services for which you will be billed. For example:
Note that in all three of these cases, the public cloud service provider would bill based on usage, because metered services is an essential characteristic of cloud computing.
For example, Windows Azure Infrastructure Services is a collection of unique service offerings within the entire portfolio of Azure service offerings. Specifically, Azure Infrastructure Services includes Azure Virtual Machines and Azure Virtual Networks. In addition, Azure Virtual Networks takes advantage of some of the PaaS components of the system to enable the site-to-site and point-to-site VPN gateway. However, when you obtain an Azure account and set up a subscription, all of the Windows Azure services are available to you with the exception of some additional value-added services that you can purchase separately.
This section expands each of these issues.
5.3.1 On-Premises Physical Network Design
You need to consider the following issues when deciding what changes you might need to make to the current physical network:
5.3.1.1 Network Connection Between On-Premises and Off-Premises Resources
Site-to-Site VPN A site-to-site VPN connection enables you to connect entire networks together. Each side of the connection hosts at least one VPN gateway, which essentially acts as router between the on-premises and off-premises networks. The routing infrastructure on the corporate network is configured to use the IP address of the local VPN gateway to access the network ID(s) that are located on the public cloud provider’s network that hosts the virtual machines that are part of the hybrid IT solution.
For more information about site-to-site VPNs, see What is VPN?
Windows Azure Virtual Networks Windows Azure enables you to put virtual machines on a virtual network that is contained within the Windows Azure infrastructure. Virtual Networks enable you to create a virtual network and place virtual machines into the virtual network. When virtual machines are placed into an Azure Virtual Network, they will be automatically assigned IP addresses by Windows Azure, so all virtual machines must be configured as DHCP clients. However, even though the virtual machines are configured as DHCP clients, they will keep their IP addressing information for the lifetime of the virtual machine.
Note: The only time when a virtual machine will not keep an IP address for the life of the virtual machine on an Azure Virtual Network is when a virtual machine might need to be moved as a consequence of “service healing.” If a virtual machine is created in the Windows Azure portal, and it then experiences service healing, that virtual machine is assigned a new IP address. You can avoid this by creating the virtual machine by using PowerShell instead of creating it in the Windows Azure portal. For more information on service healing, please see Troubleshooting Deployment Problems Using the Deployment Properties.
Virtual machines on the same Azure Virtual Network will be able to communicate with one another only if those virtual machines are part of the same cloud service. If the virtual machines are on the same virtual network and are not part of the same cloud service, those virtual machines will not be able to communicate with one another directly over the Azure Virtual Network connection.
You can use an IPsec site-to-site VPN connection to connect your corporate network to one or more Azure Virtual Networks. Windows Azure supports several VPN gateway devices that you can put on your corporate network to connect your corporate network to an Azure Virtual Network. The on-premises gateway device must have a public address and must not be placed behind a NAT device.
For more information on which VPN gateway devices are supported, see About VPN Devices for Virtual Network.
Note: While you can connect your on-premises network to multiple Azure Virtual Networks, you cannot connect a single Azure Virtual Network to multiple on-premises points of presence.
A single Azure Virtual Network can be assigned IP addresses in multiple network IDs. You can obtain a summarized block of addresses that represents the number of addresses you anticipate you will need and then you can subnet that block. However, connections between the IP subnets are not routed, and therefore there are no router ACLs that you can apply between the IP subnets.
However, you should still consider whether you will want multiple subnets. One reason for multiple subnets is for accounting purposes, where virtual machines that match certain roles within your hybrid IT infrastructure are placed on specific subnets that are assigned to those roles. However, you can use Network ACLs to control traffic between virtual machines in an Azure Virtual Network. For more information on Network ACLs in Azure Virtual Networks, please see setting an Endpoint ACL on a Windows Azure VM.
You should also consider the option of using multiple Azure Virtual Networks to support your hybrid IT infrastructure. While different Azure Virtual Networks can’t directly communicate with each other over the Azure network fabric, they can communicate with each other by looping back through the on-premises VPN gateway. Keep in mind that there are egress traffic costs that are involved with this option, so you need to assess cost issues when considering this option. This is also the case when you host some virtual machines in the Windows Azure PaaS services (which are part of a different cloud service than the virtual machines). The virtual machines in the PaaS services need to loop back through the on-premises VPN gateway to create the machines in the Azure Infrastructure Services Azure Virtual Networks.
You should decide on the IP addressing scheme, whether to use subnets, and the number of Azure Virtual Networks you will need before creating any virtual machines. After these decisions are made, you should create or move virtual machines onto those virtual networks.
Another important consideration is that Azure site-to-site VPN uses pre-shared keys to support the IPsec connection. Some enterprises may not consider pre-shared keys as an enterprise ready approach for supporting IPsec site to site VPN connections, so you will want to confer with your security team to determine if this approach is consistent with corporate security policy. For more information on this issue, please see Preshared Key Authentication. Note that your IT organization may consider the security and management issues for pre-shared keys to be an remote access VPN client problem only.
For more information on Azure Virtual Networks and how to configure and manage them, see Windows Azure Virtual Network Overview.
Dedicated WAN Link A dedicated WAN link is a permanent telco connection that is established directly between the on-premises network and the cloud infrastructure service provider’s network. Unlike the site-to-site VPN, which represents a virtual link layer connection over the Internet, the dedicated WAN link enables you to create a true link layer connection between your corporate network and the service provider’s network.
For more information on dedicated WAN links, see Wide Area Network.
At the time this document was written, Windows Azure did not support dedicated WAN link connections between the on-premises network and Azure Virtual Networks.
Point-to-Site Connections A point-to-site connection (typically referred to as a remote access VPN client connection) enables you to connect individual devices to the public cloud service provider’s network. For example, suppose you have a hybrid IT infrastructure administrator working from home from time to time. The administrator could establish a point-to-site connection from his computer in his home to the entire public cloud service provider’s network that hosts the virtual machines for his organization.
For more information on remote access VPN connections, see Remote Access VPN Connections.
Windows Azure supports point-to-site connectivity that uses a Secure Socket Tunneling Protocol (SSTP)–based remote access VPN client connection. This VPN client connection is done using the native Windows VPN client. When the connection is established, the VPN client can access any of the virtual machines over the network connection. This enables administrators to connect to the virtual machines using any administrative web interfaces that are hosted on the virtual machines, or by establishing a Remote Desktop Protocol (RDP) connection to the virtual machines. This enables hybrid IT infrastructure administrators to manage the virtual machines at the machine level without requiring them to open publically accessible RDP ports to the virtual machines.
In order to authenticate VPN clients, certificates must be created and exported. If you have a PKI, you can use an X.509 certificate issued by your CA. If you don’t have a PKI, you must generate a self-signed root certificate and client certificates chained to the self-signed root certificate. You can then install the client certificates with private key on every client computer that requires connectivity.
For more information on point-to-site connections to Windows Azure Virtual Networks, see About Secure Cross-Premises Connectivity.
The following table lists the advantages and disadvantages of each of the approaches that are discussed in this section.
Connectivity options
Advantages
Disadvantages
Site-to-site VPN
Dedicated WAN Link
Point-to-site connection (remote access VPN client connection)
5.3.2 Inbound Connectivity to the Public Cloud Infrastructure Service Network
Inbound connectivity to the public cloud infrastructure provider’s network is about how users will connect to the services that are hosted by the virtual machines within the provider’s network. Important options to consider include:
All Access to Cloud Hosted Services is Through the Internet With the first option, all connections to services in the public cloud infrastructure service provider’s network will be made over the Internet. It doesn’t matter whether the client system is inside the corporate network or outside the corporate network. With this configuration you need to maintain only a single DNS entry for inbound access to the service, because all client machines will be accessing the same IP address, In Windows Azure, this is the address of the VIP that is assigned to the front-ends of the service that is hosted in the Azure Infrastructure Services Virtual Network.
All Access to Cloud Hosted Services is Through Site-to-Site VPN or WAN Link The second option represents the opposite of the first, in that all clients that need to connect to parts of the service that are hosted in the public cloud infrastructure provider’s network will need to do it from within the confines of the corporate network. The service will not be available to users on the Internet “at large” and client systems will have to take a path through the corporate network to reach the services.
That doesn’t mean that the client systems must be physically attached to the corporate network (or attached through the corporate wireless). A client system could be off-site, but connected to the corporate network over a remote-access VPN client connection or similar technology, such as Windows DirectAccess. The DNS configuration in this case would require just a single entry, because all access to the resources in the public cloud infrastructure service provider’s network will be to the IP address that is assigned to the virtual machine in the public cloud infrastructure service provider’s network. In an Azure Virtual Network, this would be the DIP that is assigned to the front-end virtual machines of the service.
For more information on DirectAccess in Windows Server 2012, see Remote Access (DirectAccess, Routing and Remote Access) Overview.
Access to Cloud Hosted Services Varies with Client Location The third option allows for hosts that are not connected to the corporate network to connect through the Internet to the service that is hosted in the public cloud infrastructure service provider’s network. Clients that are connected to the corporate network can access the service by going through a site-to-site VPN or dedicated WAN link that connects the corporate network to the public cloud infrastructure service provider’s network.
This option requires that you maintain a DNS record that client systems can use when they are not on the corporate network, which in Azure represents the VIP that is used to access the virtual machine. It also requires a DNS record that clients will use when they are connected to the corporate network, which in Azure represents the DIP that is assigned to the virtual machine. This design requires that you create a split DNS infrastructure.
For more information on a split DNS infrastructure, see You Need A Split DNS!
The following table describes some of the advantages and disadvantages of the three options for inbound connectivity.
Inbound connectivity option
All inbound access is done over the Internet.
All inbound access is done through the corporate network.
Some inbound access is over the Internet and some is over the corporate network.
5.3.3 Load Balancing of Inbound Connections to Virtual Machines of a Public Infrastructure Service
Services that you place in the public cloud infrastructure service provider’s network may need to be load balanced to support the performance and availability characteristics that you require for a hybrid application running on a hybrid IT infrastructure. There are several ways you can enable load balancing of connections to services that are hosted on the public cloud infrastructure service provider’s network. These include:
Load Balancing Mechanism Provided by the Public Cloud Infrastructure Service Provider The first option requires that the service provider has a built-in load balancing capability that is included with its service offering. In Windows Azure, external communication with virtual machines can occur through endpoints. These endpoints are used for various purposes, such as load-balanced traffic or direct virtual machine connectivity, like RDP or SSH.
Windows Azure provides round-robin load balancing of network traffic to publicly defined ports of a cloud service that is represented by these endpoints. For virtual machines, you can set up load balancing by creating new virtual machines, connecting them under a cloud service, and then adding load-balanced endpoints to the virtual machines.
For more information on load balancing for virtual machines in Windows Azure, see Load Balancing Virtual Machines
Load Balancing Enabled on the Virtual Machines The second option requires that the operating system running on the virtual machines in the public cloud infrastructure service provider’s network must run some kind of software-based load-balancing system.
For example, Windows Server 2012 includes the Network Load Balancing feature, which can be installed on any virtual machine that runs that operating system. There are other load balancing applications that can be installed on virtual machines. The service provider must be able to support these guest-based load balancing techniques, because they often change the characteristics of the MAC address that is exposed to the network. At the time this paper was written, Azure Virtual Networks did not support this type of load balancing.
For more information about Windows Server Network Load Balancing, see Network Load Balancing Overview
Use an External Network Load Balancer The third option is a relatively specialized one because it requires that you can control the path between the client of the service that is hosted in the public cloud infrastructure service provider’s network and the destination virtual machines. The reason for this is that the clients must pass through the dedicated hardware load balancer so that the hardware load balancer can perform the load balancing for the client systems.
This option is likely not going to be available from the public cloud infrastructure service provider’s side, because public providers in general do not allow you to place your own equipment on their network. This method would work if you are hosting an application in the service provider’s network that is accessible only to clients on the corporate network. Because you have control of what path internal clients will use to reach the service, you can easily put a load balancer in the path.
For more information on external load balancers, see Load Balancing (computing)
The following table describes the advantages and disadvantages of each of these three approaches.
Load-balancing mechanism
Public cloud infrastructure service provider load balancing solution
OS-based or add-on load-balancing solution
External load-balancing solution
5.3.4 Name Resolution for the Public Infrastructure Service Network
Name resolution is a critical activity for any application in a hybrid IT infrastructure. Applications that span on-premises components and those in the public cloud infrastructure provider’s network must be able to resolve names on both sides in order for all tiers of the application to work easily with one another. There are several options for name resolution in a hybrid IT infrastructure:
Name Resolution Services Provided by the Public Cloud Infrastructure Service Provider The public cloud infrastructure service provider may provide some type of DNS services as part of its service offering. The nature of the DNS services will vary. For example, Azure Virtual Networks provide basic DNS services for name resolution of virtual machines that are part of the same cloud service. Be aware that this is not the same as virtual machines that are on the same Azure Virtual Network. If two virtual machines are on the same Azure Virtual Network, but as not part of the same cloud service, they will not be able to resolve each other’s names by using the Azure Virtual Network DNS service.
For more information on Azure Virtual Network DNS services, see Windows Azure Name Resolution.
Name Resolution Services Based on On-Premises DNS Infrastructure The second option is the one you’ll typically use in a hybrid IT infrastructure where applications span on-premises networks and cloud infrastructure service provider’s networks. You can configure the virtual machines in the service provider’s network to use DNS servers that are located on premises, or you can create virtual machines in the public cloud infrastructure service provider’s network that hosts corporate DNS services and are part of the corporate DNS replication topology. This makes name resolution for both on-premises and cloud based resources to be available to all machines that support the hybrid application.
Name Resolution Services External to Cloud and On-Premises Systems The third option is less typical, as it would be used when there is no direct link, such as a site-to-site VPN or dedicated WAN link, between the corporate network and the public cloud infrastructure services network. However, in this scenario you still want to enable some components of the hybrid application to live in the public cloud and yet keep some components on premises. Communications between the public cloud infrastructure service provider’s components and those on premises can be done over the Internet. If on-premises components need to initiate connections to the off-premises components, they must use Internet host name resolution to reach those components. Likewise, if components in the public cloud infrastructure service provider’s network need to initiate connections to those that are located on premises, they would need to do so over the Internet by using a public IP address that can forward the connections to the components on the on-premises network. This means that you would need to publish the on-premises components to the Internet, although you could create access controls that limit the incoming connections to only those virtual machines that are located in the public cloud infrastructure services network.
The following table describes some advantages and disadvantages of each of these approaches.
Name resolution approach
Public cloud infrastructure service provider supplies DNS.
DNS is integrated with on-premises DNS infrastructure.
DNS is based on public/external DNS infrastructure.
5.4 Storage Design Considerations
When considering options for storage in a hybrid IT infrastructure scenario, you will need to assess current storage practices and storage options that are available with your public cloud infrastructure service provider.
Storage issues that you might consider include:
5.4.1 Storage Tiering
Storage tiering enables you to place workloads on storage that support the IOPS requirements of a particular workload. For example, you might have a database bound application that needs to handle a large number of transactions per second. You would want the public cloud infrastructure service provider to have an option for you to host your database on fast storage, perhaps Solid State Disk (SSD) storage. On the other hand, you may have other applications that do not require ultra-fast storage, in which case you could put those applications in a slower storage tier. The assumption is that the public cloud infrastructure service provider will charge more for the high performance storage and less for the less performant storage.
At the time this document was written, Azure Infrastructure Services does not provide an option for tiered storage. However, the service constantly evolves. Make sure to refer to the Windows Azure documentation pages on a regular basis during your design process.
5.4.2 IaaS Database
There are scenarios in a hybrid IT infrastructure where the front-end and application tiers will be hosted in the public cloud infrastructure service provider’s network and the database tier is hosted on premises. Another possibility is that the front-end, application and database tiers are hosted on the public cloud infrastructure service provider’s network. In this scenario, you will need to investigate whether or not the public cloud infrastructure service provider supports running database applications on virtual machines hosted on its network.
Windows Azure supports placing SQL Server on Azure Infrastructure Services. For applications that need full SQL Server functionality, Azure Infrastructure Services is a viable solution. SQL Server 2012 and SQL Server 2008 R2 images are available and they include standard, web and enterprise editions. If you have an existing SQL Server license with software assurance, you can move your existing license to Windows Azure and only pay for compute and storage. Running SQL Server in Azure Infrastructure Services is an viable option in the following scenarios:
5.4.3 PaaS Database and Storage
Windows Azure has a PaaS database as a service offering. For applications that need a full featured relational database-as-a-service, Windows Azure offers SQL Database, formerly known as SQL Azure Database. SQL Database offers a high-level of interoperability, enabling you to build applications using many of the major development frameworks.
Table storage is another option that your public cloud service provider might offer. This can be used to store large amounts of unstructured data. Windows Azure offers table based storage that is a ISO 27001 certified managed service which can auto scale to meet massive volume of up to 200 terabytes. Tables are accessible from virtually anywhere via REST and managed APIs.
Finally, your public cloud infrastructure service provider may offer blob storage for your applications and virtual machines. Blobs are easy way to store large amounts of unstructured text or binary data such as video, audio and virtual machine images. Like table storage, Windows Azure Blobs are an ISO 27001 certified managed service which can auto scale to meet massive volume of up to 200 terabytes a. Blobs are accessible from virtually anywhere via REST and managed APIs.
Compute design considerations center on the virtual machines that will be hosted on premises and in the public cloud service provider’s network. In some cases, the only virtual machines that are participating in a hybrid IT infrastructure will be on the public cloud infrastructure service provider’s network, since the on-premises resources will be hosted on physical hardware instead of being virtualized. Whether current services are run on physical or virtualized hardware, you will need to take into account issues related to the virtual machine offering made available by the public cloud service provider.
Consider the following issues when designing the hybrid IT infrastructure’s compute components:
5.5.1 Operating System and Service Images
An image is a virtual disk file that you use as a template to create a new virtual machine. An image is a template because, unlike a running virtual machine, it doesn't have specific settings such as the computer name and user account settings. When you create a virtual machine from an image, an operating system disk is automatically created for the new virtual machine. Some public cloud infrastructure service providers will provide images that not only contain operating systems, but also contain services that run on top of the operating system. These are sometimes referred to as “service templates” and such templates can enable you to stand up services more quickly than it would be if you had to first install the operating system and then install the services that you want to run.
Windows Azure makes both operating system and service images available to you. You can either use an image provided by Windows Azure in the Image Gallery, or you can create your own image to use as a template. For example, you can create a virtual machine from an image in the Image Gallery. Windows Azure provides a selection of Windows and Linux images, as well as images that have BizTalk and other applications already installed.
For more information on operating system and service images in Windows Azure, please see Manage Disks and Images.
5.5.2 On-Premises Physical and Virtual Service Images and Disks
Windows Azure enables you to not only use images provided by Azure, but also images that you create on premises. To create a Windows Server image, you must run the Sysprep command on your development server to generalize and shut it down before you can upload the .vhd file that contains the operating system.
For more information about using Sysprep, see How to Use Sysprep: An Introduction.
To create a Linux image, depending on the software distribution, you must run a set of commands that are specific to the distribution and you must run the Windows Azure Linux Agent.
For more information on creating and moving on premises disk images, please see Manage Disks and Images.
5.5.3 Virtual Disk Formats and Types
Virtual Disk Formats You will need to consider what virtual disk formats are supported by your public cloud infrastructure services provider. Each virtualization platform vendor typically supports its own virtual disk container format. You will need to determine which virtual disk formats are supported by the public cloud infrastructure service provider. If the provider you choose does not support the disk formats you currently have in production for the services you want to move to the infrastructure service provider’s network, then you will need to perform a disk format conversion prior to posting those disks into your public cloud infrastructure service provider’s network.
For example, Windows Azure currently supports only the .vhd file format. If you have virtual machines running on a non-Hyper-V virtualization infrastructure, or if you have virtual machines running on a Windows Server 2012 virtualization infrastructure that use the .vhdx format, you will need to convert those disk formats to .vhd. There are a number tools available for converting disk formats. For one example, please see How to Deploy a Virtual Machine by Converting a Virtual Machine (V2V).
Virtual Disk Types Some public cloud infrastructure service providers will make different virtual disk types available to you that you can use in your hybrid IT infrastructure. These virtual disk types might be useful in different scenarios, such as disks that can be used as operating system disks or storage disks.
Azure Infrastructure Services supports an operating system disk VHD that you can boot and mount as a running version of an operating system. Any VHD that is attached to virtualized hardware and that is running as part of a service is an operating system disk. After an image is provisioned, it becomes an operating system disk. An operating system disk is always created when you use an image to create a virtual machine. The VHD that is intended to be used as an operating system disk contains the operating system, any operating system customizations, and your applications. Azure Infrastructure Services operating system disks are read-write cache enabled.
Azure Infrastructure Services also supports a VHD can be used as a data disk to enable a virtual machine to store application data. After you create a virtual machine, you can either attach an existing data disk to the machine, or you can create and attach a new data disk. Whenever you use a data-intensive application in a virtual machine, it’s highly recommended that you use a data disk to store application data, rather than using the operating system disk. Azure Infrastructure Services data disks by default have read-write caching disabled.
A third type of disk known as a "Caching Disk" it automatically included with any virtual machine created in Azure Infrastructure Services. This disk is used for the pagefile by default. If you have other temporary data that you want to save to local storage, you can place that data on the Caching disk. The information on the Caching Disk is not persistent and does not survive reboots of the virtual machine.
For more information about Azure Infrastructure Services operating system and data disks, please see Azure Virtual Machines.
5.5.4 Virtual Machine Customization
Different public cloud infrastructure service providers will provide various levels of customization for your virtual machines. Typical customizations at the infrastructure layer include how much memory, how many and speeds of processors, and how much storage you can make available to a virtual machine. In some cases the public cloud infrastructure service provider will allow granular options for provisioning memory, processors and storage, and in some cases the provider will require you to select from a set of “t-shirt” sized virtual machines with each size defining the amount of processing, memory and storage resources available for that size. Windows Azure Infrastructure Services uses this “t-shirt” size model.
For more information on the types of virtual hardware available to you, please see Virtual Machines.
The amount you pay for virtual machines on the public cloud infrastructure service provider’s network is typically proportional to the size and number of virtual machines you choose. Consider what virtual machines you require to support your hybrid IT infrastructure in advance. Investigate whether or not the public cloud infrastructure service provider has a price calculator that will assist you in estimating the costs of running the virtual machines you require, in advance.
Windows Azure Infrastructure Services has a pricing calculator to help you assess what your costs will be. Please see Windows Azure Pricing Calculator.
5.5.5 Virtual Machine Access
You will need to consider how you will access the virtual machines running on the public cloud infrastructure service provider’s network. The method of access will vary with the operating system running within the virtual machine. For Windows based operating systems, you have the option to use the Remote Desktop Protocol (RDP) to connect to the virtual machine so that you can manage it. You also have the option of using remote PowerShell commands. If the virtual machine is running a Linux-based operating system, you can use the SSH protocol.
For more information about logging on to a virtual machine running Windows Server in Azure Infrastructure Services, see How to Log on to a Virtual Machine Running Windows Server 2008 R2.
For more information about logging on to a virtual machine running Linux in Azure Infrastructure Services, see How to Log on to a Virtual Machine Running Linux.
5.5.6 Virtual Machine and Service Availability
Service Availability When designing a hybrid IT infrastructure you will need to consider how you will make the virtual machines running in the public cloud infrastructure service provider’s network highly available. You will need to consider how to make the application highly available as well as the virtual machines that run the application.
Load balancing incoming connections to the virtual machines running the application can help increase application availability. Incoming connections can be spread across multiple virtual machines. These virtual machines typically host the front-end stateless component of the application. If one of the virtual machines hosting the front-end component becomes disabled, connections can be load balanced to other front-end virtual machines. Different public cloud service providers will likely use different load balancing algorithms, so you will want to consider the load balancing algorithm used by the provider when designing application high availability into your hybrid IT infrastructure.
Azure Infrastructure Services supports load balancing connections to virtual machines on an Azure Virtual Network. For more information about this, please see Load Balancing Virtual Machines.
Virtual Machine Availability The hardware that supports the virtual machines needs to be maintained on a periodic basis. Your public cloud infrastructure service provider will need to schedule times when software and hardware is serviced and upgraded. In order to make sure that the services that run on those virtual machines continue to be available during maintenance and upgrade windows, you need to consider options that the public cloud service provider makes available to you to prevent downtime during these cycles.
For example, Windows Azure periodically updates the operating system that hosts the virtual machines. A virtual machine is shut down when an update is applied to its host server. An update domain is used to ensure that not all of the virtual machine instances are updated at the same time. When you assign multiple virtual machines to an availability set, Windows Azure ensures that the machines are assigned to different update domains. The previous diagram shows two virtual machines running Internet Information Services (IIS) in separate update domains and two virtual machines running SQL Server also in separate update domains.
For more information on availability for Azure Infrastructure Services virtual machines, please see Manage the Availability of Virtual Machines.
From the perspective of basic cloud infrastructure considerations, there are some basic issues around management and support design that you'll want to consider. The primary areas include, but are not limited to, the following:
5.6.1 Consumer and Provider Portal
A few options are available to you to provide a seamless user experience across both your private cloud services and the Windows Azure public cloud services.
Note: Windows Azure Services for Windows Server integrates with Windows Server 2012 and System Center 2012. The next version of Windows Azure Services for Windows Server is the Windows Azure Pack for Windows Server. It will integrate with Windows Server 2012 R2 and System Center 2012 R2.
5.6.2 Usage and Billing
If you’re providing cloud services to your consumers today, then you are already able to track resource consumption by your consumers. You use this data to either charge your customers for their consumption, or simply report back to them on their consumption. Public cloud service providers each have their own pricing and billing options.
Windows Azure Virtual Machines pricing is publicly available, and provides purchase options by credit card or invoicing. Purchase options are connected to a Microsoft Account. When using Windows Azure Services, you’ll need to determine which purchase option you’ll choose, and how those costs will either be charged back or shown back to the individuals within the organization that consumed the resources. As of the writing of this document, Windows Azure billing is provided at the subscription level, and doesn’t provide much granularity for the individual resources consumed within a subscription. Thus, you may decide to setup multiple subscriptions to track resource consumption, or strategies for tracking resource consumption through a single subscription.
5.6.3 Service Reporting
If you’re providing cloud services to your consumers today, then you are already provide reports to your consumers as to whether services met their service level agreements (SLAs) in areas such as performance and availability. Public services providers offer SLAs for the services they provide, as well as service reporting so you know whether or not they met their SLAs.
Windows Azure provides availability SLAs for its various services. An example of the availability SLA offered with the Windows Azure Virtual Machines service is defined in the Virtual Machines article. You’ll need to decide whether it’s possible to integrate the service reporting offered by the public provider with your own service reporting capability. If it is possible, you’ll need to determine whether you want to integrate the public providers’ service reporting with your own or not. If you’re providing a service to your consumers which has some components running on-premises, and others running on a public provider’s cloud, you’ll have to integrate the service reporting capability so that you can provide service level reporting to your consumers.
When working with a public cloud infrastructure service provider’s system, you need to understand what authentication and authorization/access control options are available to you. In addition, you’ll need to understand how authentication and authorization come together to support your overall account management requirements. In this section we will cover these issues.
5.6.4.1 Authentication
Users need to authenticate to the provider’s system to gain access to system resources. When designing your hybrid IT infrastructure, you need to determine what authentication options are available to you and what the advantages and disadvantages might be to each approach.
There are several options that might be possible for your authentication to the service provider’s system design:
The following table describes the advantages and disadvantages of each of these options by using Active Directory Federation Services and Windows Azure Active Directory as examples of technologies that you can use for direct and indirect federation, respectively.
Option
You authenticate to the service provider’s proprietary authentication mechanism, separately from any you have already on premises.
You can federate your on-premises authentication mechanism with the service provider or use some form of directory synchronization.
You can federate your on-premises authentication mechanism with the service provider’s through a federation service such as Windows Azure Active Directory.
5.6.4.2 Authorization and Access Control
The following table shows advantages and disadvantages of each of these options in authorization and access control.
AuthN and access control option
On-premises role-based administrative access control.
Public cloud infrastructure service role-based administrative access control.
Authorized employees are allowed to acquire hybrid IT infrastructure resources.
Dedicated hybrid IT infrastructure group.
Reflect IT organizational structure to hybrid IT infrastructure.
Allow consumers of the hybrid IT infrastructure to mirror on-premises siloed infrastructure.
For more information on role-based access control in Hyper-V, see Configure Hyper-V for Role Based Access Control
For more information on role-based access control in System Center Virtual Machine Manager, see Private Cloud in System Center Virtual Machine Manager 2012 - Part 2 – Delegate Control.
At this time granular role-based access control is not available in Azure Infrastructure Services.
5.6.4.3 Account Management
You need to consider workflow issues regarding who has access to both the public cloud service account that is used for billing services and any sub-accounts that might be used for administration of the public infrastructure service components. For example, suppose there is a manager who is responsible for the infrastructure service account. What might happen if that manager were released from the company? It’s possible that if the former manager left on bad terms, that person could potentially cancel the account and thereby block access to all the services. Similarly, what might happen if a member of the hybrid IT infrastructure team were released from the company, and that person’s administrative account were still active? If the administrator who was released left the company on bad terms, that person could delete virtual machines, leave an exploit on the service, and any number of other things that a person with administrative access could achieve.
For these reasons and more, it’s critical that you have a workflow or account provisioning and deprovisioning process that can prevent these problems from happening. You may already have a workflow and account management system in place that performs these actions for you for on-premises accounts. If that is the case, you can investigate the possibilities of connecting your on-premises account management system with the management system that is used by your public cloud infrastructure service provider.
For example, as mentioned in the table in section 5.3.1 Authentication, you may have the option to federate your on-premises account system with the service provider’s system. If that is the case, user accounts that are provisioned and de-provisioned on premises will automatically be managed for access to the service provider’s system. You might consider an on-premises solution that is based on Forefront Identity Manager (FIM) to help you with this type of account management and tie it into the federated environment.
At this time in Windows Azure, you have the option of assigning an account to be a Service Administrator or Service Co-Administrator. The difference between these two roles is that the Service Co-Administrator cannot delete the Service Administrator account for a subscription. Only the Windows Azure account owner can delete a Service Administrator.
For more information on administrative roles in Windows Azure, see Provisioning Windows Azure for Web Applications.
In a hybrid IT infrastructure, you will need to consider options available for authentication and authorization. While there are a number of authentication and authorization options available for the applications that you’ll run in the public cloud infrastructure service provider’s network, in the majority of cases those applications will be dependent to a certain degree on Active Directory. For this reason, it’s important to consider your design options for applications run some or all of their components in the public cloud infrastructure service provider’s network.
Key issues for consideration include:
The remainder of this section will detail considerations in each of these areas.
5.6.5.1 Active Directory Domain Controllers in the Public Cloud Infrastructure Provider's Network Considerations
Historically the recommendation has been not to virtualize domain controllers. Many virtualization infrastructure designers have virtualized domain controllers only to experience a failure related to a virtualized domain controller.
For example, backing up and restoring domain controllers can roll back the state of the domain controller and lead to issues that are related to inconsistencies in the Active Directory database. Restoring snapshots from a virtualized domain controller would have the same effect as restoring from backup—the previous state would be restored and lead to Active Directory database inconsistencies. The same effects are seen when you use more advanced technologies to restore a domain controller, such as creating SAN snapshots and restoring those, or creating a disk mirror and then breaking the mirror and using the version on one side of the mirror at a later time as part of a restore process. Update Sequence Number (USN) “bubbles” create the problems that are most commonly encountered with virtualized domain controllers. USN bubbles can lead to a number of problems, including:
For these reasons and more, it is critical to avoid USN bubbles.
For more information on USN bubbles, see How the Active Directory Replication Model Works.
VM Generation ID Virtualization makes it easier to create a USN bubble scenario, and therefore the recommendation in the past has been that you should not virtualize domain controllers. However, with Windows Server 2012, virtualizing domain controllers is now fully supported.
Full support for virtualizing domain controllers is enabled by a feature in the hypervisor which is called the VM Generation ID. When a domain controller is virtualized on a supported virtualization platform, the domain controller will wait until replication takes place to be told what its state and role is. If the virtualized domain controller is one that was restored from a snapshot, it will wait to be told what the correct state is instead of replicating a previous state and causing a USN bubble.
For more information on VM Generation IDs, see Introduction to Active Directory Domain Services Virtualization.
Note: VM Generation IDs must be supported by both the hypervisor and the guest operating system. Used together, Windows Server 2012 Hyper-V and the Windows Server 2012 operating system acting as a guest will support VM Generation IDs. VMware also supports VM Generation ID when running Windows Server 2012 domain controller guests. Windows Azure Infrastructure Services also supports VM Generation ID and therefore also supports virtualization of domain controllers.
When creating domain controllers in Azure Infrastructure Services, you have the option to create them new on an Azure Virtual Network, or to use one that you created on premises and move it to an Azure Virtual network.
Note: Do not sysprep domain controllers—sysprep will generate an error when you try to run it on a domain controller.
Instead of using sysprep, consider moving the VHD file to Azure storage and then create a new virtual machine by using that VHD file. If your on-premises domain controller is running on physical hardware, you have the option to do a physical to virtual conversion and move the resultant .vhd file to Azure storage. Then you can create the new virtual machine from that .vhd file.
You also have the option to create a new domain controller in Azure Infrastructure Services and enable inbound replication to the domain controller. In this case, all the replication traffic is inbound, so there are no bandwidth charges due to egress traffic during the initial inbound replication, but there will be egress traffic costs for outbound replication.
Active Directory Related File Placement When designing an Active Directory design to support hybrid application authentication, you will need to consider the disk types that are available from the public cloud infrastructure service provider. There may be some disk types and caching schemes that are more or less favorable to specific Active Directory domain controller data types. For example, Windows Azure supports two disk types where you can store information for virtual machines:
As mentioned earlier in this paper, Windows Azure Infrastructure Services also supports a “temporary disk,” but you should avoid storing data on a temporary disk because the information on the temporary disk is not persistent across reboots of the virtual machine. In Windows Azure, the temporary disk is primarily used for the page file and it helps speed up the virtual machine boot process.
In Windows Azure, the main difference between a data disk and an OS disk relates to their caching policies. The default caching policy for an OS disk is read/write. When read/write activity takes place, it will first be performed on a caching disk. After a period of time, it will be written to permanent blob storage. The reason for this is that for the OS disk, which should contain only the core operating system support files, the reads and writes will be small. This makes local caching a more efficient mechanism than making the multiple and frequent small writes directly to permanent storage.
Note: The OS Disk size limit at the time this was written was 127 GB. However, this might change in the future, so watch the support pages on the Windows Azure website for updates.
The default caching policy for Data Disks is “none,” which means that no caching is performed. Data is written directly to permanent storage. Unlike OS Disks, which are currently limited to 127 GB, Data Disks currently support up to 1 TB. If you need more storage for a disk, you can span up to 16 disks for up to 16 TB, which is available as part of the current Extra Large, A6 and A7 virtual machine’s disk offering.
Note: These are current maximum Data Disk sizes and numbers. This might change in the future. Please check the Windows Azure support pages for updates.
With all this in mind, consider where you want to place the DIT/Sysvol location. Would it be where caching could lead to a failure to write, or would it be where Active Directory related information is immediately written to disk? The latter is the preferred option. The main reason for this is that write-behind disk caching invalidates some core assumptions made by Active Directory:
For more information related to Active Directory and FUA, see Things to consider when you host Active Directory domain controllers in virtual hosting environments.
The following table describes some of the advantages and disadvantages of Azure Infrastructure Services disk types in the context of Active Directory domain controllers.
Windows Azure disk type
Advantages in domain-controller scenario
Disadvantages in domain-controller scenario
OS Disk
Data Disk
Temporary Disk
5.6.5.2 Read-Only Domain Controller Considerations
There are several options available to you for putting Active Directory in the Azure Infrastructure Services cloud:
In a hybrid IT environment, you might consider the public cloud infrastructure service provider’s network as being similar to a branch office, or as an off-premises hosted data center. So it would make sense to take advantage of read-only domain controllers, because they were designed for a branch office deployment. However, while a public cloud infrastructure service provider’s network may be treated as similar to a branch office, there are some significant differences between the branch office environment that was envisioned by the creators of the read-only domain controller role and the environment seen in a public cloud infrastructure service provider’s network. The main difference is that the branch office scenario is seen as a low security environment, where the domain controller might not be in a physically secure location, which make it vulnerable to theft or tampering. Because of this, the read-only domain controller was designed as a good alternative for branch offices, providing the following benefits:
The following table describes the advantages and disadvantages of deploying a read-only domain controller (RODC) in a public cloud infrastructure provider’s network, such as Windows Azure.
For more information on attribute filtering and credential caching, see RODC Filtered Attribute Set, Credential Caching, and the Authentication Process with an RODC.
5.6.5.3 Domain-Controller Locator Considerations
When putting Active Directory domain services in a public cloud infrastructure service provider’s network, you need to think about how to correctly define and connect Active Directory subnets and sites to the off-premises components—as the choices you make here will influence the cost of the overall solution. Sites, site links, and subnets affect where authentication takes place and also the topology of domain controller replication. To begin with, here are some definitions:
When creating replication policies, consider the following:
One option is to define the Azure Virtual Network (or any public cloud service provider’s network) network ID as a subnet in Active Directory, and then machines on that subnet will use the local domain controller for authentication (assuming that they are available). This means that services that are situated in Azure Infrastructure Services won’t have to reach out to on-premises domain controllers for authentication services. This also reduces cost, because if the service in Azure Infrastructure Services had to authenticate using on-premises domain controllers, that would generate egress traffic, which you must pay for.
For more information on Active Directory sites, see Active Directory Sites.
Also consider what costs you want to set on the links. For example, the Azure Infrastructure Services connection represents a much higher-cost link. You’ll also want to consider that when the issue of “next closest site” occurs, the domain controllers in the Azure Infrastructure Services are not considered to be the next closest (unless that’s what you want to intend, such as in the case of remote offices that use a domain controller in Azure Infrastructure Services as a backup).
For more information on this issue, see Enabling Clients to Locate the Next Closest Domain Controller.
Active Directory replication also supports compression. The more compressed the data is, the lower the egress costs will to be.
For more information on Active Directory compression, see Active Directory Replication Traffic.
Finally, consider putting together your replication schedule based on anticipated latency issues. Remember that domain controllers replicate only the last state of a value, so slowing down replication saves cost if there's sufficient churn in your environment.
5.6.5.4 Domain, Forest, and Global Catalog Considerations
When considering putting a full read/write domain controller on to the public cloud infrastructure service provider’s network, you’ll first want to ask yourself about their security model and operational principles. Azure Infrastructure Services is a public cloud offering, which means that you’re using a shared compute, networking, and storage infrastructure. In such an environment, isolation is a key operating principle, and the Azure team has insured that isolation is enforced to the extent that placing a domain controller in Azure Infrastructure Services is a supported and secure deployment model.
For more information on Azure security, see Windows Azure Security Overview. The next step is to consider what kind of domain/forest configuration you want to deploy. Some of the options are:
The first option might represent the least secure option of the three, because if the domain controller in the cloud is compromised, the entire production directory services infrastructure would be affected. The second and third options can be considered incrementally more secure, because there is only a one-way trust, but the overhead of maintaining trusts might not fit organizational requirements.
The last option might be considered be the most secure, but there is administrative overhead that you need to take into account, and not all deployment scenarios will support this kind of configuration. You need to consider these issues before deciding on a domain and forest model.
Given the Azure security model, the consensus is that the first option is the preferred option when you weigh the options for application compatibility, management overhead, and security.
Another important consideration is regulatory and compliance issues. A lot of PII can be stored in these read/write domain controllers, and there may be regulatory issues that you need to consider. There are also cost considerations. You’ll end up generating a lot more egress traffic (depending on authentication load), and there will also be egress replication traffic that you’ll need to factor into the cost equation.
For detailed information about Active Directory security considerations, see Best Practice Guide for Securing Active Directory Installations.
The following table describes some of the advantages and disadvantages of each of the domain and forest models.
Global Catalog Considerations When designing a hybrid IT infrastructure, you need to consider whether you want to put a Global Catalog domain controller into the off-premises component of your infrastructure. A Global Catalog server is a domain controller that keeps information about all objects in its domain and partial information about objects in other domains.
To learn more about Global Catalog servers, see What is the Global Catalog.
A Global Catalog enables an application to ask a single domain controller one question that might refer to multiple domains, even though that domain controller is not a member of the domain for which the question is being asked. A Global Catalog server contains a partial copy of the rest of the forest, and this information is a defined attribute set that is filtered to a Global Catalog server. This is also known as the Partial Attribute Set or PAS.
For more information on the Partial Attribute Set, see How the Global Catalog Works. There are some reasons why you might not want your domain controller in the Azure Infrastructure Services to be a Global Catalog server. These reasons include:
Those are some reasons why you wouldn’t want to put a Global Catalog in the cloud. With those reasons in mind, when would you put a Global Catalog in the cloud? One answer would be, when you have a single-domain forest.
What should you do if you have two domains in the same forest? For example, suppose that one domain is on premises and the second domain is in the Azure Infrastructure Services cloud. The answer is to make the domain controllers in the cloud into Global Catalogs. The reason for this is that authentication (as the user logs on) requires access to a group type in Windows Active Directory called a Universal Group, and Universal Groups require a Global Catalog in order to populate. This means that a Global Catalog is a required step in all authentication scenarios where you have more than a single domain.
Also, consider whether you want the domain controllers in Azure Infrastructure Services to require a round trip to the on-premises network in order to access a Global Catalog at every single authentication attempt. This is a tradeoff, and the decision depends on what the replication requirements would be versus how many authentication attempts are made. You probably don’t think so much about these issues when Active Directory is on premises only, but when you design a hybrid IT infrastructure in which egress traffic is billable, your design considerations must take this factor into account.
Workloads in the cloud that authenticate against a domain controller in the cloud will still generate outbound authentication traffic if you don’t have a Global Catalog in the cloud. It’s difficult to provide hard and fast guidance because this scenario is fairly new, and you’re likely going to have to figure out the relative costs of the different options (authentication traffic versus replication traffic) or wait until we have something that is based on our experiences that we might be able to share with you in the future.
What we do know is that the Global Catalogs are used to expand Universal Group membership, which is likely going to lead to even less predictable costs for Global Catalogs because they host every domain (in part). However, something that might complicate issues even more, or at least require more study, is the effect of creating an Internet-facing service that authenticates with Active Directory.
One option is to take advantage of Universal Group Membership Caching, but there are issues with this solution and you probably will want to consider those.
For more information on Universal Group Membership Caching, see Enabling Universal Group Caching for a Site.
Finally, most replication for the Global Catalogs in the Azure Infrastructure Services cloud is going to be inbound, so cost is not an issue there. Outbound replication is possible, but this can be avoided by configuring the right site links.
The following table summarizes some of the advantages and disadvantages of putting a Global Catalog server in the public cloud infrastructure service provider’s network.
Advantages of a Global Catalog in the cloud
Disadvantages of a Global Catalog in the cloud
5.6.5.5 Active Directory Name Resolution and Geo-Distribution Considerations
Domain controllers and their clients must be able to register and resolve resources within their own domains and forest, as well as across trusts. And because static addressing isn’t supported in Azure Virtual Networks, these settings must be configured within the Virtual Network definition.
There are several ways to approach the name resolution requirements for Active Directory in a hybrid IT infrastructure. The following is one suggested approach:
Geo-Distribution Considerations Your hybrid IT infrastructure design might include geo-distributed, Azure Virtual Network hosted domain controllers. Azure Infrastructure Services can be an attractive option for geo-distributing domain controllers. They can provide:
However, keep in mind that virtual networks are isolated from one another. If you want different Virtual Networks to communicate, you must establish site-to-site links with each of them and then have them loop back through the corporate network to reach other Azure Virtual Networks. This means that all replication traffic will route through your on-premises domain controllers, which is going to generate some egress traffic. You will want to consider piloting such a configuration to see what your egress numbers look like before deploying a full blown geo-distributed architecture.
5.6.5.6 Active Directory Federation Service (ADFS) Considerations
Another Active Directory function that might be appropriate to consider when constructing a hybrid IT infrastructure is Active Directory Federation Services, or ADFS. While the scenarios might not be as broad as those for Active Directory Domain Services, there are some scenarios where you will want to consider this option. The three primary advantages of deploying ADFS in a public cloud infrastructure services network are:
Deploying Windows Server ADFS in a public cloud infrastructure service provider’s network is very similar to doing so on premises; however, differences do exist. Any Windows Server ADFS requirement to connect back to the on-premises network depends upon the relative placement of the roles. If Windows Server ADFS is running on a public cloud infrastructure service provider’s network and its domain controllers are deployed only on-premises, then the off-premises side of the solution must connect the virtual machines back to the on-premises network by using the link that connects the public and private sides of the hybrid IT solution. Important issues to consider when designing a hybrid IT infrastructure to support ADFS include:
Note: Machines that need to expose the same set of ports directly to the Internet (such as port 80 and 443) cannot share the same cloud service. Therefore, we recommend that you create a dedicated cloud service for your Windows Server ADFS servers to avoid potential overlaps between port requirements for an application and for Windows Server Active Directory.
For more information on Active Directory Federation Services and Active Directory Domain Services in Azure Infrastructure Services, see Guidelines for Deploying Windows Server Active Directory on Windows Azure Virtual Machines.
5.6.5.7 Windows Azure Active Directory Considerations
This document does not discuss the use of Windows Azure Active Directory, which is a REST-based service that provides identity management and access control capabilities for cloud applications. Windows Azure Active Directory and Windows Server Active Directory Directory Services are designed to work together to provide an identity and access management solution for today’s hybrid IT environments and modern cloud-based applications. The scope of this paper is on the core infrastructure requirements for a hybrid IT infrastructure that does not include cloud-based PaaS and SaaS applications, which is the key scenario for which Windows Azure Active Directory applies.
To help you understand the differences and relationships between Windows Server AD DS and Windows Azure AD, consider the following:
For more information about Windows Azure Active Directory, please see Identity.
When designing your hybrid IT infrastructure you will want to consider backup and disaster recovery options.
Windows Azure offers a backup service that you can use to back up on-premises data. Backup can help you protect important server data offsite with automated backups to Windows Azure, where they are available for data restoration.
You can manage cloud backups from the backup tools in Windows Server 2012, Windows Server 2012 Essentials, or System Center 2012 Data Protection Manager. These tools provide similar experiences when configuring, monitoring, and recovering backups whether to local disk or Windows Azure storage. After data is backed up to Windows Azure, authorized users can recover backups to any server.
Windows Azure backup also supports incremental backups, where only changes to files are transferred to the cloud. This helps ensure efficient use of storage, reduced bandwidth consumption, and point-in-time recovery of multiple versions of the data. Configurable data retention policies, data compression and data transfer throttling also offer you added flexibility and help boost efficiency. Backups are stored in Windows Azure and are "offsite," which reduces the need to secure and protect onsite backup media.
For more information on Windows Azure Backup, please see Windows Azure Backup Overview.
5.6.6.2 Disaster Recovery
Another important option to consider is the role a public cloud infrastructure service provider can play in disaster recovery and business continuity. Some public cloud infrastructure service providers will make various disaster recovery options available to you.
For example, Windows Azure currently offers Recovery services. If you are using Hyper-V Recovery Manager you will create Hyper-V Recovery Manager vaults to orchestrate failover and recovery for virtual machines managed by System Center 2012 Virtual Machine Manager (VMM). You configure and store information about VMM servers, clouds, and virtual machines in a source location that are protected by Windows Azure recovery services; and about VMM servers, clouds, and virtual machines in a target location that are used for failover and recovery. You can create recovery plans that specify the order in which virtual machines fail over, and customize these plans to run additional scripts or manual actions.
For more information about Windows Azure Recovery services, please see Recovery Services Overview.
After identifying the requirements and constraints in your environment and then evaluating each of the design considerations that are detailed within this document, you can create a hybrid IT infrastructure design that best meets your unique needs. Then, you can implement it in a test environment, test it, and deploy it into production.
To complement this document, Microsoft has created reference implementation (RI) guidance sets for hybrid IT infrastructure solutions that are designed for specific audiences. Each RI guidance set includes the following documents:
Note: The Design document within a Reference Implementation (RI) guidance set uses one combination of the almost infinite number of combinations of design and configuration options that are presented in this Hybrid IT Infrastructure Design Considerations article. The specific design options from this Hybrid IT Infrastructure Design Considerations document that are chosen in an RI Design document are based on the unique requirements from the Scenario Definition document in the RI guidance set. As a result, many people who read this Hybrid IT infrastructure Design Considerations document will find it helpful to also read the RI guidance set for this domain that is targeted at an audience type similar to their own. The RI guidance set shows which design options from this document were chosen for the example organization, and helps the reader to better understand why those options were chosen. Other people will decide that reading an RI guidance set is unnecessary for them, and that this hybrid IT infrastructure Design Considerations document provides all the information they need to create their own custom design.
Although the Design document in an RI guidance set is related to this hybrid IT infrastructure Design Considerations document, there are no dependencies between the documents.
Windows Server 2012 DNS services Active Directory Domain Services Windows Azure Active Directory Windows Azure Virtual Machines Windows Azure Cloud Services Windows Azure Storage Windows Azure Storage Services Windows Azure Recovery Service Windows Azure Virtual Network
Authors: Thomas W. Shinder - Microsoft Jim Dial - Microsoft Reviewers: Yuri Diogenes - Microsoft John Dawson - Microsoft Cheryl McGuire - Microsoft Kathy Davies - Microsoft John Morello - Microsoft Jamal Malik - Microsoft
This article is maintained by the Microsoft DDEC Solutions Team.
Version
Date
Change Description
1.0
7/1/2013
Initial posting and editing complete.
1.1
8/22/2013
New hybrid cloud principles and patterns were added. Fixed table entries in multiple tables so that disadvantage are all moved to the disadvantages columns.