These notes will attempt to describe how operating systems virtualization (henceforth to be termed ``virtualization’’) is implemented. There are essentially two approaches in use today: hardware assisted virtualization and paravirtualization. We’ll discuss them in terms of their most prevalent open source examples: KVM and Xen.
Virtualization, like many terms associated with cloud computing, is often ill defined. For the purposes of this course (and as a good benchmark definition) virtualization is
the process of abstracting the physical machine hardware to the point where it is possible to run a full operating system using the abstractions only.
That is, a virtualized system is one in which the operating system “thinks” it is running “alone” on a physical machine, but instead it is controlling abstractions in place of the physical hardware.
Even this definition is a bit tricky since Intel introduced hardware support for virtualization. These extensions to the x86 architecture implement in hardware “abstract” versions of other hardware features (like page tables) calling into question the definition of the word “abstraction.” From a processor architecture perspective, then, virtualization is the isolation of physical machine resources such that a full machine operating system can use the resources independently as if they machine were dedicated to it.
From a systems design perspective, virtualization is the notion that the operating system (and not the process) is the resource container. That is, each operating system believes it has full control over a set of machine resources and some other “system” (usually called a hypervisor) is able to arbitrate sharing among operating systems (in a way analogous to the way in which an operating system arbitrates sharing among processes). The Linux containers community might take exception to this last definition but so be it.
Cloud computing (as it is defined today) depends on the ability to provision an IP network that reaches all of the resources a user wishes to employ. The challenge, however, is that this network must be provisioned and ultimately deprovisioned (decommissioned) dynamically and the “normal” IP management protocols were not designed for dynamic reconfiguration. The main goals for IP, when it was designed, were routing robustness and delivery determinism (including reliable delivery). Because latencies were so high and connectivity to intermittent, the protocols were designed to react slowly so that transient changes in the network did not cause instability.
Fiber-based networks (which are more reliable, lower latency and higher bandwidth than wire networks) introduced the possibility of implementing dynamic network reconfiguration without sacrificing network stability. In particular, it became possible to use table-driven forwarding rules at the link level.
Before this time, IP defined network routes at layer 3 – the network routing level. Link level protocols were “stateless” in that they need only manage the transfer of data between two fixed end points. The “state” defining the end points doesn’t change (or it can be rebuilt using a broadcast via ARP). Thus all information pertaining to a route that data must take as it traverses the network was originally confined to Layer 3. This information is managed in a per-hop “routing table” that indicates the point-to-point network link that data must take to make its next hop.
With ATM and SONET, however, table-driven link-level protocols made it possible for an abstract “network link” to be implemented as a routed “path” across intermediate link nodes. These original technology-specific protocols eventually informed a standard for such link-level state management called MPLS.
Amazon provides web-service based access to its infrastructure via two separate but interoperating facilities: