iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week

Home > iSGTW - 18 August 2010 > Feature - Back to Basics - How hardware virtualization works: Part 4

Back to Basics - How hardware virtualization works: Part 4

Greg Pfister
Since retiring from his position as an IBM Distinguished Engineer, Greg Pfister has worked as an independent consultant as well as serving as research faculty at Colorado State University. Pfister is the author of “In Search of Clusters,” and over his 30-year career, he has accrued over 30 patents in parallel computing and computer communications.

It is possible to find many explanations of hardware virtualization on the Internet and, of course, in computer science courses. Nonetheless, there remains a great deal of confusion regarding this increasingly popular technology.

This is the last part of a multi-part series that attempts to provide an approachable explanation of hardware virtualization. To catch up, see Part 1, Part 2, and Part 3.

Drown It in Silicon

In the previous discussion I might have lead you to believe that paravirtualization is widely used in mainframes (IBM zSeries and clones). Sorry. It is used, but in many cases another technique is used, alone or in combination with paravirtualization.

Consider the example of reading the real time clock. All that has to happen is that a silly little offset is added. It is perfectly possible to build hardware that adds an offset all by itself, without any "help" from software. So that's what they did. (See figure below-right.)

They embedded nearly the whole shooting match directly into silicon. This implies that the bag 'o bits I've been glibly referring to becomes part of the hardware architecture: Now it's hardware that has to reach in and know where the clock offset resides. Not everything is as trivial as adding an offset, of course; what happens with the memory mapping gets, to me anyway, a tad scary in its complexity. But, of course, it can be made to work. Nobody else is willing to invest a pound or so of silicon into doing this. Yet.

Greg Pfister
Image courtesy of Greg Pfister

As Moore's Law keeps providing us with more and more transistors, perhaps at some point the industry will tire of providing even more cores, and spend some of those transistors on something that might actually be immediately usable.

A Bit About Input and Output

One reason for all this mainframe talk is that it provides an existence proof: Mainframes have been virtualizing IO basically forever, allowing different virtual machines to think they completely own their own IO devices when in fact they're shared. And, of course, it is strongly supported in yet more hardware. A virtual machine can issue an IO operation, have it directed to its address for an IO device (which may not be the "real" address), get the operation performed, and receive a completion interrupt, or an error, all without involving a hypervisor, at full hardware efficiency. So it can be done.

But until very recently, it could not be readily done with PCI and PCIe (PCI Express) IO. Both the IO interface and the IO devices need hardware support for this to work. As a result, IO operations have for commodity and RISC systems been done interpretively, by the hypervisor. This obviously increases overhead significantly. Paravirtualization can clearly help here: Just ask the hypervisor to go do the IO directly.

However, even with paravirtualization this requires the hypervisor to have its own IO driver set, separate from that of the guest operating systems. This is a redundancy that adds significant bulk to a hypervisor and isn't as reliable as one would like, for the simple reason that no IO driver is ever as reliable as one would like. And reliability is very strongly desired in a hypervisor. Errors within it can bring down all the guest systems running under them.

Another thing that can help is direct assignment of devices to guest systems. This gives a guest virtual machine sole ownership of a physical device. Together with hardware support that maps and isolates IO addresses, so a virtual machine can only access the devices it owns, this provides full speed operation using the guest operating system drivers, with no hypervisor involvement. However, it means you do need dedicated devices for each virtual machine, something that clearly inhibits scaling: Imagine 15 virtual servers, all wanting their own physical network card. This support is also not an industry standard. What we want is some way for a single device to act like multiple virtual devices.

Enter the PCI SIG. It has recently released a collection – yes, a collection – of specifications to deal with this issue. I'm not going to attempt to cover them all here. The net effect, however, is that they allow industry-standard creation of IO devices with internal logic that makes them appear as if they are several, separate, "virtual" devices (the SR-IOV and MR-IOV specifications); and add features supporting that concept, such as multiple different IO addresses for each device.

A key point here is that this requires support by the IO device vendors. It cannot be done just by a purveyor of servers and server chipsets. So its adoption will be gated by how soon those vendors roll this technology out, how good a job they do, and how much of a premium they choose to charge for it. I am not especially sanguine about this. We have done too good a job beating a low cost mantra into too many IO vendors for them to be ready to jump on anything like this, which increases cost without directly improving their marketing numbers (GBs stored, bandwidth, etc.).


There is a joke, or a deep truth, expressed by the computer pioneer David Wheeler, co-inventor of the subroutine, as "All problems in computer science can be solved by another level of indirection."

Virtualization is not going to prove that false. It is effectively a layer of indirection or abstraction added between physical hardware and the systems running on it. By providing that layer, virtualization enables a collection of benefits that were recognized long ago, benefits that are now being exploited by cloud computing. In fact, virtualization is so often embedded in cloud computing discussions that many have argued, vehemently, that without virtualization you do not have cloud computing. As explained previously, I don't agree with that statement, especially when "virtualization" is used to mean "hardware virtualization," as it usually is.

However, there is no denying that the technology of virtualization makes cloud computing tremendously more economic and manageable.

Virtualization is not magic. It is not even all that complicated in its essence. (Of course its details, like the details of nearly anything, can be mind-boggling.) And despite what might first appear to be the case, it is also efficient; resources are not wasted by using it. There is still a hole to plug in IO virtualization, but solutions there are developing gradually if not necessarily expeditiously.

There are many other aspects of this topic that have not been touched on here, such as where the hypervisor actually resides (on the bare metal? Inside an operating system?), the role virtualization can play when migrating between hardware architectures, and the deep relationship that can, and will, exist between virtualization and security. But hopefully this discussion has provided enough background to enable some of you to cut through the marketing hype and the thicket of details that usually accompany most discussions of this topic. Good luck.

This post originally appeared on Greg Pfister’s blog, Perils of Parallel, and in the proceedings of Cloudviews - Cloud Computing Conference 2009, the 2nd Cloud Computing International Conference.


 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR


NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing


Enter your email address to subscribe to iSGTW.


 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum


January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2


February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III

More calendar items . . .


FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map