iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week

Home > iSGTW - 4 August 2010 > Back to Basics - How hardware virtualization works: Part 3

Back to Basics - How hardware virtualization works: Part 3

Greg Pfister
Since retiring from his position as an IBM Distinguished Engineer, Greg Pfister has worked as an independent consultant as well as serving as research faculty at Colorado State University. Pfister is the author of “In Search of Clusters,” and over his 30-year career, he has accrued over 30 patents in parallel computing and computer communications.

It is possible to find many explanations of hardware virtualization on the Internet and, of course, in computer science courses. Nonetheless, there remains a great deal of confusion regarding this increasingly popular technology.

This multi-part series attempts to provide an approachable explanation of hardware virtualization. You can see Part 1 here and Part 2 here.

Translate, Trap and Map
The basic trap and map technique described previously depends crucially on a hardware feature: the hardware must be able to trap on every instruction that could affect other virtual machines. Prior to the introduction of Intel's and AMD's specific additional hardware virtualization support, that was not true. For example, setting the real time clock was, in fact, not a trappable instruction. It wasn't even restricted to supervisors. (Note, not all Intel processors have virtualization support today; this is apparently a done to segment the market.)

Yet VMware and others did provide, and continue to provide, hardware virtualization on such older systems. How? By using a load-time binary scan and patch. (See figure below.) Whenever a section of memory was marked executable – making that marking was, thankfully, trap-able – the hypervisor would immediately scan the executable binary for troublesome instructions and replace each one with a trap instruction. In addition, of course, it augmented the bag 'o bits for that virtual machine with information saying what each of those traps was supposed to do originally.

Image courtesy of Greg Pfister.

Now, many software companies are not fond of the idea of someone else modifying their shipped binaries, and can even get sticky about things like support if that is done. Also, my personal reaction is that this is a horrendous kluge. But is a necessary kluge, needed to get around hardware deficiencies, and it has proven to work well in thousands, if not millions, of installations.

Thankfully, it is not necessary on more recent hardware releases.

Whether or not the hardware traps all the right things, there is still unavoidable overhead in hardware virtualization. For example, think back to my prior comments about dealing with virtual memory. You can imagine the complex hoops a hypervisor must repeatedly jump through when the operating system in a client machine is setting up its memory map at application startup, or adjusting the working sets of applications by manipulating its map of virtual memory.

One way around overhead like that is to take a long, hard look at how prevalent you expect virtualization to be, and seriously ask: Is this operating system ever really going to run on bare metal? Or will it almost always run under a hypervisor?

Some operating system development streams decided the answer to that question is: No bare metal. A hypervisor will always be there. Examples: Linux with the Xen hypervisor, IBM AIX, and of course the IBM mainframe operating system z/OS (no mainframe has been shipped without virtualization since the mid-1980s).

If that's the case, things can be more efficient. If you know a hypervisor is always really behind memory mapping, for example, provide an actual call to the hypervisor to do things that have substantial overhead. For example: Don't do your own memory mapping, just ask the hypervisor for a new page of memory when you need it. Don't set the real-time clock yourself, tell the hypervisor directly to do it. (See figure below.)

Image courtesy of Greg Pfister.

This technique has become known as paravirtualization, and can lower the overhead of virtualization significantly. A set of "para-APIs" invoking the hypervisor directly has even been standardized, and is available in Xen, VMware, and other hypervisors.

The concept of paravirtualizatin actually dates back to around 1973 and the VM operating system developed in the IBM Cambridge Science Center. They had the not-unreasonable notion that the right way to build a time-sharing system was to give every user his or her own virtual machine, a notion somewhat like today's virtual desktop systems. The operating system run in each of those VMs used paravirtualization, but it wasn't called that back in the Computer Jurassic.

Virtualization is, in computer industry terms, a truly ancient art.

The next post covers , lowest-overhead technique used in virtualization, then input/output, and draws some conclusions.

This post originally appeared on Greg Pfister’s blog, Perils of Parallel, and in the proceedings of Cloudviews - Cloud Computing Conference 2009, the 2nd Cloud Computing International Conference.


 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR


NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing


Enter your email address to subscribe to iSGTW.


 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum


January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2


February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III

More calendar items . . .


FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map