iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week
Null

Home > iSGTW - 28 July 2010 > Back to Basics - How hardware virtualization works: Part 2

Back to Basics - How hardware virtualization works: Part 2


Greg Pfister
BY GREG PFISTER
Since retiring from his position as an IBM Distinguished Engineer, Greg Pfister has worked as an independent consultant as well as serving as research faculty at Colorado State University. Pfister is the author of “In Search of Clusters,” and over his 30-year career, he has accrued over 30 patents in parallel computing and computer communications.

It is possible to find many explanations of hardware virtualization on the Internet and, of course, in computer science courses. Nonetheless, there remains a great deal of confusion regarding this increasingly popular technology.

This multi-part series attempts to provide an approachable explanation of hardware virtualization. You can see Part 1 here.

The Goal

The goal of hardware virtualization is to maintain, for all the code running in a virtual machine, the illusion that it is running on its own, private, stand-alone piece of hardware. What a provider is giving you is a lease on your own private computer, after all.

“All code” includes all applications, all middleware like databases or LAMP stacks, and crucially, your own operating system –including the ability to run different operating systems, like Windows and Linux, on the same hardware, simultaneously. Hence: Isolation of virtual machines from each other is key. Each should think it still “owns” all of its own hardware.

The result isn’t always precisely perfect. With sufficient diligence, operating system code can figure out that it isn’t running on bare metal. Usually, however, that is the case only when specific programming is done with the aim of finding that out.

Trap and Map

The basic technique used is often referred to as “trap and map.” Imagine you are a thread of computation in a virtual machine, running on a one processor of a multiprocessor that is also running other virtual machines.

So off you go, pounding away, directly executing instructions on your own processor, running directly on bare hardware. There is no simulation or, at this point, software of any kind involved in what you are doing; you manipulate the real physical registers, use the real physical adders, floating-point units, cache, and so on. You are running asfastas thehardwarewillgo. Fastasyoucan. Poundingoncache, playingwithpointers, keepinghardwarepipelinesfull, until…

BAM!

Greg Pfister
Image courtesy of Greg Pfister

You attempt to execute an instruction that would change the state of the physical machine in a way that would be visible to other virtual machines. (See the figure nearby.)

Just altering the value in your own register file doesn’t do that, and neither does, for example, writing into your own section of memory. That’s why you can do such things at full-bore hardware speed.

Suppose, however, you attempt to do something like set the real-time clock – the one master real time clock for the whole physical machine. Having that clock altered out from under other running virtual machines would not be very good at all for their health. You aren’t allowed to do things like that.

So, BAM, you trap. You are wrenched out of user mode, or out of supervisor mode, up into a new higher privilege mode; call it hypervisor mode. There, the hypervisor looks at what you wanted to do – change the real-time clock – and looks in a bag of bits it keeps that holds the description of your virtual machine. In particular, it grabs the value showing the offset between the hardware real time clock and your real time clock, alters that offset appropriately, returns the appropriate settings to you, and gives you back control. Then you start runningasfastasyoucan again. If you later read the real-time clock, the analogous sequence happens, adding that stored offset to the value in the hardware real-time clock.

Not every such operation is as simple as computing an offset, of course. For example, a client virtual machine’s supervisor attempting to manipulate its virtual memory mapping is a rather more complicated case to deal with, a case that involves maintaining an additional layer of mapping (kept in the bag o’ bits): A map from the hardware real memory space to the “virtually real” memory space seen by the client virtual machine. All the mappings involved can be, and are, ultimately collapsed into a single mapping step; so execution directly uses the hardware that performs virtual memory mapping.

Concerning Efficiency

How often do you BAM? Unhelpfully, this is clearly application dependent. But the answer in practice, setting aside input/output for the moment, is not often at all. It’s usually a small fraction of the total time spent in the supervisor, which itself is usually a small fraction of the total run time. As a coarse guide, think in terms of overhead that is well less than 5%, or in other words, for most purposes, negligible. Programs that are IO intensive can see substantially higher numbers, though, unless you have access to the very latest in hardware virtualization support; then it’s negligible again. A little more about that later.

I originally asked you to imagine you were a thread running on one processor of a multiprocessor. What happens when this isn’t the case? You could be running on a uniprocessor, or, as is commonly the case, there could be more virtual machines than physical processors or processor hardware theads. For such cases, hypervisors implement a time-slicing scheduler that switches among the virtual machine clients. It’s usually not as complex as schedulers in modern operating systems, but it suffices. This might be pointed to as a source of overhead: You're only getting a fraction of the whole machine! But assuming we’re talking about a commercial server, you were only using 12% or so of it anyway, so that’s not a problem. A more serious problem arises when you have less real memory than all the machines need; virtualization does not reduce aggregate memory requirements. But with enough memory, many virtual machines can be hosted on a single physical system with negligible degradation.

The next post covers more of the techniques used to do this, getting around some hardware limitations (translate/trap/map) and efficiency issues (paravirtualization).

This post originally appeared on Greg Pfister’s blog, Perils of Parallel, and in the proceedings of Cloudviews - Cloud Computing Conference 2009, the 2nd Cloud Computing International Conference.

Tags:



Null
 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR

 Announcements

NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing

 Subscribe

Enter your email address to subscribe to iSGTW.

Unsubscribe

 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum

 

January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2

 

February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III


More calendar items . . .

 

FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map