Prof. Douglas Thain: virtual machines

Tuesday, November 16, 2010

The Virtualization Theorem Ignored for Three Decades

Today, in my graduate operating systems class, we discussed what I believe is the most important result in computer science ever to be persistently ignored:

Popek and Goldberg, Formal Requirements for Virtualizible Third Generation Architectures, Communications of the ACM, Volume 17, Issue 7, July 1974.

This paper puts forth a very simple principle that must be observed in order for a CPU to be capable of running in a virtual machine. First, two definitions:

A sensitive instruction reads or modifies supervisor state
A privileged instruction traps if attempted in user mode.

And this central theorem:

All sensitive operations must be privileged.

Here is why this is important. A conventional operating system (OS) is in charge of the whole machine, and is free to modify the processor status, page tables, I/O devices, and other sensitive aspects of the machine in order to run normal processes.

But, if you take that OS and put it in a virtual machine (VM), it is no longer in charge of the whole machine. All of those actions on sensitive state must be translated in some way by the virtual machine monitor. The simplest way to accomplish that translation is to run the OS in user mode, allowing the VMM to execute sensitive operations on its behalf. To make sure that the VMM gets all of the sensitive operations, they must all be forced to trap.

This principle was articulated very clearly in 1974, when virtualization was already a widely applied technique in the world of mainframe computing. Unfortunately, the principle didn't make the leap into the microcomputer world. In fact, there was an enduring tradition of releasing processors that were not virtualizable, only to realize the mistake and issue a second version with a minor fix.

For example, the venerable Motorola 68000 was first released in 1978, and was heralded as a "mainframe on a chip". Except, it had one little problem: a read from the sensitive status register did not trap, preventing the construction of a virtual memory system. So, Motorola issued the 68010, which was almost identical, except that a read from the status register forced a trap, enabling correct virtualization of memory.

Unfortunately, not everybody got the memo.

For nearly three decades, the Intel x86 series of processors did not have this property. In user mode, many instructions could be used to view sensitive state, and many attempts to write sensitive state would fail silently without a trap. From the 1970s until the late 1990s, efficient virtualization was basically impossible on the most widely used processor family.

Around the year 2000, virtualization became of interest as a way to service the multi-tenancy needs of large internet services. A number of solutions were developed simultaneously to work around the limitations of the Intel chips. One approach used in VMWare was to dynamically compile assembly code at runtime to convert sensitive instructions into deliberate traps to the VMM. Another approach used in the Xen hypervisor was to modify the operating system code so that it explicitly called the VMM instead of invoking sensitive instructions.

There are many other approaches to working around the limitation. Suffice to say that they are all rather complicated, but they can be made to work.

Finally, in 2005, both Intel and AMD introduced virtualization extensions to their processors, enabling basic trap-and-execute virtualization, only 29 years after the Popek and Goldberg theorem was widely circulated.

So, what's the moral of the story?

Monday, April 5, 2010

The Forty Tribes of Linux

As I have noted in this column before, a perennial challenge of distributed computing in the real world is dealing with the multiplicity of operating systems and related environments. If you are dealing with an uncontrolled environment like a large university or an 'at home' computing environment, there is no telling what you are going to get. If you have a piece of software that depends exactly on the presence of Linux 19.5.3.4.9.2, it just isn't going to work.

You might think that this could be avoided by having a professionally managed environment. At Notre Dame, we have a site license for Red Hat Linux, and our staff are pretty rigorous in keeping everything up to date and on track. But even then, you can't assume everything is identical: there is no way to upgrade everyone simultaneously, and every machine operates on a different schedule (and discipline) for picking up automatic updates. For example, we are currently in the tail of of a general campus migration from Red Hat 4 to Red Hat 5.

Here is some hard evidence. We recently started using the neat 'cron' feature in Condor to make a daily observation of the operating system version, kernel version, and C library version of each machine. With a few variations on condor_status, we can see the upgrade status of the whole system:

The major release numbers (below) aren't too bad. About 3/4 of our cores are running the latest Red Hat, but another 73 machines are behind by a version or two. And, oops, looks like someone plugged in their own personal CentOS machine. Not too hard to deal with, if you are careful to put 'redhat_version' in your requirements:


% condor_status -format "%s\n" redhat_version | sort | uniq -c | sort -rn

782 Red Hat Enterprise Linux Server release 5.4 (Tikanga)
27  Red Hat Enterprise Linux AS release 4 (Nahant Update 7)
26  Red Hat Enterprise Linux Server release 5.3 (Tikanga)
10  Red Hat Enterprise Linux AS release 4 (Nahant Update 8)
10  Red Hat Enterprise Linux WS release 4 (Nahant Update 7)
4   CentOS release 5.3 (Final)

If we go a little deeper, the picture gets murkier. Below are the distribution of Linux kernel versions. Interesting to note that a few are hand-modified for some unusual hardware, and only two are Xen virtualized. Hope that you don't have any code sensitive to the kernel version.


% condor_status -format "%s\n" kernel_version | sort | uniq -c | sort -rn
   
342 2.6.18-164.9.1.el5
294 2.6.18-164.el5
94  2.6.18-164.10.1.el5
32  2.6.18-164.11.1.el5
32  2.6.9-78.0.13.ELsmp
14  2.6.18-128.7.1.el5
12  2.6.18-164.6.1.el5
10  2.6.18-128.2.1.el5
6   2.6.18-164.2.1.el5
5   2.6.9-78.0.17.ELsmp
4   2.6.27.8-md-microway
4   2.6.9-89.0.20.ELsmp
2   2.6.18-128.4.1.el5
2   2.6.18-164.9.1.el5xen
2   2.6.9-78.0.5.ELsmp
2   2.6.9-89.0.16.ELsmp
2   2.6.9-89.0.9.ELsmp

For completeness, here is the distribution of glibc versions, which has much the same story:


% condor_status -format "%s\n" glibc_version | sort | uniq -c

452 glibc-2.5-42.el5_4.2
296 glibc-2.5-42
34  glibc-2.5-42.el5_4.3
24  glibc-2.3.4-2.41
16  glibc-2.5-34.el5_3.1
14  glibc-2.5-34
13  glibc-2.3.4-2.41.el4_7.1
6   glibc-2.3.4-2.43
4   glibc-2.3.4-2.43.el4_8.1

In the good old days, you could just indicate that a program required OpSys=="LINUX" and more or less expect it to run. That certainly isn't possible now. Perhaps we are misleading users by talking about this thing called Linux, which doesn't really exist in any consistent form. Instead, we should be telling our users that a new operating system gets invented every week, and is usually named after a team on Survivor.

The good folks at Sun tried to solve this problem almost 20 years ago with Java. The idea was that they would create a stable platform that could be implemented on any machine. Then, you could write programs that would be universally portable. The problem was, well...


% condor_status -format "%s " JavaVendor -format "%s\n" JavaVersion | sort | uniq -c | sort -rn
308 Sun Microsystems Inc. 1.6.0
222 Sun Microsystems Inc. 1.6.0_15
174 Sun Microsystems Inc. 1.6.0_17
52  Free Software Foundation, Inc. 1.4.2
28  Sun Microsystems Inc. 1.6.0_18
3   Sun Microsystems Inc. 1.5.0_17
2   Apple Computer, Inc. 1.5.0_19

Many people think the grand solution to this problem is virtual machines. Perhaps, but more on that next time.

Prof. Douglas Thain

Tuesday, November 16, 2010

The Virtualization Theorem Ignored for Three Decades

Monday, April 5, 2010

The Forty Tribes of Linux