红联Linux门户
Linux帮助

Write emulator-friendly Linux code

发布时间:2007-11-29 09:59:40来源:红联作者:golenuort
Computers have been emulating other computers for a long time, often to access a legacy application or to use applications written for a popular OS on a system with a more stable, responsive OS. As Linux™ grows in popularity, developers need to examine their options when planning binaries that will run on non-Linux systems. This article examines what emulators do and looks at hardware and software emulation issues in detail.
For years, computers have been emulating other computers. A common reason to emulate older computers is nostalgia, and indeed, many emulators can run a broad variety of video games with perfect fidelity. Another reason to emulate another computer is to access application software that exists only on a specific platform.

In general, application emulation targets platforms that possess the larger market shares. For instance, the WINE project attempts to provide a way to run Windows® binaries, because -- let's face it -- there are many more cool applications for Windows than there are for Linux (although, as they point out, WINE Is Not an Emulator).

However, in recent years Linux has proven to be a stable and versatile operating system; consequently, its market share has grown. And along with the growth of market share has come a spike in interest in emulating Linux. This article reviews the current state of Linux binary emulation on other systems and highlights some of the issues that Linux developers should keep in mind to make life easier for the people running their binaries in emulation.

The basic emulator

The idea of an emulator is simple. Computers are predictable enough. If you want to know exactly what a computer would do if it were given a certain piece of code, you can find out by making a model of that computer. Of course, there's a certain amount of overhead involved, but if the computer you're emulating is much older than the computer doing the emulation, the emulation will be faster than the original.

Some emulation layers, such as NetBSD's Linux emulation layer, merely provide emulation of the software part of an environment, taking system calls from a Linux binary and handing back results that look like a Linux kernel was being used. Others, such as VirtualPC, may emulate the whole computer, including the processor. Emulating the processor is slower but can produce better compatibility.

Emulators as a distribution format

Although this article focuses on ways to run Linux binaries on other platforms, distribution of compiled binaries has its place as well. As Linux emulation becomes more widespread, the Linux binary format becomes a viable way to distribute simple programs without giving out source code. Linux binaries can be run on a broad variety of systems, admittedly sometimes at a cost -- there are challenges in using the Linux binary format as a general distribution format.

Emulation usually isn't enough to let you run a shared object built for one system in a program built for another. If your product is mostly distributed as a shared library object, it probably can't be loaded on other platforms.

There are those who would argue that using the Linux binary format for distribution of code to other platforms is crazy. It may be crazy, but it works. For a few years, my primary Web browser was running under emulation (to say nothing of word processors, document converters, and even credit-card processing software).

Much of the software applications we like to use are commercial, and commercial software vendors benefit greatly from being able to distribute a single binary that runs on a great number of platforms. Given the variety of Linux emulation available, the Linux binary format is starting to look like a real software distribution option.

Oh, and porting source code is a much different task than distribution; frequently, porting is a much easier task.

Full hardware emulators

A full hardware emulator simulates an entire machine; not just the processor but the rest of the machine as well. For instance, an emulated computer will act as though it has its own keyboard controller and video card.

Full hardware emulation is especially common for accessing older-machine programs. A popular example is the MAME arcade game emulator, which emulates the hardware of various old arcade machines.

Full hardware emulators are in some ways the simplest way to do emulation. A lot of work goes into building a full hardware emulator, but once you've got it, everything should just work. For instance, VirtualPC on the Macintosh started supporting Linux in version 3.

Hardware emulation can get you around problems you can't easily bypass otherwise. For instance, I once had a BIOS flash utility that was distributed only in the format of a self-extracting image file for DOS. Worse, it only ran on a machine with an actual floppy on a traditional ISA floppy controller (my Windows desktop machine had an LS-120 drive). Emulation to the rescue! I ran the program under an emulator, writing the data to a USB floppy drive plugged into a Mac.

Hardware emulation has its downside, too. A lot of effort goes into making everything work. If you want a network, you need to emulate a network chip well enough for the operating system to run on it. Furthermore, emulating foreign instructions can be very expensive. Often, a system like this will work nearly perfectly, but timing-related functionality may be unreliable.

Full hardware emulators have been in use for a long time, at their best for handling legacy systems and code that can take the speed hit from emulation.

Nonetheless, users who want to run x86 Linux binaries on a Macintosh or any other non-x86 machine may well rely on one of the currently available x86 emulators to try to get it running. Most utility programs will run perfectly well (if slowly, perhaps) on systems like this. The only major concern to worry about is that users of such systems may install smaller or older Linux distributions in the hopes of improving performance. Someone running an emulated machine with 32 MB of memory is unlikely to run the latest version of KDE.

Partial hardware emulators

Partial hardware emulators are an intermediate solution: they emulate a computer, but only a computer of the type they're actually hosted on. Programs like this reduce the cost of emulation by generally performing at speeds comparable to the host machine. Examples include the Serenity Virtual Station and VMWare.

These systems are most useful when you have applications for a variety of systems and need to run them all at once. Like full hardware emulators, systems like this will be running a full Linux OS environment, and your program should be fine as long as it's reasonably portable across Linux systems. However, once again, portability to older versions of Linux will help a lot. People using a virtual machine may want to run an older, smaller version of Linux on it.

Software emulators

In the world of emulation, software emulators are where life gets interesting. A software emulator is not running your program on a virtual machine -- it's running it on the fly without a virtual machine. These programs work by setting up an environment in which a program's code can run normally, but attempts by the program to access the operating system get routed through an emulation layer of some sort. WINE is a great example (albeit for Windows), although it is officially not an emulator.

Some software emulators are explicitly invoked by the user, like the lxrun program available for SCO and Solaris systems. Others are built into a UNIX® kernel's support for loading binary images -- if a program doesn't look to be valid, it can be compared against a table of possible emulators that can look at it to see whether they can run it.

Software emulators often offer the best user experience. There's no special set up, no large disk images. The programs just run (most of the time). Access to system calls, shared libraries, and file system structures raise a number of issues, though, so we'll cover them next.

System calls

System calls are the easiest and the hardest part of emulation. A system call has a well-defined interface, and the calling mechanism can generally be easily detected and handled -- that's the easy part. The hard part is that the system call may be difficult or impossible to implement reasonably.

Traditionally, the big killer in Linux emulation was the clone() system call. This call provided a brute force way to get simple threading by creating two processes that shared a number of things that could include memory, file descriptors, signal handling -- in other words, anything and everything. Unfortunately, if your operating system didn't provide a good analogue to this, there was simply no way to implement the system call.

Worse, since clone() showed up when POSIX threads were not well or widely supported and was often used as a substitute for them, a lot of programs used it in a variety of exciting, complicated, and (need I say) unexpected ways.

If you want people to run your binaries, try to stay away from OS-specific system calls; favor standard POSIX system calls. This is a good practice in software development.

A kernel-based emulator traps the system calls when they reach it. A user-space emulator such as lxrun waits for the application to try to make a system call. Because the Linux system call facility is not the same as the system call facility on Solaris or SCO UNIX, the result is a segmentation fault. The lxrun program then acts like a debugger, correcting the fault and continuing -- but in fact, it has intercepted the system call, made a corresponding system call to the underlying operating system, and patched everything up. Clever!

File system structures

The problem with file systems is often more subtle. It's easy enough to access the file system. What's not easy is finding the files there that you expect.

If your program is running in emulation, the file system you access may be substantially different from the file system you had when you were developing the program. For instance, if your program uses the /proc file system (commonly used to get access to kernel status and information), it's possible that a feature common in more recent kernels will be absent on an older system.

Linux developers have a big advantage here over developers on proprietary systems, because different Linux distributions arrange files differently, so most programmers have a good sense of how to avoid being too dependent on file system layouts. Nonetheless -- sometimes -- a file name will have a perfectly good reason for being encoded in a program.

A solution to this dilemma, adopted in more than one emulator, is to set up an extra layer of interpretation for file system calls. For instance, in NetBSD's Linux emulation code, file accesses are checked first against the files in /emul/linux and only after that against the files in the system's real root directory. This allows the system to provide "overrides" for system files when Linux binaries won't work with the standard files.

In fact, the main use for this is in libraries and other support files, but a number of system binaries are provided as well. For instance, if a Linux binary were to try to call uname to get a kernel version, it would be very confused if it got back a NetBSD version number. Instead, it gets the Linux version numbers it's expecting.

Shared libraries

As mentioned above, shared libraries are a good candidate for being found by the emulated binaries but not by system binaries. Because the details of shared library formats and ABIs may vary from one system to another, you can't just assume that all the systems can share a given library. Names will clash -- for instance, the current NetBSD and SUSE 7.3 both have a file called libncurses.so.5. Gettting the right one of those is important.

Shared libraries bring up another point for developers. It's important to know what library version the different systems are using. Right now, NetBSD is using SUSE 7.3 shared libraries for its Linux emulation. There's code to grab the 9.1 shared libraries, but there's also a warning that they aren't stable with the kernel-level emulation.

Emulation packages tend to lag a bit behind the rest of the marketplace. Even if you think that most of your prospective users will have reasonably current Linux distributions, the emulator crowd will almost all be a bit behind the times.

Shared libraries bring up another concern -- not every system contains all of them. Emulation packages are often likely not to have every last shared library installed. And, to make it more fun, their users are less likely to be able to easily install a missing package.

In these cases, it's a good idea to minimize dependencies, both on new features and on non-core shared libraries. Emulator users are likely to run into these issues.

Don't get tricked into using static libraries as insurance against these problems. A static library can introduce its own new dependencies, and you can't check them as easily. It doesn't do any good to rework an algorithm to avoid an unportable system call if you statically link with a library that uses it. Dynamic linking allows you to build a program that will run on a much broader variety of systems.

Programs calling other programs

There is one special case that seems to bite people more than any other, especially with installers. On many systems, the default shell you get by calling /bin/sh is not bash. This means that scripts that assume bash extensions may not work on other systems.

This gets into an especially tricky bit of logic in the emulator. The operating system probably knows enough to check the Linux path for relevant Linux binaries when a binary is executed, and it will likely have a copy of bash installed there. But when you run a script, the kernel doesn't see this as a Linux binary; it sees a script with an interpreter path, and it's no longer running in emulation mode when it tries to load the interpreter.

Portable shell scripting techniques pay off here. This is one of the most common issues users face when running emulated applications. The installer may fail to run because it's a nonportable shell script.
文章评论

共有 0 条评论