Next: From Alpha to Portability Up: The Linux Edge Previous: Amiga and the Motorola |
When I began to write the Linux kernel, there was an accepted school of thought about how to write a portable system. The conventional wisdom was that you had to use a microkernel-style architecture.
With a monolithic kernel such as the Linux kernel, memory is divided into user space and kernel space. Kernel space is where the actual kernel code is loaded, and where memory is allocated for kernel-level operations. Kernel operations include scheduling, process management, signaling, device I/O, paging, and swapping: the core operations that other programs rely on to be taken care of. Because the kernel code includes low-level interaction with the hardware, monolithic kernels appear to be specific to a particular architecture.
A microkernel performs a much smaller set of operations, and in more limited form: interprocess communication, limited process management and scheduling, and some low-level I/O. Microkernels appear to be less hardware-specific because many of the system specifics are pushed into user space. A microkernel architecture is basically a way of abstracting the details of process control, memory allocation, and resource allocation so that a port to another chipset would require minimal changes.
So at the time I started work on Linux in 1991, people assumed portability would come from a microkernel approach. You see, this was sort of the research darling at the time for computer scientists. However, I am a pragmatic person, and at the time I felt that microkernels (a) were experimental, (b) were obviously more complex than monolithic Kernels, and (c) executed notably slower than monolithic kernels. Speed matters a lot in a real-world operating system, and so a lot of the research dollars at the time were spent on examining optimization for microkernels to make it so they could run as fast as a normal kernel. The funny thing is if you actually read those papers, you find that, while the researchers were applying their optimizational tricks on a microkernel, in fact those same tricks could just as easily be applied to traditional kernels to accelerate their execution.
In fact, this made me think that the microkernel approach was essentially a dishonest approach aimed at receiving more dollars for research. I don't necessarily think these researchers were knowingly dishonest. Perhaps they were simply stupid. Or deluded. I mean this in a very real sense. The dishonesty comes from the intense pressure in the research community at that time to pursue the microkernel topic. In a computer science research lab, you were studying microkernels or you weren't studying kernels at all. So everyone was pressured into this dishonesty, even the people designing Windows NT. While the NT team knew the final result wouldn't approach a microkernel, they knew they had to pay lip service to the idea.
Fortunately I never felt much pressure to pursue microkernels. The University of Helsinki had been doing operating system research from the late 60s on, and people there didn't see the operating system kernel as much of a research topic anymore. In a way they were right: the basics of operating systems, and by extension the Linux kernel, were well understood by the early 70s; anything after that has been to some degree an exercise in self-gratification.
If you want code to be portable, you shouldn't necessarily create an abstraction layer to achieve portability. Instead you should just program intelligently. Essentially, trying to make microkernels portable is a waste of time. It's like building an exceptionally fast car and putting square tires on it. The idea of abstracting away the one thing that must be blindingly fast -- the kernel -- is inherently counter-productive.
Of course there's a bit more to microkernel research than that. But a big part of the problem is a difference in goals. The aim of much of the microkernel research was to design for a theoretical ideal, to come up with a design that would be as portable as possible across any conceivable architecture. With Linux I didn't have to aim for such a lofty goal. I was interested in portability between real world systems, not theoretical systems.