|[Home] [Credit Search] [Category Browser] [Staff Roll Call]||The LINUX.COM Article Archive|
|Originally Published: Thursday, 19 July 2001||Author: Subhasish Ghosh|
|Published to: learn_articles_firststep/General||Page: 1/1 - [Std View]|
Bootstrapping a Linux system - An Analysis
Ever wonder what happens between powering on your system and the logon prompt? You see all the screen messages, but what do they mean? Linux.com writer Subhasish Ghosh wondered the same thing, and went to find out in Bootstrapping a Linux System.
As many of the readers must have noticed when the computer is bootstrapping itself a lot of messages come up on the screen. These can be viewed later by issuing the command:
cat /var/log/dmesg | more (because it is usually a lot of output). Now, the question is: Hey, what do these messages mean? That's easy to answer: Look into any Linux textbook, and you will find it says something like this: "the Kernel Boot messages" and so on. But, is that all? And what is meant by "Kernel Boot messages"?
A complete understanding of the internal workings of Linux requires a lot of patience and even sacrifice because it requires a complete understanding of the architecture of the Linux Kernel. Most Linux users either don't have that much study time available or are not that interested in it, while some may have other important things to do in life. But for those of you who, like me, are interested in what happens at boot time this article will attempt to cast a little more light on the issue.
I am not going to explain the "Linux Kernel Architecture" in this article because it would require a whole book to do so. Rather, in this article, I explain (or at least try to explain), in detail, one of the most fundamental concepts of a computer-system - Bootstrapping; the process of starting the system, from turning on the system to seeing the login prompt
- A very basic understanding of the internal workings and operations of a computer kernel is assumed on behalf of the readers.
- All the files mentioned in this article refer to Linux Kernel 2.4.2-2. Though the files are common for all Linux Kernels and can be found on any Linux system, I have used Red Hat Linux 7.1
- In this article we must limit our discussion to IBM PC architectures.
- I have a friend who lives nearby who kicks his CPU to start what he calls "bootstrapping". I usually call it "bootslapping". But the process mentioned here also applies to his machine!
Then, the code found at physical address 0xfffffff0 is executed. This address is mapped by the hardware to be a read-only, permanent memory-chip, a special kind of memory that is usually called ROM (Read-Only Memory). The BIOS (Basic Input/Output System) is a set of programs that is stored in ROM. It consists of several interrupt-driven low-level procedures used by various operating systems to handle the hardware devices that constitute the computer-system. Microsoft DOS is one such OS.
The question that now comes up is: does Linux also use the BIOS to initialize the hardware devices attached to the computer system? Or, is it anything else that performs the same task? If yes, what's it? Well, the answer is not that simple, cause the answer needs to be understood carefully. Starting with the 80386 model, Intel microprocessors perform address translation (from Logical Address -> Linear Address -> Physical Address) in two different ways called the "Real mode" and "Protected mode". Real mode exists mainly to maintain processor compatibility with older models. In fact, all BIOS procedures are executed in Real mode. But, the Linux Kernel executes in the Protected mode and NOT in the Real mode. Thus, once initialized, Linux does NOT make any use of BIOS but provides it's own device drivers for every hardware device on the computer.
The question that now comes up is: When Linux uses "Protected mode", why can't the BIOS use the same mode? BIOS uses the Real mode, because it utilizes Real mode addresses for its operation, and Real mode addresses are the only ones available when the computer is switched on. A Real mode address is a seg segment and an off offset; thus the corresponding physical address is given by seg*(2*8)+off. (Please note: Since a Segment Descriptor is 8 bytes long, it's relative address inside the GDT or the LDT is obtained by multiplying the most significant 13 bits of the Segment Selector by 8).
So, does this mean Linux never uses the BIOS during the entire process of bootstrapping? Well, the answer here is No, Linux is forced to use BIOS in the bootstrapping phase when it has to retrieve the Kernel image from disk or some other external device.
To sum up this section, let's look closely at the main operations that the BIOS performs during the bootstrapping sequence. They are as follows:
That's all. These are the operations that the BIOS is scheduled to perform. Once this is over, it's the Boot Loader that takes over. So, let's now move on to the next section.
The Linux Kernel fits into a single 1.44-MB floppy disk. (In fact, there exists a type of Red Hat Linux installation known as "stripped-off" type, where it requires approx. 2 MB physical RAM and approx. 1.44 MB hard disk space for running a Red Hat Linux system.) But the only way to store a Linux Kernel on a single floppy disk is to compress the "Linux Kernel Image". The point to remember here is that compressing is done at compile time, while decompressing is done at boot time by the loader.
In the case of booting Linux from a floppy disk the boot loader's job is very simple. It has been coded in the
/usr/src/linux-2.4.2/arch/i386/boot/bootsect.S assembly language file. When we compile the Linux Kernel source, and obtain a new kernel image, the executable code yielded by this assembly language file is place at the beginning of the Kernel image file. This makes it easy to produce a floppy disk containing the Linux Kernel.
Copying the kernel image starting from the first sector of the disk can create the floppy. When the BIOS loads the first sector of the floppy disk, it actually copies the code of the boot loader. The boot loader, which is invoked by BIOS (by jumping to the physical address 0x00007c00) performs the following operations:
Usually the Linux Kernel is loaded from a hard disk. This requires a two-stage boot loader. On Intel systems, the most commonly used Linux boot loader is named LILO. For other architectures, other Linux boot loaders exist. LILO may either be installed on the MBR (Please note: During Red Hat Linux Installation there comes a step where the user has to either write LILO to the MBR or put it in the boot sector) or in the boot sector of an active disk partition.
LILO is broken into two parts otherwise it would be too large to fit into the MBR. The MBR (or the disk partition boot sector) includes a small boot loader, which is loaded into RAM starting from address 0x00007c00 by the BIOS. This small program moves itself to the address 0x0009a000, then sets up the Real Mode stack, and then finally loads the second part of the LILO boot loader. (Please note: The Real Mode stack ranges from address 0x0009b000 to 0x0009a200).
The second part of LILO reads all the available operating systems from disk and offers the user a prompt so that he or she can choose any of them from the available list. After the user has chosen any Kernel (on my system, one can opt for any 1 Linux Kernel out of 8 Custom Kernels) to be loaded, the boot loader may either copy the boot sector of the corresponding partition into RAM and execute it or directly copy the Kernel image into RAM.
Since the Linux Kernel image must be booted the Linux boot loader performs essentially the same operations as the boot loader integrated into the Kernel image. The boot loader invoked by BIOS (by jumping to the physical address 0x00007c00) performs the following operations:
The setup( ) function can be found in the file
/usr/src/linux-2.4.2/arch/i386/boot/setup.S. The code of the setup( ) assembly language function is placed by the linker immediately after the integrated boot loader of the Kernel, that is, at offset 0x200 of the Kernel Image file. This allows the boot loader to locate the code easily and copy it onto the RAM starting from the physical address 0x00090200.
Now the question that comes up is: What does this setup( ) function do? As its name suggests, it's supposed to set up something. But what? And how?
As we all know for the Kernel to operate properly all the hardware devices in the computer must be detected and then initialized in an orderly fashion. The setup( ) function initializes all the hardware devices and thus creates an environment for the Kernel to operate in.
But, hang on a second. Didn't we see a few minutes earlier that the BIOS was supposed to do all this stuff? Yeah, you are right. Although the BIOS already initialized most hardware, the Linux Kernel does NOT rely on it and initializes all of the hardware in its own fashion. But, if someone asks, well, why does Linux operate in such a way? The answer to this question is both very easy yet extremely difficult to explain. The Linux Kernel had been so designed to enhance portability and robustness. This is one of the many features that makes the Linux Kernel the best out of all the Unix and Unix-like Kernels available and makes it unique in so many ways. A proper understanding of why and exactly how the Linux Kernel implements this feature is beyond the scope of this article and would require an extremely detailed coverage of the essential features of the Linux Kernel Architecture.
The setup( ) code mainly performs the following tasks:
From here on the going gets a bit tougher as the bootstrap process gets a bit more complicated.
/usr/src/linux-2.4.2/arch/i386/boot/compressed/head.Sfile. After setup( ) code is executed this function has been moved either to physical address 0x00100000 or to physical address 0x00001000, depending on whether the Kernel Image was loaded "high" or "low" in RAM.
When executed this function performs the following operations:
Now, after this 4th operation code execution is taken over by the other startup_32( ) function. In other words, the second one takes over the bootstrapping process.
At this point you might be tempted to ask: Hey, using two different functions having the same name... Doesn't this cause problem? The answer is: Well, no it doesn't at all. Both functions are executed by jumping to their initial physical addresses and hence they are executed in their own execution environments. No problem at all!
Now, let's look at the second startup_32( ) function's functionality. What does it do? When executed this function essentially sets up the execution environment for the first Linux process (process 0). The function performs the following operations:
The following takes place when this function executes:
The "Linux version 2.4.2 ..." message is displayed right after the beginning of start_kernel( ). Many other messages are displayed also. At the very end, the very familiar login prompt appears on the console. This tells the user that the Linux Kernel is up and running, and just raring to go... and dominate the world!
About the Author: My name is Subhasish Ghosh. I am 20, currently a computer-systems engineering student in India. I am a Microsoft Certified Professional (MCP), MCP certified on NT 4.0 and recently completed Red Hat Linux Certified Engineer (RHCE) Training. I have been working with Linux for a long time now, have programmed using C, C++, VC++, COM, MFC, DCOM, ATL 3.0, Perl, Python and Linux programming using GTK+. Currently I'm busy learning the Linux Kernel Architecture in detail, writing articles for Linux.com and most importantly practicing non-geek talk with my girlfriend, Hanna. E-mail: firstname.lastname@example.org
PLEASE NOTE: I like hearing from and helping Linux users from all over the world. Anyone interested in e-mailing me should feel free to do so. However, I request computer-freaks, people who complain too much about everything and specifically egoistic Indians pretending to know everything trying to point out mistakes in my articles please do not e-mail me. I don't have much time for that. Thank you!