Originally Published: Thursday, 19 July 2001 Author: Subhasish Ghosh
Published to: learn_articles_firststep/General Page: 1/1 - [Std View]

Bootstrapping a Linux system - An Analysis

Ever wonder what happens between powering on your system and the logon prompt? You see all the screen messages, but what do they mean? Linux.com writer Subhasish Ghosh wondered the same thing, and went to find out in Bootstrapping a Linux System.

Everyday millions of Linux users all over the world switch on their computers, wait for a few seconds (or minutes depending on their CPU speeds) to see their favorite operating system boot, and finally get the "login" prompt. It causes immense pleasure just to log into your favorite operating system and work, doesn't it? Well, for me, surely it does.

As many of the readers must have noticed when the computer is bootstrapping itself a lot of messages come up on the screen. These can be viewed later by issuing the command: cat /var/log/dmesg | more (because it is usually a lot of output). Now, the question is: Hey, what do these messages mean? That's easy to answer: Look into any Linux textbook, and you will find it says something like this: "the Kernel Boot messages" and so on. But, is that all? And what is meant by "Kernel Boot messages"?

A complete understanding of the internal workings of Linux requires a lot of patience and even sacrifice because it requires a complete understanding of the architecture of the Linux Kernel. Most Linux users either don't have that much study time available or are not that interested in it, while some may have other important things to do in life. But for those of you who, like me, are interested in what happens at boot time this article will attempt to cast a little more light on the issue.

I am not going to explain the "Linux Kernel Architecture" in this article because it would require a whole book to do so. Rather, in this article, I explain (or at least try to explain), in detail, one of the most fundamental concepts of a computer-system - Bootstrapping; the process of starting the system, from turning on the system to seeing the login prompt

Bootstrapping. What's that?

Traditionally, the term "to bootstrap" refers to a person who tries to stand up (usually while lying down) by pulling on his or her own boots. In operating systems the term refers to the process in which a part of the operating system is brought into the Main Memory, with the processor executing those instructions. At bootstrapping the internal data structures of the Linux Kernel are initialized, values are set to the constituent variable(s), and processes are created (that usually spawn other significant processes later). Computer bootstrapping is a long and complicated task because when the computer is switched on all the hardware devices are in an unpredictable state, while the RAM is inactive and in a random state. Thus, the thing to keep in mind is the process called "bootstrapping" is highly dependent on the computer architecture in question.

Please note:
  1. A very basic understanding of the internal workings and operations of a computer kernel is assumed on behalf of the readers.
  2. All the files mentioned in this article refer to Linux Kernel 2.4.2-2. Though the files are common for all Linux Kernels and can be found on any Linux system, I have used Red Hat Linux 7.1
  3. In this article we must limit our discussion to IBM PC architectures.
  4. I have a friend who lives nearby who kicks his CPU to start what he calls "bootstrapping". I usually call it "bootslapping". But the process mentioned here also applies to his machine!

BIOS. What's that? What does it do?

When a computer is first powered on it's practically useless. The RAM chips contain random data, nothing is initialized, and there's no operating system present. To begin the bootstrapping process a special hardware circuit raises the logical value of the RESET pin of the CPU. Then, some CPU registers, which include registers like cs (a Segmentation Register - code segment register, which points to a segment containing program instructions) and eip (when a processor-detected exception is generated by the CPU, that is, in other words, an exception raised by the CPU when the CPU detects an anomalous condition while executing an instruction, they are further of three types, namely "faults", "traps" and "aborts", depending on the value of the eip register that is saved on the Kernel Mode stack when the CPU control unit raises the exception.) are set to fixed values.

Then, the code found at physical address 0xfffffff0 is executed. This address is mapped by the hardware to be a read-only, permanent memory-chip, a special kind of memory that is usually called ROM (Read-Only Memory). The BIOS (Basic Input/Output System) is a set of programs that is stored in ROM. It consists of several interrupt-driven low-level procedures used by various operating systems to handle the hardware devices that constitute the computer-system. Microsoft DOS is one such OS.

The question that now comes up is: does Linux also use the BIOS to initialize the hardware devices attached to the computer system? Or, is it anything else that performs the same task? If yes, what's it? Well, the answer is not that simple, cause the answer needs to be understood carefully. Starting with the 80386 model, Intel microprocessors perform address translation (from Logical Address -> Linear Address -> Physical Address) in two different ways called the "Real mode" and "Protected mode". Real mode exists mainly to maintain processor compatibility with older models. In fact, all BIOS procedures are executed in Real mode. But, the Linux Kernel executes in the Protected mode and NOT in the Real mode. Thus, once initialized, Linux does NOT make any use of BIOS but provides it's own device drivers for every hardware device on the computer.

The question that now comes up is: When Linux uses "Protected mode", why can't the BIOS use the same mode? BIOS uses the Real mode, because it utilizes Real mode addresses for its operation, and Real mode addresses are the only ones available when the computer is switched on. A Real mode address is a seg segment and an off offset; thus the corresponding physical address is given by seg*(2*8)+off. (Please note: Since a Segment Descriptor is 8 bytes long, it's relative address inside the GDT or the LDT is obtained by multiplying the most significant 13 bits of the Segment Selector by 8).

So, does this mean Linux never uses the BIOS during the entire process of bootstrapping? Well, the answer here is No, Linux is forced to use BIOS in the bootstrapping phase when it has to retrieve the Kernel image from disk or some other external device.

To sum up this section, let's look closely at the main operations that the BIOS performs during the bootstrapping sequence. They are as follows:

  1. The BIOS carries out an exhaustive series of tests on the hardware. This is to check what devices are present and that are working properly. This step is usually called POST (Power-On Self-Test). The version banner and a series of messages are displayed during this step.
  2. Next the BIOS initializes the Hardware. This step is a very significant one, because it guarantees that all hardware devices are operating without conflicts on the IRQ lines and I/O ports. When this step's about to be over, it displays a table of installed PCI devices.
  3. Then comes the "operating system". The BIOS will search for an operating system to boot. Depending on the BIOS setting it may access the boot sector of a floppy disk, any hard disk or any CD-ROM attached to the system.
  4. As soon as a valid device is found the BIOS copies the contents of its first sector into RAM, starting from the physical address 0x00007c00, then jumps to that address and executes the code just loaded.

That's all. These are the operations that the BIOS is scheduled to perform. Once this is over, it's the Boot Loader that takes over. So, let's now move on to the next section.

Boot Loader. What's that? What does it do?

The BIOS invokes (note: does NOT execute) a special program whose only task is to load the image of an operating system Kernel into RAM. This program is called the Boot Loader. Before we proceed any further let's take a brief look at the different ways a system can be booted.

  1. Booting Linux from a Floppy disk
  2. Booting Linux from a Hard disk

Booting Linux from Floppy Disk.

When booting from a floppy disk, the instructions stored in the first sector of the floppy disk are loaded into RAM and executed. These instructions then copy all the remaining sectors containing the Kernel image into RAM.

The Linux Kernel fits into a single 1.44-MB floppy disk. (In fact, there exists a type of Red Hat Linux installation known as "stripped-off" type, where it requires approx. 2 MB physical RAM and approx. 1.44 MB hard disk space for running a Red Hat Linux system.) But the only way to store a Linux Kernel on a single floppy disk is to compress the "Linux Kernel Image". The point to remember here is that compressing is done at compile time, while decompressing is done at boot time by the loader.

In the case of booting Linux from a floppy disk the boot loader's job is very simple. It has been coded in the /usr/src/linux-2.4.2/arch/i386/boot/bootsect.S assembly language file. When we compile the Linux Kernel source, and obtain a new kernel image, the executable code yielded by this assembly language file is place at the beginning of the Kernel image file. This makes it easy to produce a floppy disk containing the Linux Kernel.

Copying the kernel image starting from the first sector of the disk can create the floppy. When the BIOS loads the first sector of the floppy disk, it actually copies the code of the boot loader. The boot loader, which is invoked by BIOS (by jumping to the physical address 0x00007c00) performs the following operations:

  1. Moves itself from address 0x00007c00 to address 0x00090000.
  2. Using address 0x00003ff4, sets up the "Real Mode" stack.
  3. Sets up the disk parameter table. This is used by BIOS to handle the floppy device driver.
  4. Displays the message "Loading" by invoking a BIOS procedure.
  5. Then the boot loader invokes a BIOS procedure to load the setup( ) code of the Kernel Image from the floppy disk. It puts this into RAM starting from address 0x00090200.
  6. Next the boot loader invokes a BIOS procedure. This procedure loads the rest of the Kernel image from the floppy disk and puts the image in RAM starting from either address 0x00010000 (called "low address" for small Kernel Images compiled with "make zImage") or address 0x00100000 (called "high address" for big Kernel Images compiled with "make bzImage").
  7. Then, it finally jumps to the setup( ) code.

Booting Linux from Hard Disk.

When booting from the hard disk, the booting procedure is different. The first sector of the hard disk, called the Master Boot Record (MBR) includes the partition table and a small program. This program loads the first sector of the partition containing the operating system to be started. Linux is highly flexible and sophisticated piece of software, thus it replaces this small program in the MBR with a sophisticated program called LILO (LInux boot LOader). LILO allows users to select the operating system to be booted.

Usually the Linux Kernel is loaded from a hard disk. This requires a two-stage boot loader. On Intel systems, the most commonly used Linux boot loader is named LILO. For other architectures, other Linux boot loaders exist. LILO may either be installed on the MBR (Please note: During Red Hat Linux Installation there comes a step where the user has to either write LILO to the MBR or put it in the boot sector) or in the boot sector of an active disk partition.

LILO is broken into two parts otherwise it would be too large to fit into the MBR. The MBR (or the disk partition boot sector) includes a small boot loader, which is loaded into RAM starting from address 0x00007c00 by the BIOS. This small program moves itself to the address 0x0009a000, then sets up the Real Mode stack, and then finally loads the second part of the LILO boot loader. (Please note: The Real Mode stack ranges from address 0x0009b000 to 0x0009a200).

The second part of LILO reads all the available operating systems from disk and offers the user a prompt so that he or she can choose any of them from the available list. After the user has chosen any Kernel (on my system, one can opt for any 1 Linux Kernel out of 8 Custom Kernels) to be loaded, the boot loader may either copy the boot sector of the corresponding partition into RAM and execute it or directly copy the Kernel image into RAM.

Since the Linux Kernel image must be booted the Linux boot loader performs essentially the same operations as the boot loader integrated into the Kernel image. The boot loader invoked by BIOS (by jumping to the physical address 0x00007c00) performs the following operations:

  1. Moves itself from address 0x00007c00 to address 0x00090000.
  2. Using address 0x00003ff4, sets up the "Real Mode" stack.
  3. Sets up the disk parameter table. This is used by BIOS to handle the hard disk device driver.
  4. Displays the message "Loading Linux" by invoking a BIOS procedure.
  5. Then, invokes a BIOS procedure to load the setup( ) code of the Kernel Image. It puts this into RAM starting from address 0x00090200.
  6. Finally it invokes a BIOS procedure. This procedure loads the rest of the Kernel image and puts the image in RAM starting from either address 0x00010000 (called "low address" for small Kernel Images compiled with "make zImage") or address 0x00100000 (called "high address" for big Kernel Images compiled with "make bzImage").
  7. Then, it finally jumps to the setup( ) code.

The setup( ) function. What does it do?

Now the time has come to take a deeper look into some of the essential assembly language functions that are indispensable for the bootstrapping process.

The setup( ) function can be found in the file /usr/src/linux-2.4.2/arch/i386/boot/setup.S. The code of the setup( ) assembly language function is placed by the linker immediately after the integrated boot loader of the Kernel, that is, at offset 0x200 of the Kernel Image file. This allows the boot loader to locate the code easily and copy it onto the RAM starting from the physical address 0x00090200.

Now the question that comes up is: What does this setup( ) function do? As its name suggests, it's supposed to set up something. But what? And how?

As we all know for the Kernel to operate properly all the hardware devices in the computer must be detected and then initialized in an orderly fashion. The setup( ) function initializes all the hardware devices and thus creates an environment for the Kernel to operate in.

But, hang on a second. Didn't we see a few minutes earlier that the BIOS was supposed to do all this stuff? Yeah, you are right. Although the BIOS already initialized most hardware, the Linux Kernel does NOT rely on it and initializes all of the hardware in its own fashion. But, if someone asks, well, why does Linux operate in such a way? The answer to this question is both very easy yet extremely difficult to explain. The Linux Kernel had been so designed to enhance portability and robustness. This is one of the many features that makes the Linux Kernel the best out of all the Unix and Unix-like Kernels available and makes it unique in so many ways. A proper understanding of why and exactly how the Linux Kernel implements this feature is beyond the scope of this article and would require an extremely detailed coverage of the essential features of the Linux Kernel Architecture.

The setup( ) code mainly performs the following tasks:

  1. First, total amount of physical RAM available to the system is detected. It invokes a BIOS procedure for detecting the RAM.
  2. Sets the Keyboard repeat delay and rate.
  3. The Video adapter card is detected.
  4. The Disk Controller is reinitialized and hard disk parameters are determined.
  5. Checks for an IBM Micro Channel bus (MCA).
  6. Checks for a PS/2 pointing device (bus mouse).
  7. Checks for Advanced Power Management (APM) BIOS support.
  8. Checks the position of the Kernel Image loaded in RAM. If loaded "low" in RAM (when using zImage, at physical address 0x00010000) it is moved to "high" in RAM (at physical address 0x00001000). But, if the Kernel image is a "bzImage" loaded in "high" of RAM already, then it's not moved anywhere.
  9. Sets up the Interrupt Descriptor Table (IDT) and a Global Descriptor Table (GDT).
  10. If a floating-point unit (FPU) is present, it's now reset.
  11. The PIC (Programmable Interrupt Controller) is reprogrammed at this step.
  12. The CPU is switched from "Real mode" to "Protected mode" by setting the PE bit in the cr0 status register.
  13. Jumps to the stratup_32( ) assembly language function.

From here on the going gets a bit tougher as the bootstrap process gets a bit more complicated.

The startup_32( ) function - 1st function. What does it do?

Okay, let's get to the confusing parts straight away. There are two functions called startup_32( ). Though both these two startup_32( ) functions are assembly language functions and are required for bootstrap process they are totally different functions. The one we refer to here is coded in the /usr/src/linux-2.4.2/arch/i386/boot/compressed/head.S file. After setup( ) code is executed this function has been moved either to physical address 0x00100000 or to physical address 0x00001000, depending on whether the Kernel Image was loaded "high" or "low" in RAM.

When executed this function performs the following operations:

  1. The segmentation registers are initialized along with a provisional stack.
  2. The area of uninitialized data in the Kernel is filled with zeroes. It is identified by symbols _edata and _end.
  3. It then executes a function decompress_kernel( ). This function is used to decompress the Linux Kernel image. As a result, the "Uncompressing Linux . . ." message is displayed on the screen. After the Linux Kernel image has been decompressed correctly, the "OK, booting the kernel." message is shown. One very significant question here is: Okay, we understand the Linux Kernel image is decompressed, but where is it loaded? The answer is: If the Linux Kernel image was loaded "low", then the decompressed kernel is placed at physical address 0x00100000. Otherwise, if the Linux Kernel image was loaded "high", the decompressed kernel is placed in a temporary buffer located after the compressed image. The decompressed kernel image is then moved to its final position, which starts at physical address 0x00100000.
  4. Finally code execution jumps to the physical address 0x00100000.

Now, after this 4th operation code execution is taken over by the other startup_32( ) function. In other words, the second one takes over the bootstrapping process.

The startup_32( ) function - 2nd function. What does it do?

The decompressed Linux kernel image begins with another startup_32( ) function. This function exists in the /usr/src/linux-2.4.2/arch/i386/kernel/head.S file.

At this point you might be tempted to ask: Hey, using two different functions having the same name... Doesn't this cause problem? The answer is: Well, no it doesn't at all. Both functions are executed by jumping to their initial physical addresses and hence they are executed in their own execution environments. No problem at all!

Now, let's look at the second startup_32( ) function's functionality. What does it do? When executed this function essentially sets up the execution environment for the first Linux process (process 0). The function performs the following operations:

  1. The segmentation registers are initialized with their final values.
  2. Sets up the Kernel Mode stack for process 0.
  3. It then invokes and executes a function setup_idt( ) that fills the IDT (Interrupt Descriptor Table) with null interrupt handlers.
  4. The system parameters obtained from BIOS is put in the first page frame.
  5. The "Model" of the processor is identified.
  6. Loads the gdtr and idtr registers with the addresses of the GDT and IDT tables.
  7. It finally makes a jump to the start_kernel( ) function.

The start_kernel( ) function. What does it do?

The start_kernel( ) function completes the "initialization" of the Linux Kernel. All the essential Kernel components are initialized when this function executes. This is the last step of the entire "bootstrapping" process.

The following takes place when this function executes:

  1. The paging_init( ) function is executed that initializes the Page Tables.
  2. The mem_init( ) function is executed that initializes the Page Descriptors.
  3. The trap_init( ) and init_IRQ( ) functions are executed that initializes the IDT for the final time.
  4. The kmem_cache_init( ) and kmem_cache_sizes_init ( ) functions are executed that initializes the Slab Allocator.
  5. The time_init( ) function is executed that initializes the System Date & Time.
  6. The Kernel thread for process 1 is created by invoking the kernel_thread( ) function. This in turn creates the other kernel threads and executes /sbin/init program.

The "Linux version 2.4.2 ..." message is displayed right after the beginning of start_kernel( ). Many other messages are displayed also. At the very end, the very familiar login prompt appears on the console. This tells the user that the Linux Kernel is up and running, and just raring to go... and dominate the world!

Conclusion

This sums up our overview of the entire bootstrapping process of a Linux system. As the readers may rightly note I have not explained most of the other components and terms that I have used. A few include IDT, GDT, eip register, cs register and so on. The full explanation of all these terms would make it impossible to complete the article in just a few pages, and would make the entire topic rather boring to some people. In this article I provide everyone with just a glimpse of the processes and other various things that take place when a Linux system boots. In you want more in-depth coverage of all the associated functions like paging_init( ) and mem_init( ) go buy a book.

About the Author: My name is Subhasish Ghosh. I am 20, currently a computer-systems engineering student in India. I am a Microsoft Certified Professional (MCP), MCP certified on NT 4.0 and recently completed Red Hat Linux Certified Engineer (RHCE) Training. I have been working with Linux for a long time now, have programmed using C, C++, VC++, COM, MFC, DCOM, ATL 3.0, Perl, Python and Linux programming using GTK+. Currently I'm busy learning the Linux Kernel Architecture in detail, writing articles for Linux.com and most importantly practicing non-geek talk with my girlfriend, Hanna. E-mail: auspicious_blessingsindia@hotmail.com

PLEASE NOTE: I like hearing from and helping Linux users from all over the world. Anyone interested in e-mailing me should feel free to do so. However, I request computer-freaks, people who complain too much about everything and specifically egoistic Indians pretending to know everything trying to point out mistakes in my articles please do not e-mail me. I don't have much time for that. Thank you!