Originally Published: Monday, 26 February 2001 Author: Matt Michie
Published to: featured_articles/Featured Articles Page: 1/1 - [Std View]

Surfing Kernel Code

Matt Michie descends the darkness penetrating the kernel source and shows reading code isn't just for coders or gurus.

Even though everyone knows the Linux kernel is "free software", and that the source is open, most beginner and even mid-level Linux users usually don't take time to read the source. This article will give a couple tips and interesting high points to check out, as well as give you a small taste of the wealth of information embedded in the kernel, even if you aren't a C or ASM programmer.

Generally, the default location to install the kernel source is /usr/src/linux. Some distributions include the source as an optional package. Check your distribution documentation for exact details. One can always download the source from ftp.kernel.org.

For someone who's never looked through the source, 145 megabytes of compressed C and assembly sounds daunting. However, it is easy to see why Linus has maintained his benevolent dictatorship over the kernel. All the source is wonderfully organized into modules and directories, which makes things easy to find and understand.

The first directory everyone should be familiar with is Documentation. There is a wealth of simple text files with information on everything from how Linus wants code submitted, to writing device drivers for the Amiga's Zorro bus. If you are a coder or plan to learn to code someday, start with Linus' CodingStyle file. If you aren't a coder, read it anyway for an interesting glimpse into the mind of Linus Torvalds. It is probably possible write a pulp psychology book on analyzing programmers through their coding styles, and this is no different.

My two favorite quotes are when Linus says, "First off, I'd suggest printing out a copy of the GNU coding standards, and NOT read it. Burn them, it's a great symbolic gesture. [..] You've probably been told by your long-time Unix user helper that "GNU emacs" automatically formats the C sources for you, and you've noticed that yes, it does do that, but the defaults it uses are less than desirable (in fact, they are worse than random typing - a infinite number of monkeys typing into GNU emacs would never make a good program)."

Be sure to read the whole thing for more witty and insightful comments. Other highlights are pci.txt (how to write Linux PCI drivers), oops-tracing.txt (how to track down kernel bugs), and mtrr.txt (setting the Memory Type Range Registers on the Intel P6 family). See what other gems you can dig up. Amazing what reading a little bit of documentation can get you.

From here, let's dig into the source code. The first stop should be into linux/init. Here one can find main.c and version.c. Here is where the kernel starts to set itself up. The infamous BogoMIPS also gets calculated here:


void __init calibrate_delay(void)
{
unsigned long ticks, loopbit;
int lps_precision = LPS_PREC;

loops_per_jiffy = (1<<12);

printk("Calibrating delay loop... ");

[... Some more Code ... ]

/* Round the value and print it */
printk("%lu.%02lu BogoMIPS\n",
loops_per_jiffy/(500000/HZ),
(loops_per_jiffy/(5000/HZ)) % 100);
}

Though you may not understand all the code, the comments are quite readable and are sprinkled nicely throughout. For instance:


/*
* Ok, the machine is now initialized. None of the devices
* have been touched yet, but the CPU subsystem is up and
* running, and memory and process management works.
*
* Now we can finally start doing some real work..
*/

One of the more interesting things I never knew about the kernel was when it tries to call init (man init if you don't know what init does) it will attempt to gracefully fail like so:


execve("/sbin/init",argv_init,envp_init);
execve("/etc/init",argv_init,envp_init);
execve("/bin/init",argv_init,envp_init);
execve("/bin/sh",argv_init,envp_init);
panic("No init found. Try passing init= option to kernel.");

From this simple piece of code, it becomes obvious which locations the kernel expects init to be found. More interesting, is that it will attempt to drop you into the default shell if init can't be found and only if that doesn't work will it panic.

Also from this same C file, one can discover that Linus and crew expect you to compile with at least gcc 2.91. There are many interesting tidbits in here, surf through and see what you can uncover.

Moving on, it is always interesting to try random recursive greps through the source tree. An interesting discussion broke out on the Linux Kernel Mailing List not too long ago when someone tried to submit a patch to clean up all the 'dirty words' embedded in comments. Several notables pointed out that a 13 year old probably wasn't going to pick up cussing from reading the kernel source. Why not check it out yourself? Type grep -r 'dirtyword' /usr/src/linux. You'll probably get a chuckle at some of the comments. Sometimes the kernel even prints out some interesting error messages with a printk.

The best error message has to be from linux/drivers/char/lp.c:

printk(KERN_INFO "lp%d on fire\n", minor);

This would print out, for example, "lp0 on fire".

Another interesting directory to surf through is the kernel/ directory. Here is the heart of many Linux system calls. If you would like to see the source for printk, for instance, or perhaps fork.c or panic.c, this is the place to look.

One thing you may notice is that goto's are pretty common and accepted as necessary for system programming like the kernel. In fact, goto appears over 11,000 times in the 2.4.2 kernel source tree!

The last quick look we'll take through the source is through linux/drivers/char/random.c. This is the source code for generating strong random numbers, written by Theodore Ts'o in 1994 and last modified in 1999. This is the driver used to generate random numbers. If you rely on any encryption programs which use the kernel to generate their random numbers, it is in your best interest to read through this file.

The first 15% of the file is devoted to comments and explanations of the algorithms, how to set up the random seed, and even some shell scripts. This could easily be used in a textbook as an example of how to write comments. Beautiful!

This ends our brief sojourn into the kernel source. We've hardly scratched the surface of what is actually in the code, but hopefully I've demonstrated that reading kernel source is not just for kernel hackers and C gurus. All Linux users can benefit from the knowledge contained within.

Matt Michie exists in the New Mexican desert. Please visit his web site at http://daimyo.org.