Originally Published: Thursday, 23 August 2001 Author: Subhasish Ghosh
Published to: develop_articles/Development Articles Page: 3/5 - [Printable]

Understanding Linux Kernel Inter-process Communication: Pipes, FIFO & IPC (Part 1)

In this article, part one of a two part article, the prolific and talented Subhasish returns to give Linux.com readers another trip into understanding Linux kernel behavoir and programming. There's a lot of information covered here for free, so hang up your hat and have fun. Part 2 of Understanding Linux Kernel Inter-process Communication will be published tomorrow.

More on Pipes  << Page 3 of 5  >>

Readers should note, when a program creates a new process using the fork system call, file descriptors that were previously open remain open. By creating a pipe in the original process and then forking to create a new process, we can pass data from one process to the other down the pipe. This is how an ordinary pipe works. Let's now take a detailed look of the Pipe Data Structures in GNU/Linux. When we start thinking on the system call level, once a pipe has been created, a process uses the read() and write() VFS (Virtual FileSystem) system calls to access it. Therefore, for each pipe, the Linux kernel creates an inode object plus two file objects, one for reading and the other for writing. When a process wants to read or write to the pipe (NOT both together, in POSIX and Linux), it must use the proper file descriptor. When the inode object refers to the pipe, its u field consists of a pipe_inode_info data structure. The pipe_inode_info data structure has the following fields:

Type Field Description
char * base Address of Linux Kernel buffer
unsigned int start Read position in Linux Kernel buffer
unsigned int lock Locking flag utilized for exclusive access
struct wait_queue * wait Pipe/FIFO wait queue
unsigned int readers Flag for reading processes
unsigned int writers Flag for writing processes
unsigned int rd_openers Used while opening a FIFO for reading
unsigned int wr_openers Used while opening a FIFO for writing

Also each pipe has its own pipe buffer. A 'pipe buffer' may be defined as a single page frame containing the data written into the pipe, yet to be read. The address of this page frame is stored in the 'base' field of the pipe_inode_info data structure. Okay, a question that now comes up is: What about 'race conditions'? (For definition of this term and other associated terms, readers are requested to read the earlier articles in the series.) How does a pipe avoid race conditions on the pipe's data structures? To avoid race conditions on the pipe's data structures, the Linux kernel forbids concurrent accesses to the pipe buffer. This brings into play the 'lock' field in the pipe_inode_info data structure. Is that all? No, definitely NOT. The lock field in the pipe_inode_info data structure is not enough to handle complex situations. POSIX comes to the rescue (like a 'Hero' in a movie and saves the day!). Thus the POSIX standard allows the writing of processes to be suspended when the pipe is full, so that readers can empty the buffer. These requirements are met by utilizing the functionality of an additional i_atomic_write semaphore that can be found in the inode object. i_atomic_write semaphore suspends a write operation till the buffer is full. The process that issues a pipe() system call is initially the only process that can access the new pipe, both for reading and writing. To represent that the pipe has both a reader and a writer, the 'readers' and 'writers' fields of the pipe_inode_info data structure are initialized to 1. It is very vital that all readers (please note: I mean all the people reading this article) must note that the "readers" and "writers" fields in the pipe_inode_info data structure have a different functionality when applied to "pipes" and "FIFO". The readers and writers act as flags when applied to pipes, and as "counters", NOT "flags", when associated with FIFOs. Now that we have seen what a "pipe" is, what it does, how it operates including a sample program, let's look into pipes in more minute detail.

Creating and Destroying a Pipe: A pipe is implemented as a set of VFS objects. The point to note is: A pipe remains in the system as long as some process owns a file descriptor referring to it. When the low-level pipe() system call is used, the pipe() call is serviced by the sys_pipe() function. sys_pipe() function in turn invokes the do_pipe() function. In order to create a new pipe, the do_pipe() function performs the following operations:

  1. A file object and a file descriptor is allocated for the read channel of the pipe. It then sets the "flag" field of the file object to O_RDONLY, and then initializes the f_op field with the address of the read_pipe_fops table.
  2. Then it allocates a file object and a file descriptor for the write channel of the pipe. Then sets the "flag" field of the file object to O_WRONLY, and then finally initializes the f_op field with the address of the write_pipe_fops table.
  3. Once this done, it then invokes the get_pipe_inode() function, which allocates and initializes an inode object for the pipe. get_pipe_inode() also allocates a page frame for the pipe buffer and stores its address in the "base" field of the pipe_inode_info data structure (mentioned above).
  4. Then, it allocates a dentry object, uses it to link together the two file objects and the inode object.
  5. It finally returns the two file descriptors to the User Mode process.

So, everytime, one issues a pipe() system call, these above-mentioned five steps are carried out automatically, thereby creating a new pipe. Now, let's look at how a pipe can be destroyed. Whenever a process invokes the close() system call on a file descriptor associated with a pipe, the Linux kernel executes the fput() function on the corresponding file object, which decrements the usage counter. If the counter becomes zero, the function invokes the 'release' method of the file operations. Both the pipe_read_release() and pipe_write_release() functions are used to implement the 'release' method of the pipe's file objects. They set to 0 the 'readers' and 'writers' fields, respectively, of the pipe_inode_info data structure. Then, each function invokes the pipe_release() function. This function, when invoked, wakes up any process(s) sleeping in the pipe's wait queue so that they can recognize the change in pipe state. It then checks whether both the 'readers' and 'writers' fields of the pipe_inode_info data structure are equal to 0; if yes, in this case only, it releases the page frame containing the pipe buffer. So, this is the summary of all the various things that take place within the Linux Kernel everytime a pipe is created and later destroyed. Interesting enough, right? Let's now move on to the next interesting section, FIFOs.





More on Pipes  << Page 3 of 5  >>