Linux.com Article DB: Understanding Linux Kernel Inter-process Communication: Pipes, FIFO & IPC (Part 2)

This is part two of Understanding Linux Kernel Inter-process Communication, the first part was published yesterday. You'll probably want to read the first part, well, first.

This article will cover:

System V (AT&T System V.2 release of UNIX) IPC Resources: Semaphores, Message Queues & Shared Memory segments (implemented in terms of GNU/Linux).

A few code examples to chew on (for the brave-hearted!).

Please Note:

For explanation of words such as "kernel control paths", "semaphores", "race conditions" and related features, please refer to earlier articles in the series.

All readers must note that though this article explores the depth of the Linux Kernel, but without the discussion of AT&T System V release of UNIX IPC features and facilities, no discussion would ever be complete. Thus, several System V UNIX features will be discussed too.

I have had used Red Hat Linux 7.1, Linux Kernel 2.4.2-2 for compiling all the code included.

In earlier articles we have already encountered some exciting features of the Linux Kernel. This article explains how User Mode processes can synchronize themselves and exchange data. We have already covered a lot of synchronization topics, especially in "Linux Kernel Synchronization", but as readers must have noticed the main protagonist of the story there was a "Kernel Control Path" acting within the Linux Kernel and NOT User Mode programs. Thus, we are now ready to discuss synchronization of User Mode processes. These processes rely on the Linux Kernel to synchronize themselves and exchange data.

System V IPC Facilities

In this section, we are going to look at a set of inter-process communication facilities that were introduced in the AT&T System V.2 Release of UNIX. Since all these facilities appeared in the same release and have a similar programmatic interface, they are often referred to as System V IPC. As mentioned in part 1 of this article, IPC data structures are created dynamically when a process requests an IPC resource, that is either a semaphore, or a message queue or a shared memory segment. Each IPC resource is persistent; i.e. unless explicitly released by a process, it is always kept in memory. An IPC resource may be used by any process, including those that do not share the ancestor that created the resource.

Now the question that comes up is: A particular process may require several IPC resources of the same type, so how on earth is someone supposed to identify each one of these resources? The answer is simple: Each new resource is identified by a 32-bit IPC Key, which is similar to the file pathname in the system's directory tree. In addition to the IPC Key, each newly allocated IPC resource also has a 32-bit IPC Identifier, which is somewhat similar to the file descriptor associated with an open file. But one very important point to note is: IPC Identifiers are assigned to IPC resources by the Kernel and are unique within the system, but IPC Keys can be freely chosen by application programmers. But, what does this "IPC Identifier" do? When two or more processes wish to communicate through an IPC resource, they all refer to the IPC Identifier of the resource. OK, it gets a little tricky from here on in.

When I was studying the linux kernel architecture and other associated features from a number of books, professor's notes, library manuals, online magazines, HowTo's and other such official and/or unofficial sources, I always wanted to seek the answer to one simple question, which unfortunately no one could answer. Readers must have noted that in the paragraph just above this one, I did mention "...IPC Identifiers are assigned to IPC resources by the Kernel and are unique within the system...". My question was: How on earth is an IPC Identifier computed by the Linux Kernel and how come every time it produces one, it HAS to be unique? I did manage to find the answer to this question. The answer is: In order to minimize the risk of incorrectly referencing the wrong resource, the Linux Kernel does NOT recycle IPC identifiers as soon as they become free. Instead, the IPC identifier assigned to a resource is almost always larger than the identifier assigned to the previously allocated resource of the same type. Each IPC identifier is computed by combining a "slot usage sequence number" relative to the resource type, an arbitrary "slot index" for the allocated resource, and the value chosen in the Linux Kernel for the maximum number of allocatable resources. Choosing s to represent the "slot usage sequence number", M to represent the maximum number of allocatable resources, i to represent the arbitrary "slot index", where i is "either greater than or equal to zero" but "less than M", then each IPC resource's ID is computed by the formula:

IPC Identifier = ( s X M + i )

The "slot usage sequence number" s is initialized to 0 and is incremented by 1 at every resource deallocation. In two consecutive resource allocations, the slot index i can only increase; it can decrease only when a resource has been deallocated, but then the increased "slot usage sequence number" ensures that the new IPC identifier for the new allocated resource is larger than the previous one, thus, ensuring that each time an IPC Identifier is produced (allocated to a resource), it's a UNIQUE one. As simple as that! See, I told you, understanding Linux kernel features is so easy!

Before we go on to look at different inter-process communication facilities in more detail, there is one very important topic I think I should cover. So, let's discuss the ipc_perm data structure; the structure associated with each and every 'IPC Resource'. But all readers must realize, that beyond a certain point, understanding the ipc_perm data structure and it's functionality becomes very difficult and complicated, and I really feel uncomfortable myself dealing with it. The fields in the ipc_perm data structure are:

`Type`	`Field`	`Description`
`int`	`key`	`IPC key`
`unsigned short`	`uid`	`Owner user ID`
`unsigned short`	`gid`	`Owner group ID`
`unsigned short`	`cuid`	`Creator user ID`
`unsigned short`	`cgid`	`Creator group ID`
`unsigned short`	`mode`	`Permission bit mask`
`unsigned short`	`seq`	`Slot usage sequence number`

Thus, when represented in its natural form, the ipc_perm data structure would look like this:

#include <sys/types.h> #include <sys/ipc.h>

key_t ftok(char *pathname, char proj);

IPC_PRIVATE

struct ipc_perm {

key_t key; ushort uid; ushort gid; ushort cuid; ushort cgid; ushort mode; ushort seq;

};

IPC_CREAT IPC_EXCL

Readers should note the importance of ftok(char *pathname, char proj) function. The ftok() function attempts to create a new key from a file pathname and an 8-bit project identifier passed as parameters. It does not guarantee, however, a unique key number, since there is a small chance that it will return the same IPC key to two different applications using different pathnames and project identifiers. As mentioned earlier, each IPC resource is associated with an ipc_perm data structure. The uid, gid, cuid and cgid fields store the user and group identifiers of the resource's creator and the user and group identifiers of the current resource's owner, respectively. The "mode" bit mask includes six flags, which store the read and write access permissions for the resource's owner, the resource's group and other users. The "key" field (of type int) contains the IPC Key (which we have had earlier discussed) of the corresponding resource, and a "seq" field which stores the slot sequence number 's' used to compute the IPC Identifier of the resource. Now, the question that readers might be asking is: Hey, Subhasish, okay we understand the ipc_perm data structure. But what on earth does it do?

There exists a set of functions named semctl(), msgctl() and shmctl() functions, depending on whether the new resource is a semaphore, a message queue or a shared memory segment; which may be used tp handle IPC Resources. The IPC_SET subcommand allows a process to change the owner's user and group identifiers and the permission bit mask in the ipc_perm data structure. The IPC_STAT and IPC_INFO subcommands retrieve some information concerning a resource. Finally, the IPC_RMID subcommand releases an IPC resource. Thus, different subcommands, a few of which have been mentioned above, act on different fields of the ipc_perm data structure which in turn directly affects how IPC resources act, which in turn affects IPC facilities at its core. How, why and in what way exactly these subcommands act and thus affect the Linux Kernel is out of scope of this topic. Moreover it really gets very complicated from here on, and discussing it would mean a lot of problems for me (and you, the readers!). Trust me, I know it! Okay, now that we have seen what is an IPC Resource and studied a few things related to them, let's move on to the next section, where we would discuss all the three IPC Resources one by one, namely, semaphores, message queues and shared memory segments and see a few code segments in action. So, catch your breath and move on...

IPC Semaphores

In my last article entitled "Linux Kernel Synchronization", I did talk about "Semaphores". Right? But, readers must NOT confuse "POSIX Realtime Extension Kernel Thread Semaphores" with "System V Semaphores", more specifically referred to as IPC Semaphores. They are totally different entities, and MUST NOT be confused. Moreover, I have seen people trying their level best to translate the former interface functions into the later. This is not only dangerous (for the application using it), but personally I feel it's illegal. Thus, I use a simple distinction between them to avoid confusion: I like POSIX semaphores, I hate IPC Semaphores. As simple as that. Here in this section, we talk about IPC Semaphores. IPC semaphores are counters used to provide controlled access to shared data structures for multiple processes. The semaphore value is positive if the protected resource is available, and negative or 0 if the protected resource in not currently available. A process that wants to access the resource decrements by 1 the semaphore value. It is allowed to use the resource, if and only if the old value was positive. If not, the process waits until the semaphore becomes positive. When a process releases a protected resource, it increments its semaphore value by 1, in doing so, any other process waiting for the semaphore is woken up. This is how IPC Semaphores are designed to operate, thereby locking critical sections of code. Before we move on any further, let's look at a particular term: "primitive semaphores". What are they?

In Kernel semaphores (i.e. "POSIX Realtime Extension Kernel Thread Semaphores"), a semaphore is JUST a single value. But, in case of IPC semaphores, each IPC semaphore is a set of one OR more semaphore values. This means that the same IPC resource can protect several independent shared data structures. A function named semget() exists, and the number of semaphore values in each IPC semaphore must be specified as a parameter of semget() function when the resource is being allocated. But readers must note, that this value musn't exceed SEMMSL, whose value is 32. The counters inside an IPC semaphore is called a "primitive semaphore". Let's now consider a code snippet that illustrates a System V semaphore in action.

#include <sys/types.h> #include <sys/ipc.h> #include <sys/sem.h>

int semget ( key_t key, int nsems, int semflg ); int semop ( int semid, struct sembuf *sops, unsigned nsops);

struct sembuf {

short sem_num; /* semaphore number: 0 = first */ short sem_op; /* semaphore operation */ short sem_flg; /* operation flags */

};

int semctl (int semid, int semnum, int cmd, union semun arg);

union semun {

int val; /* value for SETVAL */ struct semid_ds *buf; /* buffer for IPC_STAT, IPC_SET */ unsigned short int *array; /* array for GETALL, SETALL */ struct seminfo *__buf; /* buffer for IPC_INFO */

};

IPC_STAT IPC_SET IPC_RMID GETALL SETALL SETVAL struct semid_ds;

When semget() is called and it acquires the semaphore set's ID, one can perform operations on this set using semop. The struct sembuf has fields that are filled in to represent the requested action on the semaphore. The first field specifies the number of the semaphore in the set you wish to operate on. The second field (sem_op) contains the operation you wish to do. If sem_op is positive, the value is added to the semaphore (ie release of resources). If sem_op is less than or equal to zero, semop(2) will block until the semaphore reaches abs(sem_op). You may also specify various options in sem_flg, including IPC_NOWAIT, which tells semop(2) NOT to block. When SEM_UNDO is specified, all actions are undone when the current process exits. Undo is guaranteed with private semaphores only. Notice that unlike the other IPC operations, the third argument to semctl is a union, not a simple struct. This is because the various options must occurr on different data. Various operations can be performed using semctl(2) IPC_STAT can be used to fill in the struct semid_ds structure. You can then set various fields (including changing ownership and premissions of ipc_perm) using IPC_SET. But the most important ctl operation is IPC_RMID. When all of your programs are done using a semaphore, you MUST semctl it with IPC_RMID (the shmid_ds structure may be NULL), or else the semaphore will remain in memory forever. You may also set and get specific values of the semaphore using GETVAL, SETVAL, GETALL, and SETALL.

Now that we have seen how a semaphore works, let's get down into the Linux Kernel, and see what exactly happens inside the Kernel. The typical steps performed by a process wishing to access one or more resources protected by an IPC semaphore are as follows:

The semget() wrapper function is invoked, to get the IPC semaphore identifier. If the process wants to create a new IPC semaphore, it also specifies the IPC_CREATE or IPC_PRIVATE flag and the number of "primitive semaphores" required.
Invokes the semop() wrapper function to test, and decrement all primitive semaphore values involved. If all tests succeed, the decrements are performed, the function terminates, and the process is allowed to access the protected resources.
When relinquishing the protected resources, it then invokes the semop() function again, this time, for atomically incrementing all primitive semaphores involved.
Optionally, it then invokes the semctl() wrapper functions, specifying in its parameter the IPC_RMID flag to remove the IPC semaphore from the system.

This is how an IPC semaphore is created, acts, and then is deleted from a system. So, let's now move on to the next exciting section, IPC Messages.

IPC Messages

Another IPC resource, which processes can use to communicate with each other is IPC Messages. Each message generated by a process is sent to an "IPC Message Queue", where it stays until another process reads it. So, the question now is: What is this so-called "IPC message" made up of? A message is made up of a fixed-size "header" and a variable-length "text". Thus, it can be labeled with an integer value (the message type) which allows a process to selectively retrieve messages from the message queue. Readers must note that: A message queue is implemented in reality by means of a linked list. Once a process has read a message from an IPC message queue, the Kernel destroys it. Thus, this proves the well-known fact: Only one process can receive a given message. So, let's now see what functions a process needs to invoke in order to send a message and retrieve one.

For sending a message, a process invokes the msgsnd() function, passing as parameters:

The 'IPC Identifier' of the destination message queue.
The message 'text' size.
The address of the User Mode buffer that contains the message type immediately followed by the message text.

For retrieving a message, a process invokes the msgrcv() function, passing as parameters:

The 'IPC Identifier' of the destination message queue resource.
The pointer to a User Mode buffer to which the message type and message text ought to be copied.
The User Mode buffer size.
What message should be retrieved, denoted by a value t.

Now, the integer t can take any one of these three values: either t is null, t is (+ve) or t is (-ve). If the value of t is null, the first message in the queue is returned. If t is positive, the first message in the queue with its type equal to t is returned. Finally, if t is negative, the function returns the first message whose message type is the lowest value less than or equal to the absolute value of t. Till this, it's just fine. Beyond this, make sure you get ready for a real roller-coaster ride. Why? Read on.

Now, let's talk a little about the data structures associated with IPC messages. There exists an "IPC Message Queue Descriptor", whose address is very important since it is used for many critical purposes. Understanding what this "IPC Message Queue Descriptor" is, is where our roller-coaster ride begins. Make sure you hold on tight! The "IPC Message Queue Descriptor" is a msqid_ds structure, whose fields are shown below. The most significant fields are msg_first and msg_last, which point to the first and to the last message in the linked list, respectively. The rwait field points to a "wait queue". A wait queue includes all processes currently waiting for some message in the queue. The wwait field points to a "wait queue" that includes all processes currently waiting for some free space in the queue so they can add a new message. The total size of the header and the text of all messages in the queues cannot exceed the value stored in the msg_qbytes field. The default maximum size is MSGMNB, which is 16,384 bytes.

`Type`	`Field`	`Description`
`struct ipc_perm`	`msg_perm`	`ipc_perm data structure`
`struct msg *`	`msg_first`	`First message in queue`
`struct msg *`	`msg_last`	`Last message in queue`
`long`	`msg_stime`	`Time of last msgsnd()`
`long`	`msg_rtime`	`Time of last msgrcv()`
`long`	`msg_ctime`	`Last change time`
`struct wait_queue *`	`wwait`	`Processes waiting for free space`
`struct wait_queue *`	`rwait`	`Processes waiting for messages`
`unsigned short`	`msg_cbytes`	`Current number of bytes in queue`
`unsigned short`	`msg_qnum`	`Number of messages in queue`
`unsigned short`	`msg_qbytes`	`Maximum number of bytes in queue`
`unsigned short`	`msg_lspid`	`PID of last msgsnd()`
`unsigned short`	`msg_lrpid`	`PID of last msgrcv()`

Let's now look at a code snippet illustrating an IPC Message queue in action.

#include <sys/types.h> #include <sys/ipc.h> #include <sys/msg.h> int msgget(key_t key, int msgflg); int msgsnd(int msgid, struct msgbuf *msgp, int msgsz,int msgflg); int msgrcv(int msgid, struct msgbuf *msgp, int msgsz,long msgtyp, int msgflg ); struct msgbuf {

long mtype; /* message type, must be > 0 */ char mtext[msgsz]; /* message data */

};

MSGMAX MSGMNB int msgctl(int msqid, int cmd, struct msqid_ds *buf); IPC_STAT IPC_SET IPC_RMID struct msqid_ds;

Now, that we have seen what an IPC message is and in what way it relates to an IPC Message Queue, the most important data structure in action, along with a code snippet illustrating a message queue, its now time for all of us to move on to the next and last section of this article, IPC Shared Memory.

IPC Shared Memory

Amongst all the three IPC Resources, the most useful IPC resource is the shared memory segment. Shared memory allows two or more processes to access some common data structures by placing them in a shared memory segment. Each process that wants to access the data structures included in a shared memory segment must add to its address space a new memory region, which maps the page frames associated with the shared memory segment. Each such frame can thus be easily handled by the Kernel through demand paging. The shmget() function is invoked to get the IPC Identifier of a shared memory segment, optionally creating it if it does not already exist. Then, shmat() function is invoked to "attach" a shared memory segment to a process, it then receives as its parameter the identifier of the IPC shared memory resource and tries to add a shared memory region to the address space of the calling process. The calling process can require a specific starting linear address for the memory region, but the address is usually unimportant, and each process accessing the shared memory segment can use a different address in its own address space. The function shmat() however, leaves the process's page tables unchanged. Another function, shmdt() is invoked to "detach" a shared memory segment specified by its IPC Identifier, that is, to remove the corresponding memory region from the process's adddress space. All readers should note the following very important points:

Detaching a shared memory segment using shmdt() function DOES NOT delete the shared memory segment! It just makes it unavailable for the current process. When I started learning all this, I made this "mistakes" : I assumed using the shmdt() function deletes the shared memory segment. So, I want to warn all readers at this point. No, it does NOT ever "delete" the shared memory segment. Invoking shmdt() "detach"-es a shared memory segment. That's it.
By itself, shared memory does NOT provide any sort of synchronization facilities. In other words, there are no automatic facilities to prevent a second process starting to read the shared memory before the first process has finished writing to it. It's the sole responsibility of the programmer to synchronize access to a shared memory segment.

An IPC shared memory segment descriptor, used primarily identifying shared memory segments, is a shmid_kernel data structure. So, let's see what this shmid_kernel data structure has in store for us. The shmid_kernel data structure may be represented as:

`Type`	`Field`	`Description`
`struct ipc_perm`	`u.shm_perm`	`ipc_perm data structure`
`int`	`u.shm_segsz`	`Size of shared memory region (in bytes)`
`long`	`u.shm_atime`	`Last attach time`
`long`	`u.shm_dtime`	`Last detach time`
`long`	`u.shm_ctime`	`Last change time`
`unsigned short`	`u.shm_cpid`	`PID of creator`
`unsigned short`	`u.shm_lpid`	`PID of last accessing process`
`unsigned short`	`u.shm_nattch`	`Number of current attaches`
`unsigned long`	`shm_npages`	`Size of shared memory region (pages)`
`unsigned long *`	`shm_pages`	`Pointer to array of page frame PTEs`
`struct vm_area_struct *`	`attaches`	`Pointer to VMA descriptor list`

Those fields which are accessible to User Mode processes are included within a shmid_ds data structure named u inside the descriptor. Their contents are accessed using the shmctl() function. The u.shm_segsz and shm_npages fields store the size of the shared memory segment in bytes and in pages, respectively. Although User Mode processes can require a shared memory segment of any length, the length of the allocated segment is a multiple of the page size, since the Kernel must map the shared memory segment with a memory region. The shm_pages field points to an array that contains one element for each page of the segment. The attached field points to the first element of a doubly linked list that includes the vm_area_struct descriptors of all memory regions associated with the shared memory segment. When mapping IPC shared memory segments, some fields of vm_area_struct descriptors have a special meaning. Let's now look at a code snippet illustrating shared memory segments in action.

#include <sys/types.h> #include <sys/ipc.h> #include <sys/shm.h>

int shmget(key_t key, int size, int shmflg); void *shmat(int shmid, const void *shmaddr, int shmflg); int shmdt(const void *shmaddr); int shmctl(int shmid, int cmd, struct shmid_ds *buf);

IPC_STAT IPC_SET IPC_RMID

struct shmid_ds;

For creating a shared memory segment, we use the shmget system call. It takes a key, a size, and a flag paramater and returns an integer id for our segment. The flag can be combinations of IPC_CREAT and IPC_EXCL, as well as the permissions bits on the segment as described above. Once the segment is created and we have its id, we must attach it, or map it into memory. This is done with shmat. shmat works much the same as (in BSD) in that we can specify an address to map to, but the address must be a multiple of SHMLBA, unless we specify SHM_RND in the flags paramater. Various operations can be performed using shmctl. IPC_STAT can be used to fill in the struct shmid_ds structure. We can then set various fields (including changing ownership and premissions of ipc_perm) using IPC_SET. When all of our programs are done using a segment, we MUST shmctl it with IPC_RMID (the shmid_ds structure may be NULL), or else the segment will remain in memory forever. This is how a shared memory segments operates. Thus, this brings us to the end of this section, which in turn brings us to the end of this article too.

Thus, in this article we have seen what we mean by terms like IPC, IPC Resources, System V IPC, and different IPC mechanisms and resources; and more importantly how Unix Systems (more specifically GNU/Linux systems) implement them in reality. As the famous English poet, John Keats once mentioned: "A thing of beauty is joy forever". He was right. Linux, which forms the core of the GNU (GNU's Not Unix) operating system (which unfortunately everyone nowadays refers to as simply the "Linux OS"), is definitely a thing of beauty. And for enjoying this beauty, one needs to be passionate about 'Open Source Technologies' and especially towards Linux. Linux is far more advanced than it's many commercial competitors in so many ways. Just to name one, it is possible to fit both the Linux Kernel image and full root filesystem, including all fundamental system programs, on just one 1.44 MB floppy disk! No commercial Unix variants, to date, as far as I know, are able to boot from a single floppy boot diskette. In my next article, I will deal with Linux Memory Management, in other words, how and in what ways, does Linux implement Memory and other related features. So, get a Linux distribution today, learn some C coding; and if you are lucky and passionate, and like doing everything in life with all your heart and soul, becoming a Linux expert is just a matter of time... Take care all...

About the Author: My name is Subhasish Ghosh. I'm 20 years old, currently a computer-systems engineering student in India; a Microsoft Certified Professional (MCP), MCSD, MCP certified on NT 4.0, recently completed Red Hat Linux Certified Engineer (RHCE) Training & cleared Brainbench.com "Linux General Administration" certification exam. Have been installing, configuring and developing on Linux patform for a long time now, have had programmed using C, C++, VC++, VB, COM, DCOM, MFC, ATL 3.0, PERL, Python, POSIX Threads and Linux Kernel programming; currently holding a total of 8 International Industry Certifications. For a list of all my articles at Linux.com (and other sites), click here.
Latest News: My new friend from St. Petersburg, Russia, Annette (I call her Ann!), unfortunately refers to "Linux" as "Lunix"; my room resembles an O'Reilly & Wrox warehouse, my mom has warned me, either to clean my room by next week, or get out of the house along with my Compaq PC, Linux CDs and books! Looks like I need a new place real soon! E-mail: subhasish_ghosh@linuxmail.org

Originally published on Linux.com. Released under the Open Content License unless otherwise stated. Notify Gareth Watts of any errors or copyright violations.

Originally Published: Friday, 24 August 2001	Author: Subhasish Ghosh
Published to: develop_articles/Development Articles	Page: 1/1 - [Std View]
Understanding Linux Kernel Inter-process Communication: Pipes, FIFO & IPC (Part 2) The Linux kernel is a thing of great beauty and learning to understand and appreciate its facets and edges is a worthy and noble pursuit. Take our hand as Linux.com offers this second part of Subhasish Ghosh's look at Inter-Process Communication in the Linux Kernel. Together we will find the grok, sooner or later.