[Home] [Credit Search] [Category Browser] [Staff Roll Call] | The LINUX.COM Article Archive |
Originally Published: Friday, 24 August 2001 | Author: Subhasish Ghosh |
Published to: develop_articles/Development Articles | Page: 1/1 - [Std View] |
Understanding Linux Kernel Inter-process Communication: Pipes, FIFO & IPC (Part 2)
The Linux kernel is a thing of great beauty and learning to understand and appreciate its facets and edges is a worthy and noble pursuit. Take our hand as Linux.com offers this second part of Subhasish Ghosh's look at Inter-Process Communication in the Linux Kernel. Together we will find the grok, sooner or later.
|
This article will cover:
- System V (AT&T System V.2 release of UNIX) IPC Resources: Semaphores, Message Queues & Shared Memory segments (implemented in terms of GNU/Linux).
- A few code examples to chew on (for the brave-hearted!).
Please Note:
- For explanation of words such as "kernel control paths", "semaphores", "race conditions" and related features, please refer to earlier articles in the series.
- All readers must note that though this article explores the depth of the Linux Kernel, but without the discussion of AT&T System V release of UNIX IPC features and facilities, no discussion would ever be complete. Thus, several System V UNIX features will be discussed too.
- I have had used Red Hat Linux 7.1, Linux Kernel 2.4.2-2 for compiling all the code included.
In earlier articles we have already encountered some exciting features of the Linux Kernel. This article explains how User Mode processes can synchronize themselves and exchange data. We have already covered a lot of synchronization topics, especially in "Linux Kernel Synchronization", but as readers must have noticed the main protagonist of the story there was a "Kernel Control Path" acting within the Linux Kernel and NOT User Mode programs. Thus, we are now ready to discuss synchronization of User Mode processes. These processes rely on the Linux Kernel to synchronize themselves and exchange data.
In this section, we are going to look at a set of inter-process communication facilities that were introduced in the AT&T System V.2 Release of UNIX. Since all these facilities appeared in the same release and have a similar programmatic interface, they are often referred to as System V IPC. As mentioned in part 1 of this article, IPC data structures are created dynamically when a process requests an IPC resource, that is either a semaphore, or a message queue or a shared memory segment. Each IPC resource is persistent; i.e. unless explicitly released by a process, it is always kept in memory. An IPC resource may be used by any process, including those that do not share the ancestor that created the resource.
Now the question that comes up is: A particular process may require several IPC resources of the same type, so how on earth is someone supposed to identify each one of these resources? The answer is simple: Each new resource is identified by a 32-bit IPC Key, which is similar to the file pathname in the system's directory tree. In addition to the IPC Key, each newly allocated IPC resource also has a 32-bit IPC Identifier, which is somewhat similar to the file descriptor associated with an open file. But one very important point to note is: IPC Identifiers are assigned to IPC resources by the Kernel and are unique within the system, but IPC Keys can be freely chosen by application programmers. But, what does this "IPC Identifier" do? When two or more processes wish to communicate through an IPC resource, they all refer to the IPC Identifier of the resource. OK, it gets a little tricky from here on in.
When I was studying the linux kernel architecture and other associated features from a number of books, professor's notes, library manuals, online magazines, HowTo's and other such official and/or unofficial sources, I always wanted to seek the answer to one simple question, which unfortunately no one could answer. Readers must have noted that in the paragraph just above this one, I did mention "...IPC Identifiers are assigned to IPC resources by the Kernel and are unique within the system...". My question was: How on earth is an IPC Identifier computed by the Linux Kernel and how come every time it produces one, it HAS to be unique? I did manage to find the answer to this question. The answer is: In order to minimize the risk of incorrectly referencing the wrong resource, the Linux Kernel does NOT recycle IPC identifiers as soon as they become free. Instead, the IPC identifier assigned to a resource is almost always larger than the identifier assigned to the previously allocated resource of the same type. Each IPC identifier is computed by combining a "slot usage sequence number" relative to the resource type, an arbitrary "slot index" for the allocated resource, and the value chosen in the Linux Kernel for the maximum number of allocatable resources. Choosing s to represent the "slot usage sequence number", M to represent the maximum number of allocatable resources, i to represent the arbitrary "slot index", where i is "either greater than or equal to zero" but "less than M", then each IPC resource's ID is computed by the formula:
IPC Identifier = ( s X M + i )
The "slot usage sequence number" s is initialized to 0 and is incremented by 1 at every resource deallocation. In two consecutive resource allocations, the slot index i can only increase; it can decrease only when a resource has been deallocated, but then the increased "slot usage sequence number" ensures that the new IPC identifier for the new allocated resource is larger than the previous one, thus, ensuring that each time an IPC Identifier is produced (allocated to a resource), it's a UNIQUE one. As simple as that! See, I told you, understanding Linux kernel features is so easy!
Before we go on to
look at different inter-process communication facilities in more
detail, there is one very important topic I think I should cover.
So, let's discuss the ipc_perm
data
structure; the structure associated with each and every 'IPC
Resource'. But all readers must realize, that beyond a certain
point, understanding the ipc_perm
data structure and it's functionality becomes very difficult and
complicated, and I really feel uncomfortable myself dealing with
it. The fields in the ipc_perm
data
structure are:
Type |
Field |
Description |
int |
key |
IPC key |
unsigned short |
uid |
Owner user ID |
unsigned short |
gid |
Owner group ID |
unsigned short |
cuid |
Creator user ID |
unsigned short |
cgid |
Creator group ID |
unsigned short |
mode |
Permission bit mask |
unsigned short |
seq |
Slot usage sequence number |
Thus, when represented in its natural form, the ipc_perm
data structure would look like
this:
#include <sys/types.h>
#include <sys/ipc.h>
key_t ftok(char *pathname, char proj);
IPC_PRIVATE
struct ipc_perm
{
key_t key;
ushort uid;
ushort gid;
ushort cuid;
ushort cgid;
ushort mode;
ushort seq;
};
IPC_CREAT
IPC_EXCL
Readers should note the importance of ftok(char
*pathname, char proj)
function. The ftok()
function attempts to create a new key from a file pathname and an
8-bit project identifier passed as parameters. It does not
guarantee, however, a unique key number, since there is a small
chance that it will return the same IPC key to two different
applications using different pathnames and project identifiers.
As mentioned earlier, each IPC resource is associated with an ipc_perm
data structure. The uid
, gid
, cuid
and cgid
fields store the user and group identifiers of the resource's
creator and the user and group identifiers of the current
resource's owner, respectively. The "mode
"
bit mask includes six flags, which store the read and write
access permissions for the resource's owner, the resource's group
and other users. The "key
"
field (of type int) contains the IPC Key (which we have had
earlier discussed) of the corresponding resource, and a "seq
" field which stores the slot
sequence number 's' used to compute the IPC Identifier of the
resource. Now, the question that readers might be asking is: Hey,
Subhasish, okay we understand the ipc_perm
data structure. But what on earth does it do?
There exists a set
of functions named semctl(), msgctl() and
shmctl()
functions, depending on whether the new resource
is a semaphore, a message queue or a shared memory segment; which
may be used tp handle IPC Resources. The IPC_SET subcommand
allows a process to change the owner's user and group identifiers
and the permission bit mask in the ipc_perm
data structure. The IPC_STAT and IPC_INFO subcommands retrieve
some information concerning a resource. Finally, the IPC_RMID
subcommand releases an IPC resource. Thus, different subcommands,
a few of which have been mentioned above, act on different fields
of the ipc_perm
data structure which
in turn directly affects how IPC resources act, which in turn
affects IPC facilities at its core. How, why and in what way
exactly these subcommands act and thus affect the Linux Kernel is
out of scope of this topic. Moreover it really gets very
complicated from here on, and discussing it would mean a lot of
problems for me (and you, the readers!). Trust me, I know it! Okay,
now that we have seen what is an IPC Resource and studied a few
things related to them, let's move on to the next section, where
we would discuss all the three IPC Resources one by one, namely,
semaphores, message queues and shared memory segments and see a
few code segments in action. So, catch your breath and move on...
In my last article entitled "Linux Kernel Synchronization", I did talk about "Semaphores". Right? But, readers must NOT confuse "POSIX Realtime Extension Kernel Thread Semaphores" with "System V Semaphores", more specifically referred to as IPC Semaphores. They are totally different entities, and MUST NOT be confused. Moreover, I have seen people trying their level best to translate the former interface functions into the later. This is not only dangerous (for the application using it), but personally I feel it's illegal. Thus, I use a simple distinction between them to avoid confusion: I like POSIX semaphores, I hate IPC Semaphores. As simple as that. Here in this section, we talk about IPC Semaphores. IPC semaphores are counters used to provide controlled access to shared data structures for multiple processes. The semaphore value is positive if the protected resource is available, and negative or 0 if the protected resource in not currently available. A process that wants to access the resource decrements by 1 the semaphore value. It is allowed to use the resource, if and only if the old value was positive. If not, the process waits until the semaphore becomes positive. When a process releases a protected resource, it increments its semaphore value by 1, in doing so, any other process waiting for the semaphore is woken up. This is how IPC Semaphores are designed to operate, thereby locking critical sections of code. Before we move on any further, let's look at a particular term: "primitive semaphores". What are they?
In Kernel semaphores (i.e. "POSIX Realtime
Extension Kernel Thread Semaphores"), a semaphore
is JUST a single value. But, in case of IPC
semaphores, each IPC semaphore is a set of one OR more semaphore
values. This means that the same IPC resource can protect several
independent shared data structures. A function named semget()
exists, and the number of
semaphore values in each IPC semaphore must be specified as a
parameter of semget()
function when
the resource is being allocated. But readers must note, that this
value musn't exceed SEMMSL
, whose
value is 32. The counters inside an IPC semaphore is called a
"primitive semaphore". Let's now consider a code
snippet that illustrates a System V semaphore in action.
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
int semget ( key_t key, int nsems, int semflg );
int semop ( int semid, struct sembuf *sops, unsigned nsops);
struct sembuf
{
short sem_num; /* semaphore number: 0 = first */
short sem_op; /* semaphore operation */
short sem_flg; /* operation flags */
};
int semctl (int semid, int semnum, int cmd, union semun arg);
union semun
{
int val; /* value for SETVAL */
struct semid_ds *buf; /* buffer for IPC_STAT, IPC_SET */
unsigned short int *array; /* array for GETALL, SETALL */
struct seminfo *__buf; /* buffer for IPC_INFO */
};
IPC_STAT
IPC_SET
IPC_RMID
GETALL
SETALL
SETVAL
struct semid_ds;
When semget()
is called and it
acquires the semaphore set's ID, one can perform operations on
this set using
.
The semop
struct sembuf
has fields that are
filled in to represent the requested action on the semaphore. The
first field specifies the number of the semaphore in the set you
wish to operate on. The second field (sem_op
)
contains the operation you wish to do. If sem_op
is positive, the value is added to the semaphore (ie release of
resources). If sem_op
is less than or
equal to zero, semop
(2) will block
until the semaphore reaches abs(sem_op
).
You may also specify various options in sem_flg
,
including IPC_NOWAIT, which tells semop
(2)
NOT to block. When SEM_UNDO is specified, all actions are undone
when the current process exits. Undo is guaranteed with private
semaphores only. Notice that unlike the other IPC operations, the
third argument to semctl is a union, not a simple struct. This is
because the various options must occurr on different data.
Various operations can be performed using
can be used to fill in the struct
semid_ds structure. You can then set various fields (including
changing ownership and premissions of semctl(2)
IPC_STAT
ipc_perm
)
using IPC_SET
. But the most important ctl operation
is IPC_RMID
. When all of your programs are done
using a semaphore, you MUST semctl it with IPC_RMID
(the shmid_ds
structure may be NULL),
or else the semaphore will remain in memory forever. You may also
set and get specific values of the semaphore using GETVAL,
SETVAL, GETALL, and SETALL.
Now that we have seen how a semaphore works, let's get down into the Linux Kernel, and see what exactly happens inside the Kernel. The typical steps performed by a process wishing to access one or more resources protected by an IPC semaphore are as follows:
semget()
wrapper function
is invoked, to get the IPC semaphore identifier. If the
process wants to create a new IPC semaphore, it also
specifies the IPC_CREATE or IPC_PRIVATE flag and the
number of "primitive semaphores" required.semop()
wrapper
function to test, and decrement all primitive semaphore
values involved. If all tests succeed, the decrements are
performed, the function terminates, and the process is
allowed to access the protected resources.semop()
function
again, this time, for atomically incrementing all
primitive semaphores involved.semctl()
wrapper functions, specifying in its parameter the
IPC_RMID flag to remove the IPC semaphore from the
system. This is how an IPC semaphore is created, acts, and then is deleted from a system. So, let's now move on to the next exciting section, IPC Messages.
Another IPC resource, which processes can use to communicate with each other is IPC Messages. Each message generated by a process is sent to an "IPC Message Queue", where it stays until another process reads it. So, the question now is: What is this so-called "IPC message" made up of? A message is made up of a fixed-size "header" and a variable-length "text". Thus, it can be labeled with an integer value (the message type) which allows a process to selectively retrieve messages from the message queue. Readers must note that: A message queue is implemented in reality by means of a linked list. Once a process has read a message from an IPC message queue, the Kernel destroys it. Thus, this proves the well-known fact: Only one process can receive a given message. So, let's now see what functions a process needs to invoke in order to send a message and retrieve one.
For sending a message, a process invokes the msgsnd()
function, passing as parameters:
For retrieving a message, a process invokes the msgrcv()
function, passing as parameters:
Now, the integer t can take any one of these three values: either t is null, t is (+ve) or t is (-ve). If the value of t is null, the first message in the queue is returned. If t is positive, the first message in the queue with its type equal to t is returned. Finally, if t is negative, the function returns the first message whose message type is the lowest value less than or equal to the absolute value of t. Till this, it's just fine. Beyond this, make sure you get ready for a real roller-coaster ride. Why? Read on.
Now, let's talk a little about the data structures associated
with IPC messages. There exists an "IPC Message Queue
Descriptor", whose address is very important since it is
used for many critical purposes. Understanding what this
"IPC Message Queue Descriptor" is, is where our
roller-coaster ride begins. Make sure you hold on tight! The
"IPC Message Queue Descriptor" is a msqid_ds
structure, whose fields are shown
below. The most significant fields are msg_first
and msg_last
, which point to the
first and to the last message in the linked list, respectively.
The rwait
field points to a
"wait queue". A wait queue includes all processes
currently waiting for some message in the queue. The wwait
field points to a "wait
queue" that includes all processes currently waiting for
some free space in the queue so they can add a new message. The
total size of the header and the text of all messages in the
queues cannot exceed the value stored in the msg_qbytes
field. The default maximum size is MSGMNB
,
which is 16,384 bytes.
Type |
Field |
Description |
---|---|---|
struct ipc_perm |
msg_perm |
ipc_perm data structure |
struct msg * |
msg_first |
First message in queue |
struct msg * |
msg_last |
Last message in queue |
long |
msg_stime |
Time of last msgsnd() |
long |
msg_rtime |
Time of last msgrcv() |
long |
msg_ctime |
Last change time |
struct wait_queue * |
wwait |
Processes waiting for free space |
struct wait_queue * |
rwait |
Processes waiting for messages |
unsigned short |
msg_cbytes |
Current number of bytes in queue |
unsigned short |
msg_qnum |
Number of messages in queue |
unsigned short |
msg_qbytes |
Maximum number of bytes in queue |
unsigned short |
msg_lspid |
PID of last msgsnd() |
unsigned short |
msg_lrpid |
PID of last msgrcv() |
Let's now look at a code snippet illustrating an IPC Message queue in action.
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
int msgget(key_t key, int msgflg);
int msgsnd(int msgid, struct msgbuf *msgp, int msgsz,int msgflg);
int msgrcv(int msgid, struct msgbuf *msgp, int msgsz,long msgtyp, int msgflg );
struct msgbuf
{
long mtype; /* message type, must be > 0 */
char mtext[msgsz]; /* message data */
};
MSGMAX
MSGMNB
int msgctl(int msqid, int cmd, struct msqid_ds *buf);
IPC_STAT
IPC_SET
IPC_RMID
struct msqid_ds;
Now, that we have seen what an IPC message is and in what way it relates to an IPC Message Queue, the most important data structure in action, along with a code snippet illustrating a message queue, its now time for all of us to move on to the next and last section of this article, IPC Shared Memory.
Amongst all the three IPC Resources, the most useful IPC
resource is the shared memory segment. Shared memory allows two
or more processes to access some common data structures by
placing them in a shared memory segment. Each process that wants
to access the data structures included in a shared memory segment
must add to its address space a new memory region, which maps the
page frames associated with the shared memory segment. Each such
frame can thus be easily handled by the Kernel through demand
paging. The shmget()
function is
invoked to get the IPC Identifier of a shared memory segment,
optionally creating it if it does not already exist. Then, shmat()
function is invoked to
"attach" a shared memory segment to a process, it then
receives as its parameter the identifier of the IPC shared memory
resource and tries to add a shared memory region to the address
space of the calling process. The calling process can require a
specific starting linear address for the memory region, but the
address is usually unimportant, and each process accessing the
shared memory segment can use a different address in its own
address space. The function shmat()
however, leaves the process's page tables unchanged. Another
function, shmdt()
is invoked to
"detach" a shared memory segment specified by its IPC
Identifier, that is, to remove the corresponding memory region
from the process's adddress space. All readers should note the
following very important points:
shmdt()
function DOES NOT delete
the shared memory segment! It just makes it unavailable
for the current process. When I started learning all this, I made this "mistakes" : I assumed using the shmdt()
function deletes the shared memory segment. So, I want to
warn all readers at this point. No, it does NOT ever
"delete" the shared memory segment. Invoking shmdt()
"detach"-es a
shared memory segment. That's it.An IPC shared memory segment descriptor, used primarily
identifying shared memory segments, is a shmid_kernel
data structure. So, let's see what this shmid_kernel
data structure has in store for us. The shmid_kernel
data structure may be represented as:
Type |
Field |
Description |
---|---|---|
struct ipc_perm |
u.shm_perm |
ipc_perm data structure |
int |
u.shm_segsz |
Size of shared memory region (in
bytes) |
long |
u.shm_atime |
Last attach time |
long |
u.shm_dtime |
Last detach time |
long |
u.shm_ctime |
Last change time |
unsigned short |
u.shm_cpid |
PID of creator |
unsigned short |
u.shm_lpid |
PID of last accessing process |
unsigned short |
u.shm_nattch |
Number of current attaches |
unsigned long |
shm_npages |
Size of shared memory region
(pages) |
unsigned long * |
shm_pages |
Pointer to array of page frame
PTEs |
struct vm_area_struct * |
attaches |
Pointer to VMA descriptor list |
Those fields which are accessible to User Mode processes are
included within a shmid_ds
data
structure named u
inside the
descriptor. Their contents are accessed using the shmctl()
function. The u.shm_segsz
and shm_npages
fields store the size
of the shared memory segment in bytes and in pages, respectively.
Although User Mode processes can require a shared memory segment
of any length, the length of the allocated segment is a multiple
of the page size, since the Kernel must map the shared memory
segment with a memory region. The shm_pages
field points to an array that contains one element for each page
of the segment. The attached field points to the first element of
a doubly linked list that includes the vm_area_struct
descriptors of all memory regions associated with the shared
memory segment. When mapping IPC shared memory segments, some
fields of vm_area_struct
descriptors
have a special meaning. Let's now look at a code snippet
illustrating shared memory segments in action.
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
int shmget(key_t key, int size, int shmflg);
void *shmat(int shmid, const void *shmaddr, int shmflg);
int shmdt(const void *shmaddr);
int shmctl(int shmid, int cmd, struct shmid_ds *buf);
IPC_STAT
IPC_SET
IPC_RMID
struct shmid_ds;
For creating a shared memory segment, we use the
system call.
It takes a key, a size, and a flag paramater and returns an
integer id for our segment. The flag can be combinations of
IPC_CREAT and IPC_EXCL, as well as the permissions bits on the
segment as described above. Once the segment is created and we
have its id, we must attach it, or map it into memory. This is
done with shmget
.
shmat
works
much the same as (in
BSD) in that we can specify an address to map to, but the address
must be a multiple of SHMLBA, unless we specify SHM_RND in the
flags paramater. Various operations can be performed using shmat
shmctl
. IPC_STAT
can be used to fill in the struct shmid_ds
structure. We can then set various fields (including changing
ownership and premissions of ipc_perm) using IPC_SET
.
When all of our programs are done using a segment, we MUST shmctl
it with IPC_RMID
(the
shmid_ds structure may be NULL), or else the segment will remain
in memory forever. This is how a shared memory segments operates.
Thus, this brings us to the end of this section, which in turn
brings us to the end of this article too.
Thus, in this article we have seen what we mean by terms like IPC, IPC Resources, System V IPC, and different IPC mechanisms and resources; and more importantly how Unix Systems (more specifically GNU/Linux systems) implement them in reality. As the famous English poet, John Keats once mentioned: "A thing of beauty is joy forever". He was right. Linux, which forms the core of the GNU (GNU's Not Unix) operating system (which unfortunately everyone nowadays refers to as simply the "Linux OS"), is definitely a thing of beauty. And for enjoying this beauty, one needs to be passionate about 'Open Source Technologies' and especially towards Linux. Linux is far more advanced than it's many commercial competitors in so many ways. Just to name one, it is possible to fit both the Linux Kernel image and full root filesystem, including all fundamental system programs, on just one 1.44 MB floppy disk! No commercial Unix variants, to date, as far as I know, are able to boot from a single floppy boot diskette. In my next article, I will deal with Linux Memory Management, in other words, how and in what ways, does Linux implement Memory and other related features. So, get a Linux distribution today, learn some C coding; and if you are lucky and passionate, and like doing everything in life with all your heart and soul, becoming a Linux expert is just a matter of time... Take care all...
About the Author: My name is Subhasish
Ghosh. I'm 20 years old, currently a computer-systems engineering
student in India; a Microsoft Certified Professional (MCP), MCSD,
MCP certified on NT 4.0, recently completed Red Hat Linux
Certified Engineer (RHCE) Training & cleared Brainbench.com
"Linux General Administration" certification exam. Have
been installing, configuring and developing on Linux patform for
a long time now, have had programmed using C, C++, VC++, VB, COM,
DCOM, MFC, ATL 3.0, PERL, Python, POSIX Threads and Linux Kernel
programming; currently holding a total of 8 International
Industry Certifications. For a list of all my articles at
Linux.com (and other sites), click here.
Latest News: My new friend from St.
Petersburg, Russia, Annette (I call her Ann!), unfortunately
refers to "Linux" as "Lunix"; my room
resembles an O'Reilly & Wrox warehouse, my mom has warned me,
either to clean my room by next week, or get out of the house
along with my Compaq PC, Linux CDs and books! Looks like I need a
new place real soon! E-mail: subhasish_ghosh@linuxmail.org