Concurrent Server Design fork(), SIGCHLD Handler & Zombie Process Prevention

 

Concurrent Server Design
fork(), SIGCHLD Handler & Zombie Process Prevention
Chapter 46 · TLPI · EmbeddedPathashala
Part
2 of 4
Topic
Concurrency
Key Calls
fork, waitpid
Signal
SIGCHLD

Why This Topic Matters

The server must handle many clients at the same time. If it served one client fully before accepting the next, a client requesting a large file would block all others — unacceptable in practice. The solution is a concurrent server that spawns a child process for each incoming request.

But forking introduces a new problem: zombie processes. This part explains what they are, why they happen, and how the server uses a SIGCHLD signal handler with waitpid() to clean them up automatically.

Key Terms

Concurrent Server Iterative Server fork() SIGCHLD waitpid() WNOHANG Zombie Process SA_RESTART EINTR sigaction() savedErrno

Iterative vs Concurrent Server

There are two classic server designs:

Design How It Works Problem
Iterative Server handles one request fully, then moves to the next. A slow request (large file) blocks all other clients waiting in the queue.
Concurrent Server forks a child for each request. Parent immediately loops back to accept the next request. Zombie processes accumulate if children are not reaped. (Solved with SIGCHLD.)
The file-server in TLPI Chapter 46 uses the concurrent design specifically to avoid one large-file request blocking all other clients.

How the Server Uses fork() Per Request

The flow inside the server main loop:

Server Main Process — msgrcv() blocks here waiting
Client request arrives in server queue
fork() — creates child process

PARENT
Loops back to msgrcv()
Waits for next client
CHILD
Calls serveRequest()
Opens file, sends data
Calls _exit()
SIGCHLD fires when child exits → grimReaper() calls waitpid() → zombie removed

The parent uses fork() to create a child, then immediately loops back to msgrcv(). The child independently calls serveRequest() and then calls _exit() (not exit()) to avoid flushing stdio buffers that belong to the parent.

What Is a Zombie Process?

When a child process exits, it does not immediately disappear from the kernel’s process table. The kernel keeps a small record of it — its exit status — until the parent calls wait() or waitpid() to collect it. This dead-but-not-yet-collected process is called a zombie.

State Description
Child running Normal process, consuming CPU and memory
Child exited, parent not waited ZOMBIE — no resources, but occupies a PID slot
Parent calls waitpid() Zombie reaped — PID slot freed, entry removed
Why zombies are dangerous: Each zombie holds a process ID (PID). Linux has a finite number of PIDs. A server handling thousands of requests without reaping zombies will eventually exhaust all available PIDs — causing fork() to fail with EAGAIN. The server crashes.

Preventing Zombies with SIGCHLD and waitpid()

When a child process terminates, the kernel sends SIGCHLD to the parent. The server installs a signal handler called grimReaper() for SIGCHLD. This handler calls waitpid() to reap any finished children.

The grimReaper Handler

static void
grimReaper(int sig)
{
    int savedErrno;

    savedErrno = errno; /* Save errno — waitpid() may change it */

    /* Loop until all finished children are reaped.
       WNOHANG: do not block if no child has exited yet.
       Returns 0 when no more zombies, -1 on error. */
    while (waitpid(-1, NULL, WNOHANG) > 0)
        continue;

    errno = savedErrno; /* Restore errno */
}

Three critical points in this handler:

Detail Why It Matters
waitpid(-1, NULL, WNOHANG) -1 means “any child”. WNOHANG means return immediately if no child has exited — prevents the handler from blocking.
Loop instead of single call Multiple children could exit simultaneously. A single waitpid() only reaps one. The loop keeps going until no more zombies remain.
Save and restore errno Signal handlers can interrupt system calls mid-execution. waitpid() itself modifies errno. If we don’t save/restore it, the interrupted code might see a corrupted error number.

SA_RESTART and EINTR — Handling Interrupted System Calls

The server main loop calls msgrcv(), which blocks. When SIGCHLD fires (because a child died), the signal handler runs and then control returns to msgrcv(). What happens then depends on the SA_RESTART flag:

Flag Behavior after signal handler returns
SA_RESTART set Many slow system calls restart automatically — msgrcv() resumes waiting.
No SA_RESTART The system call returns -1 with errno == EINTR.

The server sets SA_RESTART when installing the SIGCHLD handler. However, not all system calls honor SA_RESTART. The server therefore also wraps msgrcv() in a loop:

/* Install SIGCHLD handler with SA_RESTART */
struct sigaction sa;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;     /* restart slow syscalls after handler */
sa.sa_handler = grimReaper;
sigaction(SIGCHLD, &sa, NULL);

/* Main server loop with EINTR restart logic */
for (;;) {
    msgLen = msgrcv(serverId, &req, REQ_MSG_SIZE, 0, 0);
    if (msgLen == -1) {
        if (errno == EINTR)   /* interrupted by SIGCHLD handler */
            continue;          /* restart msgrcv() manually */
        errMsg("msgrcv");
        break;
    }
    /* fork child to handle request ... */
}
Belt-and-suspenders approach: The server uses both SA_RESTART and an explicit EINTR check. This is the safest pattern — SA_RESTART handles most cases, while the EINTR loop handles the edge cases where restart does not happen automatically.

Why the Child Uses _exit() Instead of exit()

After fork(), both parent and child share the same stdio buffers (because they are copied on fork). If the child calls exit():

  • exit() flushes all stdio buffers.
  • This could double-flush any partial output the parent has buffered.
  • The result would be corrupted or duplicated output on stdout/stderr.

_exit() terminates immediately without flushing stdio buffers, so the parent’s buffers are left intact.

if (pid == 0) {              /* We are the child */
    serveRequest(&req);      /* Do the work         */
    _exit(EXIT_SUCCESS);     /* Do NOT flush stdio! */
}
/* Parent falls through and loops back to msgrcv() */

Code Example — Full Server main() Skeleton

This skeleton shows all the concurrency and signal handling pieces together:

#include <sys/msg.h>
#include <signal.h>
#include <errno.h>
#include <sys/wait.h>
#include "svmsg_file.h"

static void
grimReaper(int sig)
{
    int savedErrno = errno;
    while (waitpid(-1, NULL, WNOHANG) > 0)
        continue;
    errno = savedErrno;
}

int
main(void)
{
    int serverId;
    struct requestMsg req;
    ssize_t msgLen;
    pid_t pid;
    struct sigaction sa;

    /* 1. Create well-known server queue */
    serverId = msgget(SERVER_KEY,
                  IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR | S_IWGRP);

    /* 2. Install SIGCHLD handler */
    sigemptyset(&sa.sa_mask);
    sa.sa_flags   = SA_RESTART;
    sa.sa_handler = grimReaper;
    sigaction(SIGCHLD, &sa, NULL);

    /* 3. Main request loop */
    for (;;) {
        msgLen = msgrcv(serverId, &req, REQ_MSG_SIZE, 0, 0);
        if (msgLen == -1) {
            if (errno == EINTR) continue; /* SIGCHLD interrupted us */
            break;
        }

        pid = fork();
        if (pid == -1) break;

        if (pid == 0) {            /* Child */
            serveRequest(&req);
            _exit(EXIT_SUCCESS);
        }
        /* Parent: fall through, loop to get next request */
    }

    /* 4. Remove server queue on exit */
    msgctl(serverId, IPC_RMID, NULL);
    return 0;
}

What the Child Inherits from the Parent via fork()

This is a very common interview topic. After fork(), the child gets:

Inherited Description
Stack contents A copy of the parent’s stack — including the req variable holding the request message
Open file descriptors Copies of all open fds (they share underlying kernel file descriptions)
Signal dispositions Same handlers as parent
Message queue IDs SysV IPC identifiers are kernel-level — child can use same IDs
NOT inherited Parent’s pending signals, timers, and PID
In the file server, the child inherits the req structure that was read by the parent’s msgrcv(). This is how the child knows which file the client wants and where to send the reply — without any extra communication between parent and child.

Summary of This Part
Iterative vs Concurrent server fork() per request model Zombie process definition grimReaper SIGCHLD handler waitpid(-1, NULL, WNOHANG) SA_RESTART + EINTR handling _exit() vs exit() after fork Child inheritance from parent

Interview Questions — Concurrent Server & Signals
  1. What is the difference between an iterative and a concurrent server? When would you choose each?
  2. What is a zombie process? How does it differ from an orphan process?
  3. Why does the SIGCHLD handler use a while loop around waitpid() instead of calling it once?
  4. What is the purpose of WNOHANG in waitpid(-1, NULL, WNOHANG)?
  5. Why does the grimReaper handler save and restore errno?
  6. What does SA_RESTART do and why is it still not enough — why does the server also check for EINTR?
  7. Why does the child call _exit() instead of exit()?
  8. What does the child process inherit from the parent after fork()?
  9. How does the child process know which client to respond to without extra IPC with the parent?
  10. What would happen if the server did NOT install a SIGCHLD handler and how would you detect the resulting problem?
  11. Can two child processes from the same server both try to send messages to the same client queue simultaneously? What would happen?

Continue Learning

Next: Server-Side Request Handling — serveRequest(), reading files, sending response messages

Next Part → ← Previous Part

Leave a Reply

Your email address will not be published. Required fields are marked *