Linux Processes, Memory & Libraries – free linux system programming course

 

Chapter 04 — Processes, Memory & Libraries – free linux system programming course
How programs become processes, how memory is laid out, how fork and exec work, process credentials, capabilities, init, daemons, mmap, and static vs shared libraries.
Navigation: Home | ← Ch 03 | Ch 05 →

Keywords

ProcessPIDfork()exec()Copy-on-WriteText SegmentHeapStackRUID / EUIDSUID bitCapabilitiesinit / PID 1Daemonmmap()Shared Library

What Is a Process?

A process is an instance of an executing program. When a program is run, the kernel: (1) loads the program’s code into virtual memory, (2) allocates memory for variables, (3) creates kernel data structures recording the PID, termination status, UIDs, GIDs, and more. From the kernel’s view, processes are the entities among which it shares hardware resources.

Process Memory Layout

A process’s virtual address space has four logical segments:

Text Segment (Code)

Compiled machine instructions of the program. Read-only to prevent accidental overwrites. Shared between all processes running the same program — only one physical copy in RAM regardless of how many instances run.

Data Segment

Global and static variables that were explicitly initialised in source code (e.g., int x = 42; at file scope). Loaded from the program file at startup.

Heap

Area for dynamic memory allocation via malloc(). Grows upward toward higher addresses. Memory persists until freed with free() or the process exits. Memory leaks happen when allocated heap memory is never freed.

Stack

Grows and shrinks as functions are called and return. Each call creates a stack frame holding local variables, function arguments, and the return address. Grows downward toward lower addresses. Stack memory is automatically reclaimed when functions return.

HIGH ADDRESS
┌─────────────────────┐
│ Stack ↓             │  local variables, return addresses
│                     │
│   (free space)      │
│                     │
│ Heap ↑              │  malloc() / free()
├─────────────────────┤
│ Data Segment        │  initialised global/static variables
├─────────────────────┤
│ Text Segment        │  program code (read-only)
LOW ADDRESS

Creating Processes: fork()

A process creates a child using fork(). The child is an almost exact copy — same code, same data, same open file descriptors. fork() returns twice: returns the child’s PID in the parent, returns 0 in the child.

pid_t pid = fork();
if (pid == -1) {
    perror("fork failed");
} else if (pid == 0) {
    /* CHILD: runs here */
    printf("Child PID: %d\n", getpid());
} else {
    /* PARENT: runs here, pid = child's PID */
    printf("Parent, child is: %d\n", pid);
    waitpid(pid, NULL, 0);  /* wait for child to finish */
}
Copy-on-Write (COW): Physically copying all parent memory at fork time would be very slow. Instead, the kernel marks all shared pages read-only. A page is copied to a new location only when either process actually writes to it. Since fork() followed by exec() discards everything before writing, fork-exec incurs nearly zero copy overhead.

Executing New Programs: exec()

execve() replaces the calling process’s program image with a new program. The PID stays the same; code, data, stack, and heap are all replaced. Open file descriptors persist (unless marked O_CLOEXEC). The fork-exec pattern — fork a child, set up any redirections, then exec — is how the shell runs every command.

pid_t pid = fork();
if (pid == 0) {
    /* child: redirect stdout to a file, then exec */
    int fd = open("output.txt", O_WRONLY|O_CREAT|O_TRUNC, 0644);
    dup2(fd, STDOUT_FILENO);   /* fd 1 now points to file */
    close(fd);
    execl("/bin/ls", "ls", "-l", NULL);
    _exit(1);  /* only reached if exec fails */
}
waitpid(pid, NULL, 0);

Process Credentials

Real UID / Real GID (RUID / RGID)

Identifies the actual owner of the process. Set at login and inherited by all children. Represents “who started this process.”

Effective UID / Effective GID (EUID / EGID)

What the kernel checks for permission on file access, signals, and other resources. Normally equals RUID but can differ via the SUID mechanism.

Saved Set-User-ID (SSUID)

A copy of the EUID from the last exec(). Lets a process temporarily drop to its RUID and later restore the elevated EUID — used for precise privilege management.

The SUID Bit — How passwd Works

When a file has the SUID bit set and is owned by root, any process running it gets EUID=0. Example:

ls -l /usr/bin/passwd
-rws r-x r-x  1  root  root  ...  /usr/bin/passwd
  ↑
  's' = SUID bit — running this gives the process EUID=0

A regular user runs passwd and their process temporarily becomes EUID=0, allowing it to write to /etc/shadow. The passwd program validates the user’s identity first — the escalation is tightly controlled.

Linux Capabilities

Since kernel 2.2, Linux divides root’s privileges into ~40 independent units called capabilities. A process can be granted only what it needs:

CAP_KILL — signal other users’ processes CAP_NET_BIND_SERVICE — bind to ports <1024 CAP_SYS_ADMIN — broad admin operations CAP_DAC_OVERRIDE — bypass file permissions CAP_SETUID — change process UIDs

A web server needs only CAP_NET_BIND_SERVICE to use port 80 — granting it full root would be excessive and dangerous. Modern systems use capabilities to follow the principle of least privilege.

The init Process (PID 1)

After booting, the kernel creates init (PID 1) from /sbin/init. Init is the ancestor of every process on the system. It always has PID 1, runs with full root privileges, cannot be killed even by root, and terminates only on system shutdown. Init continuously reaps zombie children and manages system services. Modern Linux systems use systemd as PID 1.

Daemon Processes

A daemon is a long-lived background process with no controlling terminal. It is usually started at boot and runs until shutdown. Examples: sshd, httpd, crond, syslogd. Standard daemonisation steps:

pid_t pid = fork();
if (pid > 0) exit(0);          /* parent exits, shell gets control back */
setsid();                       /* child: new session, no controlling terminal */
/* redirect stdin/stdout/stderr to /dev/null */
/* chdir("/"), umask(0), close inherited file descriptors */

Memory Mappings: mmap()

mmap() creates a new memory mapping in the process’s virtual address space. Two types:

File Mapping

Maps a region of a file into virtual memory. Reading/writing those memory bytes is equivalent to I/O on the file. Pages are loaded from the file on demand. Used to load executable code and for memory-mapped I/O.

Anonymous Mapping

No backing file — pages are initialised to zero. Used by malloc() for large allocations and for shared memory between a parent and child after fork().

Static vs Shared Libraries

Static Libraries (.a files)

At link time the linker extracts needed object modules from the library and copies them into the executable. Every statically linked program contains its own private copy of all library code it uses. Disadvantage: wasted disk space and RAM from duplicate copies; a bug fix in a library requires relinking every program that uses it.

Shared Libraries (.so files)

The linker only records that the program needs libfoo.so. At runtime, the dynamic linker (ld.so) loads the shared library and resolves function references. Only one copy of the library code lives in RAM shared by all programs. Advantages: saves disk space and RAM; bug fixes take effect immediately for all programs without relinking.

ldd /bin/ls               # show shared library dependencies
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
    libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1

Interview Questions

Q1: What is the difference between fork() and exec()?

Answer: fork() creates a new process by duplicating the caller. Both parent and child continue running the same code; fork() returns the child’s PID in the parent and 0 in the child. exec() replaces the calling process’s program image with a new program — the PID stays the same but code, data, stack, and heap are entirely replaced. The shell uses both together: fork() to create a child, exec() in the child to run the command, and waitpid() in the parent to collect the exit status.

Q2: What is copy-on-write and how does it make fork() efficient?

Answer: When fork() creates a child, physically copying all parent memory for large processes would be extremely slow. Copy-on-write avoids this: the kernel marks all shared memory pages read-only and both processes point to the same physical pages. A page is only copied to a new location when either process actually writes to it, triggering a protection fault. Since fork() followed immediately by exec() never writes to parent pages, fork-exec incurs almost no memory copy cost.

Q3: What is a zombie process and how do you prevent it?

Answer: After a process exits, the kernel retains a small record of its exit status until the parent collects it with wait(). A process in this state — dead but uncollected — is a zombie. It holds a PID and process table slot but no CPU or significant memory. Prevention: (1) call waitpid() for every child; (2) install a SIGCHLD handler that calls waitpid(-1, NULL, WNOHANG) in a loop; (3) set SIGCHLD to SIG_IGN for auto-reaping; (4) double-fork so the grandchild is adopted and reaped by init.

Q4: What are Linux capabilities and why are they better than running everything as root?

Answer: Capabilities divide root’s privileges into ~40 independent units. A process gets only the capabilities it actually needs. A web server binding to port 80 needs only CAP_NET_BIND_SERVICE; giving it full root is excessive and dangerous. If the process is compromised, the attacker gains only those specific capabilities rather than full root access. Modern service managers like systemd assign per-service capability sets, greatly reducing the attack surface compared to the traditional binary root/non-root model.

Q5: What is the advantage of shared libraries over static libraries?

Answer: Shared libraries save disk space and RAM because only one physical copy of the library code is loaded regardless of how many programs use it. More critically for security: bug fixes to a shared library automatically benefit all programs that use it the next time they run — no recompilation or relinking needed. With static libraries every program containing vulnerable code must be rebuilt and redistributed individually. For a widely used library like libc, shared linking means a single patch fixes every program on the system.

Continue to Chapter 05

Next: All 7 IPC mechanisms, signals as software interrupts, and POSIX threads with mutexes and condition variables.

Chapter 05 → ← Chapter 03

Leave a Reply

Your email address will not be published. Required fields are marked *