Threads vs Processes & Chapter Summary

Threads vs Processes & Chapter Summary
Part 9 of 9  |  Design Choice · Trade-offs · Key Takeaways · Exercises
Level
All Levels
Topic
Architecture Decisions
Book
TLPI – Ch 29.9–29.11

Before writing a multithreaded application you should ask: should I use threads at all, or would separate processes serve better? This is not always obvious. Both models have clear advantages and disadvantages. This final part of Chapter 29 gives you a framework for making that decision, summarises everything covered, and presents the textbook exercises to test your understanding.

Key Terms

Thread Safety Address Space IPC Fork Signal Handling Isolation Context Switch Virtual Memory

1. Advantages of Using Threads

Threads shine in scenarios where tasks within an application need to work closely together and share data frequently.

🚀
Fast creation

Thread creation is approximately 10× faster than fork(). On Linux, creating a thread via clone() shares page tables and file descriptors rather than duplicating them. This matters in server applications that handle thousands of connections per second.

📦
Easy, fast data sharing

Threads share the same address space, so sharing data is as simple as writing to a global or heap variable. With processes, you need pipes, shared memory, message queues, or sockets — all much more complex and slower.

Lower context-switch overhead

Switching between threads of the same process can be faster than switching between processes, because threads share the same virtual address space — no TLB (Translation Lookaside Buffer) flush is needed when the kernel switches between threads of the same process.

🔄
I/O concurrency without blocking

If one thread blocks on I/O (e.g., waiting for a network packet), other threads can continue running. This is much simpler to implement than non-blocking I/O callbacks or event loops.

2. Disadvantages and Risks of Using Threads

🔒
Thread safety requirement

All functions you call from multiple threads must be thread-safe (or you must call them with appropriate locking). Many standard library functions are not thread-safe by default (e.g., the old strtok(), localtime(), etc.). You need to use reentrant variants like strtok_r() and localtime_r(). Multiprocess programs don’t need to worry about this within a single process.

💥
A bug in one thread can crash all threads

Since all threads share the same virtual address space, a buggy thread (e.g., writing through a wild pointer, corrupting the heap) can overwrite data used by another thread, causing crashes or silent data corruption across the entire process. With separate processes, a crash in one process does not affect others.

🧩
Virtual address space competition

Each thread consumes a chunk of the process’s virtual address space for its stack and thread-local data. On 32-bit systems (3 GB user space), creating many threads each with 8 MB stacks quickly exhausts the address space. Separate processes each get their own full address space.

📡
Signal handling complexity

Handling signals in multithreaded programs requires careful design. Signals can be delivered to any thread in the process. Getting the right thread to handle a signal without races is non-trivial. The general recommendation is to avoid signals in multithreaded programs where possible.

📜
All threads must run the same program

In a multithreaded application, all threads execute within the same process image — they run the same program binary (though in different functions). With a multiprocess application, each process can run a completely different program via exec().

3. Other Factors in the Decision

Beyond performance, there are practical engineering considerations:

Factor Threads Processes
Shared file descriptors Yes — advantage or disadvantage Only if explicitly shared (socketpair etc.)
Shared signal dispositions Yes — complicates signal design Each process has its own
Fault isolation Poor — one bug can crash all Good — crash contained
Run different programs No — same binary Yes — exec() any program
Data sharing Trivial — shared memory Requires IPC
Creation speed ~10× faster Slower (copy-on-write)
Shared CWD, umask, UID/GID Yes — may be advantage or disadvantage Each process has its own

4. Practical Decision Guide

✅ Choose THREADS when…
  • Tasks share a large common dataset
  • You need very fast task creation
  • Low-latency communication between tasks
  • All tasks are part of one logical application
  • Tasks need to coordinate tightly (producer–consumer, etc.)
  • Example: web server, video codec, matrix computation
✅ Choose PROCESSES when…
  • Fault isolation is critical (e.g., browser tabs)
  • Tasks run different programs (exec())
  • Security sandbox needed between tasks
  • Tasks are largely independent
  • Using legacy non-thread-safe libraries
  • Example: shell pipelines, web server worker model

5. Chapter 29 — Complete Summary

Overview (29.1): Threads are independent execution paths within a process. They share code, heap, globals, file descriptors, and most OS resources. Each thread has a private stack. Thread creation is ~10× faster than fork() and sharing data is trivial.
Pthreads API Background (29.2): Key data types: pthread_t, pthread_mutex_t, pthread_cond_t etc. are opaque — never compare with ==. Each thread has its own errno. Pthreads functions return 0 on success and a positive error number on failure. Compile with gcc -pthread.
Thread Creation (29.3): pthread_create(pthread_t*, attr, start_fn, arg) creates a thread. The start function signature is void *fn(void *). No guaranteed scheduling order after creation.
Thread Termination (29.4): Four ways to end: return, pthread_exit(), pthread_cancel(), or process-wide exit(). pthread_exit() from main lets other threads continue. Never return a pointer to a stack-local variable.
Thread IDs (29.5): pthread_self() returns calling thread’s ID. pthread_equal() compares two IDs portably. POSIX TIDs differ from Linux kernel TIDs (gettid()). IDs can be reused after join.
Joining (29.6): pthread_join(tid, &retval) blocks until the thread finishes and cleans up its resources. Without joining (or detaching), terminated threads become zombie threads. Any thread can join any other (peer model) — no hierarchy like processes.
Detaching (29.7): pthread_detach() marks a thread for automatic cleanup on termination. Cannot be joined afterwards. Can also be set at creation via pthread_attr_setdetachstate(PTHREAD_CREATE_DETACHED).
Thread Attributes (29.8): pthread_attr_t controls stack size, detach state, scheduling policy/priority. Init → Set → Create → Destroy. Attribute object can be destroyed immediately after pthread_create().
Threads vs Processes (29.9): Threads win on speed and easy sharing; processes win on isolation, fault containment, and ability to run different programs. Choice depends on the application’s needs.

6. Textbook Exercises (29.11)

Exercise 29-1

Question: What possible outcomes might there be if a thread executes the following code?

pthread_join(pthread_self(), NULL);

Write a program to see what actually happens on Linux. If we have a variable tid containing a thread ID, how can a thread prevent itself from making a call pthread_join(tid, NULL) that is equivalent to the above statement?

▶ Analysis & Answer

What happens:

  • A thread trying to join itself creates a deadlock: the thread is waiting for itself to terminate, which can never happen because it is blocked waiting.
  • On Linux with NPTL, pthread_join(pthread_self(), NULL) returns EDEADLK (Error: deadlock would occur) immediately rather than actually deadlocking.

How to prevent a thread from joining itself:

Check using pthread_equal() before calling join:

if (!pthread_equal(tid, pthread_self()))
    pthread_join(tid, NULL);
else
    fprintf(stderr, "Cannot join self!\n");
Exercise 29-2

Question: Aside from the absence of error checking and variable declarations, what is the problem with the following program?

static void *
threadFunc(void *arg)
{
    struct someStruct *pbuf = (struct someStruct *) arg;
    /* Do some work with structure pointed to by 'pbuf' */
}

int
main(int argc, char *argv[])
{
    struct someStruct buf;

    pthread_create(&thr, NULL, threadFunc, (void *) &buf);
    pthread_exit(NULL);
}
▶ Analysis & Answer

The Problem: Dangling Stack Pointer

buf is a local variable on main()‘s stack. When main() calls pthread_exit(NULL), the main thread terminates and its stack is freed (or may be reused).

Meanwhile, the new thread is still running threadFunc() with a pointer to that now-freed stack memory. Accessing *pbuf in the thread is undefined behaviour — reading freed/reused memory.

Fix options:

  • Option A: Use pthread_join() in main() instead of pthread_exit() — this ensures the thread finishes before main()‘s stack is freed.
  • Option B: Allocate buf on the heap with malloc() — heap memory lives until explicitly freed.
  • Option C: Make buf a global variable — globals live for the entire process lifetime.
/* Fix A: join instead of exit */
pthread_create(&thr, NULL, threadFunc, (void *) &buf);
pthread_join(thr, NULL);   /* main waits — buf stays valid */
return 0;

/* Fix B: heap allocation */
struct someStruct *pbuf = malloc(sizeof *pbuf);
pthread_create(&thr, NULL, threadFunc, pbuf);
/* threadFunc() should free(pbuf) when done */
pthread_exit(NULL);

7. Code Example: Exercise 29-1 — Self-Join Test

/* compile: gcc -pthread -o self_join self_join.c */
#include <stdio.h>
#include <string.h>
#include <pthread.h>

static void *thread_fn(void *arg)
{
    int s;

    printf("Thread: attempting to join myself...\n");

    /* Attempt self-join */
    s = pthread_join(pthread_self(), NULL);

    if (s != 0)
        printf("Thread: pthread_join self failed: %s\n", strerror(s));
    else
        printf("Thread: pthread_join self succeeded (unexpected!)\n");

    return NULL;
}

int main(void)
{
    pthread_t tid;
    pthread_create(&tid, NULL, thread_fn, NULL);
    pthread_join(tid, NULL);

    /* Example: safely guarding against self-join */
    printf("\nMain: demonstrating safe join guard:\n");
    pthread_t main_tid = pthread_self();

    if (!pthread_equal(main_tid, tid))
        printf("Main: safe to join tid (different from us)\n");
    else
        printf("Main: tid is us — skipping join\n");

    return 0;
}

Expected output on Linux (NPTL):

Thread: attempting to join myself...
Thread: pthread_join self failed: Resource deadlock avoided

Main: demonstrating safe join guard:
Main: safe to join tid (different from us)

8. Interview Questions — Threads vs Processes & Chapter Review

Q1. What are the main advantages of threads over processes?
(1) Thread creation is much faster (~10×) than fork() because resources like page tables are shared rather than duplicated. (2) Sharing data between threads is trivially easy — just use shared (global/heap) variables, with no IPC overhead. (3) Context switching between threads of the same process can be faster than between processes.
Q2. What are the main disadvantages of threads compared to processes?
(1) All functions must be thread-safe; many legacy functions are not. (2) A bug in one thread (wild pointer write, heap corruption) can crash or corrupt data for all other threads. (3) All threads share the same virtual address space, limiting the total number of threads on 32-bit systems. (4) Signal handling is more complex. (5) All threads must run the same program binary.
Q3. What does “thread-safe” mean?
A function is thread-safe if it can be called concurrently by multiple threads without causing data corruption or incorrect results. This usually means it either: (a) uses only local (stack) variables, (b) uses appropriate locking to protect shared state, or (c) uses thread-local storage. Functions that use global state without locking are NOT thread-safe.
Q4. Give a real-world example where threads are clearly better than processes.
A web server like Apache httpd (in worker MPM mode) or Nginx uses threads to handle multiple client requests simultaneously while sharing a common cache, configuration, and connection pool in memory. Creating a new process per request would be too slow and wasteful. Another example: a video encoder splitting frames across CPU cores — all threads work on a shared frame buffer.
Q5. Give a real-world example where separate processes are better than threads.
A web browser using separate processes for each tab (like Chrome/Chromium). If one tab crashes or is compromised (e.g., by malicious JavaScript/WebAssembly), the other tabs continue running safely. The OS provides strong isolation between processes, which is essential for security.
Q6. Write pseudocode showing the complete lifecycle of a joinable thread.
pthread_attr_t attr;
pthread_t tid;
void *result;

pthread_attr_init(&attr);                        // init attributes
pthread_attr_setstacksize(&attr, 512*1024);      // optional: set stack

pthread_create(&tid, &attr, my_func, my_arg);   // create thread
pthread_attr_destroy(&attr);                     // no longer needed

/* ... main does other work ... */

pthread_join(tid, &result);                      // wait + cleanup
// use result...

🎉 Chapter 29 Complete!

You have covered all 9 parts of POSIX Threads Introduction. Next chapters to study:

Chapter 30: Mutexes & Condition Variables Chapter 31: Thread Safety & TLS Chapter 32: Thread Cancellation Chapter 33: Threads & Signals

Leave a Reply

Your email address will not be published. Required fields are marked *