Beginner–Intermediate
Linux Threading
TLPI – Ch 29
Threads are one of the most important concepts in modern systems programming. If you have ever wondered how a web server handles thousands of users at the same time, or how a music player can download a song while playing another — the answer is threads. This part explains what threads are, how they differ from processes, and what makes them useful.
1. What Is a Thread?
A thread is the smallest unit of execution inside a process. Think of a process as a factory, and threads as the workers inside that factory. All workers share the same building (memory), the same tools (file descriptors), and the same electricity (OS resources) — but each worker does their own job independently.
A process can have one or more threads. When a program starts, it automatically gets one thread called the main thread (or initial thread). You can then create more threads as needed.
The traditional UNIX model creates separate processes to do multiple things at once (using fork()). Threads offer a lighter, faster alternative when all tasks belong to the same application.
2. How Threads Share Memory (Linux x86-32 Layout)
The diagram below shows how four threads coexist inside one process on a Linux x86-32 system. Notice that they share the text, data, heap, and library regions but each has its own private stack.
Key takeaway: Threads share the heap, global variables, and code. Each thread has its own stack for local variables and function call frames. This is why sharing data between threads is so easy — just use a global variable or heap allocation.
3. What Threads Share and What They Don’t
| ✅ Shared by ALL Threads | 🔒 Private to EACH Thread |
|---|---|
| Global variables (initialized & BSS) | Stack (local variables) |
| Heap memory (malloc/new) | Thread ID |
| Open file descriptors | Signal mask |
| Process ID (PID) and parent PID | errno variable (thread-local) |
| User & group IDs (credentials) | Floating-point environment |
| Signal dispositions (handlers) | Realtime scheduling policy & priority |
| Current working directory, umask | CPU affinity (Linux-specific) |
| Resource limits, nice value | Thread-specific data |
4. Why Use Threads Instead of Processes?
The traditional UNIX way to do multiple things at once is to call fork() to create child processes. But threads offer two big improvements:
Creating a thread is roughly 10× faster than fork(). When you call fork(), the OS must duplicate page tables, file descriptor tables, and other process attributes. Threads skip all of this — they share those resources. Under the hood, Linux creates threads using the clone() system call with sharing flags, making it much cheaper.
Threads share memory automatically. If thread A writes a value to a global variable, thread B can read it immediately — no pipes, no shared memory segments, no message queues needed. With processes you would need IPC (Inter-Process Communication) to exchange data.
However, easy sharing is a double-edged sword. If two threads update the same variable at the same time without coordination, the result is unpredictable. This is called a race condition, and it is why synchronisation (mutexes, condition variables) is so important in threaded programs — covered in Chapter 30.
5. Concurrency vs Parallelism
Multiple threads are in progress at the same time, but they may take turns on one CPU core. The OS rapidly switches between them, giving the illusion of simultaneous execution.
On a multi-core/multi-CPU system, two or more threads truly run at the exact same moment on different cores. Threads in a process are eligible for true parallel execution.
If one thread is blocked waiting for disk I/O or a network packet, other threads can continue running. This is one of the biggest advantages of multithreading in I/O-heavy applications like web servers.
6. Code Example: Checking Thread vs Process with getpid()
The snippet below shows that all threads inside one process share the same PID. This is a quick way to verify that threads are indeed “inside” the same process.
/* compile: gcc -pthread -o thread_pid thread_pid.c */
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
/* This function runs in the new thread */
static void *thread_fn(void *arg)
{
/* getpid() returns the same PID as the main thread */
printf("Thread: my PID = %d\n", (int)getpid());
return NULL;
}
int main(void)
{
pthread_t tid;
printf("Main: my PID = %d\n", (int)getpid());
pthread_create(&tid, NULL, thread_fn, NULL);
pthread_join(tid, NULL); /* wait for thread to finish */
return 0;
}
Expected output:
Main: my PID = 4321
Thread: my PID = 4321 <-- same PID!
Both lines print the same PID, confirming both belong to the same process.
7. Code Example: Threads Share Global Variables
This example demonstrates that a global variable modified by one thread is immediately visible to another — because they share the same memory space.
/* compile: gcc -pthread -o shared_global shared_global.c */
#include <stdio.h>
#include <pthread.h>
/* Global variable — shared by all threads */
static int shared_counter = 0;
static void *increment_fn(void *arg)
{
/* NOTE: in real code you MUST protect this with a mutex.
Here we show sharing concept only. */
for (int i = 0; i < 100000; i++)
shared_counter++;
return NULL;
}
int main(void)
{
pthread_t t1, t2;
pthread_create(&t1, NULL, increment_fn, NULL);
pthread_create(&t2, NULL, increment_fn, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
/* Due to race condition, result may not be 200000.
This intentionally shows WHY synchronization is needed. */
printf("Final counter = %d (expected 200000)\n", shared_counter);
return 0;
}
What you will notice: The output is often less than 200,000 because both threads read-modify-write shared_counter at the same time without locking. This is the classic race condition. Chapter 30 (Mutexes) fixes this.
8. Interview Questions
fork() must duplicate page tables, file descriptor tables, and other process-wide data structures. Thread creation (via clone() internally) shares these structures instead of copying them, making it approximately 10× faster.getpid(). However, each thread has its own unique thread ID (TID) assigned by the kernel, accessible via gettid() (Linux-specific). POSIX thread IDs (from pthread_self()) are separate from kernel TIDs.Learn about Pthreads data types, errno in threads, return values, and how to compile multithreaded programs.
