Chapter 33 · Sections 33.5 + 33.6
Linux Pthreads: LinuxThreads vs NPTL
How Linux implemented POSIX threads, what went wrong, and how NPTL fixed it
Key Terms
Overview: Two Linux Pthreads Implementations
Linux has had two main implementations of the POSIX Threads (Pthreads) API:
LinuxThreads OBSOLETE
- Original implementation by Xavier Leroy
- Worked but violated many POSIX rules
- Each thread had a different PID (!)
- Not supported in glibc 2.4 and later
- Uses manager thread (wasteful)
NPTL (Native POSIX Threads Library) MODERN
- Written by Ulrich Drepper & Ingo Molnar
- Close SUSv3 conformance
- All threads share one PID (correct)
- Required Linux 2.6 kernel changes
- Can create 100,000+ threads
LinuxThreads — How It Worked OBSOLETE
LinuxThreads implemented threads using the clone() syscall with these flags:
clone(CLONE_VM | CLONE_FILES | CLONE_FS | CLONE_SIGHAND, ...)
/*
* CLONE_VM — share virtual memory (heap, globals)
* CLONE_FILES — share file descriptor table
* CLONE_FS — share filesystem info (cwd, root, umask)
* CLONE_SIGHAND — share signal handlers/dispositions
*
* NOT shared: process ID, parent process ID
* This is the core problem — each thread got its own PID!
*/
Other LinuxThreads internals:
- A special manager thread was created automatically to handle thread creation and termination — consuming an extra thread slot.
- Signals were used for internal operations — the first 2-3 real-time signals (or SIGUSR1/SIGUSR2 on older kernels) were consumed by LinuxThreads and could not be used by applications.
LinuxThreads POSIX Nonconformances
LinuxThreads broke many POSIX rules — these are critical to know for interviews:
| # | POSIX Requirement | LinuxThreads Violation |
|---|---|---|
| 1 | All threads share the same PID | getpid() returns a different value in each thread ❌ |
| 2 | Any thread can wait() for a child created by any thread | Only the thread that called fork() can wait() for the child ❌ |
| 3 | All threads share credentials (UID/GID) | Threads don’t share credentials ❌ |
| 4 | Process-directed signal goes to one arbitrary thread | Can only target a specific thread (due to different PIDs) ❌ |
| 5 | Threads share session ID and process group ID | Don’t share session/group ID ❌ |
| 6 | Resource limits are process-wide | Resource limits are per-thread ❌ |
| 7 | times() and getrusage() return process-wide totals | Return per-thread values ❌ |
| 8 | New thread inherits NO alternate signal stack | New thread inherits creator’s alternate signal stack → crash risk ❌ |
| 9 | fcntl() record locks are process-wide | Locks not shared between threads ❌ |
| 10 | Threads share nice value and interval timers | Nice value and interval timers not shared ❌ |
NPTL — How It Fixed Everything MODERN
NPTL required significant new kernel features (added in Linux 2.6):
New Kernel Features for NPTL
- Thread groups (refined) — threads share PID
- futex — fast user-space mutex mechanism
- get_thread_area / set_thread_area — thread-local storage
- exit_group() — terminate all threads
- Rewritten kernel scheduler — handles 1000s of KSEs efficiently
- Extended clone() flags — CLONE_THREAD, CLONE_SYSVSEM, etc.
NPTL creates threads with a much richer clone() call:
clone(CLONE_VM | CLONE_FILES | CLONE_FS | CLONE_SIGHAND |
CLONE_THREAD | CLONE_SETTLS |
CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID |
CLONE_SYSVSEM, ...)
/* Key new flags vs LinuxThreads:
*
* CLONE_THREAD — places new thread in same thread group as creator.
* All threads share the same PID (tgid). ✅
*
* CLONE_SYSVSEM — share System V semaphore undo values ✅
*
* CLONE_SETTLS — set thread-local storage (TLS) descriptor ✅
*
* CLONE_PARENT_SETTID — store the new thread's TID in user space ✅
*
* CLONE_CHILD_CLEARTID — clear TID in user space on exit
* (enables futex-based pthread_join) ✅
*/
NPTL Thread Group: All Threads Share One PID
Other NPTL improvements:
- No manager thread — threads are managed directly.
- Only 2 real-time signals consumed internally (not 3 like LinuxThreads).
- One signal is used for thread cancellation; the other ensures all threads have consistent user/group IDs after
setuid()etc. - When
ps(1)is run on a multithreaded NPTL process, only one line appears. Useps -Lto see individual threads.
setsid()/setpgid() restrictions, but these were fixed.)What Is a futex?
A futex (Fast User-space muTEX) is a kernel mechanism designed to make common locking cases very fast. The key insight is:
- Uncontended case (no other thread waiting): lock/unlock using just an atomic CPU instruction in user space — no kernel syscall needed at all. Extremely fast.
- Contended case (another thread is waiting): the futex makes a kernel syscall to park/wake threads. This is the slow path.
NPTL uses futexes to implement pthread mutexes and condition variables. This is why NPTL mutex operations are much faster than LinuxThreads’ signal-based approach.
futex: Fast Uncontended Lock
No syscall
Nanosecond speed ⚡
Kernel wakes waiter
Slower but correct 🔒
Detecting the Threading Implementation at Runtime
On modern systems, NPTL is the only option (glibc 2.4+ dropped LinuxThreads). But you can still query which implementation is present:
/* Method 1: Shell command (glibc 2.3.2 or later) */
$ getconf GNU_LIBPTHREAD_VERSION
NPTL 2.17
/* Method 2: C program using confstr() */
/*
* ep_detect_threads.c — Detect threading implementation at runtime
* Compile: gcc -o ep_detect_threads ep_detect_threads.c -lpthread
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <gnu/libc-version.h>
int main(void)
{
/* Method 1: confstr() — glibc-specific */
char buf[256];
size_t n = confstr(_CS_GNU_LIBPTHREAD_VERSION, buf, sizeof(buf));
if (n > 0)
printf("Threading implementation: %s\n", buf);
else
printf("confstr(_CS_GNU_LIBPTHREAD_VERSION) not available\n");
/* Method 2: glibc version */
printf("glibc version: %s\n", gnu_get_libc_version());
/* Method 3: Check number of CPUs (available to threads) */
printf("Online CPUs: %ld\n", sysconf(_SC_NPROCESSORS_ONLN));
/* Method 4: Verify all threads have same PID (NPTL behaviour) */
printf("PID from main thread: %d\n", getpid());
/* Shell commands to check:
* getconf GNU_LIBPTHREAD_VERSION
* ldd /bin/ls | grep libpthread
* /lib/x86_64-linux-gnu/libpthread.so.0 (direct execution)
*/
return 0;
}
Selecting the Threading Implementation
On older systems that had both LinuxThreads and NPTL, you could force a specific implementation using the LD_ASSUME_KERNEL environment variable:
# Force LinuxThreads (pretend kernel is old enough that NPTL isn't supported)
$ LD_ASSUME_KERNEL=2.2.5 ./my_program
# Verify which implementation is being used
$ export LD_ASSUME_KERNEL=2.2.5
$ $(ldd /bin/ls | grep libc.so | awk '{print $3}') | egrep -i 'threads|nptl'
linuxthreads-0.10 by Xavier Leroy
# Without LD_ASSUME_KERNEL:
$ $(ldd /bin/ls | grep libc.so | awk '{print $3}') | egrep -i 'threads|nptl'
Native POSIX Threads Library by Ulrich Drepper et al
# Note: Specifying kernel version 2.2.5 is enough to ensure LinuxThreads
# is selected, since NPTL requires Linux 2.6+
Code Example 1 — Verify NPTL: All Threads Share Same PID
This is the simplest way to prove you’re running on NPTL: under NPTL, getpid() returns the same value in all threads. Under LinuxThreads, it would differ.
/*
* ep_nptl_verify.c — Verify NPTL by checking all threads share the same PID
* Compile: gcc -o ep_nptl_verify ep_nptl_verify.c -lpthread
*
* Under NPTL: all getpid() calls return the same value ✅
* Under LinuxThreads: each thread returns a different PID ❌
*/
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
/* gettid() is not in glibc before 2.30 — use raw syscall */
static pid_t gettid_compat(void)
{
return (pid_t)syscall(SYS_gettid);
}
void *thread_func(void *arg)
{
int thread_num = *(int *)arg;
pid_t pid = getpid(); /* Process ID (should be same for all under NPTL) */
pid_t tid = gettid_compat(); /* Kernel thread ID (unique per thread always) */
printf("[Thread %d] PID = %d, TID (kernel) = %d, pthread_self = %lu\n",
thread_num, pid, tid, (unsigned long)pthread_self());
if (pid == tid)
printf("[Thread %d] PID == TID (this is the main thread / single thread)\n",
thread_num);
return NULL;
}
int main(void)
{
pthread_t tids[4];
int ids[4];
printf("[Main] PID = %d, TID = %d\n", getpid(), gettid_compat());
printf("[Main] Creating 4 threads...\n\n");
for (int i = 0; i < 4; i++) {
ids[i] = i + 1;
pthread_create(&tids[i], NULL, thread_func, &ids[i]);
}
for (int i = 0; i < 4; i++)
pthread_join(tids[i], NULL);
printf("\n[Main] On NPTL: all threads show the SAME PID ✅\n");
printf("[Main] Each thread has a UNIQUE kernel TID.\n");
printf("[Main] Under LinuxThreads (obsolete): each would show different PID ❌\n");
return 0;
}
Expected NPTL output:
[Main] PID = 4521, TID = 4521
[Main] Creating 4 threads...
[Thread 1] PID = 4521, TID (kernel) = 4522, pthread_self = 140234567...
[Thread 2] PID = 4521, TID (kernel) = 4523, pthread_self = 140234123...
[Thread 3] PID = 4521, TID (kernel) = 4524, pthread_self = 140233789...
[Thread 4] PID = 4521, TID (kernel) = 4525, pthread_self = 140233456...
[Main] On NPTL: all threads show the SAME PID ✅
[Main] Each thread has a UNIQUE kernel TID.
Code Example 2 — NPTL Scale Test: Create Many Threads
NPTL can handle 100,000+ threads. LinuxThreads was limited to a few thousand. This example stress-tests thread creation (at a safe scale for your system).
/*
* ep_nptl_scale.c — Create many threads to demonstrate NPTL scalability
* Compile: gcc -O2 -o ep_nptl_scale ep_nptl_scale.c -lpthread
*
* CAUTION: Don't run with NUM_THREADS too large without checking your
* system's thread limit: cat /proc/sys/kernel/threads-max
* ulimit -u (max user processes)
*/
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#define NUM_THREADS 1000 /* safe for most systems */
static pthread_mutex_t count_mutex = PTHREAD_MUTEX_INITIALIZER;
static long active_threads = 0;
static long max_simultaneous = 0;
void *tiny_thread(void *arg)
{
pthread_mutex_lock(&count_mutex);
active_threads++;
if (active_threads > max_simultaneous)
max_simultaneous = active_threads;
pthread_mutex_unlock(&count_mutex);
usleep(1000); /* 1ms — let many threads overlap */
pthread_mutex_lock(&count_mutex);
active_threads--;
pthread_mutex_unlock(&count_mutex);
return NULL;
}
int main(void)
{
pthread_t *tids;
struct timespec t_start, t_end;
pthread_attr_t attr;
/* Use reduced stack to allow more threads */
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 64 * 1024); /* 64 KB each */
tids = malloc(NUM_THREADS * sizeof(pthread_t));
if (!tids) { perror("malloc"); exit(1); }
printf("Creating %d threads (stack: 64 KB each)...\n", NUM_THREADS);
clock_gettime(CLOCK_MONOTONIC, &t_start);
for (int i = 0; i < NUM_THREADS; i++) {
int ret = pthread_create(&tids[i], &attr, tiny_thread, NULL);
if (ret != 0) {
fprintf(stderr, "pthread_create failed at thread %d: %s\n",
i, strerror(ret));
NUM_THREADS == i; /* stop here */
break;
}
}
for (int i = 0; i < NUM_THREADS; i++)
pthread_join(tids[i], NULL);
clock_gettime(CLOCK_MONOTONIC, &t_end);
double elapsed = (t_end.tv_sec - t_start.tv_sec)
+ (t_end.tv_nsec - t_start.tv_nsec) / 1e9;
printf("Created and joined %d threads in %.3f seconds\n",
NUM_THREADS, elapsed);
printf("Max simultaneous active threads: %ld\n", max_simultaneous);
printf("Avg time per thread create+join: %.1f µs\n",
elapsed * 1e6 / NUM_THREADS);
printf("\nNPTL creators ran 100,000 threads successfully.\n");
printf("LinuxThreads practical limit was ~a few thousand.\n");
pthread_attr_destroy(&attr);
free(tids);
return 0;
}
Section 33.6 — Advanced Pthreads Features (Overview)
Beyond the basics, the Pthreads API includes advanced synchronisation and scheduling features:
Realtime Scheduling
Set realtime scheduling policies (SCHED_FIFO, SCHED_RR) and priorities for individual threads using pthread_setschedparam(). Similar to process-level sched_setscheduler() but per-thread.
Process-Shared Mutexes
Mutexes and condition variables can be shared between different processes (not just threads in the same process) if placed in shared memory. Set with PTHREAD_PROCESS_SHARED attribute. NPTL supports this.
Advanced Synchronisation
Barriers (pthread_barrier_*): synchronise N threads at a common point. RW locks (pthread_rwlock_*): multiple readers, exclusive writer. Spin locks (pthread_spinlock_*): busy-wait locking for very short critical sections.
Interview Questions
clone() without the CLONE_THREAD flag. This meant each thread was a separate process from the kernel’s perspective — each got its own unique PID. POSIX requires all threads in a process to share the same PID. This one architectural mistake cascaded into many nonconformances: signals couldn’t be properly delivered process-wide, wait() didn’t work across threads, credentials weren’t shared, and so on.CLONE_THREAD places the new thread in the same thread group as the creator. All threads in a thread group share the same thread group ID (tgid), which is what getpid() returns. This means all NPTL threads return the same PID from getpid() — conforming to POSIX. The kernel-level thread ID (gettid()) remains unique per thread, but the process-level PID is shared.getconf GNU_LIBPTHREAD_VERSION in the shell, which prints something like “NPTL 2.17”. Programmatically, use confstr(_CS_GNU_LIBPTHREAD_VERSION, buf, sizeof(buf)). On older systems, you could execute the glibc shared library directly ($(ldd /bin/ls | grep libc.so | awk '{print $3}')) and grep the output for “nptl” or “threads”.get_thread_area/set_thread_area for thread-local storage, exit_group() to terminate all threads at once, support for threaded core dumps and debugging, signal management improvements, and an extended clone() syscall with new flags like CLONE_THREAD and CLONE_SYSVSEM.ps shows only a single line for the whole process (since all threads share one PID). To see individual threads, use ps -L which shows each thread’s LWP (lightweight process) ID. Under LinuxThreads: ps shows a separate line for each thread (since they had different PIDs), plus the manager thread — making it appear as if there are multiple processes.getpid() now returns the same value in all threads rather than different values.Chapter 33 — Complete Summary
- Thread stacks: Each non-main thread gets a fixed 2 MB stack (x86-32). Resize with
pthread_attr_setstacksize(). - Signals & threads: Dispositions are process-wide; masks are per-thread. Use sigwait() pattern for safe async signal handling.
- fork() in threads: Only the calling thread survives. fork+exec is the safe pattern. Use
pthread_atfork()for fork without exec. - exec() / exit(): All threads vanish.
pthread_exit()is per-thread.exit()kills all. - Thread models: M:1 (user, fast but limited), 1:1 (kernel, Linux uses this), M:N (complex, rejected for NPTL).
- LinuxThreads: Obsolete. Each thread had its own PID. Many POSIX violations.
- NPTL: Modern. All threads share PID via CLONE_THREAD. Uses futexes. Handles 100,000+ threads.
You’ve Completed Chapter 33!
EmbeddedPathashala — free embedded systems and Linux programming education for students and freshers.
