Linux Pthreads: LinuxThreads vs NPTL

 

← Implementation Models Sections 33.5–33.6 · LinuxThreads vs NPTL ← Back to Index

Chapter 33 · Sections 33.5 + 33.6

Linux Pthreads: LinuxThreads vs NPTL

How Linux implemented POSIX threads, what went wrong, and how NPTL fixed it

LinuxThreadsLegacy — obsolete

NPTLModern — all Linux

100,000Threads NPTL can handle

Linux 2.6NPTL introduced

Key Terms

LinuxThreads NPTL NGPT clone() syscall CLONE_THREAD CLONE_SYSVSEM futex thread group exit_group() getconf GNU_LIBPTHREAD_VERSION LD_ASSUME_KERNEL ABI compatibility

Overview: Two Linux Pthreads Implementations

Linux has had two main implementations of the POSIX Threads (Pthreads) API:

LinuxThreads OBSOLETE

  • Original implementation by Xavier Leroy
  • Worked but violated many POSIX rules
  • Each thread had a different PID (!)
  • Not supported in glibc 2.4 and later
  • Uses manager thread (wasteful)

NPTL (Native POSIX Threads Library) MODERN

  • Written by Ulrich Drepper & Ingo Molnar
  • Close SUSv3 conformance
  • All threads share one PID (correct)
  • Required Linux 2.6 kernel changes
  • Can create 100,000+ threads
Historical footnote: IBM developed a third option called NGPT (Next Generation POSIX Threads) using the M:N model. NGPT outperformed LinuxThreads, but the NPTL developers built an even better 1:1 implementation. NPTL outperformed NGPT, and NGPT development was discontinued.

LinuxThreads — How It Worked OBSOLETE

LinuxThreads implemented threads using the clone() syscall with these flags:

clone(CLONE_VM | CLONE_FILES | CLONE_FS | CLONE_SIGHAND, ...)
/*
 * CLONE_VM      — share virtual memory (heap, globals)
 * CLONE_FILES   — share file descriptor table
 * CLONE_FS      — share filesystem info (cwd, root, umask)
 * CLONE_SIGHAND — share signal handlers/dispositions
 *
 * NOT shared: process ID, parent process ID
 * This is the core problem — each thread got its own PID!
 */

Other LinuxThreads internals:

  • A special manager thread was created automatically to handle thread creation and termination — consuming an extra thread slot.
  • Signals were used for internal operations — the first 2-3 real-time signals (or SIGUSR1/SIGUSR2 on older kernels) were consumed by LinuxThreads and could not be used by applications.

LinuxThreads POSIX Nonconformances

LinuxThreads broke many POSIX rules — these are critical to know for interviews:

# POSIX Requirement LinuxThreads Violation
1 All threads share the same PID getpid() returns a different value in each thread ❌
2 Any thread can wait() for a child created by any thread Only the thread that called fork() can wait() for the child ❌
3 All threads share credentials (UID/GID) Threads don’t share credentials ❌
4 Process-directed signal goes to one arbitrary thread Can only target a specific thread (due to different PIDs) ❌
5 Threads share session ID and process group ID Don’t share session/group ID ❌
6 Resource limits are process-wide Resource limits are per-thread ❌
7 times() and getrusage() return process-wide totals Return per-thread values ❌
8 New thread inherits NO alternate signal stack New thread inherits creator’s alternate signal stack → crash risk ❌
9 fcntl() record locks are process-wide Locks not shared between threads ❌
10 Threads share nice value and interval timers Nice value and interval timers not shared ❌

NPTL — How It Fixed Everything MODERN

NPTL required significant new kernel features (added in Linux 2.6):

New Kernel Features for NPTL

  • Thread groups (refined) — threads share PID
  • futex — fast user-space mutex mechanism
  • get_thread_area / set_thread_area — thread-local storage
  • exit_group() — terminate all threads
  • Rewritten kernel scheduler — handles 1000s of KSEs efficiently
  • Extended clone() flags — CLONE_THREAD, CLONE_SYSVSEM, etc.

NPTL creates threads with a much richer clone() call:

clone(CLONE_VM | CLONE_FILES | CLONE_FS | CLONE_SIGHAND |
      CLONE_THREAD | CLONE_SETTLS |
      CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID |
      CLONE_SYSVSEM, ...)

/* Key new flags vs LinuxThreads:
 *
 * CLONE_THREAD      — places new thread in same thread group as creator.
 *                     All threads share the same PID (tgid). ✅
 *
 * CLONE_SYSVSEM     — share System V semaphore undo values ✅
 *
 * CLONE_SETTLS      — set thread-local storage (TLS) descriptor ✅
 *
 * CLONE_PARENT_SETTID — store the new thread's TID in user space ✅
 *
 * CLONE_CHILD_CLEARTID — clear TID in user space on exit
 *                         (enables futex-based pthread_join) ✅
 */

NPTL Thread Group: All Threads Share One PID

Process PID = 1234 (thread group ID = tgid = 1234)
Main Thread
tid = 1234, tgid = 1234
getpid() → 1234 ✅
Thread 2
tid = 1235, tgid = 1234
getpid() → 1234 ✅
Thread 3
tid = 1236, tgid = 1234
getpid() → 1234 ✅
Under LinuxThreads, each thread would return a DIFFERENT PID from getpid() ❌

Other NPTL improvements:

  • No manager thread — threads are managed directly.
  • Only 2 real-time signals consumed internally (not 3 like LinuxThreads).
  • One signal is used for thread cancellation; the other ensures all threads have consistent user/group IDs after setuid() etc.
  • When ps(1) is run on a multithreaded NPTL process, only one line appears. Use ps -L to see individual threads.
Remaining NPTL nonconformance (at time of writing): Threads still don’t share a nice value. (Kernels before 2.6.16 also had issues with alternate signal stacks and setsid()/setpgid() restrictions, but these were fixed.)

What Is a futex?

A futex (Fast User-space muTEX) is a kernel mechanism designed to make common locking cases very fast. The key insight is:

  • Uncontended case (no other thread waiting): lock/unlock using just an atomic CPU instruction in user space — no kernel syscall needed at all. Extremely fast.
  • Contended case (another thread is waiting): the futex makes a kernel syscall to park/wake threads. This is the slow path.

NPTL uses futexes to implement pthread mutexes and condition variables. This is why NPTL mutex operations are much faster than LinuxThreads’ signal-based approach.

futex: Fast Uncontended Lock

Uncontended
Atomic CAS in user space
No syscall
Nanosecond speed ⚡
vs
Contended
futex() syscall to block
Kernel wakes waiter
Slower but correct 🔒

Detecting the Threading Implementation at Runtime

On modern systems, NPTL is the only option (glibc 2.4+ dropped LinuxThreads). But you can still query which implementation is present:

/* Method 1: Shell command (glibc 2.3.2 or later) */
$ getconf GNU_LIBPTHREAD_VERSION
NPTL 2.17

/* Method 2: C program using confstr() */
/*
 * ep_detect_threads.c — Detect threading implementation at runtime
 * Compile: gcc -o ep_detect_threads ep_detect_threads.c -lpthread
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <gnu/libc-version.h>

int main(void)
{
    /* Method 1: confstr() — glibc-specific */
    char buf[256];
    size_t n = confstr(_CS_GNU_LIBPTHREAD_VERSION, buf, sizeof(buf));
    if (n > 0)
        printf("Threading implementation: %s\n", buf);
    else
        printf("confstr(_CS_GNU_LIBPTHREAD_VERSION) not available\n");

    /* Method 2: glibc version */
    printf("glibc version: %s\n", gnu_get_libc_version());

    /* Method 3: Check number of CPUs (available to threads) */
    printf("Online CPUs: %ld\n", sysconf(_SC_NPROCESSORS_ONLN));

    /* Method 4: Verify all threads have same PID (NPTL behaviour) */
    printf("PID from main thread: %d\n", getpid());

    /* Shell commands to check:
     * getconf GNU_LIBPTHREAD_VERSION
     * ldd /bin/ls | grep libpthread
     * /lib/x86_64-linux-gnu/libpthread.so.0 (direct execution)
     */
    return 0;
}

Selecting the Threading Implementation

On older systems that had both LinuxThreads and NPTL, you could force a specific implementation using the LD_ASSUME_KERNEL environment variable:

# Force LinuxThreads (pretend kernel is old enough that NPTL isn't supported)
$ LD_ASSUME_KERNEL=2.2.5 ./my_program

# Verify which implementation is being used
$ export LD_ASSUME_KERNEL=2.2.5
$ $(ldd /bin/ls | grep libc.so | awk '{print $3}') | egrep -i 'threads|nptl'
  linuxthreads-0.10 by Xavier Leroy

# Without LD_ASSUME_KERNEL:
$ $(ldd /bin/ls | grep libc.so | awk '{print $3}') | egrep -i 'threads|nptl'
  Native POSIX Threads Library by Ulrich Drepper et al

# Note: Specifying kernel version 2.2.5 is enough to ensure LinuxThreads
# is selected, since NPTL requires Linux 2.6+
On modern systems (glibc 2.4+), LinuxThreads is no longer provided and this distinction is only historically relevant. All modern Linux programs use NPTL.

Code Example 1 — Verify NPTL: All Threads Share Same PID

This is the simplest way to prove you’re running on NPTL: under NPTL, getpid() returns the same value in all threads. Under LinuxThreads, it would differ.

/*
 * ep_nptl_verify.c — Verify NPTL by checking all threads share the same PID
 * Compile: gcc -o ep_nptl_verify ep_nptl_verify.c -lpthread
 *
 * Under NPTL:         all getpid() calls return the same value ✅
 * Under LinuxThreads: each thread returns a different PID ❌
 */
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>

/* gettid() is not in glibc before 2.30 — use raw syscall */
static pid_t gettid_compat(void)
{
    return (pid_t)syscall(SYS_gettid);
}

void *thread_func(void *arg)
{
    int thread_num = *(int *)arg;
    pid_t pid = getpid();       /* Process ID (should be same for all under NPTL) */
    pid_t tid = gettid_compat(); /* Kernel thread ID (unique per thread always) */

    printf("[Thread %d] PID = %d, TID (kernel) = %d, pthread_self = %lu\n",
           thread_num, pid, tid, (unsigned long)pthread_self());

    if (pid == tid)
        printf("[Thread %d] PID == TID (this is the main thread / single thread)\n",
               thread_num);
    return NULL;
}

int main(void)
{
    pthread_t tids[4];
    int ids[4];

    printf("[Main] PID = %d, TID = %d\n", getpid(), gettid_compat());
    printf("[Main] Creating 4 threads...\n\n");

    for (int i = 0; i < 4; i++) {
        ids[i] = i + 1;
        pthread_create(&tids[i], NULL, thread_func, &ids[i]);
    }
    for (int i = 0; i < 4; i++)
        pthread_join(tids[i], NULL);

    printf("\n[Main] On NPTL: all threads show the SAME PID ✅\n");
    printf("[Main] Each thread has a UNIQUE kernel TID.\n");
    printf("[Main] Under LinuxThreads (obsolete): each would show different PID ❌\n");
    return 0;
}

Expected NPTL output:

[Main] PID = 4521, TID = 4521
[Main] Creating 4 threads...

[Thread 1] PID = 4521, TID (kernel) = 4522, pthread_self = 140234567...
[Thread 2] PID = 4521, TID (kernel) = 4523, pthread_self = 140234123...
[Thread 3] PID = 4521, TID (kernel) = 4524, pthread_self = 140233789...
[Thread 4] PID = 4521, TID (kernel) = 4525, pthread_self = 140233456...

[Main] On NPTL: all threads show the SAME PID ✅
[Main] Each thread has a UNIQUE kernel TID.

Code Example 2 — NPTL Scale Test: Create Many Threads

NPTL can handle 100,000+ threads. LinuxThreads was limited to a few thousand. This example stress-tests thread creation (at a safe scale for your system).

/*
 * ep_nptl_scale.c — Create many threads to demonstrate NPTL scalability
 * Compile: gcc -O2 -o ep_nptl_scale ep_nptl_scale.c -lpthread
 *
 * CAUTION: Don't run with NUM_THREADS too large without checking your
 * system's thread limit: cat /proc/sys/kernel/threads-max
 * ulimit -u   (max user processes)
 */
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>

#define NUM_THREADS   1000   /* safe for most systems */

static pthread_mutex_t count_mutex = PTHREAD_MUTEX_INITIALIZER;
static long active_threads = 0;
static long max_simultaneous = 0;

void *tiny_thread(void *arg)
{
    pthread_mutex_lock(&count_mutex);
    active_threads++;
    if (active_threads > max_simultaneous)
        max_simultaneous = active_threads;
    pthread_mutex_unlock(&count_mutex);

    usleep(1000);  /* 1ms — let many threads overlap */

    pthread_mutex_lock(&count_mutex);
    active_threads--;
    pthread_mutex_unlock(&count_mutex);
    return NULL;
}

int main(void)
{
    pthread_t *tids;
    struct timespec t_start, t_end;
    pthread_attr_t attr;

    /* Use reduced stack to allow more threads */
    pthread_attr_init(&attr);
    pthread_attr_setstacksize(&attr, 64 * 1024);  /* 64 KB each */

    tids = malloc(NUM_THREADS * sizeof(pthread_t));
    if (!tids) { perror("malloc"); exit(1); }

    printf("Creating %d threads (stack: 64 KB each)...\n", NUM_THREADS);
    clock_gettime(CLOCK_MONOTONIC, &t_start);

    for (int i = 0; i < NUM_THREADS; i++) {
        int ret = pthread_create(&tids[i], &attr, tiny_thread, NULL);
        if (ret != 0) {
            fprintf(stderr, "pthread_create failed at thread %d: %s\n",
                    i, strerror(ret));
            NUM_THREADS == i;  /* stop here */
            break;
        }
    }

    for (int i = 0; i < NUM_THREADS; i++)
        pthread_join(tids[i], NULL);

    clock_gettime(CLOCK_MONOTONIC, &t_end);
    double elapsed = (t_end.tv_sec - t_start.tv_sec)
                   + (t_end.tv_nsec - t_start.tv_nsec) / 1e9;

    printf("Created and joined %d threads in %.3f seconds\n",
           NUM_THREADS, elapsed);
    printf("Max simultaneous active threads: %ld\n", max_simultaneous);
    printf("Avg time per thread create+join: %.1f µs\n",
           elapsed * 1e6 / NUM_THREADS);
    printf("\nNPTL creators ran 100,000 threads successfully.\n");
    printf("LinuxThreads practical limit was ~a few thousand.\n");

    pthread_attr_destroy(&attr);
    free(tids);
    return 0;
}

Section 33.6 — Advanced Pthreads Features (Overview)

Beyond the basics, the Pthreads API includes advanced synchronisation and scheduling features:

Realtime Scheduling

Set realtime scheduling policies (SCHED_FIFO, SCHED_RR) and priorities for individual threads using pthread_setschedparam(). Similar to process-level sched_setscheduler() but per-thread.

Process-Shared Mutexes

Mutexes and condition variables can be shared between different processes (not just threads in the same process) if placed in shared memory. Set with PTHREAD_PROCESS_SHARED attribute. NPTL supports this.

Advanced Synchronisation

Barriers (pthread_barrier_*): synchronise N threads at a common point. RW locks (pthread_rwlock_*): multiple readers, exclusive writer. Spin locks (pthread_spinlock_*): busy-wait locking for very short critical sections.

Interview Questions

Q1. What was the fundamental problem with LinuxThreads that caused so many POSIX nonconformances?
The core problem was that LinuxThreads used clone() without the CLONE_THREAD flag. This meant each thread was a separate process from the kernel’s perspective — each got its own unique PID. POSIX requires all threads in a process to share the same PID. This one architectural mistake cascaded into many nonconformances: signals couldn’t be properly delivered process-wide, wait() didn’t work across threads, credentials weren’t shared, and so on.
Q2. What is the CLONE_THREAD flag in NPTL’s clone() call and what problem does it solve?
CLONE_THREAD places the new thread in the same thread group as the creator. All threads in a thread group share the same thread group ID (tgid), which is what getpid() returns. This means all NPTL threads return the same PID from getpid() — conforming to POSIX. The kernel-level thread ID (gettid()) remains unique per thread, but the process-level PID is shared.
Q3. What is a futex and why is it important for NPTL performance?
A futex (Fast User-space muTEX) is a kernel synchronisation primitive that is fast in the common (uncontended) case. When a mutex is unlocked and no other thread is waiting, the lock/unlock can be done with a single atomic CPU instruction entirely in user space — no kernel system call. Only when there is contention (another thread is waiting) does a kernel syscall occur. NPTL uses futexes for mutexes and condition variables, making them much faster than LinuxThreads’ signal-based approach.
Q4. How do you check which Pthreads implementation is in use on a Linux system?
The most portable method is: getconf GNU_LIBPTHREAD_VERSION in the shell, which prints something like “NPTL 2.17”. Programmatically, use confstr(_CS_GNU_LIBPTHREAD_VERSION, buf, sizeof(buf)). On older systems, you could execute the glibc shared library directly ($(ldd /bin/ls | grep libc.so | awk '{print $3}')) and grep the output for “nptl” or “threads”.
Q5. What kernel changes were needed to support NPTL, and when were they added?
NPTL required Linux 2.6 kernel changes including: refined thread group implementation (so all threads share a PID), futex support (for fast synchronisation), new system calls get_thread_area/set_thread_area for thread-local storage, exit_group() to terminate all threads at once, support for threaded core dumps and debugging, signal management improvements, and an extended clone() syscall with new flags like CLONE_THREAD and CLONE_SYSVSEM.
Q6. What does ps(1) show for a multithreaded process under NPTL vs LinuxThreads?
Under NPTL: ps shows only a single line for the whole process (since all threads share one PID). To see individual threads, use ps -L which shows each thread’s LWP (lightweight process) ID. Under LinuxThreads: ps shows a separate line for each thread (since they had different PIDs), plus the manager thread — making it appear as if there are multiple processes.
Q7. Is NPTL ABI-compatible with LinuxThreads? What does that mean?
Yes, NPTL is designed to be ABI-compatible with LinuxThreads. This means programs compiled and linked against a glibc that used LinuxThreads do not need to be recompiled to run with NPTL. The binary interface (function signatures, library entry points, data structures) is the same. However, some runtime behaviour may change when running with NPTL because NPTL adheres more closely to POSIX — for example, getpid() now returns the same value in all threads rather than different values.

Chapter 33 — Complete Summary

  • Thread stacks: Each non-main thread gets a fixed 2 MB stack (x86-32). Resize with pthread_attr_setstacksize().
  • Signals & threads: Dispositions are process-wide; masks are per-thread. Use sigwait() pattern for safe async signal handling.
  • fork() in threads: Only the calling thread survives. fork+exec is the safe pattern. Use pthread_atfork() for fork without exec.
  • exec() / exit(): All threads vanish. pthread_exit() is per-thread. exit() kills all.
  • Thread models: M:1 (user, fast but limited), 1:1 (kernel, Linux uses this), M:N (complex, rejected for NPTL).
  • LinuxThreads: Obsolete. Each thread had its own PID. Many POSIX violations.
  • NPTL: Modern. All threads share PID via CLONE_THREAD. Uses futexes. Handles 100,000+ threads.

You’ve Completed Chapter 33!

EmbeddedPathashala — free embedded systems and Linux programming education for students and freshers.

Visit EmbeddedPathashala Back to Index

Leave a Reply

Your email address will not be published. Required fields are marked *