Thread Implementation Models

 

← Process Control Section 33.4 · Implementation Models Next: LinuxThreads vs NPTL →

Chapter 33 · Section 33.4

Thread Implementation Models

M:1 (user-level), 1:1 (kernel-level), and M:N (two-level) thread models explained

M:1User-level threads

1:1Kernel-level (Linux)

M:NTwo-level (complex)

Key Terms

KSE (Kernel Scheduling Entity) M:1 Model 1:1 Model M:N Model User-level threads Kernel-level threads Two-level model Blocking syscall Scheduler scalability

What Is a Kernel Scheduling Entity (KSE)?

A Kernel Scheduling Entity (KSE) is the unit of work that the kernel’s scheduler manages. The kernel allocates CPU time to KSEs. In traditional single-threaded Unix, a KSE and a process were the same thing.

When threading was added to Unix, a new question arose: how should user-space threads (pthreads) map onto these kernel-visible KSEs? The answer defines the thread implementation model.

The Core Question: How do N user threads map to kernel KSEs?

User Space
T1
T2
T3
T4
Kernel Space
KSE
or
KSE
KSE
KSE
KSE
or
KSE
KSE

4 user threads → 1 KSE (M:1) | 4 threads → 4 KSEs (1:1) | 4 threads → 2 KSEs (M:N)

M:1 Model — Many-to-One (User-Level Threads)

In the M:1 model, all threads map onto a single kernel scheduling entity. The kernel sees only one process. All thread management (creation, scheduling, synchronisation) happens entirely in user space via the threading library.

M:1 Model: All threads share one kernel view

USER SPACE — Threading Library
Thread 1
Thread 2
Thread 3
Thread 4
All threads multiplexed on ONE kernel context
↓ single kernel view ↓
KERNEL — 1 KSE (single process)

✅ Advantages

  • Very fast thread operations (no kernel mode switch)
  • Mutex/lock operations happen entirely in user space
  • Easy to port across different OS platforms

❌ Disadvantages

  • One blocking syscall (e.g., read()) blocks ALL threads
  • Cannot use multiple CPUs/cores in parallel
  • Cannot assign different priorities to threads across processes
The blocking syscall problem: When a thread calls read() and it blocks, the kernel sees only one process and puts the entire process to sleep. All other user-level threads in that process are also frozen, even if they have work to do and the CPU is available. This is the fatal flaw of M:1.

1:1 Model — One-to-One (Kernel-Level Threads)

In the 1:1 model, each user thread maps to its own KSE. The kernel knows about every thread and schedules them individually.

1:1 Model: Each thread gets its own KSE

USER SPACE
Thread 1
Thread 2
Thread 3
Thread 4
↓    ↓    ↓    ↓
KERNEL — 4 independent KSEs

KSE 1
KSE 2
KSE 3
KSE 4

✅ Advantages

  • Blocking syscall only blocks the calling thread
  • Kernel can schedule threads on different CPU cores in parallel
  • True parallel execution on multicore systems

❌ Disadvantages

  • Thread creation/switching requires kernel mode switch (slower than M:1)
  • Applications with thousands of threads can overload the kernel scheduler
  • More kernel resources per thread
Linux’s choice: Both LinuxThreads and NPTL use the 1:1 model. NPTL invested significant work rewriting the kernel scheduler to handle thousands of KSEs efficiently, eliminating the main disadvantage of the 1:1 model at scale.

M:N Model — Many-to-Many (Two-Level Model)

The M:N model tries to combine the best of M:1 and 1:1. Multiple user threads map onto fewer KSEs. The library multiplexes user threads onto a pool of KSEs.

M:N Model: Many threads, fewer KSEs

USER SPACE — 6 Threads
T1
T2
T3
T4
T5
T6
Library + kernel cooperate to schedule
KERNEL — 3 KSEs (2 or 4 CPU cores)

KSE 1
KSE 2
KSE 3

✅ Advantages (Theoretical)

  • Fewer KSEs → less kernel overhead for many threads
  • Blocking syscall can block just one KSE; library wakes another
  • Parallel execution across multiple CPUs

❌ Disadvantages

  • Extremely complex to implement correctly
  • Library and kernel must cooperate and share scheduling info
  • Signal management is complicated
  • On Linux, the scheduler was rewritten to handle 1:1 at scale anyway
Why Linux rejected M:N for NPTL: M:N was initially considered for NPTL but rejected because it required too many wide-ranging kernel changes. The Linux kernel scheduler was instead optimised to handle the 1:1 model efficiently even with thousands of threads — making M:N complexity unnecessary.

Comparison Table

Feature M:1 (User) 1:1 (Kernel) M:N (Two-level)
Kernel awareness of threads ❌ None ✅ Full ⚠️ Partial
Thread create/switch speed ⚡ Very fast 🐢 Slower (syscall) ⚡ Fast (user)
Blocking syscall effect Blocks ALL threads Only calling thread One KSE blocked
True multicore parallelism ❌ No ✅ Yes ✅ Yes
Implementation complexity Low Medium Very High
Scalability (many threads) Limited (1 CPU) Good (NPTL optimised) Good (theoretically)
Used by Linux? ❌ No ✅ Yes (NPTL + LinuxThreads) ❌ Rejected for NPTL

Code Example 1 — 1:1 Model: True Parallel Execution on Multiple Cores

On Linux (1:1 model), threads actually run on different CPU cores simultaneously. This example uses CPU affinity to pin threads to specific cores and verifies parallel execution.

/*
 * ep_parallel_threads.c — Show true parallelism (1:1 model, multiple cores)
 * Compile: gcc -o ep_parallel_threads ep_parallel_threads.c -lpthread
 *
 * On a multicore system, you'll see both threads running simultaneously.
 * On M:1, only one would run at a time.
 */
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>
#include <time.h>

static volatile long counter_a = 0;
static volatile long counter_b = 0;

struct thread_arg {
    int cpu_id;        /* which CPU to run on */
    volatile long *counter; /* which counter to increment */
    const char *name;
};

void *cpu_bound_thread(void *arg)
{
    struct thread_arg *ta = (struct thread_arg *)arg;
    cpu_set_t cpuset;

    /* Pin this thread to a specific CPU */
    CPU_ZERO(&cpuset);
    CPU_SET(ta->cpu_id, &cpuset);
    int ret = pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
    if (ret != 0)
        printf("[%s] Warning: could not set CPU affinity\n", ta->name);
    else
        printf("[%s] Pinned to CPU %d\n", ta->name, ta->cpu_id);

    /* Busy-work loop: count as fast as possible for 1 second */
    struct timespec start, now;
    clock_gettime(CLOCK_MONOTONIC, &start);
    while (1) {
        clock_gettime(CLOCK_MONOTONIC, &now);
        double elapsed = (now.tv_sec - start.tv_sec)
                       + (now.tv_nsec - start.tv_nsec) / 1e9;
        if (elapsed >= 1.0) break;
        (*ta->counter)++;
    }
    printf("[%s] Counted to: %ld\n", ta->name, *ta->counter);
    return NULL;
}

int main(void)
{
    pthread_t tid_a, tid_b;
    struct thread_arg arg_a = { .cpu_id = 0, .counter = &counter_a, .name = "ThreadA" };
    struct thread_arg arg_b = { .cpu_id = 1, .counter = &counter_b, .name = "ThreadB" };

    int ncpus = (int)sysconf(_SC_NPROCESSORS_ONLN);
    printf("[Main] CPUs available: %d\n", ncpus);
    printf("[Main] Creating two CPU-bound threads (Linux 1:1 model)\n");

    pthread_create(&tid_a, NULL, cpu_bound_thread, &arg_a);
    pthread_create(&tid_b, NULL, cpu_bound_thread, &arg_b);

    pthread_join(tid_a, NULL);
    pthread_join(tid_b, NULL);

    printf("[Main] Combined count: %ld\n", counter_a + counter_b);
    printf("[Main] In M:1 model, both threads would share 1 CPU;\n");
    printf("[Main] in 1:1 model they run truly in parallel.\n");
    return 0;
}

Code Example 2 — Blocking Syscall: Why M:1 Fails

This example illustrates conceptually why a blocking syscall is fatal in M:1 but harmless in 1:1. We simulate it by having one thread block on read() (from a pipe with no writer) while another thread should continue working.

/*
 * ep_blocking_syscall.c — Simulate blocking syscall impact
 * In 1:1 (Linux): Thread B keeps running while Thread A blocks.
 * In M:1: Thread B would also freeze (kernel sees 1 process sleeping).
 *
 * Compile: gcc -o ep_blocking_syscall ep_blocking_syscall.c -lpthread
 */
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

static int pipe_fds[2];  /* pipe: [0]=read end, [1]=write end */

/* Thread A: blocks on read() — simulates I/O wait */
void *blocking_thread(void *arg)
{
    char buf[64];
    printf("[ThreadA] About to call read() — will block until data arrives\n");
    ssize_t n = read(pipe_fds[0], buf, sizeof(buf) - 1);
    if (n > 0) {
        buf[n] = '\0';
        printf("[ThreadA] Read: '%s'\n", buf);
    }
    printf("[ThreadA] Done\n");
    return NULL;
}

/* Thread B: CPU work — should NOT be affected by Thread A's blocking */
void *cpu_thread(void *arg)
{
    printf("[ThreadB] Doing CPU work (should NOT block even though Thread A is blocked)\n");
    long sum = 0;
    for (long i = 0; i < 200000000L; i++)  /* ~0.5s of work */
        sum += i;
    printf("[ThreadB] CPU work done. sum = %ld\n", sum);
    return NULL;
}

int main(void)
{
    pthread_t tid_a, tid_b;

    if (pipe(pipe_fds) != 0) { perror("pipe"); exit(1); }

    printf("[Main] Creating threads. Linux uses 1:1 model.\n");
    printf("[Main] Thread A will block on read(); Thread B should run freely.\n\n");

    pthread_create(&tid_a, NULL, blocking_thread, NULL);
    pthread_create(&tid_b, NULL, cpu_thread, NULL);

    /* Wait for Thread B to finish */
    pthread_join(tid_b, NULL);
    printf("\n[Main] Thread B finished — proves it was NOT blocked by Thread A.\n");
    printf("[Main] Now writing to pipe to unblock Thread A...\n");

    /* Unblock Thread A */
    write(pipe_fds[1], "wake up!", 8);
    pthread_join(tid_a, NULL);

    close(pipe_fds[0]);
    close(pipe_fds[1]);
    printf("[Main] All done.\n");
    return 0;
}
Run this and observe that Thread B’s CPU work completes fully while Thread A is blocked. In the M:1 model, Thread B would be frozen too because the kernel puts the whole process to sleep on the blocking read.

Interview Questions

Q1. What is a Kernel Scheduling Entity (KSE)?
A KSE is the unit of execution that the kernel’s scheduler sees and allocates CPU time to. In traditional single-threaded Unix, a KSE was synonymous with a process. In threaded systems, the mapping between user-level threads and KSEs defines the threading model (M:1, 1:1, or M:N).
Q2. What is the main disadvantage of the M:1 threading model?
The main disadvantage is that a blocking system call (like read() or write()) blocks the entire process, including all user-level threads. The kernel has no knowledge of individual threads, so it cannot keep other threads running while one is blocked. Additionally, M:1 cannot exploit multiple CPU cores — all threads run on the same single kernel context.
Q3. Why does Linux use the 1:1 threading model instead of M:N?
The M:N model was initially considered for NPTL but rejected because it required extensive, wide-ranging changes to the Linux kernel that were deemed unnecessary. Instead, the Linux kernel scheduler was significantly improved (rewritten) to efficiently handle large numbers of KSEs — achieving good scalability with the simpler 1:1 model. Benchmarks showed that the optimised 1:1 NPTL performed better than the M:N NGPT implementation from IBM.
Q4. Can Linux threads truly run in parallel on a multicore CPU? Under which model?
Yes, on a multicore CPU, Linux threads can run truly in parallel because Linux uses the 1:1 model (each thread has its own KSE). The kernel scheduler can distribute different KSEs to different CPU cores simultaneously. In the M:1 model, true parallel execution is impossible since the kernel sees only one schedulable entity regardless of how many user threads exist.
Q5. Why are M:N threading operations faster than 1:1 for simple thread operations?
In M:N (like M:1 for user-level operations), thread creation, context switching between threads, and mutex/condition-variable operations that don’t require blocking can all be done entirely in user space without a kernel mode switch. This avoids the overhead of a system call trap. In the 1:1 model, every thread operation that touches the KSE (creation, some synchronisation) requires a kernel mode switch, which is more expensive.

Section Summary

  • A KSE is what the kernel schedules. Thread models define the user-thread-to-KSE mapping.
  • M:1: All threads on one KSE. Fast user-level ops, but blocking syscall freezes all threads. No true parallelism.
  • 1:1: Each thread is a KSE. True parallelism, blocking only affects calling thread. Used by Linux (NPTL, LinuxThreads).
  • M:N: N threads on M KSEs. Best of both worlds in theory, but extremely complex. Rejected for Linux NPTL.

Keep Learning — It’s Free

EmbeddedPathashala — free embedded systems education.

Visit EmbeddedPathashala

Leave a Reply

Your email address will not be published. Required fields are marked *