Thread-Local Storage

 

Chapter 31 · File 5 of 5
Thread-Local Storage
The __thread Keyword — Simpler Per-Thread Storage Without the TSD API
Topics Covered:

__thread specifier TLS Concept TLS vs TSD Declaration Rules Initialization of TLS Variables TLS strerror() Example Kernel/Compiler Support Chapter Summary

1. What is Thread-Local Storage (TLS)?

Thread-Local Storage (TLS) is an alternative to Thread-Specific Data (TSD) that achieves the same goal — per-thread private variables — but with a much simpler syntax. Instead of calling API functions to create keys and store/retrieve pointers, you simply add the __thread keyword to a global or static variable declaration.

When a variable is declared with __thread, every thread automatically gets its own separate copy of that variable. Each thread reads and writes its own copy, and no thread can see another thread’s copy.

Like TSD, TLS variables are persistent: a thread’s copy of a TLS variable retains its value between multiple calls to the same function. And like TSD, when a thread terminates, its copies of all TLS variables are automatically deallocated.

TLS: Each Thread Gets Its Own Copy Automatically
Thread A
static __thread char buf[256];
← Thread A’s own copy
Thread B
static __thread char buf[256];
← Thread B’s own copy
Thread C
static __thread char buf[256];
← Thread C’s own copy
One declaration, three private copies — the compiler and runtime handle it automatically.

2. The __thread Specifier — Syntax and Rules

The syntax is simple: add __thread to the declaration of a global or static variable:

/* ===== TLS Variable Declarations ===== */

/* Simple TLS variable — each thread has its own copy, initialized to 0 */
static __thread int per_thread_counter;

/* TLS buffer — each thread has its own 256-byte buffer */
static __thread char per_thread_buf[256];

/* TLS with initializer — each thread starts with value 42 */
static __thread int magic = 42;

/* Global TLS (visible across translation units) */
__thread long global_per_thread_value;

/* With extern */
extern __thread int some_per_thread_var;

Important rules for __thread:

  • The __thread keyword must immediately follow the static or extern storage class specifier, if one is used. Example: static __thread int x; is correct. __thread static int x; is wrong.
  • TLS variables can have initializers, just like normal global or static variables: static __thread int x = 10;. The initializer must be a constant expression.
  • You can take the address of a TLS variable using the & operator: int *p = &per_thread_counter;. However, the address is only valid while the thread is alive — do not store it somewhere another thread might use after this thread exits.
  • __thread can only be applied to global or static variables — not to local (auto) variables, because local variables are already per-thread by nature (they live on the stack).

3. Platform and Compiler Requirements

Thread-Local Storage is not part of the POSIX standard — it is a non-standard extension. However, it is widely available on most major UNIX/Linux platforms. On Linux, TLS requires all three of:

Component Required Version Notes
Linux Kernel 2.6 or later The kernel must provide the necessary system call support (set_thread_area on x86)
Pthreads implementation NPTL (Native POSIX Thread Library) NPTL replaced the older LinuxThreads; virtually all modern Linux systems use NPTL
GCC compiler 3.3 or later (for x86-32) GCC must understand the __thread keyword and generate appropriate TLS access code
C11 Standard: The C11 standard introduced _Thread_local as a standard keyword with the same meaning as GCC’s __thread. The C11 header <threads.h> also provides a macro thread_local as an alias. On modern systems, you can use thread_local instead of __thread for better portability.

4. Thread-Safe strerror() Using TLS

Compare this to the TSD version from File 4. The logic is identical, but notice how much simpler the implementation is — no keys, no key creation, no pthread_once(), no malloc(), no destructor, no pthread_setspecific/getspecific. The __thread keyword does all the work.

/* ===== Listing 31-4: Thread-safe strerror() using TLS — THE SIMPLE WAY ===== */
/* File: strerror_tls.c */

#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <pthread.h>

#define MAX_ERROR_LEN 256

/* THE KEY LINE: __thread makes this a per-thread variable.
   Each thread automatically gets its own 256-byte buf.
   No API calls needed. No malloc. No destructor.
   Storage is freed automatically when the thread exits. */
static __thread char buf[MAX_ERROR_LEN];

char *strerror(int err)
{
    /* 'buf' here refers to THIS THREAD's own copy — no sharing possible */
    if (err < 0 || err >= _sys_nerr || _sys_errlist[err] == NULL) {
        snprintf(buf, MAX_ERROR_LEN, "Unknown error %d", err);
    } else {
        strncpy(buf, _sys_errlist[err], MAX_ERROR_LEN - 1);
        buf[MAX_ERROR_LEN - 1] = '\0';
    }
    return buf;   /* Returns pointer to THIS THREAD's private buf — safe! */
}

/* === Test === */
static void *threadFunc(void *arg)
{
    char *str = strerror(EPERM);
    printf("Other thread: str (%p) = %s\n", str, str);
    return NULL;
}

int main(void)
{
    pthread_t t;
    char *str;

    str = strerror(EINVAL);
    printf("Main thread has called strerror()\n");

    pthread_create(&t, NULL, threadFunc, NULL);
    pthread_join(t, NULL);

    printf("Main thread: str (%p) = %s\n", str, str);
    return 0;
}

/* Expected output:
   Main thread has called strerror()
   Other thread: str (0x40376ab0) = Operation not permitted
   Main thread:  str (0x40175080) = Invalid argument
                      ^^^^^^^^^^^      ^^^^^^^^^^^^^^^^
                      Different addresses! Each thread has its own buf. */
Key Insight: The TLS version of strerror() is functionally identical to the TSD version, but the implementation is dramatically simpler. The static __thread char buf[] declaration replaces roughly 30 lines of TSD API code.

5. TSD vs TLS — Detailed Comparison

Feature Thread-Specific Data (TSD) Thread-Local Storage (TLS)
Standard POSIX (pthread.h) Non-standard (__thread) / C11 (thread_local)
Syntax complexity High — requires key creation, set/get calls Very low — just add __thread to declaration
Memory allocation Manual (malloc + free via destructor) Automatic (handled by compiler/runtime)
Destructor/cleanup Explicit — register function with pthread_key_create() Automatic — storage freed when thread exits
Dynamic size Yes — malloc() can allocate any size No — size must be fixed at compile time
Portability High — any POSIX platform Lower — needs compiler + kernel support
Performance Slower — function call overhead for get/set Faster — direct variable access
Use with function pointers Yes — key can be passed around Limited — variable access is static
Lazy allocation per thread Yes — allocate on first call from each thread No — all copies allocated when thread starts
Best used when Existing API cannot change; dynamic per-thread data Simple per-thread variables; new code; fixed-size data

6. More TLS Examples

/* ===== EXAMPLE: Various uses of __thread ===== */

#include <stdio.h>
#include <pthread.h>

/* Per-thread call counter — each thread counts its own calls independently */
static __thread unsigned long call_count = 0;

/* Per-thread error accumulator */
static __thread int last_errno = 0;

/* Per-thread scratch buffer */
static __thread char scratch[512];

/* Per-thread "initialized" flag */
static __thread int thread_initialized = 0;

void do_work(int value)
{
    call_count++;   /* Increments THIS thread's counter — no mutex needed! */

    if (!thread_initialized) {
        /* First-call-from-this-thread initialization */
        snprintf(scratch, sizeof(scratch), "Thread scratch initialized");
        thread_initialized = 1;
    }

    /* Use scratch buffer — it's per-thread, no races */
    snprintf(scratch, sizeof(scratch), "Processing value %d (call #%lu)",
             value, call_count);
    printf("%s\n", scratch);
}

static void *worker(void *arg)
{
    int id = *(int *)arg;
    int i;
    for (i = 0; i < 3; i++) {
        do_work(id * 10 + i);
    }
    /* call_count here shows only THIS thread's calls — no shared state */
    printf("Thread %d total calls: %lu\n", id, call_count);
    return NULL;
}

int main(void)
{
    pthread_t t1, t2;
    int id1 = 1, id2 = 2;

    pthread_create(&t1, NULL, worker, &id1);
    pthread_create(&t2, NULL, worker, &id2);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    /* Main thread's call_count is still 0 — it never called do_work() */
    printf("Main thread calls: %lu\n", call_count);
    return 0;
}

/* Important: Taking the address of a TLS variable */
void demonstrate_tls_address(void)
{
    /* Valid: &call_count gives the address of THIS thread's copy */
    unsigned long *p = &call_count;
    (*p)++;   /* Increments this thread's call_count */

    /* WARNING: Do NOT store this address and use it from another thread!
       After this thread exits, the address becomes invalid. */
}

7. Chapter 31 Summary

Key Takeaways

Thread Safety: A function is thread-safe if it can be called by multiple threads simultaneously. The most common cause of unsafety is using global or static variables. The two main fixes are: protecting critical sections with mutexes (adds some overhead but maintains concurrency) or making functions reentrant (no shared state, no mutexes needed).
Non-thread-safe functions: SUSv3 lists functions that are NOT required to be thread-safe, typically those that return pointers to static buffers. Use the _r reentrant variants (like strtok_r, ctime_r) in multithreaded code.
pthread_once(): Ensures an initialization function runs exactly once, regardless of how many threads call it. Essential for library initialization, especially for creating TSD keys.
Thread-Specific Data (TSD): The POSIX API for per-thread storage. Uses keys (pthread_key_create), set (pthread_setspecific), and get (pthread_getspecific) to give each thread its own private copy of data. Destructors handle cleanup. Can make existing functions thread-safe without changing their interface.
Thread-Local Storage (TLS): The simpler alternative. Just add __thread (or C11’s thread_local) to a global or static variable. The compiler and runtime automatically provide a per-thread copy. Much simpler than TSD, but requires fixed size and compiler/kernel support.

Interview Questions

Q1. What is the __thread keyword and what does it do?__thread is a GCC extension (also thread_local in C11) that makes a global or static variable thread-local — meaning each thread automatically gets its own private copy of the variable. Threads cannot access each other’s copies. The storage is automatically created when a thread starts and freed when it exits. This provides per-thread persistent storage without any explicit API calls.

Q2. Where must the __thread keyword be placed in a declaration?If a storage class specifier (static or extern) is present, __thread must immediately follow it. For example: static __thread int x; is correct, not __thread static int x;. If no storage class specifier is used (plain global), __thread comes first: __thread int y;.

Q3. What are the main advantages of TLS (__thread) over TSD (pthread API)?TLS is much simpler to use — no need to call pthread_key_create, pthread_setspecific, or pthread_getspecific. No need to write a destructor or call malloc/free. The compiler handles all storage management automatically. TLS access is also faster because it’s essentially a direct variable access (no function call overhead for get/set). The main disadvantage is that TLS requires fixed-size declarations and broader system support (specific kernel version, NPTL, and GCC version).

Q4. Can you take the address of a TLS variable? Are there any risks?Yes, you can take the address using the & operator. The address gives you a pointer to that particular thread’s copy of the variable. The risk is if you store this address somewhere and another thread tries to use it after the original thread has terminated — the storage is freed when the thread exits, making the pointer dangling (invalid). TLS variable addresses should never be shared with other threads or stored beyond the thread’s lifetime.

Q5. Compare the TLS and TSD implementations of thread-safe strerror(). What are the key differences?Both achieve the same result — each thread has its own buffer. With TSD: you need ~30 lines of setup code (pthread_once, pthread_key_create, a destructor, pthread_getspecific checking for NULL, malloc, pthread_setspecific). With TLS: you just declare static __thread char buf[256] — one line. The function body becomes identical to the original non-thread-safe version. TLS is dramatically simpler. The difference is that TSD allows dynamically-sized buffers and explicit cleanup logic, while TLS uses a fixed-size compile-time array with automatic lifetime management.

Q6. What system requirements are needed to use __thread on Linux?Three components must all be present: (1) Linux kernel 2.6 or later, which provides the necessary low-level support (e.g., set_thread_area syscall on x86); (2) NPTL (Native POSIX Thread Library) as the Pthreads implementation — virtually all modern Linux systems use this; (3) GCC version 3.3 or later for x86-32 (or equivalent versions for other architectures). If using C11 thread_local, you also need a C11-compliant compiler mode.

Q7. Give an example of when you would choose TSD over TLS.You would choose TSD when: (1) The per-thread data structure is dynamically sized — for example, a per-thread linked list or variable-length buffer where you don’t know the size at compile time. (2) You need to perform non-trivial cleanup beyond just freeing memory — for instance, closing file descriptors or sending a “thread done” notification in the destructor. (3) You are writing code that must be portable to older systems that may not support __thread. (4) You need to pass the key around as a parameter to different functions that all share the same per-thread storage.

Q8. What is the complete picture of Chapter 31 — how do thread safety, pthread_once, TSD, and TLS fit together?Chapter 31 presents a progression of solutions for making functions thread-safe. The problem starts with functions using shared static/global data (not thread-safe). Simple solution: protect with a mutex (thread-safe but serialized). Better solution: make reentrant (requires interface change). For existing functions that cannot change their interface: use TSD or TLS to give each thread its own copy. pthread_once() is a supporting tool used with TSD — it ensures the TSD key is created exactly once before any thread uses it. TLS is a simpler replacement for TSD that achieves the same goal with compiler support. Together these tools form a complete toolkit for writing thread-safe library code.

Leave a Reply

Your email address will not be published. Required fields are marked *