__thread Keyword — Simpler Per-Thread Storage Without the TSD API1. What is Thread-Local Storage (TLS)?
Thread-Local Storage (TLS) is an alternative to Thread-Specific Data (TSD) that achieves the same goal — per-thread private variables — but with a much simpler syntax. Instead of calling API functions to create keys and store/retrieve pointers, you simply add the __thread keyword to a global or static variable declaration.
When a variable is declared with __thread, every thread automatically gets its own separate copy of that variable. Each thread reads and writes its own copy, and no thread can see another thread’s copy.
Like TSD, TLS variables are persistent: a thread’s copy of a TLS variable retains its value between multiple calls to the same function. And like TSD, when a thread terminates, its copies of all TLS variables are automatically deallocated.
← Thread A’s own copy
← Thread B’s own copy
← Thread C’s own copy
2. The __thread Specifier — Syntax and Rules
The syntax is simple: add __thread to the declaration of a global or static variable:
/* ===== TLS Variable Declarations ===== */
/* Simple TLS variable — each thread has its own copy, initialized to 0 */
static __thread int per_thread_counter;
/* TLS buffer — each thread has its own 256-byte buffer */
static __thread char per_thread_buf[256];
/* TLS with initializer — each thread starts with value 42 */
static __thread int magic = 42;
/* Global TLS (visible across translation units) */
__thread long global_per_thread_value;
/* With extern */
extern __thread int some_per_thread_var;
Important rules for __thread:
- The
__threadkeyword must immediately follow thestaticorexternstorage class specifier, if one is used. Example:static __thread int x;is correct.__thread static int x;is wrong. - TLS variables can have initializers, just like normal global or static variables:
static __thread int x = 10;. The initializer must be a constant expression. - You can take the address of a TLS variable using the
&operator:int *p = &per_thread_counter;. However, the address is only valid while the thread is alive — do not store it somewhere another thread might use after this thread exits. __threadcan only be applied to global or static variables — not to local (auto) variables, because local variables are already per-thread by nature (they live on the stack).
3. Platform and Compiler Requirements
Thread-Local Storage is not part of the POSIX standard — it is a non-standard extension. However, it is widely available on most major UNIX/Linux platforms. On Linux, TLS requires all three of:
| Component | Required Version | Notes |
|---|---|---|
| Linux Kernel | 2.6 or later | The kernel must provide the necessary system call support (set_thread_area on x86) |
| Pthreads implementation | NPTL (Native POSIX Thread Library) | NPTL replaced the older LinuxThreads; virtually all modern Linux systems use NPTL |
| GCC compiler | 3.3 or later (for x86-32) | GCC must understand the __thread keyword and generate appropriate TLS access code |
_Thread_local as a standard keyword with the same meaning as GCC’s __thread. The C11 header <threads.h> also provides a macro thread_local as an alias. On modern systems, you can use thread_local instead of __thread for better portability.4. Thread-Safe strerror() Using TLS
Compare this to the TSD version from File 4. The logic is identical, but notice how much simpler the implementation is — no keys, no key creation, no pthread_once(), no malloc(), no destructor, no pthread_setspecific/getspecific. The __thread keyword does all the work.
/* ===== Listing 31-4: Thread-safe strerror() using TLS — THE SIMPLE WAY ===== */
/* File: strerror_tls.c */
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <pthread.h>
#define MAX_ERROR_LEN 256
/* THE KEY LINE: __thread makes this a per-thread variable.
Each thread automatically gets its own 256-byte buf.
No API calls needed. No malloc. No destructor.
Storage is freed automatically when the thread exits. */
static __thread char buf[MAX_ERROR_LEN];
char *strerror(int err)
{
/* 'buf' here refers to THIS THREAD's own copy — no sharing possible */
if (err < 0 || err >= _sys_nerr || _sys_errlist[err] == NULL) {
snprintf(buf, MAX_ERROR_LEN, "Unknown error %d", err);
} else {
strncpy(buf, _sys_errlist[err], MAX_ERROR_LEN - 1);
buf[MAX_ERROR_LEN - 1] = '\0';
}
return buf; /* Returns pointer to THIS THREAD's private buf — safe! */
}
/* === Test === */
static void *threadFunc(void *arg)
{
char *str = strerror(EPERM);
printf("Other thread: str (%p) = %s\n", str, str);
return NULL;
}
int main(void)
{
pthread_t t;
char *str;
str = strerror(EINVAL);
printf("Main thread has called strerror()\n");
pthread_create(&t, NULL, threadFunc, NULL);
pthread_join(t, NULL);
printf("Main thread: str (%p) = %s\n", str, str);
return 0;
}
/* Expected output:
Main thread has called strerror()
Other thread: str (0x40376ab0) = Operation not permitted
Main thread: str (0x40175080) = Invalid argument
^^^^^^^^^^^ ^^^^^^^^^^^^^^^^
Different addresses! Each thread has its own buf. */
strerror() is functionally identical to the TSD version, but the implementation is dramatically simpler. The static __thread char buf[] declaration replaces roughly 30 lines of TSD API code.5. TSD vs TLS — Detailed Comparison
| Feature | Thread-Specific Data (TSD) | Thread-Local Storage (TLS) |
|---|---|---|
| Standard | POSIX (pthread.h) | Non-standard (__thread) / C11 (thread_local) |
| Syntax complexity | High — requires key creation, set/get calls | Very low — just add __thread to declaration |
| Memory allocation | Manual (malloc + free via destructor) | Automatic (handled by compiler/runtime) |
| Destructor/cleanup | Explicit — register function with pthread_key_create() | Automatic — storage freed when thread exits |
| Dynamic size | Yes — malloc() can allocate any size | No — size must be fixed at compile time |
| Portability | High — any POSIX platform | Lower — needs compiler + kernel support |
| Performance | Slower — function call overhead for get/set | Faster — direct variable access |
| Use with function pointers | Yes — key can be passed around | Limited — variable access is static |
| Lazy allocation per thread | Yes — allocate on first call from each thread | No — all copies allocated when thread starts |
| Best used when | Existing API cannot change; dynamic per-thread data | Simple per-thread variables; new code; fixed-size data |
6. More TLS Examples
/* ===== EXAMPLE: Various uses of __thread ===== */
#include <stdio.h>
#include <pthread.h>
/* Per-thread call counter — each thread counts its own calls independently */
static __thread unsigned long call_count = 0;
/* Per-thread error accumulator */
static __thread int last_errno = 0;
/* Per-thread scratch buffer */
static __thread char scratch[512];
/* Per-thread "initialized" flag */
static __thread int thread_initialized = 0;
void do_work(int value)
{
call_count++; /* Increments THIS thread's counter — no mutex needed! */
if (!thread_initialized) {
/* First-call-from-this-thread initialization */
snprintf(scratch, sizeof(scratch), "Thread scratch initialized");
thread_initialized = 1;
}
/* Use scratch buffer — it's per-thread, no races */
snprintf(scratch, sizeof(scratch), "Processing value %d (call #%lu)",
value, call_count);
printf("%s\n", scratch);
}
static void *worker(void *arg)
{
int id = *(int *)arg;
int i;
for (i = 0; i < 3; i++) {
do_work(id * 10 + i);
}
/* call_count here shows only THIS thread's calls — no shared state */
printf("Thread %d total calls: %lu\n", id, call_count);
return NULL;
}
int main(void)
{
pthread_t t1, t2;
int id1 = 1, id2 = 2;
pthread_create(&t1, NULL, worker, &id1);
pthread_create(&t2, NULL, worker, &id2);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
/* Main thread's call_count is still 0 — it never called do_work() */
printf("Main thread calls: %lu\n", call_count);
return 0;
}
/* Important: Taking the address of a TLS variable */
void demonstrate_tls_address(void)
{
/* Valid: &call_count gives the address of THIS thread's copy */
unsigned long *p = &call_count;
(*p)++; /* Increments this thread's call_count */
/* WARNING: Do NOT store this address and use it from another thread!
After this thread exits, the address becomes invalid. */
}
7. Chapter 31 Summary
Key Takeaways
_r reentrant variants (like strtok_r, ctime_r) in multithreaded code.__thread (or C11’s thread_local) to a global or static variable. The compiler and runtime automatically provide a per-thread copy. Much simpler than TSD, but requires fixed size and compiler/kernel support.Interview Questions
__thread keyword and what does it do?__thread is a GCC extension (also thread_local in C11) that makes a global or static variable thread-local — meaning each thread automatically gets its own private copy of the variable. Threads cannot access each other’s copies. The storage is automatically created when a thread starts and freed when it exits. This provides per-thread persistent storage without any explicit API calls.
__thread keyword be placed in a declaration?If a storage class specifier (static or extern) is present, __thread must immediately follow it. For example: static __thread int x; is correct, not __thread static int x;. If no storage class specifier is used (plain global), __thread comes first: __thread int y;.
__thread) over TSD (pthread API)?TLS is much simpler to use — no need to call pthread_key_create, pthread_setspecific, or pthread_getspecific. No need to write a destructor or call malloc/free. The compiler handles all storage management automatically. TLS access is also faster because it’s essentially a direct variable access (no function call overhead for get/set). The main disadvantage is that TLS requires fixed-size declarations and broader system support (specific kernel version, NPTL, and GCC version).
& operator. The address gives you a pointer to that particular thread’s copy of the variable. The risk is if you store this address somewhere and another thread tries to use it after the original thread has terminated — the storage is freed when the thread exits, making the pointer dangling (invalid). TLS variable addresses should never be shared with other threads or stored beyond the thread’s lifetime.
strerror(). What are the key differences?Both achieve the same result — each thread has its own buffer. With TSD: you need ~30 lines of setup code (pthread_once, pthread_key_create, a destructor, pthread_getspecific checking for NULL, malloc, pthread_setspecific). With TLS: you just declare static __thread char buf[256] — one line. The function body becomes identical to the original non-thread-safe version. TLS is dramatically simpler. The difference is that TSD allows dynamically-sized buffers and explicit cleanup logic, while TLS uses a fixed-size compile-time array with automatic lifetime management.
__thread on Linux?Three components must all be present: (1) Linux kernel 2.6 or later, which provides the necessary low-level support (e.g., set_thread_area syscall on x86); (2) NPTL (Native POSIX Thread Library) as the Pthreads implementation — virtually all modern Linux systems use this; (3) GCC version 3.3 or later for x86-32 (or equivalent versions for other architectures). If using C11 thread_local, you also need a C11-compliant compiler mode.
__thread. (4) You need to pass the key around as a parameter to different functions that all share the same per-thread storage.
