The flags Argument — The Power of clone()
The flags bitmask is what makes clone() powerful. Each bit controls whether a specific resource is shared between parent and child, or whether the child gets its own private copy. By combining flags you can create anything from a fully isolated process to a thread that shares everything.
Complete Flags Reference Table
| Flag | Effect When Set | Used By |
|---|---|---|
| CLONE_FILES | Share open file descriptor table | POSIX threads |
| CLONE_FS | Share filesystem info (umask, cwd, root) | POSIX threads |
| CLONE_SIGHAND | Share signal disposition table | POSIX threads |
| CLONE_VM | Share virtual memory (same page tables) | POSIX threads, vfork() |
| CLONE_THREAD | Place child in parent’s thread group (share PID) | NPTL |
| CLONE_SYSVSEM | Share System V semaphore undo values | NPTL (kernel 2.6+) |
| CLONE_SETTLS | Set up thread-local storage from tls arg | NPTL (kernel 2.6+) |
| CLONE_PARENT_SETTID | Write child TID to ptid before returning | NPTL (kernel 2.6+) |
| CLONE_CHILD_SETTID | Write child TID to ctid in child’s memory | Kernel 2.6+ |
| CLONE_CHILD_CLEARTID | Zero ctid on exit, wake futex (pthread_join) | NPTL (kernel 2.6+) |
| CLONE_NEWNS | Child gets copy of parent’s mount namespace | Containers (kernel 2.4.19+) |
| CLONE_NEWIPC | New System V IPC namespace | Containers (kernel 2.6.19+) |
| CLONE_NEWNET | New network namespace | Containers (kernel 2.4.24+) |
| CLONE_NEWPID | New PID namespace (child appears as PID 1) | Containers (kernel 2.6.19+) |
| CLONE_NEWUSER | New user-ID/group-ID namespace | Containers (kernel 2.6.23+) |
| CLONE_NEWUTS | New UTS namespace (hostname, domainname) | Containers (kernel 2.6.19+) |
| CLONE_PARENT | Child’s parent = caller’s parent (same PPID) | Kernel 2.4+ |
| CLONE_VFORK | Suspend parent until child execs or exits | vfork() emulation |
| CLONE_PTRACE | Trace child if parent is being traced | Debuggers |
| CLONE_UNTRACED | Prevent CLONE_PTRACE being forced on child | Kernel threads (kernel 2.6+) |
| CLONE_IO | Share I/O context with parent | Kernel 2.6.25+ |
Section A: Flags for POSIX Thread Creation
CLONE_VM — Share Virtual Memory
When set, parent and child share the same page tables. Any memory write by one is immediately visible to the other. This is the defining characteristic of a thread. Without it, the child gets a copy-on-write copy (like fork()).
CLONE_FILES — Share File Descriptor Table
Parent and child share one file descriptor table. open(), close(), dup() in either process affects both. POSIX requires all threads in a process to share file descriptors. Without this flag, the child gets its own copy of the fd table (referencing the same underlying open file descriptions as fork() does).
CLONE_FS — Share Filesystem Attributes
Shared: umask, root directory (chroot()), current working directory (chdir()). A chdir() or chroot() by either process affects both. POSIX threads share these attributes. Cannot be combined with CLONE_NEWNS.
CLONE_SIGHAND — Share Signal Dispositions
The signal disposition table (what to do for each signal: ignore, default, or handler) is shared. Changing a signal handler in one process via sigaction() changes it for both. Signal masks and pending signals are always separate even when sharing dispositions.
Example 1 — Simulate a Thread (CLONE_VM + CLONE_FILES + CLONE_SIGHAND)
/* thread_like_clone.c — Create a thread-like child sharing memory */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>
#include <signal.h>
#include <sys/wait.h>
#include <unistd.h>
#define STACK_SIZE (64 * 1024)
int shared_counter = 0; /* In shared memory — both parent and child see changes */
static int child_func(void *arg)
{
printf("[Child] shared_counter before = %d\n", shared_counter);
shared_counter += 100; /* This modifies PARENT's memory too (CLONE_VM) */
printf("[Child] shared_counter after = %d\n", shared_counter);
return 0;
}
int main(void)
{
char *stack = malloc(STACK_SIZE);
char *stack_top = stack + STACK_SIZE;
shared_counter = 5;
printf("[Parent] shared_counter = %d (before clone)\n", shared_counter);
pid_t pid = clone(
child_func,
stack_top,
CLONE_VM | CLONE_FILES | CLONE_SIGHAND | SIGCHLD,
NULL
);
if (pid == -1) { perror("clone"); exit(1); }
waitpid(pid, NULL, 0);
/* Because CLONE_VM was set, the child's write is visible here */
printf("[Parent] shared_counter = %d (after child modified it)\n",
shared_counter);
free(stack);
return 0;
}
/*
* Output:
* [Parent] shared_counter = 5 (before clone)
* [Child] shared_counter before = 5
* [Child] shared_counter after = 105
* [Parent] shared_counter = 105 (after child modified it)
*
* Without CLONE_VM, parent would still see 5.
*/
Section B: Thread Groups — CLONE_THREAD, TID, TGID
CLONE_THREAD — Place Child in Parent’s Thread Group
POSIX requires all threads in a process to share the same process ID. Linux achieves this via thread groups. When CLONE_THREAD is set, the child is placed in the same thread group as the parent, meaning getpid() returns the same value in all threads (the TGID).
Requires: CLONE_SIGHAND (which requires CLONE_VM).
Thread Group — TID vs TGID Diagram
| Thread Group — TGID = 2001 (the process ID seen by getpid()) | |||
|
Thread A
TID = 2001
TGID = 2001
Group Leader
|
Thread B
TID = 2002
TGID = 2001
|
Thread C
TID = 2003
TGID = 2001
|
Thread D
TID = 2004
TGID = 2001
|
| All threads: getpid() returns 2001 | gettid() returns individual TID | PPID = 1900 for all | |||
Example 2 — CLONE_THREAD: Shared PID (TGID)
/* clone_thread.c — Show TID vs TGID with CLONE_THREAD */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>
#include <signal.h>
#include <sys/wait.h>
#include <sys/syscall.h>
#include <unistd.h>
#define STACK_SIZE (64 * 1024)
/* gettid() is not in libc — call it directly */
static pid_t gettid_wrapper(void)
{
return (pid_t) syscall(SYS_gettid);
}
static int thread_func(void *arg)
{
printf("[Thread] PID (TGID) via getpid() = %d\n", getpid());
printf("[Thread] TID via gettid() = %d\n", gettid_wrapper());
sleep(1); /* Give parent time to print its info */
return 0;
}
int main(void)
{
char *stack = malloc(STACK_SIZE);
char *stack_top = stack + STACK_SIZE;
printf("[Main] PID (TGID) via getpid() = %d\n", getpid());
printf("[Main] TID via gettid() = %d\n", gettid_wrapper());
pid_t tid = clone(
thread_func,
stack_top,
/* CLONE_THREAD requires CLONE_SIGHAND which requires CLONE_VM */
CLONE_VM | CLONE_SIGHAND | CLONE_THREAD,
NULL
);
if (tid == -1) { perror("clone"); exit(1); }
printf("[Main] clone() returned TID = %d\n", tid);
/*
* Cannot use waitpid() for CLONE_THREAD children directly.
* The thread must be joined via futex or we just sleep here.
*/
sleep(2);
free(stack);
return 0;
}
/*
* Output (example):
* [Main] PID (TGID) via getpid() = 12345
* [Main] TID via gettid() = 12345
* [Main] clone() returned TID = 12346
* [Thread] PID (TGID) via getpid() = 12345 ← SAME as parent!
* [Thread] TID via gettid() = 12346 ← Different
*/
Section C: Namespace Flags — Building Containers
What Are Linux Namespaces?
Namespaces wrap global system resources (PIDs, network, mounts, users) so each namespace has its own private view. Processes in different namespaces cannot see each other’s resources. This is the foundation of Docker, LXC, and other container technologies.
| Flag | Isolates | Container Use |
|---|---|---|
CLONE_NEWPID |
Process IDs | First process in container gets PID 1 |
CLONE_NEWNET |
Network stack | Container gets own IP, interfaces, firewall |
CLONE_NEWNS |
Mount points | Container has its own filesystem view |
CLONE_NEWIPC |
System V IPC (queues, semaphores) | Containers can’t share IPC objects |
CLONE_NEWUTS |
Hostname and NIS domain name | Each container has its own hostname |
CLONE_NEWUSER |
User and group IDs | Unprivileged containers (UID 0 inside ≠ root outside) |
Container Isolation Model
| Linux Kernel | |
|
Host (Default Namespace)
PID namespace: 1..65535
Network: eth0, 192.168.1.1
Hostname: myserver
Mounts: /proc, /sys, /home
|
Container (New Namespace)
PID namespace: 1 = container init
Network: veth0, 172.17.0.2
Hostname: webapp-container
Mounts: container root fs
|
| Isolated via: CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWUTS | CLONE_NEWNS | CLONE_NEWIPC | CLONE_NEWUSER | |
Example 3 — New UTS Namespace (Change Hostname in Child)
/* clone_newuts.c — Child gets its own hostname via CLONE_NEWUTS */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>
#include <signal.h>
#include <sys/wait.h>
#include <unistd.h>
#define STACK_SIZE (64 * 1024)
static int child_func(void *arg)
{
char *new_hostname = (char *) arg;
/* Set a new hostname — only affects this UTS namespace */
if (sethostname(new_hostname, strlen(new_hostname)) == -1) {
perror("sethostname"); return 1;
}
char buf[256];
gethostname(buf, sizeof(buf));
printf("[Child] hostname = '%s'\n", buf);
return 0;
}
int main(void)
{
char *stack = malloc(STACK_SIZE);
char *stack_top = stack + STACK_SIZE;
char host_before[256];
gethostname(host_before, sizeof(host_before));
printf("[Parent] hostname before = '%s'\n", host_before);
/* CLONE_NEWUTS: child gets its own UTS namespace */
/* Requires CAP_SYS_ADMIN (root) */
pid_t pid = clone(child_func, stack_top,
CLONE_NEWUTS | SIGCHLD,
"my-container-host");
if (pid == -1) { perror("clone"); exit(1); }
waitpid(pid, NULL, 0);
/* Parent's hostname is UNCHANGED */
char host_after[256];
gethostname(host_after, sizeof(host_after));
printf("[Parent] hostname after = '%s' (unchanged)\n", host_after);
free(stack);
return 0;
}
/* Run as root: sudo ./clone_newuts
* Output:
* [Parent] hostname before = 'myserver'
* [Child] hostname = 'my-container-host'
* [Parent] hostname after = 'myserver' ← Parent unaffected
*/
Section D: Threading Support Flags — SETTID / CLEARTID
| Flag | When is TID written/cleared? | Why needed? |
|---|---|---|
CLONE_PARENT_SETTID |
Before clone() returns — in parent’s memory at ptid |
Race-free TID capture. Return value of clone() can race with thread exit handler. |
CLONE_CHILD_SETTID |
After clone — in child’s memory at ctid |
Flexibility for other threading implementations. |
CLONE_CHILD_CLEARTID |
When child exits — zeros ctid and wakes futex |
How pthread_join() detects thread termination without polling. |
⚠️ Why CLONE_PARENT_SETTID Solves a Race Condition
/* Without CLONE_PARENT_SETTID — RACE CONDITION */
tid = clone(...);
/* If child exits BEFORE this assignment completes,
the signal handler fires with tid still = 0.
Handler cannot identify the thread. BROKEN! */
/* With CLONE_PARENT_SETTID — SAFE */
/* Kernel writes TID to &new_tid BEFORE clone() returns,
so no matter when the signal fires, new_tid has the right value. */
clone(..., CLONE_PARENT_SETTID, ..., &new_tid, ...);
Section E: Mount Namespaces — CLONE_NEWNS
CLONE_NEWNS gives the child its own copy of the parent’s mount namespace. Calls to mount() and umount() in the child do not affect the parent’s view of the filesystem. This is the basis for container isolation.
