What is clone()?
clone() is a Linux-specific system call that creates a new process. Unlike fork() which always copies the parent’s memory, signals, and file descriptors, clone() lets you choose precisely which resources are shared and which are copied.
It is the real primitive: inside the kernel, fork(), vfork(), and clone() all call the same kernel function do_fork() — they just pass different flags. The threading libraries (NPTL, LinuxThreads) use clone() directly to create threads.
fork() vs vfork() vs clone() — At a Glance
| System Call | Memory | Starts Executing | Stack | Sharing Control |
|---|---|---|---|---|
fork() |
Copy-on-Write copy | After the fork() call | Copy of parent’s stack | No control — fixed behaviour |
vfork() |
Shares parent memory | After the vfork() call | Shares parent’s stack! | No control — fixed behaviour |
clone() |
Caller decides via flags | At a new function (func) | Caller provides separate stack | Full control via flags |
clone() Prototype
#define _GNU_SOURCE
#include <sched.h>
int clone(
int (*func)(void *), /* Child starts executing here */
void *child_stack, /* Top of stack for child (grows downward) */
int flags, /* Sharing flags + termination signal */
void *func_arg, /* Argument passed to func */
/* Optional: */
pid_t *ptid, /* Store child TID here (before fork) */
struct user_desc *tls, /* Thread-local storage descriptor */
pid_t *ctid /* Store child TID here; cleared on exit */
);
/* Returns: child PID on success in parent, -1 on error */
Parameters Explained
| Parameter | Role |
|---|---|
func |
Function where the child process begins execution. Returns the child’s exit status. |
child_stack |
Pointer to the top of a memory block to use as the child’s stack. Stack grows downward on x86, so pass the high end of a malloc’d block. |
flags |
Bitmask of CLONE_* flags (what to share) ORed with the child’s termination signal (lower byte). E.g. CLONE_VM | SIGCHLD. |
func_arg |
Argument passed to func. Cast to/from void* to pass any type. |
ptid |
If CLONE_PARENT_SETTID is set: kernel writes child TID here before returning (race-free thread ID capture). |
tls |
Thread-local storage descriptor. Used by NPTL for per-thread data. |
ctid |
If CLONE_CHILD_CLEARTID is set: kernel zeroes this location and wakes futex waiters when child exits — how pthread_join() works. |
Inside the Kernel — How clone() Works
User callsclone(func, stack, flags, arg) |
→ | glibc wrapper sets up registers, calls sys_clone() |
→ | Kerneldo_fork()creates new KSE |
→ | Child starts executing func(arg) |
Child Stack Setup — Why Pass the Top?
On x86 and most architectures, the stack grows downward in memory. So you allocate a block and pass a pointer to its high address end:
| stackTop = stack + STACK_SIZE ← pass this to clone() |
| stack[65535] |
| stack[65534] ← stack grows downward as child runs |
| … |
| stack[0] ← malloc’d block starts here |
| stack = malloc(STACK_SIZE) ← low address |
Example 1 — Basic clone() Usage
A minimal example that creates a child process using clone(). The child prints a message and exits; the parent waits for it.
/* basic_clone.c — Minimal clone() example */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>
#include <signal.h>
#include <sys/wait.h>
#define STACK_SIZE (64 * 1024) /* 64 KB stack for child */
/* This function runs in the child process */
static int child_func(void *arg)
{
char *message = (char *) arg;
printf("[Child] PID=%d, message='%s'\n", getpid(), message);
return 0; /* exit status of child */
}
int main(void)
{
char *stack;
char *stack_top;
pid_t child_pid;
/* Allocate stack for the child */
stack = malloc(STACK_SIZE);
if (!stack) { perror("malloc"); exit(1); }
/* Stack grows downward: pass the HIGH end */
stack_top = stack + STACK_SIZE;
printf("[Parent] PID=%d, creating child with clone()...\n", getpid());
/* Create the child:
* - child_func: where child starts
* - stack_top: top of child's stack
* - SIGCHLD: signal sent to parent when child exits
* - "hello": argument to child_func
*/
child_pid = clone(child_func, stack_top, SIGCHLD, "hello from clone");
if (child_pid == -1) { perror("clone"); exit(1); }
printf("[Parent] Child PID = %d\n", child_pid);
/* Wait for child to finish */
if (waitpid(child_pid, NULL, 0) == -1) { perror("waitpid"); exit(1); }
printf("[Parent] Child has terminated.\n");
free(stack);
return 0;
}
/* Compile: gcc -o basic_clone basic_clone.c
Run: ./basic_clone */
Example 2 — Sharing File Descriptors with CLONE_FILES
This example shows the difference between cloning with and without CLONE_FILES. When CLONE_FILES is set, the child and parent share the same file descriptor table — closing an fd in the child also closes it in the parent.
/* clone_files.c — Demonstrate CLONE_FILES sharing */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sched.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>
#include <sys/wait.h>
#include <errno.h>
#define STACK_SIZE (64 * 1024)
/* Receive the fd number as argument; close it and exit */
static int child_func(void *arg)
{
int fd = *((int *) arg);
printf("[Child] closing fd %d\n", fd);
close(fd);
return 0;
}
int run_test(int use_clone_files)
{
char *stack = malloc(STACK_SIZE);
char *stack_top = stack + STACK_SIZE;
int fd, flags;
pid_t child_pid;
/* Open a file — child will close this */
fd = open("/dev/null", O_RDWR);
if (fd == -1) { perror("open"); return -1; }
/* Choose whether to share fd table */
flags = SIGCHLD;
if (use_clone_files)
flags |= CLONE_FILES;
child_pid = clone(child_func, stack_top, flags, &fd);
if (child_pid == -1) { perror("clone"); return -1; }
waitpid(child_pid, NULL, 0);
/* Try writing to the fd — did child's close() affect us? */
ssize_t n = write(fd, "x", 1);
if (n == -1 && errno == EBADF)
printf("[Parent] fd %d CLOSED — child's close() affected parent "
"(CLONE_FILES=%s)\n", fd, use_clone_files ? "ON" : "OFF");
else
printf("[Parent] fd %d OPEN — child's close() did NOT affect parent "
"(CLONE_FILES=%s)\n", fd, use_clone_files ? "ON" : "OFF");
close(fd); /* close even if already closed (EBADF is ok) */
free(stack);
return 0;
}
int main(void)
{
printf("=== Without CLONE_FILES ===\n");
run_test(0);
printf("\n=== With CLONE_FILES ===\n");
run_test(1);
return 0;
}
/* Expected output:
[Parent] fd 3 OPEN — child's close() did NOT affect parent (CLONE_FILES=OFF)
[Parent] fd 3 CLOSED — child's close() affected parent (CLONE_FILES=ON) */
Example 3 — Custom Termination Signal with clone()
The low byte of the flags argument specifies which signal the parent receives when the child terminates. If it is not SIGCHLD, the parent must use __WCLONE in waitpid().
/* clone_signal.c — Use SIGUSR1 as child termination signal */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>
#include <signal.h>
#include <sys/wait.h>
#include <unistd.h>
#define STACK_SIZE (64 * 1024)
#define CHILD_SIG SIGUSR1 /* Non-standard termination signal */
static int child_func(void *arg)
{
printf("[Child] running, PID=%d\n", getpid());
sleep(1);
printf("[Child] exiting\n");
return 42; /* exit code */
}
int main(void)
{
char *stack = malloc(STACK_SIZE);
char *stack_top = stack + STACK_SIZE;
pid_t child_pid;
int status;
/* Ignore CHILD_SIG so it doesn't terminate the parent */
if (signal(CHILD_SIG, SIG_IGN) == SIG_ERR) {
perror("signal"); exit(1);
}
/* Lower byte of flags = termination signal (SIGUSR1 here) */
child_pid = clone(child_func, stack_top, CHILD_SIG, NULL);
if (child_pid == -1) { perror("clone"); exit(1); }
printf("[Parent] waiting for child %d (using __WCLONE)...\n", child_pid);
/*
* __WCLONE: wait for children that deliver a signal != SIGCHLD.
* This is required when the termination signal is not SIGCHLD.
*/
if (waitpid(-1, &status, __WCLONE) == -1) {
perror("waitpid"); exit(1);
}
if (WIFEXITED(status))
printf("[Parent] child exited with status %d\n", WEXITSTATUS(status));
free(stack);
return 0;
}
/* Compile: gcc -o clone_signal clone_signal.c
Run: ./clone_signal */
Kernel Scheduling Entities (KSE) — Threads vs Processes
A useful way to think about Linux: both threads and processes are Kernel Scheduling Entities (KSEs). They differ only in how many attributes they share with other KSEs.
| KSE Type | Shares with siblings | Created by |
|---|---|---|
| POSIX Process (fork) | Almost nothing — independent memory, fds, signals | fork() → clone(SIGCHLD) |
| POSIX Thread (pthread) | Memory, file descriptors, signal dispositions, process ID | pthread_create() → clone(CLONE_VM | CLONE_FILES | …) |
| vfork() child | Shares memory temporarily (until exec/exit) | vfork() → clone(CLONE_VM | CLONE_VFORK | SIGCHLD) |
NPTL vs LinuxThreads — clone() Flag Differences
| Threading Library | clone() Flags Used | Threads share PID? |
|---|---|---|
| LinuxThreads (old) | CLONE_VM | CLONE_FILES | CLONE_FS | CLONE_SIGHAND | ❌ No — each thread has a unique PID |
| NPTL (modern) | + CLONE_THREAD | CLONE_SETTLS | CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | CLONE_SYSVSEM | ✅ Yes — all threads share the TGID |
/* How NPTL uses clone() internally (simplified) */
/* When you call pthread_create(), NPTL does roughly this: */
pid_t new_tid; /* thread ID will be written here */
pid_t ctid_loc; /* cleared when thread exits — pthread_join uses this */
clone(
thread_start_fn, /* pthread start routine wrapper */
new_stack_top,
CLONE_VM | /* share memory */
CLONE_FILES | /* share file descriptors */
CLONE_FS | /* share filesystem attributes */
CLONE_SIGHAND | /* share signal dispositions */
CLONE_THREAD | /* place in same thread group (share PID) */
CLONE_SETTLS | /* set up thread-local storage */
CLONE_PARENT_SETTID| /* write TID to &new_tid before return */
CLONE_CHILD_CLEARTID|/* clear &ctid_loc on exit → wakes pthread_join */
CLONE_SYSVSEM, /* share SysV semaphore undo values */
thread_arg,
&new_tid, /* ptid */
tls_descriptor, /* tls */
&ctid_loc /* ctid */
);
⚠️ When to Use clone() Directly
In application code: almost never. clone() is not portable and requires careful stack management. Use fork() for processes and pthread_create() for threads. Use clone() only when writing a threading library or a container runtime. Understanding it is still valuable for interview preparation and understanding what the kernel actually does.
