What is a System Call?
A system call is a controlled entry point into the kernel that allows a user-space process to request services from the operating system. The kernel exposes a set of services through its system call application programming interface (API). These services cover operations such as creating processes, performing file input/output, allocating memory, and communicating between processes.
From a C programming perspective, invoking a system call looks much like calling an ordinary function. Behind the scenes, however, a significant sequence of steps occurs that distinguishes system calls from regular function calls.
Key Concepts
Before examining how a system call executes step by step, it is important to understand three fundamental properties that apply to all system calls on Linux:
A system call changes the processor state from user mode to kernel mode. This transition is necessary so that the CPU can access protected kernel memory regions that are inaccessible from user space. Once the kernel finishes processing the request, the processor switches back to user mode.
The set of system calls available on Linux is fixed. Each system call is identified by a unique number. This numbering scheme is not normally visible to application programs, which refer to system calls by name rather than number. The syscalls(2) manual page lists all available Linux system calls.
Each system call may accept a set of arguments that specify information to be transferred between user space (the process’s virtual address space) and kernel space, and vice versa. This bidirectional data transfer is a core part of every system call interaction.
System Call Execution Steps on x86-32
The following sequence describes how a system call is executed on the x86-32 hardware architecture. This example uses execve() (system call number 11) to illustrate the full path from application code to the kernel service routine and back.
- Application Invokes Wrapper Function
The application program makes a system call by invoking a wrapper function in the C library (glibc). For example, callingexecve()in a C program actually calls a glibc wrapper defined insysdeps/unix/sysv/linux/execve.c. - Arguments Copied into CPU Registers
The wrapper function makes all system call arguments available to the kernel’s trap-handling routine. Arguments are passed to the wrapper via the stack, but the kernel expects them in specific CPU registers. The wrapper function copies all arguments into those registers. - System Call Number Loaded into %eax
Since all system calls enter the kernel through the same trap mechanism, the kernel needs a way to identify which system call is being requested. The wrapper function copies the system call number into the CPU register%eax. Forexecve(), this is number 11 (__NR_execve). - Trap Instruction Switches to Kernel Mode
The wrapper function executes a trap machine instruction (int 0x80), causing the processor to switch from user mode to kernel mode. The CPU then executes the code pointed to by location 0x80 (128 decimal) in the system’s trap vector. Newer x86-32 architectures use the fastersysenterinstruction, supported from Linux kernel 2.6 and glibc 2.3.2 onward. - Kernel’s system_call() Handler Executes
In response to the trap, the kernel invokes itssystem_call()routine (in the assembler filearch/i386/entry.S). This handler performs four sub-steps:(a) Saves register values onto the kernel stack.
(b) Checks the validity of the system call number.
(c) Looks up and invokes the appropriate system call service routine from thesys_call_table— forexecve(), this issys_execve(). The service routine validates arguments, transfers data between user and kernel memory, and returns a result.
(d) Restores register values from the kernel stack and places the return value on the stack. - Wrapper Returns to Caller in User Mode
Control returns to the wrapper function, and the processor simultaneously switches back to user mode. If the service routine returned a negative value indicating an error, the wrapper sets the global variableerrnoto the negated positive error value and returns-1to the caller. On success, it returns a nonnegative value.
On Linux, system call service routines follow a convention where a nonnegative return value indicates success. When an error occurs, the service routine returns a negative number that is the negated value of an errno constant.
The C library wrapper function then:
- Negates the negative return value (making it positive)
- Copies the result into the global
errnovariable - Returns
-1to the calling program as the function result
F_GETOWN operation of the fcntl() system call (described in advanced sections). For such cases, extra care is needed when checking for errors.The execve() system call is number 11 on Linux/x86-32 (__NR_execve = 11). Here is the complete call path:
Application program
execve(path, argv, envp); /* User calls standard function */
|
v
glibc wrapper function (sysdeps/unix/sysv/linux/execve.c)
execve(path, argv, envp) {
/* Copy arguments to registers */
/* Load system call number 11 into %eax */
int 0x80 /* Trap → switch to kernel mode */
return;
}
|
v
Kernel Trap Handler (arch/x86/kernel/entry_32.S)
system_call:
/* Save registers to kernel stack */
call sys_call_table[__NR_execve] /* Index into table using number 11 */
|
v
Service Routine (arch/x86/kernel/process_32.c)
sys_execve() {
/* Validate arguments */
/* Execute the new program */
return error_code; /* 0 on success, negative on error */
}
|
v
Back to wrapper → back to user mode → return value to application
Even for simple system calls, a significant amount of work must be performed during execution. Consider a benchmark using getppid(), which simply returns the process ID of the calling process’s parent.
| Operation | 10 Million Calls | Per Call |
|---|---|---|
getppid() system call |
~2.2 seconds | ~0.3 microseconds |
| C function returning an integer | ~0.11 seconds | ~0.011 microseconds |
getppid(). This benchmark was performed on an x86-32 system running Linux 2.6.25.The strace command can be used to trace the system calls made by a running program. This is invaluable for debugging or for understanding what a program is doing under the hood.
# Trace all system calls made by a program
strace ./myprogram
# Trace a specific system call (e.g., read)
strace -e trace=read ./myprogram
# Attach to a running process by PID
strace -p 1234
# Save output to a file
strace -o strace_output.txt ./myprogram
xyz()” means “calling the wrapper function that invokes the system call xyz().”- A system call is a controlled entry point into the kernel, allowing processes to request OS services.
- System calls cause a processor mode switch from user mode to kernel mode and back.
- The Linux system call set is fixed; each call has a unique numeric identifier.
- Arguments pass from user space to kernel space via CPU registers.
- The
int 0x80trap instruction (or fastersysenter) triggers the kernel’s trap handler. - The kernel uses a
sys_call_tableindexed by call number to dispatch to the correct service routine. - System calls carry an overhead of approximately 0.3 microseconds per call on x86-32 hardware.
- Errors are reported via a negative return value, with the error code stored in
errno.
Continue Learning System Programming
Explore library functions, error handling, and portability in the next sections.
