How Linux System Calls Work – From User Space to Kernel Space

 

Linux System Calls
How Linux System Calls Work – From User Space to Kernel Space
3.1
Chapter Section
6
Execution Steps
~0.3µs
Call Overhead

What is a System Call?

A system call is a controlled entry point into the kernel that allows a user-space process to request services from the operating system. The kernel exposes a set of services through its system call application programming interface (API). These services cover operations such as creating processes, performing file input/output, allocating memory, and communicating between processes.

From a C programming perspective, invoking a system call looks much like calling an ordinary function. Behind the scenes, however, a significant sequence of steps occurs that distinguishes system calls from regular function calls.

Key Concepts

System Call Kernel Mode User Mode int 0x80 sysenter sys_call_table Wrapper Function x86-32 errno execve() getppid() Trap Handler

General Characteristics of System Calls
Mode Switch Fixed Set Unique Numbers Arguments

Before examining how a system call executes step by step, it is important to understand three fundamental properties that apply to all system calls on Linux:

1. Processor Mode Switch

A system call changes the processor state from user mode to kernel mode. This transition is necessary so that the CPU can access protected kernel memory regions that are inaccessible from user space. Once the kernel finishes processing the request, the processor switches back to user mode.

2. Fixed and Numbered Set

The set of system calls available on Linux is fixed. Each system call is identified by a unique number. This numbering scheme is not normally visible to application programs, which refer to system calls by name rather than number. The syscalls(2) manual page lists all available Linux system calls.

3. Arguments Transfer

Each system call may accept a set of arguments that specify information to be transferred between user space (the process’s virtual address space) and kernel space, and vice versa. This bidirectional data transfer is a core part of every system call interaction.

System Call Execution Steps on x86-32

The following sequence describes how a system call is executed on the x86-32 hardware architecture. This example uses execve() (system call number 11) to illustrate the full path from application code to the kernel service routine and back.

  1. Application Invokes Wrapper Function
    The application program makes a system call by invoking a wrapper function in the C library (glibc). For example, calling execve() in a C program actually calls a glibc wrapper defined in sysdeps/unix/sysv/linux/execve.c.
  2. Arguments Copied into CPU Registers
    The wrapper function makes all system call arguments available to the kernel’s trap-handling routine. Arguments are passed to the wrapper via the stack, but the kernel expects them in specific CPU registers. The wrapper function copies all arguments into those registers.
  3. System Call Number Loaded into %eax
    Since all system calls enter the kernel through the same trap mechanism, the kernel needs a way to identify which system call is being requested. The wrapper function copies the system call number into the CPU register %eax. For execve(), this is number 11 (__NR_execve).
  4. Trap Instruction Switches to Kernel Mode
    The wrapper function executes a trap machine instruction (int 0x80), causing the processor to switch from user mode to kernel mode. The CPU then executes the code pointed to by location 0x80 (128 decimal) in the system’s trap vector. Newer x86-32 architectures use the faster sysenter instruction, supported from Linux kernel 2.6 and glibc 2.3.2 onward.
  5. Kernel’s system_call() Handler Executes
    In response to the trap, the kernel invokes its system_call() routine (in the assembler file arch/i386/entry.S). This handler performs four sub-steps:

    (a) Saves register values onto the kernel stack.
    (b) Checks the validity of the system call number.
    (c) Looks up and invokes the appropriate system call service routine from the sys_call_table — for execve(), this is sys_execve(). The service routine validates arguments, transfers data between user and kernel memory, and returns a result.
    (d) Restores register values from the kernel stack and places the return value on the stack.

  6. Wrapper Returns to Caller in User Mode
    Control returns to the wrapper function, and the processor simultaneously switches back to user mode. If the service routine returned a negative value indicating an error, the wrapper sets the global variable errno to the negated positive error value and returns -1 to the caller. On success, it returns a nonnegative value.

Error Return Convention

On Linux, system call service routines follow a convention where a nonnegative return value indicates success. When an error occurs, the service routine returns a negative number that is the negated value of an errno constant.

The C library wrapper function then:

  • Negates the negative return value (making it positive)
  • Copies the result into the global errno variable
  • Returns -1 to the calling program as the function result
Special Case: A few system call service routines can legitimately return negative values on success. One known problematic case is the F_GETOWN operation of the fcntl() system call (described in advanced sections). For such cases, extra care is needed when checking for errors.

Example: execve() System Call Flow

The execve() system call is number 11 on Linux/x86-32 (__NR_execve = 11). Here is the complete call path:

Application program
    execve(path, argv, envp);           /* User calls standard function */
         |
         v
glibc wrapper function  (sysdeps/unix/sysv/linux/execve.c)
    execve(path, argv, envp) {
        /* Copy arguments to registers */
        /* Load system call number 11 into %eax */
        int 0x80                        /* Trap → switch to kernel mode */
        return;
    }
         |
         v
Kernel Trap Handler     (arch/x86/kernel/entry_32.S)
    system_call:
        /* Save registers to kernel stack */
        call sys_call_table[__NR_execve]  /* Index into table using number 11 */
         |
         v
Service Routine         (arch/x86/kernel/process_32.c)
    sys_execve() {
        /* Validate arguments */
        /* Execute the new program */
        return error_code;            /* 0 on success, negative on error */
    }
         |
         v
Back to wrapper → back to user mode → return value to application

System Call Overhead – Benchmark Example

Even for simple system calls, a significant amount of work must be performed during execution. Consider a benchmark using getppid(), which simply returns the process ID of the calling process’s parent.

Operation 10 Million Calls Per Call
getppid() system call ~2.2 seconds ~0.3 microseconds
C function returning an integer ~0.11 seconds ~0.011 microseconds
Key Takeaway: A system call is approximately 20 times slower than a plain C function call. This overhead is small but appreciable — and most system calls have significantly more overhead than getppid(). This benchmark was performed on an x86-32 system running Linux 2.6.25.

Tracing System Calls with strace

The strace command can be used to trace the system calls made by a running program. This is invaluable for debugging or for understanding what a program is doing under the hood.

# Trace all system calls made by a program
strace ./myprogram

# Trace a specific system call (e.g., read)
strace -e trace=read ./myprogram

# Attach to a running process by PID
strace -p 1234

# Save output to a file
strace -o strace_output.txt ./myprogram
Note: When calling the C library wrapper function for a system call, from a C program’s perspective it is synonymous with invoking the system call itself. Throughout system programming documentation, “invoking the system call xyz()” means “calling the wrapper function that invokes the system call xyz().”

Summary of Key Points
Controlled Kernel Entry Mode Switching Fixed Numbered Set Wrapper Functions 20x Overhead vs C Functions
  • A system call is a controlled entry point into the kernel, allowing processes to request OS services.
  • System calls cause a processor mode switch from user mode to kernel mode and back.
  • The Linux system call set is fixed; each call has a unique numeric identifier.
  • Arguments pass from user space to kernel space via CPU registers.
  • The int 0x80 trap instruction (or faster sysenter) triggers the kernel’s trap handler.
  • The kernel uses a sys_call_table indexed by call number to dispatch to the correct service routine.
  • System calls carry an overhead of approximately 0.3 microseconds per call on x86-32 hardware.
  • Errors are reported via a negative return value, with the error code stored in errno.

Continue Learning System Programming

Explore library functions, error handling, and portability in the next sections.

Library Functions & glibc → Error Handling →

Leave a Reply

Your email address will not be published. Required fields are marked *