mlock() & mlockall() – Locking Pages in RAM Virtual Memory Operations

mlock() & mlockall() – Locking Pages in RAM

Chapter 50 – Virtual Memory Operations | Topic 2 of 4

← Topic 1: mprotect() | Topic 2: mlock() | Topic 3: mincore() →

The Swap Problem

Linux uses virtual memory with demand paging. When physical RAM runs low, the kernel moves (“swaps”) pages from RAM to disk. When your program accesses a swapped-out page, the CPU causes a page fault, the kernel reads the page back from disk, and execution resumes — but this takes milliseconds, not nanoseconds.

For real-time applications — audio processing, robotics, financial trading engines, industrial control systems — even a single swap-caused latency spike can be catastrophic. The solution is to tell the kernel: “never swap these pages out.” That is exactly what mlock() and mlockall() do.

Without mlock (normal process):

Pages in RAM ✅
(fast access)

⇄

Pages on Swap (disk) 💽
(millisecond penalty)

With mlock() (real-time process):

Locked Pages in RAM 🔒
(always fast access)

⇄

Swap disabled
(never goes here)

Function Signatures

#include <sys/mman.h>

/* Lock a specific region */
int mlock(const void *addr, size_t len);

/* Unlock a specific region */
int munlock(const void *addr, size_t len);

/* Lock/unlock entire address space */
int mlockall(int flags);
int munlockall(void);

/* All return: 0 on success, -1 on error */

mlock() – Lock a Specific Region

mlock(addr, len) locks the pages covering the virtual address range [addr, addr+len) into physical RAM. The kernel will not swap these pages out until they are explicitly unlocked with munlock().

Parameter	Notes
`addr`	Does not need to be page-aligned (kernel rounds down to page boundary)
`len`	Rounded up to page boundary. All pages covering [addr, addr+len) are locked.

⚠ Side effect of mlock(): Locking a page also makes it resident in RAM immediately. If the page was previously swapped out, the kernel reads it back from disk as part of the lock operation. After mlock() returns, the pages are guaranteed to be in physical memory.

Example 1: Basic mlock() and munlock()

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>

int main(void)
{
    long pagesize = sysconf(_SC_PAGESIZE);
    size_t len = 4 * pagesize;  /* 4 pages = 16 KB on typical system */

    /* Allocate memory */
    char *buf = mmap(NULL, len,
                     PROT_READ | PROT_WRITE,
                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (buf == MAP_FAILED) {
        perror("mmap");
        exit(EXIT_FAILURE);
    }

    printf("Allocated %zu bytes (%ld pages)\n", len, len / pagesize);

    /* Lock the entire allocated region into RAM */
    if (mlock(buf, len) == -1) {
        perror("mlock");  /* May fail if not root or RLIMIT_MEMLOCK too small */
        exit(EXIT_FAILURE);
    }

    printf("Memory locked in RAM (will not be swapped out)\n");

    /* Use the memory safely — no page faults possible now */
    memset(buf, 0xAB, len);
    printf("Memory write complete, first byte: 0x%X\n", (unsigned char)buf[0]);

    /* Unlock when done with real-time section */
    if (munlock(buf, len) == -1) {
        perror("munlock");
        exit(EXIT_FAILURE);
    }

    printf("Memory unlocked\n");

    munmap(buf, len);
    return 0;
}

/*
 * Compile: gcc -o mlock_ex mlock_ex.c
 * Run as root: sudo ./mlock_ex
 * Or set RLIMIT_MEMLOCK high enough for your user.
 */

mlockall() – Lock the Entire Address Space

For real-time processes, locking individual regions one by one is tedious and error-prone. mlockall() locks everything — text, data, heap, stack, shared libraries. This is the standard approach for real-time programs (e.g., POSIX RT applications, audio daemons).

#include <sys/mman.h>

int mlockall(int flags);

Flag	Meaning
`MCL_CURRENT`	Lock all pages currently mapped in the address space (text, data, heap, stack, shared libs)
`MCL_FUTURE`	Lock all pages that will be mapped in the future (including future malloc/mmap calls)
`MCL_CURRENT \| MCL_FUTURE`	Lock both current and all future mappings — standard for real-time programs

Example 2: Real-Time Process Setup (Standard Pattern)

This is the canonical setup sequence for a real-time Linux process. This exact pattern is recommended by the Linux real-time documentation and POSIX RT guidelines:

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>

/*
 * Pre-fault the stack to avoid page faults during real-time section.
 * Allocate and touch 'size' bytes on the stack so all stack pages
 * are resident before we enter the real-time loop.
 */
static void prefault_stack(size_t size)
{
    volatile char stack_buf[size];  /* VLA — touches each byte */
    memset((void *)stack_buf, 0, size);
    /* stack_buf is volatile so the compiler can't optimize this away */
}

int main(void)
{
    /*
     * Step 1: Lock all CURRENT and FUTURE mappings.
     * Must be done before any real-time work begins.
     * Requires CAP_IPC_LOCK (root, or set via /etc/security/limits.conf).
     */
    if (mlockall(MCL_CURRENT | MCL_FUTURE) == -1) {
        perror("mlockall");
        fprintf(stderr, "Hint: run as root or increase RLIMIT_MEMLOCK\n");
        exit(EXIT_FAILURE);
    }
    printf("All memory locked (swap disabled for this process)\n");

    /*
     * Step 2: Pre-fault the stack.
     * Even with MCL_FUTURE, stack growth causes page faults as new pages
     * are first accessed. Touch them all now while we can afford the latency.
     */
    prefault_stack(64 * 1024);  /* Touch 64 KB of stack */
    printf("Stack pre-faulted\n");

    /*
     * Step 3: Pre-allocate and touch heap memory you'll need.
     * malloc() + memset forces all pages to be resident NOW.
     */
    size_t buf_size = 1024 * 1024;  /* 1 MB */
    char *rt_buf = malloc(buf_size);
    if (!rt_buf) { perror("malloc"); exit(1); }
    memset(rt_buf, 0, buf_size);    /* Touch every byte = bring into RAM */
    printf("Heap pre-faulted: %zu bytes\n", buf_size);

    /*
     * NOW we are ready for real-time work.
     * All memory is resident. No page faults expected.
     */
    printf("\n=== Entering real-time section ===\n");
    /* ... your time-critical code here ... */
    printf("Real-time work complete\n");

    free(rt_buf);

    /* Unlock everything before exit (optional — happens on exit anyway) */
    munlockall();
    return 0;
}

Example 3: Locking Selective Pages (Partial Lock)

You don’t always need to lock everything. In a large application, you may want to lock only the performance-critical data structures — for example, a frequently-accessed lookup table or a ring buffer used by the real-time audio thread:

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>

#define TABLE_SIZE  (256 * 1024)  /* 256 KB lookup table */
#define RINGBUF_SIZE (64 * 1024)  /* 64 KB ring buffer */

int main(void)
{
    /* Critical data structures */
    char *lookup_table = malloc(TABLE_SIZE);
    char *ring_buf     = malloc(RINGBUF_SIZE);

    if (!lookup_table || !ring_buf) {
        perror("malloc");
        exit(1);
    }

    /* Initialize data */
    memset(lookup_table, 0xFF, TABLE_SIZE);
    memset(ring_buf,     0x00, RINGBUF_SIZE);

    /* Lock ONLY the critical structures, not the whole address space */
    if (mlock(lookup_table, TABLE_SIZE) == -1) {
        perror("mlock lookup_table");
        exit(1);
    }
    printf("Lookup table (%d KB) locked\n", TABLE_SIZE / 1024);

    if (mlock(ring_buf, RINGBUF_SIZE) == -1) {
        perror("mlock ring_buf");
        exit(1);
    }
    printf("Ring buffer (%d KB) locked\n", RINGBUF_SIZE / 1024);

    /* --- Real-time audio/DSP loop would run here --- */

    /* Unlock after real-time work */
    munlock(lookup_table, TABLE_SIZE);
    munlock(ring_buf, RINGBUF_SIZE);

    free(lookup_table);
    free(ring_buf);
    return 0;
}

RLIMIT_MEMLOCK – Memory Lock Limit

On Linux 2.6.9 and later, unprivileged processes can lock memory up to the RLIMIT_MEMLOCK soft resource limit. This prevents a runaway process from locking all RAM and starving other processes.

Example 4: Setting RLIMIT_MEMLOCK Programmatically

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/resource.h>
#include <unistd.h>
#include <errno.h>

int main(void)
{
    struct rlimit rl;

    /* Get current limits */
    if (getrlimit(RLIMIT_MEMLOCK, &rl) == -1) {
        perror("getrlimit");
        exit(1);
    }
    printf("RLIMIT_MEMLOCK: soft=%lu, hard=%lu bytes\n",
           (unsigned long)rl.rlim_cur,
           (unsigned long)rl.rlim_max);

    /* Try to set soft limit to 1 MB (non-root can do this if hard limit allows) */
    rl.rlim_cur = 1024 * 1024;
    if (setrlimit(RLIMIT_MEMLOCK, &rl) == -1) {
        perror("setrlimit");
        /* This fails if you try to raise above hard limit without root */
    } else {
        printf("Set RLIMIT_MEMLOCK soft limit to 1 MB\n");
    }

    /* Now attempt to lock 512 KB — should succeed within new limit */
    long pagesize = sysconf(_SC_PAGESIZE);
    size_t lock_size = 512 * 1024;
    char *buf = malloc(lock_size);
    if (!buf) { perror("malloc"); exit(1); }

    /* Touch pages first to bring them in */
    memset(buf, 0, lock_size);

    if (mlock(buf, lock_size) == -1) {
        if (errno == ENOMEM)
            fprintf(stderr, "mlock failed: exceeded RLIMIT_MEMLOCK\n");
        else
            perror("mlock");
    } else {
        printf("Locked %zu KB successfully\n", lock_size / 1024);
        munlock(buf, lock_size);
    }

    free(buf);
    return 0;
}

System-wide configuration: You can also set memory lock limits in /etc/security/limits.conf:

# Allow 'audio' group to lock up to 256 MB @audio - memlock 262144

This is the standard way to grant real-time audio applications (JACK, PipeWire) the ability to call mlockall() without being root.

Example 5: Full Demo – mlock() + mincore() Visualization

This is inspired by the TLPI memlock.c program. It allocates multiple pages, locks selected ones, then uses mincore() to show which pages are in RAM:

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>

/*
 * Print residency map: '*' = page in RAM, '.' = page not resident.
 * Uses mincore() to query page status.
 */
static void print_residency(char *addr, size_t len, long pagesize)
{
    size_t num_pages = (len + pagesize - 1) / pagesize;
    unsigned char *vec = malloc(num_pages);
    if (!vec) { perror("malloc vec"); return; }

    if (mincore(addr, len, vec) == -1) {
        perror("mincore");
        free(vec);
        return;
    }

    printf("%p: ", addr);
    for (size_t j = 0; j < num_pages; j++)
        printf("%c", (vec[j] & 1) ? '*' : '.');
    printf("\n");

    free(vec);
}

int main(void)
{
    long pagesize = sysconf(_SC_PAGESIZE);
    int num_pages = 16;           /* Allocate 16 pages */
    int lock_step = 4;            /* Every 4 pages... */
    int lock_len  = 2;            /* ...lock 2 consecutive pages */

    size_t total = num_pages * pagesize;

    char *addr = mmap(NULL, total,
                      PROT_READ | PROT_WRITE,
                      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (addr == MAP_FAILED) { perror("mmap"); exit(1); }

    printf("Allocated %d pages (%zu bytes) at %p\n",
           num_pages, total, addr);

    printf("\nBefore mlock (legend: * = resident, . = not resident):\n");
    print_residency(addr, total, pagesize);

    /* Lock pages in groups: lock 2, skip 2, lock 2, skip 2, ... */
    for (int j = 0; j + lock_len * pagesize <= (int)total;
         j += lock_step * pagesize) {
        if (mlock(addr + j, lock_len * pagesize) == -1) {
            perror("mlock");
            exit(1);
        }
    }

    printf("\nAfter mlock (pattern: lock %d, skip %d):\n", lock_len, lock_step - lock_len);
    print_residency(addr, total, pagesize);

    munmap(addr, total);
    return 0;
}

/*
 * Expected output (16 pages, lock 2 every 4):
 * Before mlock: ................
 * After mlock:  **..**..**..**.
 */

Important Locking Semantics

Behavior	Details
Lock count	Locking a page multiple times increments a count; you must `munlock()` the same number of times
fork()	Locked pages are not inherited by child processes — children start unlocked
exec()	All locks are released on `exec()`
munmap()	Unmapping automatically unlocks pages
Privilege	Requires `CAP_IPC_LOCK` or locked amount must be within `RLIMIT_MEMLOCK`

Interview Questions & Answers

Q1. What is the purpose of mlock() and when would you use it?

mlock() pins a region of virtual memory into physical RAM, preventing the kernel from swapping those pages out to disk. You would use it in real-time applications — audio processing, robotics, industrial control systems, financial trading engines — where a page fault causing a context switch to wait for disk I/O would violate timing guarantees.

Q2. What is the difference between mlock() and mlockall()?

mlock(addr, len) locks only a specific region. mlockall(flags) locks the entire address space of the process — text segment, data segment, BSS, heap, stack, and all shared library pages. mlockall(MCL_CURRENT) locks what’s mapped now; MCL_FUTURE ensures future mappings (via malloc(), mmap()) are also locked automatically.

Q3. Why do real-time programs call mlockall() at startup?

Real-time programs call mlockall(MCL_CURRENT | MCL_FUTURE) at startup to ensure that no page faults can occur during the time-critical execution path. A page fault causes the kernel to block the thread while it reads the page from disk — this can take milliseconds, which is unacceptable for real-time deadlines. By locking all memory upfront, the program eliminates this source of jitter. The same program also pre-faults the stack (touches stack pages to trigger faults now, before the RT section) and pre-allocates all heap memory it will need.

Q4. What is RLIMIT_MEMLOCK and why does it exist?

RLIMIT_MEMLOCK is a per-process resource limit that caps how much memory an unprivileged process can lock. Without this limit, a malicious or buggy process could lock all available RAM, causing all other processes to page fault constantly and potentially making the system unusable. On Linux 2.6.9+, processes within their RLIMIT_MEMLOCK limit can call mlock() without root privileges. Root or processes with CAP_IPC_LOCK can lock without restriction.

Q5. Are locked pages inherited by child processes after fork()?

No. Memory locks are not inherited across fork(). The child process starts with no locked pages even if the parent locked everything with mlockall(). If the child also needs real-time guarantees, it must call mlockall() again after forking. Locks are also released when a process calls exec() (the new program starts fresh) or when munmap() is called on a locked region.

Q6. What is “pre-faulting” and why is it needed even after mlockall()?

mlockall(MCL_FUTURE) ensures pages are locked once they are first mapped. However, the first access to a new page still causes a minor page fault (the page is allocated and zeroed on demand). In a real-time program, you want to eliminate even minor page faults during the RT section. Pre-faulting means deliberately touching all pages you will use (via memset()) before entering the real-time loop, so all faults happen during initialization rather than during time-critical execution.

Q7. What happens if mlock() is called multiple times on the same page?

Linux maintains an internal lock count per page. Each successful mlock() on a region increments the count. The page remains locked until the count drops to zero. Therefore, if you lock the same page twice, you must call munlock() twice to fully unlock it. munlockall() resets the entire count, unlocking all pages regardless of how many times they were locked.

Topic Summary

mlock(addr, len) locks a specific region; munlock() releases it.
mlockall(MCL_CURRENT | MCL_FUTURE) locks the entire process address space.
munlockall() unlocks everything.
Locking prevents swapping → eliminates page-fault latency in real-time code.
Requires CAP_IPC_LOCK or locked size within RLIMIT_MEMLOCK.
Locks are NOT inherited by child processes.
Always pre-fault pages after locking to avoid minor page faults in RT section.

← Topic 1: mprotect() Next: mincore() →

embeddedpathashala.com