Memory Locking: mlock() & mlockall()

 

Memory Locking: mlock() & mlockall()
Chapter 50.2 | TLPI Linux System Programming Series
4
System Calls
5+
Code Examples
10
Interview Q&A

Why Lock Memory?

Linux uses demand paging — physical RAM is a limited resource, so the kernel can move (“page out”) virtual memory pages to the swap area on disk when RAM is scarce. When a process accesses a page that has been swapped out, a page fault occurs and the kernel must read the page back from disk — this can take milliseconds.

There are two situations where you cannot afford page faults:

  • Performance-critical real-time applications — audio processing, trading systems, robotics control loops. A sudden page fault can cause a noticeable glitch or missed deadline.
  • Security-sensitive applications — password managers, crypto key storage. If a page containing a private key or password is swapped to disk, the data persists on the swap device even after the process exits and the page is “freed”.

Memory locking solves both problems by guaranteeing that specified pages always stay in physical RAM.

Normal Paging vs. Locked Memory

Normal (Unlocked) Page Locked Page (mlock’d)
Virtual Page → Physical Frame A
↓ (under memory pressure)
Virtual Page → Swap Disk
Physical frame reclaimed for another process
↓ (access again)
Page Fault → Read from disk → Slow!
VS
Virtual Page → Physical Frame A
🔒
Page STAYS in RAM always
Kernel cannot page it out
Zero page faults — instant access

RLIMIT_MEMLOCK — The Memory Locking Limit

Memory locking is a privileged operation because a process that locks too much RAM can starve other processes (and the kernel itself) of physical memory. Linux controls this with the RLIMIT_MEMLOCK resource limit.

Condition Kernel < 2.6.9 Kernel ≥ 2.6.9 (current)
Privileged process (CAP_IPC_LOCK) Limited by RLIMIT_MEMLOCK soft limit No limit (RLIMIT_MEMLOCK ignored)
Unprivileged process Cannot lock memory at all Can lock up to RLIMIT_MEMLOCK soft limit
Default soft + hard limit 8 pages = 32,768 bytes on x86-32

The RLIMIT_MEMLOCK limit acts as two separate limits depending on context:

Operation Limit Type
mlock(), mlockall(), mmap(MAP_LOCKED) Per-process limit on lockable virtual address space bytes
shmctl(SHM_LOCK) Per-real-user-ID limit on lockable System V shared memory bytes
#include <stdio.h>
#include <sys/resource.h>

/* Check and print the current RLIMIT_MEMLOCK values */
int main(void)
{
    struct rlimit rl;

    if (getrlimit(RLIMIT_MEMLOCK, &rl) == -1) {
        perror("getrlimit");
        return 1;
    }

    printf("RLIMIT_MEMLOCK soft limit: ");
    if (rl.rlim_cur == RLIM_INFINITY)
        printf("unlimited\n");
    else
        printf("%lu bytes (%lu KB)\n",
               (unsigned long)rl.rlim_cur,
               (unsigned long)rl.rlim_cur / 1024);

    printf("RLIMIT_MEMLOCK hard limit: ");
    if (rl.rlim_max == RLIM_INFINITY)
        printf("unlimited\n");
    else
        printf("%lu bytes (%lu KB)\n",
               (unsigned long)rl.rlim_max,
               (unsigned long)rl.rlim_max / 1024);

    return 0;
}

Run: ulimit -l in shell to see the current RLIMIT_MEMLOCK in KB. Run as root or use sudo setcap cap_ipc_lock+ep ./prog for unlimited locking.

mlock() and munlock() — Lock/Unlock a Region

#include <sys/mman.h>

/* Lock pages in the range [addr, addr+length) into RAM */
int mlock(const void *addr, size_t length);

/* Unlock (allow paging again) for the range */
int munlock(const void *addr, size_t length);

/* Both return 0 on success, -1 on error */

Rules:

  • The range does NOT need to be page-aligned — the kernel rounds down/up automatically.
  • Locking applies to complete pages. Locking 1 byte locks the entire page containing it.
  • Multiple calls to mlock() on overlapping ranges work fine — the kernel keeps a reference count.
  • Memory locks are not inherited by child processes across fork().
  • Memory locks are automatically released when the process exits or calls exec().

Example 1 — Lock a Specific Buffer with mlock()

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

#define BUF_SIZE (64 * 1024)   /* 64 KB */

int main(void)
{
    char *buf;

    /* Allocate buffer on heap */
    buf = malloc(BUF_SIZE);
    if (!buf) { perror("malloc"); return 1; }

    /* Lock it into physical RAM — no page faults guaranteed */
    if (mlock(buf, BUF_SIZE) == -1) {
        perror("mlock");
        /* Common causes:
         * EPERM  — unprivileged process exceeds RLIMIT_MEMLOCK
         * ENOMEM — not enough RAM or limit exceeded
         */
        free(buf);
        return 1;
    }
    printf("Buffer locked into RAM: %p (%d KB)\n",
           buf, BUF_SIZE / 1024);

    /* Use the buffer — guaranteed no page faults */
    memset(buf, 0x55, BUF_SIZE);
    printf("Buffer filled — no page faults occurred.\n");

    /* Unlock before freeing (good practice) */
    if (munlock(buf, BUF_SIZE) == -1)
        perror("munlock");

    free(buf);
    printf("Buffer unlocked and freed.\n");
    return 0;
}

Note: On a normal unprivileged shell, mlock() may fail because the default RLIMIT_MEMLOCK is only 32 KB or 64 KB. Run as root or increase the limit with ulimit -l unlimited (in a privileged shell).

Example 2 — Secure Password Buffer (Security Use Case)

This pattern is used by password managers and crypto libraries (like gpg) to store sensitive data in locked memory so it is never written to the swap device.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

/*
 * Allocate a secure buffer:
 * 1. Use mmap (not malloc) so we control the exact pages
 * 2. Lock the pages so they never go to swap
 * 3. Zero memory before freeing (key hygiene)
 */
typedef struct {
    void  *data;
    size_t size;
} SecureBuf;

SecureBuf secure_alloc(size_t size)
{
    SecureBuf sb = {NULL, 0};
    long page_size = sysconf(_SC_PAGESIZE);
    /* Round up to page boundary */
    size_t alloc_size = ((size + page_size - 1) / page_size) * page_size;

    sb.data = mmap(NULL, alloc_size, PROT_READ | PROT_WRITE,
                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (sb.data == MAP_FAILED) { perror("mmap"); sb.data = NULL; return sb; }

    if (mlock(sb.data, alloc_size) == -1) {
        perror("mlock — running as root may be required");
        munmap(sb.data, alloc_size);
        sb.data = NULL;
        return sb;
    }

    sb.size = alloc_size;
    printf("[secure_alloc] %zu bytes locked in RAM at %p\n",
           alloc_size, sb.data);
    return sb;
}

void secure_free(SecureBuf *sb)
{
    if (!sb || !sb->data) return;
    /* Zero memory BEFORE unlocking — prevents data leak to disk */
    explicit_bzero(sb->data, sb->size);   /* compiler won't optimize away */
    munlock(sb->data, sb->size);
    munmap(sb->data, sb->size);
    printf("[secure_free] Memory zeroed, unlocked, and unmapped.\n");
    sb->data = NULL;
    sb->size = 0;
}

int main(void)
{
    SecureBuf key_buf = secure_alloc(256);
    if (!key_buf.data) return 1;

    /* Store secret key — safely locked, will never swap */
    char *key = (char *)key_buf.data;
    snprintf(key, 256, "super_secret_private_key_12345");
    printf("Key stored securely: [hidden]\n");

    /* ... use key for crypto operations ... */

    /* Zero and free — key never touches disk */
    secure_free(&key_buf);
    return 0;
}

mlockall() and munlockall() — Lock the Entire Process

#include <sys/mman.h>

/* Lock ALL current (and optionally future) pages of this process */
int mlockall(int flags);

/* Unlock all locked pages of this process */
int munlockall(void);

/* Returns 0 on success, -1 on error */

The flags argument is a bitmask:

Flag Meaning
MCL_CURRENT Lock all pages that are currently mapped in the process’s virtual address space. This includes stack, heap, code, data, shared libraries, and any mmap’d regions.
MCL_FUTURE Automatically lock any pages that are mapped in the future — any new malloc(), mmap(), stack growth, etc. will be locked immediately on first access.
MCL_CURRENT|MCL_FUTURE Lock everything now AND lock everything in the future. Used by real-time processes that need guaranteed zero page faults throughout their lifetime.

Example 3 — mlockall() for Real-Time Process

A real-time audio or control application that must process data every few milliseconds cannot afford even a single page fault. The standard pattern:

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sched.h>
#include <unistd.h>

/*
 * Pre-fault the stack to avoid page faults during RT operation.
 * Stack grows on demand; we need to touch all pages we'll use.
 */
#define STACK_PREFAULT_SIZE (8 * 1024 * 1024)  /* 8 MB */

static void prefault_stack(void)
{
    volatile char buf[STACK_PREFAULT_SIZE];
    memset((void *)buf, 0, sizeof(buf));
    /* buf is volatile — compiler won't remove this */
}

int rt_init(void)
{
    /* Step 1: Lock ALL current pages into RAM */
    if (mlockall(MCL_CURRENT | MCL_FUTURE) == -1) {
        perror("mlockall — need root or CAP_IPC_LOCK");
        return -1;
    }
    printf("All process memory locked into RAM.\n");

    /* Step 2: Pre-fault stack pages (stack grows lazily by default) */
    prefault_stack();
    printf("Stack pages pre-faulted.\n");

    /* Step 3 (optional): Set real-time scheduling */
    struct sched_param sp;
    sp.sched_priority = sched_get_priority_max(SCHED_FIFO);
    if (sched_setscheduler(0, SCHED_FIFO, &sp) == -1) {
        perror("sched_setscheduler (ignoring — not critical for demo)");
    } else {
        printf("SCHED_FIFO priority %d set.\n", sp.sched_priority);
    }

    return 0;
}

void rt_cleanup(void)
{
    munlockall();
    printf("All memory unlocked.\n");
}

/* Simulated real-time processing loop */
void rt_process_loop(int iterations)
{
    volatile int x = 0;
    for (int i = 0; i < iterations; i++) {
        /* Simulate audio/control processing */
        x += i * 3;
        /* No page faults will occur here */
    }
    printf("RT loop complete (%d iterations, result=%d)\n",
           iterations, x);
}

int main(void)
{
    if (rt_init() != 0) {
        fprintf(stderr, "RT init failed. Try: sudo ./prog\n");
        return 1;
    }

    rt_process_loop(1000000);
    rt_cleanup();
    return 0;
}

Example 4 — MAP_LOCKED Flag with mmap()

Instead of calling mmap() first and then mlock(), you can lock the region at creation time using the MAP_LOCKED flag.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>

#define SIZE (256 * 1024)  /* 256 KB */

int main(void)
{
    /*
     * MAP_LOCKED: the kernel will attempt to lock the pages
     * into RAM as part of the mmap() call itself.
     * mmap() still succeeds even if locking fails
     * (unlike mlock which returns error separately).
     * Use mincore() afterwards to verify if pages are resident.
     */
    char *mem = mmap(NULL, SIZE,
                     PROT_READ | PROT_WRITE,
                     MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED,
                     -1, 0);
    if (mem == MAP_FAILED) {
        perror("mmap");
        return 1;
    }

    printf("mmap with MAP_LOCKED succeeded: %p\n", mem);
    printf("Pages should be locked immediately.\n");

    /* Verify using mincore() — see if pages are in RAM */
    long   page_size = sysconf(_SC_PAGESIZE);
    size_t num_pages = SIZE / page_size;
    unsigned char *vec = malloc(num_pages);
    if (!vec) { perror("malloc"); munmap(mem, SIZE); return 1; }

    if (mincore(mem, SIZE, vec) == -1) {
        perror("mincore");
    } else {
        int resident = 0;
        for (size_t i = 0; i < num_pages; i++)
            if (vec[i] & 1) resident++;
        printf("Resident pages: %d / %zu (%.0f%%)\n",
               resident, num_pages,
               100.0 * resident / num_pages);
    }
    free(vec);

    /* Touch the memory — now definitely in RAM */
    memset(mem, 0xCC, SIZE);

    munmap(mem, SIZE);
    return 0;
}

Example 5 — Raise RLIMIT_MEMLOCK for Unprivileged Process

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/resource.h>

#define LOCK_BYTES (256 * 1024)  /* Want to lock 256 KB */

int main(void)
{
    struct rlimit rl;

    /* Check current limit */
    if (getrlimit(RLIMIT_MEMLOCK, &rl) == -1) {
        perror("getrlimit"); return 1;
    }
    printf("Current RLIMIT_MEMLOCK: soft=%lu, hard=%lu\n",
           (unsigned long)rl.rlim_cur,
           (unsigned long)rl.rlim_max);

    /* Try to raise soft limit up to hard limit */
    if (rl.rlim_cur < LOCK_BYTES) {
        if (rl.rlim_max == RLIM_INFINITY || rl.rlim_max >= LOCK_BYTES) {
            rl.rlim_cur = LOCK_BYTES;
            if (setrlimit(RLIMIT_MEMLOCK, &rl) == 0)
                printf("Raised soft limit to %d bytes\n", LOCK_BYTES);
            else
                perror("setrlimit");
        } else {
            printf("Hard limit too low (%lu). Need root to raise.\n",
                   (unsigned long)rl.rlim_max);
            return 1;
        }
    }

    /* Now try to lock memory */
    char *buf = malloc(LOCK_BYTES);
    if (!buf) { perror("malloc"); return 1; }

    if (mlock(buf, LOCK_BYTES) == 0)
        printf("Successfully locked %d KB\n", LOCK_BYTES / 1024);
    else
        perror("mlock still failed");

    munlock(buf, LOCK_BYTES);
    free(buf);
    return 0;
}

mlock() vs. mlockall() — Quick Comparison

Feature mlock() mlockall()
Scope Specific address range Entire process address space
Future allocations Not locked (must call mlock again) Locked automatically (with MCL_FUTURE)
Use case Locking specific sensitive buffers (passwords, keys) Real-time processes needing zero page faults
Undo munlock() munlockall()
RAM requirement Only for the locked range All process pages must fit in RAM
fork() inheritance Memory locks are NOT inherited by children

⚠ Important: Memory Locks and Suspend Mode
On laptops and many desktop systems, when the system suspends to disk (hibernate), the entire contents of RAM — including locked pages — are saved to disk regardless of memory locks. Memory locks only prevent normal paging/swapping, not hibernation. If your security model requires truly preventing data from reaching disk, you must also disable hibernation on the system.

Key Points to Remember

  • Memory locking keeps pages in RAM and prevents page faults — critical for real-time and security
  • mlock() locks a specific range; mlockall(MCL_CURRENT|MCL_FUTURE) locks everything
  • Default RLIMIT_MEMLOCK is 8 pages (32 KB) for unprivileged processes
  • Since Linux 2.6.9, privileged processes (CAP_IPC_LOCK) have no lock limit
  • Locks apply to full pages — locking 1 byte locks the entire containing page
  • Memory locks are not inherited across fork()
  • Locks are automatically released on munmap(), process exit, or exec()
  • Always zero sensitive buffers before unlocking/freeing (use explicit_bzero())
  • MAP_LOCKED in mmap() flags achieves the same as calling mlock() after mmap
  • Memory locking does NOT protect against hibernate (suspend-to-disk)

Interview Questions & Answers

Q1. What is the purpose of memory locking in Linux?

Memory locking prevents specific virtual memory pages from being paged out to the swap device. It serves two purposes: (1) Performance — locked pages are always in RAM, so accesses never cause page faults. Real-time applications (audio, robotics, trading) use this for predictable latency. (2) Security — sensitive data (passwords, cryptographic keys) in locked pages is never written to disk. Data in swap space persists even after a process exits and can potentially be recovered by an attacker.

Q2. What is RLIMIT_MEMLOCK and why does it exist?

RLIMIT_MEMLOCK is a per-process resource limit that caps the total bytes of virtual address space a process can lock into RAM. It exists because an unrestricted lock could allow one process to consume all physical RAM with locked pages, starving other processes and the kernel of memory. The limit protects system stability. Since Linux 2.6.9, the limit is ignored for privileged processes (those with CAP_IPC_LOCK), while unprivileged processes can only lock up to the soft limit (default 32 KB on x86-32).

Q3. What is the difference between mlock() and mlockall()?

mlock(addr, length) locks only the pages in the specified address range. It is used when you want to lock specific sensitive buffers (like a password buffer) while leaving the rest of the process pageable. mlockall(flags) locks all pages in the process’s virtual address space. With MCL_CURRENT, it locks everything currently mapped. With MCL_FUTURE, it also locks any pages mapped in the future (new malloc(), stack growth, etc.). mlockall is used by real-time processes that need guaranteed zero page faults throughout execution.

Q4. Are memory locks inherited by child processes after fork()?

No. Memory locks are not inherited across fork(). The child process starts with no memory locks, even if the parent had locked all its pages. Also, memory locks are automatically released when a process calls exec() or terminates. If a child needs locked memory, it must call mlock() or mlockall() independently.

Q5. What is the MCL_FUTURE flag in mlockall() and what risk does it carry?

MCL_FUTURE tells the kernel to automatically lock any new pages that are mapped into the process address space in the future — including new malloc() allocations, stack growth, and new mmap() regions. The risk is that if the process allocates more memory than the system has available physical RAM (and the process’s RLIMIT_MEMLOCK limit), subsequent mmap() or malloc() calls will fail with ENOMEM. A real-time process using MCL_FUTURE should pre-allocate all the memory it will ever need at startup to avoid this failure mode at runtime.

Q6. Why should stack pages be pre-faulted in a real-time process even after mlockall()?

MCL_CURRENT locks pages that are already mapped. Stack pages are allocated lazily — a new stack page is not mapped until the stack actually grows into it. So at the time of mlockall(MCL_CURRENT), future stack frames are not yet mapped and therefore not locked. With MCL_FUTURE set, they will be locked when first accessed — but that first access still causes a minor page fault (a stack extension fault). To truly eliminate all page faults in the RT section, you must pre-fault the stack by calling a function that uses a large local array (touching every page of expected stack depth) before entering the RT loop.

Q7. How does the RLIMIT_MEMLOCK limit differ between mlock() and shmctl(SHM_LOCK)?

For mlock(), mlockall(), and mmap(MAP_LOCKED), RLIMIT_MEMLOCK is a per-process limit on how many bytes of that process’s virtual address space can be locked. For shmctl(SHM_LOCK) (locking System V shared memory segments), RLIMIT_MEMLOCK is a per-user limit — it caps the total bytes of System V shared memory locked by all processes sharing the same real user ID.

Q8. Does memory locking guarantee data never reaches disk?

Not completely. mlock() prevents normal demand paging (swapping), so the locked pages won’t be written to the swap partition during normal operation. However, if the system hibernates (suspend-to-disk / ACPI S4 state), the entire contents of RAM — including locked pages — are saved to the disk image, bypassing memory locks. For absolute security, hibernation must be disabled at the OS level in addition to locking memory.

Q9. What is CAP_IPC_LOCK and how does it relate to memory locking?

CAP_IPC_LOCK is a Linux capability (a fine-grained privilege) that allows a process to lock memory beyond the RLIMIT_MEMLOCK limit. Processes with this capability can call mlock()/mlockall() without being restricted by RLIMIT_MEMLOCK (since Linux 2.6.9, the limit is completely ignored for such processes). A process can be granted this capability without running as full root using: sudo setcap cap_ipc_lock+ep ./myprogram. This is safer than running the entire program as root.

Q10. What is explicit_bzero() and why is it preferred over memset() for zeroing sensitive buffers before unlock?

explicit_bzero(ptr, size) zeroes the memory like memset(ptr, 0, size), but is guaranteed not to be optimized away by the compiler. A smart compiler may recognize that a memset() call on memory that is immediately freed or goes out of scope has no visible effect, and remove it as a dead store optimization. This would leave sensitive key/password bytes in memory. explicit_bzero() (or SecureZeroMemory() on Windows) is defined to always perform the zeroing regardless of what comes after, making it essential for secure memory hygiene.

Leave a Reply

Your email address will not be published. Required fields are marked *