MAP_NORESERVE & Swap Space Overcommitting linux Memory Mappings

 

MAP_NORESERVE & Swap Space Overcommitting
Chapter 49 | Linux Memory Mappings | EmbeddedPathashala
๐Ÿ’พ Swap Reservation
๐Ÿ“Š overcommit_memory
โš ๏ธ OOM Risk

The Problem: Sparse Large Mappings

Some applications create very large memory mappings but only touch a small fraction of the mapped pages. A scientific simulation might allocate a 1 GB array but only access a few scattered elements โ€” called a sparse array.

If the kernel immediately reserved swap space for every byte of such a mapping, most of that swap space would sit unused. Linux solves this with lazy swap reservation โ€” swap space is reserved only when a page is actually accessed.

Key Concepts

MAP_NORESERVE Lazy Swap Reservation Swap Overcommitting overcommit_memory overcommit_ratio RLIMIT_AS Sparse Array OOM Killer

What is Lazy Swap Reservation?

Eager Reservation (Traditional)
R
R
R
R
R
R

All pages reserved on mmap(). Wastes swap for unused pages.

Lazy Reservation (Linux Default)
โœ“
โœ“
โœ“

โœ“ = accessed (swap reserved), – = not yet accessed (no swap cost)

With lazy reservation, the total virtual memory allocated by all processes can exceed physical RAM + swap. This works as long as not all processes try to access all their mapped pages simultaneously. If they do, memory becomes exhausted and the OOM killer steps in.

The MAP_NORESERVE Flag

You can explicitly request that the kernel not reserve swap space for a mapping by passing MAP_NORESERVE to mmap(). This is the overcommit “opt-in” at the per-mapping level.

/* Create a 1 GB sparse mapping โ€” no swap reservation */
void *big = mmap(NULL, 1024 * 1024 * 1024,
                 PROT_READ | PROT_WRITE,
                 MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE,
                 -1, 0);
/* mmap() succeeds even if 1GB swap is unavailable */

Without MAP_NORESERVE, whether the kernel reserves swap depends on the system-wide /proc/sys/vm/overcommit_memory setting.

Fork behavior: When a child process inherits a mapping across fork(), it inherits the MAP_NORESERVE setting for that mapping.

overcommit_memory Values โ€” Full Reference
Value MAP_NORESERVE = No MAP_NORESERVE = Yes Notes
0 (default) Deny obvious overcommits Allow overcommits Heuristic check: small overcommits pass
1 Always allow overcommits Always allow overcommits No check. Best for sparse workloads
2 (strict) Strict accounting for ALL mappings Total โ‰ค swap + RAM * overcommit_ratio/100

Read the current value:

cat /proc/sys/vm/overcommit_memory
cat /proc/sys/vm/overcommit_ratio    /* default: 50 */

Set the value (as root):

echo 1 > /proc/sys/vm/overcommit_memory    /* allow all */
echo 2 > /proc/sys/vm/overcommit_memory    /* strict mode */
echo 70 > /proc/sys/vm/overcommit_ratio    /* allow 70% of RAM overhead */

Strict Overcommitting Formula (value = 2)

Maximum Allowed Virtual Memory Commit
Limit = [Swap Size] + [RAM Size] ร— (overcommit_ratio / 100)
Example
RAM = 8 GB, Swap = 4 GB, ratio = 50
Limit = 4 GB + 8 GB ร— 0.5 = 8 GB

What types of mappings are counted under strict overcommitting?

  • Private writable mappings (file-backed or anonymous) โ€” cost = mapping size per process
  • Shared anonymous mappings โ€” cost = mapping size (shared among all processes)

What is NOT counted?

  • Read-only private mappings โ€” contents can’t be modified, no swap needed
  • Shared file mappings โ€” the file itself serves as backing store

RLIMIT_AS โ€” Per-Process Address Space Limit

Independent of overcommit settings, a per-process resource limit RLIMIT_AS caps the total virtual address space of a process. Any mmap() or brk() that would push the process beyond this limit returns ENOMEM.

#include <sys/resource.h>

/* Check current RLIMIT_AS */
struct rlimit lim;
getrlimit(RLIMIT_AS, &lim);
printf("RLIMIT_AS: soft=%lu, hard=%lu\n",
       (unsigned long)lim.rlim_cur,
       (unsigned long)lim.rlim_max);

/* Set 512 MB soft limit */
lim.rlim_cur = 512 * 1024 * 1024;
setrlimit(RLIMIT_AS, &lim);

Or check from the shell:

ulimit -v          # virtual memory limit in KB
ulimit -v 524288   # set 512 MB limit

Example 1: Large Sparse Array with MAP_NORESERVE
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>

#define ARRAY_SIZE  (1024L * 1024 * 256)   /* 256M integers = 1GB */
#define STRIDE      (4096 / sizeof(int))    /* one access per page */

int main(void)
{
    /* Allocate 1 GB sparse array โ€” no swap reservation */
    int *arr = mmap(NULL, ARRAY_SIZE * sizeof(int),
                    PROT_READ | PROT_WRITE,
                    MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE,
                    -1, 0);
    if (arr == MAP_FAILED) {
        perror("mmap");
        return 1;
    }
    printf("Mapped 1GB sparse array at %p (no swap reserved)\n", arr);

    /* Only touch every 4096th element โ€” 1 per page */
    long touched = 0;
    for (long i = 0; i < ARRAY_SIZE; i += STRIDE) {
        arr[i] = (int)i;
        touched++;
    }
    printf("Touched %ld pages (only those pages use swap)\n", touched);

    /* Verify a few values */
    printf("arr[0]=%d, arr[%ld]=%ld\n",
           arr[0], STRIDE, (long)arr[STRIDE]);

    munmap(arr, ARRAY_SIZE * sizeof(int));
    return 0;
}
Example 2: Reading overcommit_memory at Runtime
#include <stdio.h>
#include <stdlib.h>

int read_overcommit_setting(void)
{
    FILE *f = fopen("/proc/sys/vm/overcommit_memory", "r");
    if (!f) { perror("open overcommit_memory"); return -1; }
    int val;
    fscanf(f, "%d", &val);
    fclose(f);
    return val;
}

int read_overcommit_ratio(void)
{
    FILE *f = fopen("/proc/sys/vm/overcommit_ratio", "r");
    if (!f) { perror("open overcommit_ratio"); return -1; }
    int val;
    fscanf(f, "%d", &val);
    fclose(f);
    return val;
}

int main(void)
{
    int mode  = read_overcommit_setting();
    int ratio = read_overcommit_ratio();

    printf("overcommit_memory = %d\n", mode);
    printf("overcommit_ratio  = %d%%\n", ratio);

    switch (mode) {
        case 0: printf("Mode: Heuristic (deny obvious overcommits)\n"); break;
        case 1: printf("Mode: Always allow overcommits\n"); break;
        case 2: printf("Mode: Strict (ratio-based accounting)\n"); break;
        default: printf("Mode: Unknown\n"); break;
    }
    return 0;
}
Example 3: Demonstrating mmap() Failure Under Strict Mode
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/mman.h>
#include <sys/resource.h>
#include <errno.h>
#include <string.h>

int main(void)
{
    /* Artificially limit address space to 64MB */
    struct rlimit lim = { .rlim_cur = 64 * 1024 * 1024,
                          .rlim_max = RLIM_INFINITY };
    if (setrlimit(RLIMIT_AS, &lim) == -1) {
        perror("setrlimit");
        return 1;
    }

    /* Try to map 128MB โ€” should fail */
    void *p = mmap(NULL, 128 * 1024 * 1024,
                   PROT_READ | PROT_WRITE,
                   MAP_PRIVATE | MAP_ANONYMOUS,
                   -1, 0);
    if (p == MAP_FAILED) {
        printf("mmap(128MB) failed as expected: %s\n", strerror(errno));
    } else {
        printf("mmap(128MB) succeeded unexpectedly at %p\n", p);
        munmap(p, 128 * 1024 * 1024);
    }

    /* Try smaller โ€” should succeed */
    p = mmap(NULL, 8 * 1024 * 1024,
             PROT_READ | PROT_WRITE,
             MAP_PRIVATE | MAP_ANONYMOUS,
             -1, 0);
    if (p == MAP_FAILED) {
        printf("mmap(8MB) failed: %s\n", strerror(errno));
    } else {
        printf("mmap(8MB) succeeded at %p\n", p);
        munmap(p, 8 * 1024 * 1024);
    }
    return 0;
}

Interview Questions & Answers

Q1. What is lazy swap reservation and why does Linux use it?

A: Lazy swap reservation means the kernel reserves swap space for a mapped page only when that page is first accessed (on page fault), not when mmap() is called. Linux uses it to allow sparse large mappings without wasting swap space, and to allow total virtual allocations to exceed physical RAM + swap.

Q2. What is swap overcommitting?

A: It means the total virtual memory promised to all processes exceeds the total physical RAM plus swap available. This works as long as not all processes need their full allocation simultaneously. If they do, the OOM killer terminates processes.

Q3. What does MAP_NORESERVE do?

A: MAP_NORESERVE tells the kernel not to reserve swap space at mmap() time for this mapping. Swap will only be reserved as pages are actually accessed. Under overcommit_memory=0, specifying MAP_NORESERVE allows the overcommit that would otherwise be denied.

Q4. What are the three values of overcommit_memory?

A: 0 = heuristic mode (deny obvious overcommits, but some are allowed). 1 = always allow all overcommits. 2 = strict mode โ€” total allocations must stay within [swap + RAM * ratio/100].

Q5. What types of mappings are NOT subject to strict overcommit accounting?

A: Read-only private mappings (no swap needed since contents can’t change) and shared file mappings (the file itself is the backing store).

Q6. What is RLIMIT_AS and how is it different from overcommit limits?

A: RLIMIT_AS is a per-process resource limit on total virtual address space. Overcommit limits are system-wide and concern swap/RAM. RLIMIT_AS enforces per-process VA space limits independent of physical memory availability.

Q7. Does a child process inherit MAP_NORESERVE from its parent?

A: Yes. When a child is created via fork(), it inherits the MAP_NORESERVE setting of each mapping.

Q8. What is overcommit_ratio and what is its default?

A: overcommit_ratio is an integer percentage stored in /proc/sys/vm/overcommit_ratio. It defines what fraction of RAM can be overcommitted under strict mode (overcommit_memory=2). The default is 50, meaning up to 50% of RAM size can be over-allocated beyond swap.

Chapter 49 โ€” Memory Mappings Series

โ† mremap() Next: OOM Killer โ†’ MAP_FIXED

Leave a Reply

Your email address will not be published. Required fields are marked *