The Problem: Sparse Large Mappings
Some applications create very large memory mappings but only touch a small fraction of the mapped pages. A scientific simulation might allocate a 1 GB array but only access a few scattered elements โ called a sparse array.
If the kernel immediately reserved swap space for every byte of such a mapping, most of that swap space would sit unused. Linux solves this with lazy swap reservation โ swap space is reserved only when a page is actually accessed.
With lazy reservation, the total virtual memory allocated by all processes can exceed physical RAM + swap. This works as long as not all processes try to access all their mapped pages simultaneously. If they do, memory becomes exhausted and the OOM killer steps in.
You can explicitly request that the kernel not reserve swap space for a mapping by passing MAP_NORESERVE to mmap(). This is the overcommit “opt-in” at the per-mapping level.
/* Create a 1 GB sparse mapping โ no swap reservation */
void *big = mmap(NULL, 1024 * 1024 * 1024,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE,
-1, 0);
/* mmap() succeeds even if 1GB swap is unavailable */
Without MAP_NORESERVE, whether the kernel reserves swap depends on the system-wide /proc/sys/vm/overcommit_memory setting.
Fork behavior: When a child process inherits a mapping across fork(), it inherits the MAP_NORESERVE setting for that mapping.
| Value | MAP_NORESERVE = No | MAP_NORESERVE = Yes | Notes |
|---|---|---|---|
| 0 (default) | Deny obvious overcommits | Allow overcommits | Heuristic check: small overcommits pass |
| 1 | Always allow overcommits | Always allow overcommits | No check. Best for sparse workloads |
| 2 (strict) | Strict accounting for ALL mappings | Total โค swap + RAM * overcommit_ratio/100 | |
Read the current value:
cat /proc/sys/vm/overcommit_memory
cat /proc/sys/vm/overcommit_ratio /* default: 50 */
Set the value (as root):
echo 1 > /proc/sys/vm/overcommit_memory /* allow all */
echo 2 > /proc/sys/vm/overcommit_memory /* strict mode */
echo 70 > /proc/sys/vm/overcommit_ratio /* allow 70% of RAM overhead */
What types of mappings are counted under strict overcommitting?
- Private writable mappings (file-backed or anonymous) โ cost = mapping size per process
- Shared anonymous mappings โ cost = mapping size (shared among all processes)
What is NOT counted?
- Read-only private mappings โ contents can’t be modified, no swap needed
- Shared file mappings โ the file itself serves as backing store
Independent of overcommit settings, a per-process resource limit RLIMIT_AS caps the total virtual address space of a process. Any mmap() or brk() that would push the process beyond this limit returns ENOMEM.
#include <sys/resource.h>
/* Check current RLIMIT_AS */
struct rlimit lim;
getrlimit(RLIMIT_AS, &lim);
printf("RLIMIT_AS: soft=%lu, hard=%lu\n",
(unsigned long)lim.rlim_cur,
(unsigned long)lim.rlim_max);
/* Set 512 MB soft limit */
lim.rlim_cur = 512 * 1024 * 1024;
setrlimit(RLIMIT_AS, &lim);
Or check from the shell:
ulimit -v # virtual memory limit in KB
ulimit -v 524288 # set 512 MB limit
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>
#define ARRAY_SIZE (1024L * 1024 * 256) /* 256M integers = 1GB */
#define STRIDE (4096 / sizeof(int)) /* one access per page */
int main(void)
{
/* Allocate 1 GB sparse array โ no swap reservation */
int *arr = mmap(NULL, ARRAY_SIZE * sizeof(int),
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE,
-1, 0);
if (arr == MAP_FAILED) {
perror("mmap");
return 1;
}
printf("Mapped 1GB sparse array at %p (no swap reserved)\n", arr);
/* Only touch every 4096th element โ 1 per page */
long touched = 0;
for (long i = 0; i < ARRAY_SIZE; i += STRIDE) {
arr[i] = (int)i;
touched++;
}
printf("Touched %ld pages (only those pages use swap)\n", touched);
/* Verify a few values */
printf("arr[0]=%d, arr[%ld]=%ld\n",
arr[0], STRIDE, (long)arr[STRIDE]);
munmap(arr, ARRAY_SIZE * sizeof(int));
return 0;
}
#include <stdio.h>
#include <stdlib.h>
int read_overcommit_setting(void)
{
FILE *f = fopen("/proc/sys/vm/overcommit_memory", "r");
if (!f) { perror("open overcommit_memory"); return -1; }
int val;
fscanf(f, "%d", &val);
fclose(f);
return val;
}
int read_overcommit_ratio(void)
{
FILE *f = fopen("/proc/sys/vm/overcommit_ratio", "r");
if (!f) { perror("open overcommit_ratio"); return -1; }
int val;
fscanf(f, "%d", &val);
fclose(f);
return val;
}
int main(void)
{
int mode = read_overcommit_setting();
int ratio = read_overcommit_ratio();
printf("overcommit_memory = %d\n", mode);
printf("overcommit_ratio = %d%%\n", ratio);
switch (mode) {
case 0: printf("Mode: Heuristic (deny obvious overcommits)\n"); break;
case 1: printf("Mode: Always allow overcommits\n"); break;
case 2: printf("Mode: Strict (ratio-based accounting)\n"); break;
default: printf("Mode: Unknown\n"); break;
}
return 0;
}
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/mman.h>
#include <sys/resource.h>
#include <errno.h>
#include <string.h>
int main(void)
{
/* Artificially limit address space to 64MB */
struct rlimit lim = { .rlim_cur = 64 * 1024 * 1024,
.rlim_max = RLIM_INFINITY };
if (setrlimit(RLIMIT_AS, &lim) == -1) {
perror("setrlimit");
return 1;
}
/* Try to map 128MB โ should fail */
void *p = mmap(NULL, 128 * 1024 * 1024,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1, 0);
if (p == MAP_FAILED) {
printf("mmap(128MB) failed as expected: %s\n", strerror(errno));
} else {
printf("mmap(128MB) succeeded unexpectedly at %p\n", p);
munmap(p, 128 * 1024 * 1024);
}
/* Try smaller โ should succeed */
p = mmap(NULL, 8 * 1024 * 1024,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1, 0);
if (p == MAP_FAILED) {
printf("mmap(8MB) failed: %s\n", strerror(errno));
} else {
printf("mmap(8MB) succeeded at %p\n", p);
munmap(p, 8 * 1024 * 1024);
}
return 0;
}
Q1. What is lazy swap reservation and why does Linux use it?
A: Lazy swap reservation means the kernel reserves swap space for a mapped page only when that page is first accessed (on page fault), not when mmap() is called. Linux uses it to allow sparse large mappings without wasting swap space, and to allow total virtual allocations to exceed physical RAM + swap.
Q2. What is swap overcommitting?
A: It means the total virtual memory promised to all processes exceeds the total physical RAM plus swap available. This works as long as not all processes need their full allocation simultaneously. If they do, the OOM killer terminates processes.
Q3. What does MAP_NORESERVE do?
A: MAP_NORESERVE tells the kernel not to reserve swap space at mmap() time for this mapping. Swap will only be reserved as pages are actually accessed. Under overcommit_memory=0, specifying MAP_NORESERVE allows the overcommit that would otherwise be denied.
Q4. What are the three values of overcommit_memory?
A: 0 = heuristic mode (deny obvious overcommits, but some are allowed). 1 = always allow all overcommits. 2 = strict mode โ total allocations must stay within [swap + RAM * ratio/100].
Q5. What types of mappings are NOT subject to strict overcommit accounting?
A: Read-only private mappings (no swap needed since contents can’t change) and shared file mappings (the file itself is the backing store).
Q6. What is RLIMIT_AS and how is it different from overcommit limits?
A: RLIMIT_AS is a per-process resource limit on total virtual address space. Overcommit limits are system-wide and concern swap/RAM. RLIMIT_AS enforces per-process VA space limits independent of physical memory availability.
Q7. Does a child process inherit MAP_NORESERVE from its parent?
A: Yes. When a child is created via fork(), it inherits the MAP_NORESERVE setting of each mapping.
Q8. What is overcommit_ratio and what is its default?
A: overcommit_ratio is an integer percentage stored in /proc/sys/vm/overcommit_ratio. It defines what fraction of RAM can be overcommitted under strict mode (overcommit_memory=2). The default is 50, meaning up to 50% of RAM size can be over-allocated beyond swap.
