Advanced mmap
Advanced
TLPI Ch49
TI / ST / Qualcomm
This final part of the Chapter 49 series covers the more advanced operations you can perform on memory mappings: changing protection after creation (mprotect()), locking pages in RAM (mlock()), controlling placement with MAP_FIXED, using madvise() for performance hints, inspecting mappings via /proc/PID/maps, and a bridge to POSIX shared memory (Chapter 54).
mprotect() changes the memory protection of an existing mapping (or any pages in the virtual address space). You can tighten or loosen the PROT flags at any time without remapping.
#include <sys/mman.h>
int mprotect(void *addr, size_t length, int prot);
/* Returns 0 on success, -1 on error */
/* addr must be page-aligned */
Common patterns:
/*
* mprotect_demo.c β Change protection of a mapping at runtime
* Compile: gcc -o mprotect_demo mprotect_demo.c
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <setjmp.h>
#include <unistd.h>
#include <sys/mman.h>
static sigjmp_buf jmpbuf;
void segv_handler(int sig)
{
printf("Caught SIGSEGV (signal %d) β write blocked by mprotect!\n", sig);
siglongjmp(jmpbuf, 1);
}
int main(void)
{
char *buf;
size_t size = 4096;
/* Allocate anonymous page, start with read+write */
buf = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (buf == MAP_FAILED) { perror("mmap"); exit(1); }
/* Write works fine */
strcpy(buf, "Hello mprotect");
printf("Written: %s\n", buf);
/* Make the page read-only */
if (mprotect(buf, size, PROT_READ) == -1) {
perror("mprotect");
exit(1);
}
printf("Protection changed to PROT_READ only\n");
/* Read still works */
printf("Read: %s\n", buf);
/* Attempt write β will SIGSEGV */
signal(SIGSEGV, segv_handler);
if (sigsetjmp(jmpbuf, 1) == 0) {
buf[0] = 'X'; /* This will trap */
}
/* Restore write permission */
mprotect(buf, size, PROT_READ | PROT_WRITE);
buf[0] = 'Y'; /* Works again */
printf("After restore write: %c%s\n", buf[0], buf + 1);
munmap(buf, size);
return 0;
}
PROT_READ|PROT_WRITE, write compiled machine code, then switch to PROT_READ|PROT_EXEC before executing. Never keep PROT_WRITE|PROT_EXEC simultaneously (W^X policy β required by security hardening)./* JIT pattern: Write then Execute */
void *code_page = mmap(NULL, 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
/* ... emit machine code bytes into code_page ... */
/* Switch to executable β remove write permission (W^X) */
mprotect(code_page, 4096, PROT_READ | PROT_EXEC);
/* Cast to function pointer and call */
typedef int (*fn_t)(void);
fn_t fn = (fn_t)code_page;
int result = fn();
A guard page is a page with PROT_NONE placed adjacent to a buffer. Any overflow into the guard page causes an immediate SIGSEGV, which is far better than silent memory corruption.
| Guard Page (PROT_NONE) 4 KB |
Buffer (PROT_READ|WRITE) N Γ 4 KB |
Guard Page (PROT_NONE) 4 KB |
| overflow β SIGSEGV | overflow β SIGSEGV |
/*
* guard_page.c β Buffer sandwiched between two guard pages
* Compile: gcc -o guard_page guard_page.c
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#define BUFFER_PAGES 2 /* 2 pages = 8 KB buffer */
int main(void)
{
long page_size;
char *region;
char *buffer;
size_t total;
page_size = sysconf(_SC_PAGESIZE);
total = (BUFFER_PAGES + 2) * page_size; /* guard + buf + guard */
/* Allocate region with no access initially */
region = mmap(NULL, total, PROT_NONE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (region == MAP_FAILED) { perror("mmap"); exit(1); }
/* Make only the buffer pages accessible */
buffer = region + page_size; /* Skip first guard page */
if (mprotect(buffer, BUFFER_PAGES * page_size,
PROT_READ | PROT_WRITE) == -1) {
perror("mprotect"); exit(1);
}
printf("Guard page at: %p (PROT_NONE)\n", (void*)region);
printf("Buffer at: %p (PROT_READ|WRITE, %ld bytes)\n",
(void*)buffer, (long)(BUFFER_PAGES * page_size));
printf("Guard page at: %p (PROT_NONE)\n",
(void*)(buffer + BUFFER_PAGES * page_size));
/* Safe access within buffer */
memset(buffer, 0xCC, BUFFER_PAGES * page_size);
printf("Buffer filled safely\n");
/* Overflow into guard page would cause SIGSEGV here */
/* buffer[BUFFER_PAGES * page_size] = 0; <-- would crash */
munmap(region, total);
return 0;
}
mlock() prevents the pages of a mapping from being swapped out to disk. This is critical for real-time processes (to avoid unpredictable latency from page faults) and for security (to prevent sensitive data from appearing in swap space).
#include <sys/mman.h>
int mlock(const void *addr, size_t length); /* Lock specific range */
int munlock(const void *addr, size_t length); /* Unlock specific range */
int mlockall(int flags); /* Lock entire process address space */
int munlockall(void); /* Unlock entire address space */
| mlockall() Flag | Meaning |
|---|---|
MCL_CURRENT |
Lock all pages currently mapped in the process |
MCL_FUTURE |
Lock all pages that will be mapped in the future |
MCL_CURRENT|MCL_FUTURE |
Lock both current and future mappings |
/*
* mlock_demo.c β Lock sensitive data in RAM
* Compile: gcc -o mlock_demo mlock_demo.c
* May need: sudo setcap cap_ipc_lock+ep mlock_demo
* or: run as root
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#define KEY_SIZE 32 /* 256-bit key */
int main(void)
{
char *secret;
long page_size = sysconf(_SC_PAGESIZE);
/* Allocate one page for sensitive data */
secret = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (secret == MAP_FAILED) { perror("mmap"); exit(1); }
/* Lock this page in RAM β never swap to disk */
if (mlock(secret, page_size) == -1) {
perror("mlock (need CAP_IPC_LOCK or root)");
/* Non-fatal: continue without lock in demo */
} else {
printf("Sensitive page locked in RAM\n");
}
/* Store the "secret" */
memcpy(secret, "SUPER_SECRET_KEY_12345678901234", KEY_SIZE);
printf("Secret stored (won't appear in swap)\n");
/* When done, zero the memory before freeing */
memset(secret, 0, page_size); /* Scrub the data */
munlock(secret, page_size);
munmap(secret, page_size);
printf("Secret zeroed and memory released\n");
return 0;
}
mlock() requires CAP_IPC_LOCK capability or root on Linux. Non-privileged processes have a limit on how much memory they can lock (see RLIMIT_MEMLOCK).madvise() lets the application tell the kernel about its expected access pattern for a mapped region. The kernel is free to ignore the hints, but usually uses them to optimise page-in/page-out behaviour.
#include <sys/mman.h>
int madvise(void *addr, size_t length, int advice);
| Advice Flag | Meaning & Effect |
|---|---|
MADV_NORMAL |
Default behaviour (some read-ahead) |
MADV_SEQUENTIAL |
Pages will be accessed sequentially β kernel does aggressive read-ahead, frees pages after use |
MADV_RANDOM |
Pages will be accessed randomly β disable read-ahead to avoid wasted I/O |
MADV_WILLNEED |
Will need these pages soon β kernel pre-faults them into RAM (non-blocking) |
MADV_DONTNEED |
Won’t need these pages β kernel may discard them. Next access will page-fault (re-load from file or zero) |
MADV_FREE |
Pages can be freed when needed (Linux 4.5+). Like MADV_DONTNEED but lazy β page contents preserved until reclaimed |
MADV_HUGEPAGE |
Use transparent huge pages for this region (reduce TLB pressure) |
MADV_NOHUGEPAGE |
Do not use huge pages for this region |
/*
* madvise_demo.c β Usage examples of madvise
* Compile: gcc -o madvise_demo madvise_demo.c
*/
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
int main(int argc, char *argv[])
{
int fd;
char *map;
struct stat sb;
if (argc != 2) { fprintf(stderr, "Usage: %s file\n", argv[0]); exit(1); }
fd = open(argv[1], O_RDONLY);
if (fd == -1) { perror("open"); exit(1); }
fstat(fd, &sb);
map = mmap(NULL, sb.st_size, PROT_READ, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) { perror("mmap"); exit(1); }
close(fd);
/* CASE 1: We will scan the file sequentially */
madvise(map, sb.st_size, MADV_SEQUENTIAL);
/* Now do sequential read... */
/* CASE 2: After processing a region, hint we're done with it */
madvise(map, sb.st_size / 2, MADV_DONTNEED);
/* Kernel may reclaim those pages β saves RAM */
/* CASE 3: We're about to do random lookups */
madvise(map + sb.st_size / 2, sb.st_size / 2, MADV_RANDOM);
/* CASE 4: Pre-load the last quarter before we need it */
madvise(map + 3 * sb.st_size / 4, sb.st_size / 4, MADV_WILLNEED);
printf("madvise hints applied successfully\n");
munmap(map, sb.st_size);
return 0;
}
The Linux-specific /proc/PID/maps file lists every VMA (Virtual Memory Area) of a process. Each line has the format:
start-end perms offset dev inode pathname
Example:
7f9a3c000000-7f9a3c021000 r--p 00000000 08:01 12345 /usr/lib/libc.so.6
7f9a3c021000-7f9a3c176000 r-xp 00021000 08:01 12345 /usr/lib/libc.so.6
7f9a3c176000-7f9a3c1c4000 r--p 00176000 08:01 12345 /usr/lib/libc.so.6
7f9a3c1c4000-7f9a3c1c8000 rw-p 001c4000 08:01 12345 /usr/lib/libc.so.6
7ffd8b200000-7ffd8b221000 rw-p 00000000 00:00 0 [stack]
7ffd8b300000-7ffd8b304000 r--p 00000000 00:00 0 [vvar]
7ffd8b304000-7ffd8b306000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
Permission field breakdown:
| Position | Char | Meaning |
|---|---|---|
| 1st | r / - |
Readable |
| 2nd | w / - |
Writable |
| 3rd | x / - |
Executable |
| 4th | p / s |
p=Private (MAP_PRIVATE), s=Shared (MAP_SHARED) |
/*
* parse_maps.c β Parse /proc/self/maps and print a summary
* Compile: gcc -o parse_maps parse_maps.c
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
FILE *fp;
char line[512];
char perms[8], pathname[256];
unsigned long start, end, offset;
unsigned int dev_major, dev_minor;
unsigned long inode;
fp = fopen("/proc/self/maps", "r");
if (!fp) { perror("fopen"); exit(1); }
printf("%-20s %-6s %-12s %s\n",
"Address Range", "Perms", "Size", "Mapping");
printf("%-20s %-6s %-12s %s\n",
"--------------------", "------", "------------", "-------");
while (fgets(line, sizeof(line), fp)) {
pathname[0] = '\0';
sscanf(line, "%lx-%lx %s %lx %x:%x %lu %255[^\n]",
&start, &end, perms, &offset,
&dev_major, &dev_minor, &inode, pathname);
unsigned long size_kb = (end - start) / 1024;
const char *label = (pathname[0] ? pathname : "[anonymous]");
printf("%012lx-%012lx %-6s %-8lu KB %s\n",
start, end, perms, size_kb, label);
}
fclose(fp);
return 0;
}
MAP_FIXED places the mapping at an exact address. Because it silently destroys overlapping mappings, Linux 4.17 added MAP_FIXED_NOREPLACE as a safer alternative β it fails with EEXIST if the range is already occupied.
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
int main(void)
{
long page_size = sysconf(_SC_PAGESIZE);
void *addr;
void *map1, *map2;
/* Step 1: Get a page at a kernel-chosen address */
map1 = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (map1 == MAP_FAILED) { perror("mmap1"); exit(1); }
printf("map1 at: %p\n", map1);
/* Step 2: Try MAP_FIXED_NOREPLACE at same address β should fail */
map2 = mmap(map1, page_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED_NOREPLACE,
-1, 0);
if (map2 == MAP_FAILED) {
if (errno == EEXIST)
printf("MAP_FIXED_NOREPLACE: safely rejected (EEXIST)\n");
else
perror("mmap2 unexpected error");
} else {
printf("Unexpected success with MAP_FIXED_NOREPLACE\n");
munmap(map2, page_size);
}
/* Step 3: MAP_FIXED at a DIFFERENT page-aligned address */
addr = (void *)(((unsigned long)map1 + page_size * 16) & ~(page_size-1));
map2 = mmap(addr, page_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
-1, 0);
if (map2 != MAP_FAILED) {
printf("MAP_FIXED mapping created at: %p\n", map2);
munmap(map2, page_size);
}
munmap(map1, page_size);
return 0;
}
POSIX shared memory (shm_open()) is the preferred modern way to share memory between unrelated processes without creating a real disk file. Internally it is just mmap() applied to an object in a tmpfs-based virtual filesystem (usually /dev/shm on Linux).
- Requires a real file on disk
- File persists after process exits
- Works between unrelated processes
- No special API β just open() + mmap()
- No real disk file (tmpfs / RAM)
- Object lives in /dev/shm (by name)
- Works between unrelated processes
- Uses shm_open() + mmap() + shm_unlink()
- No file, no name
- Only between related (forked) processes
- Simplest β just mmap(MAP_SHARED|MAP_ANON)
- Automatic cleanup on process exit
/*
* posix_shm_preview.c β Quick preview of POSIX shared memory API
* (Full coverage in Chapter 54)
* Compile: gcc -o posix_shm posix_shm_preview.c -lrt
*/
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <string.h>
#define SHM_NAME "/ep_demo_shm"
#define SHM_SIZE 1024
int main(void)
{
int fd;
char *ptr;
/* Create/open POSIX shared memory object by name */
fd = shm_open(SHM_NAME,
O_CREAT | O_RDWR, /* Create if not exists, open R/W */
S_IRUSR | S_IWUSR); /* Permissions: owner rw */
if (fd == -1) { perror("shm_open"); exit(1); }
/* Set its size */
if (ftruncate(fd, SHM_SIZE) == -1) { perror("ftruncate"); exit(1); }
/* Map it β exactly the same mmap() call as a file mapping */
ptr = mmap(NULL, SHM_SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
if (ptr == MAP_FAILED) { perror("mmap"); exit(1); }
close(fd);
/* Use it */
snprintf(ptr, SHM_SIZE, "Hello from PID %d via POSIX shm!", getpid());
printf("Wrote: %s\n", ptr);
/* Cleanup */
munmap(ptr, SHM_SIZE);
shm_unlink(SHM_NAME); /* Delete from /dev/shm */
return 0;
}
| Concept | API / Flag | Key Point |
|---|---|---|
| Create mapping | mmap() |
Returns start address or MAP_FAILED |
| Remove mapping | munmap() |
Releases virtual address space; does not guarantee disk flush |
| Flush to disk | msync(MS_SYNC) |
Required for durability; MS_ASYNC is non-blocking |
| Change protection | mprotect() |
addr must be page-aligned; use for guard pages and JIT |
| Lock in RAM | mlock() |
Needs CAP_IPC_LOCK; prevents swap for RT/security |
| Access hints | madvise() |
SEQUENTIAL, RANDOM, WILLNEED, DONTNEED |
| Private file map | MAP_PRIVATE |
COW; writes not visible to others; used for ELF loading |
| Shared file map | MAP_SHARED |
Writes visible to all; flushed to file; mmap I/O and IPC |
| Anon alloc | MAP_ANONYMOUS |
fd=-1, offset=0; zeroed; demand-paged |
| Force address | MAP_FIXED |
Dangerous β silently unmaps existing mappings at that range |
| Safe force addr | MAP_FIXED_NOREPLACE |
Linux 4.17+; fails with EEXIST if range occupied |
| Inspect mappings | /proc/PID/maps |
Shows all VMAs: address, perms, offset, file |
mprotect() enables this by allowing a JIT compiler to first make a page writable (PROT_WRITE) to emit code, then switch it to executable (PROT_EXEC) before running it, removing write permission in the process.MADV_DONTNEED tells the kernel to immediately release the pages β the next access will cause a page fault returning zeroed (anonymous) or re-loaded (file) pages. The data is definitively gone. MADV_FREE (Linux 4.5+) is lazy β the kernel marks the pages as available for reclamation but keeps the data in place. If the process accesses the pages before the kernel reclaims them, it still finds the old data. The kernel only discards the pages under memory pressure. MADV_FREE is used by allocators like jemalloc to cheaply “free” pages while retaining them if memory is plentiful.mlock() pins pages in RAM, eliminating this source of unbounded latency. Real-time programs typically call mlockall(MCL_CURRENT|MCL_FUTURE) at startup to lock their entire address space.mmap(MAP_SHARED) under the hood. The difference is the backing store: a shared file mapping uses a regular file on a persistent filesystem (data survives reboots if not deleted). POSIX shared memory (shm_open) uses an object in a tmpfs (/dev/shm) β it is RAM-backed, vanishes on reboot, and is accessed by name (no directory path needed). POSIX shm is cleaner for IPC because it does not pollute the filesystem and is automatically sized with ftruncate().exec() because the entire virtual address space of the calling process is replaced with the new program’s image. The kernel creates fresh mappings for the new program’s text, data, BSS, and stack segments by loading the ELF. Only file descriptors (not marked FD_CLOEXEC) survive exec β the mappings themselves do not.mmap() is commonly used to access hardware registers via /dev/mem or a UIO (Userspace I/O) device. For example, to control a GPIO controller: open /dev/mem, call mmap(NULL, BLOCK_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, gpio_base_address) to map the GPIO register block into user space, then use pointer arithmetic to read/write registers directly. This gives zero-copy, low-latency hardware access without custom kernel drivers. The same pattern is used for framebuffers (/dev/fb0), DMA buffers, and SPI/I2C controller registers.You have covered all of Chapter 49 β Memory Mappings.
