Memory Mappings in Linux File Mappings, Private and Shared

 

Memory Mappings in Linux
Chapter 49 โ€” Part 3: File Mappings, Private and Shared
๐Ÿ“„ File to Memory Mapping
๐Ÿ”’ Private File Mapping
๐Ÿ” Shared File Mapping
โš™ Lazy Page Loading

What is a File Mapping?

A file mapping maps the contents of a file (or part of a file) directly into the process’s virtual address space. Once mapped, the file’s contents can be accessed using normal pointer operations โ€” no read() or write() system calls needed. The kernel handles loading the appropriate file data into physical memory transparently through the virtual memory system.

This is how the Linux dynamic linker (ld-linux.so) loads shared libraries and executable code segments into memory. It is also the mechanism behind many high-performance databases, log-structured storage systems, and IPC frameworks.

Key Terms in This Module

File Mapping MAP_PRIVATE file MAP_SHARED file Page Cache Demand Paging Lazy Loading Text Segment ftruncate() offset length SIGBUS msync()

๐Ÿ“„ How File Mappings Work โ€” Step by Step

Creating a file mapping requires two steps:

  1. Open the file with open() to get a file descriptor.
  2. Pass the file descriptor as the fd argument to mmap().

The file permissions required depend on the prot and flags you specify. The rules are:

mmap() flags Required open() flags Reason
PROT_READ, MAP_PRIVATE O_RDONLY Read-only mapping; writes are CoW private
PROT_READ | PROT_WRITE, MAP_PRIVATE O_RDONLY is sufficient Writes are private (CoW) โ€” never reach the file
PROT_WRITE, MAP_SHARED O_RDWR required Writes propagate to the file โ€” file must be writable
PROT_READ, MAP_SHARED O_RDONLY Shared read-only mapping (e.g., shared library text)

How the kernel maps file pages into virtual memory:

File on Disk

Byte 0โ€“4095
Byte 4096โ€“8191
Byte 8192โ€“12287
โ†•
Process Virtual Address Space

addr[0โ€“4095]
addr[4096โ€“8191]
addr[8192โ€“12287]

Full mapping example โ€” simple cat implementation:

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

/*
 * Minimal cat(1) implementation using mmap().
 * Maps the entire file and writes it to stdout in one call.
 * More efficient than read() for large files because it avoids
 * copying data between kernel and user space buffers.
 */
int main(int argc, char *argv[])
{
    int fd;
    struct stat sb;
    char *addr;

    if (argc != 2) {
        fprintf(stderr, "Usage: %s <file>\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    fd = open(argv[1], O_RDONLY);
    if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }

    /* Get file size */
    if (fstat(fd, &sb) == -1) { perror("fstat"); exit(EXIT_FAILURE); }

    if (sb.st_size == 0) {
        printf("(empty file)\n");
        close(fd);
        return 0;
    }

    /*
     * Map entire file:
     * - PROT_READ: we only need to read
     * - MAP_PRIVATE: no writes, but using MAP_PRIVATE avoids accidental writes
     * - offset = 0: start from beginning of file
     * - length = sb.st_size: map entire file
     */
    addr = mmap(NULL, (size_t)sb.st_size,
                PROT_READ,
                MAP_PRIVATE,
                fd, 0);
    if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }

    /* fd can be closed now - mapping remains valid */
    close(fd);

    /* Write file contents to stdout */
    ssize_t nw = write(STDOUT_FILENO, addr, (size_t)sb.st_size);
    if (nw != sb.st_size) {
        fprintf(stderr, "partial/failed write\n");
        munmap(addr, (size_t)sb.st_size);
        exit(EXIT_FAILURE);
    }

    munmap(addr, (size_t)sb.st_size);
    return 0;
}

๐Ÿ“ Mapping Part of a File โ€” offset and length

You don’t have to map an entire file. The offset and length arguments control exactly which region of the file is mapped. This is used by databases to map individual pages, and by the linker to map separate segments from an ELF file.

File on disk:
0โ€“4095
not mapped
4096โ€“8191
not mapped
8192โ€“12287
offset=8192
12288โ€“16383
mapped
16384โ€“20479
not mapped
20480+
not mapped
โ†“ mmap(NULL, 8192, PROT_READ, MAP_PRIVATE, fd, 8192) โ†“
Process virtual memory:
(not mapped) addr[0โ€“4095]
= file byte 8192
addr[4096โ€“8191]
= file byte 12288
(not mapped)
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    int fd;
    struct stat sb;
    char *addr;
    long page_size = sysconf(_SC_PAGESIZE);

    if (argc != 2) {
        fprintf(stderr, "Usage: %s <large_file>\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    fd = open(argv[1], O_RDONLY);
    if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }

    if (fstat(fd, &sb) == -1) { perror("fstat"); exit(EXIT_FAILURE); }

    if (sb.st_size < 2 * page_size) {
        fprintf(stderr, "File too small for this demo\n");
        close(fd); exit(EXIT_FAILURE);
    }

    /*
     * Map only the SECOND page of the file.
     * offset MUST be a multiple of page_size.
     */
    off_t offset = page_size;         /* Skip first page */
    size_t length = (size_t)page_size; /* Map one page */

    addr = mmap(NULL, length,
                PROT_READ,
                MAP_PRIVATE,
                fd, offset);
    if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }

    close(fd);

    printf("Mapped bytes %ld to %ld of file\n",
           (long)offset, (long)(offset + length - 1));
    printf("First byte of mapped region: 0x%02x ('%c')\n",
           (unsigned char)addr[0],
           (addr[0] >= 32 && addr[0] < 127) ? addr[0] : '?');

    munmap(addr, length);
    return 0;
}

/*
 * Mapping arbitrary byte offsets (not page-aligned):
 * If you need to access file byte 5000 and page_size is 4096:
 *   aligned_offset = (5000 / 4096) * 4096 = 4096
 *   extra          = 5000 - 4096 = 904
 *   addr = mmap(NULL, page_size, PROT_READ, MAP_PRIVATE, fd, aligned_offset);
 *   target_ptr = addr + extra;   // points to file byte 5000
 */

๐Ÿ”’ Private File Mappings โ€” Two Major Uses

Private file mappings (MAP_PRIVATE) have two primary use cases in Linux:

๐Ÿ“š Use Case 1: Shared Text / Code Segments

When 10 processes run the same program (e.g., 10 instances of bash), they all share the same physical pages for the code (text) segment. Each process has its own virtual mapping pointing to the same physical frames. Since code is read-only, there is no CoW needed โ€” all processes read the same pages.

๐Ÿ“„ Use Case 2: Process Initialization (data segment)

The process’s initialized data segment (.data, .bss) is loaded from the executable file using a private mapping. Initially all processes share the same physical pages. When a process writes to a variable, CoW creates a private copy. Unmodified pages keep sharing.

How Linux loads an executable (simplified):

ELF Segment mmap() flags Writable? CoW?
.text (code) PROT_READ|PROT_EXEC, MAP_PRIVATE No Never (read-only)
.rodata (const data) PROT_READ, MAP_PRIVATE No Never
.data (init data) PROT_READ|PROT_WRITE, MAP_PRIVATE Yes Yes (on write)
heap / stack PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS Yes Yes (after fork)
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

/*
 * Demo: Private file mapping - changes are NOT saved to disk.
 * This simulates how the data segment of an executable is loaded.
 */
int main(void)
{
    int fd;
    struct stat sb;
    char *addr;
    char original[64];

    /* Create a small test file */
    fd = open("/tmp/test_private.txt", O_RDWR | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }

    const char *content = "Original file content here.\n";
    write(fd, content, strlen(content));
    fstat(fd, &sb);

    /* Create PRIVATE mapping - our writes will NOT change the file */
    addr = mmap(NULL, (size_t)sb.st_size,
                PROT_READ | PROT_WRITE,
                MAP_PRIVATE,
                fd, 0);
    if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }

    close(fd);

    /* Remember what the file says */
    strncpy(original, addr, sizeof(original) - 1);

    printf("Before modification: [%s]\n", addr);

    /* Modify our private copy */
    memcpy(addr, "MODIFIED IN MEMORY ONLY ", 24);
    printf("After modification:  [%s]\n", addr);

    munmap(addr, (size_t)sb.st_size);

    /* Re-read file from disk to verify it wasn't changed */
    fd = open("/tmp/test_private.txt", O_RDONLY);
    char buf[64] = {0};
    read(fd, buf, sizeof(buf) - 1);
    close(fd);

    printf("File on disk:        [%s]\n", buf);
    printf("File unchanged: %s\n",
           strcmp(buf, content) == 0 ? "YES (MAP_PRIVATE confirmed)" : "NO");

    unlink("/tmp/test_private.txt");
    return 0;
}

๐Ÿ” Shared File Mappings โ€” IPC and File I/O

Shared file mappings (MAP_SHARED) have two major uses:

  1. Memory-mapped file I/O: Fast reads and writes to a file without explicit read()/write() system calls.
  2. IPC via shared file: Multiple processes map the same file with MAP_SHARED and see each other’s changes without any additional IPC mechanism.

How two processes share data through a file mapping:

Process A
mmap(...MAP_SHARED...)

addr_A[0] = 42;
โ†”
page cache
Kernel
Page Cache

Shared physical page
contains 42
โ†”
page cache
Process B
mmap(...MAP_SHARED...)

printf(addr_B[0]); // 42
Both processes see the same physical page. A write by Process A is visible to Process B immediately.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/wait.h>

#define SHMFILE  "/tmp/ipc_mmap.bin"
#define MAPSIZE  4096

int main(void)
{
    int fd;
    char *addr;

    /* Create and size the file */
    fd = open(SHMFILE, O_RDWR | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }

    /* File must be at least MAPSIZE bytes */
    if (ftruncate(fd, MAPSIZE) == -1) { perror("ftruncate"); exit(EXIT_FAILURE); }

    /*
     * Create MAP_SHARED mapping.
     * Both parent and child (after fork) will see each other's writes.
     */
    addr = mmap(NULL, MAPSIZE,
                PROT_READ | PROT_WRITE,
                MAP_SHARED,
                fd, 0);
    if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
    close(fd); /* fd not needed after mmap() */

    /* Initialise shared region */
    memset(addr, 0, MAPSIZE);

    pid_t pid = fork();
    if (pid == -1) { perror("fork"); exit(EXIT_FAILURE); }

    if (pid == 0) {
        /* CHILD: write a message to the shared mapping */
        sleep(1); /* Let parent set up */
        snprintf(addr, MAPSIZE, "Hello from child PID %d!", getpid());
        printf("[Child] Wrote: '%s'\n", addr);

        /* Flush to file (good practice even for IPC via file mapping) */
        msync(addr, MAPSIZE, MS_SYNC);
        munmap(addr, MAPSIZE);
        exit(EXIT_SUCCESS);
    }

    /* PARENT: wait then read what child wrote */
    wait(NULL);
    printf("[Parent] Read: '%s'\n", addr);

    munmap(addr, MAPSIZE);
    unlink(SHMFILE);
    return 0;
}

Fast file update using shared mapping:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>

/* Write a timestamp to a log file using mmap - no lseek, no write() */
int main(void)
{
    int fd;
    char *addr;
    const size_t SIZE = 256;
    struct tm *tm_info;
    time_t t;
    char tsbuf[64];

    fd = open("/tmp/mmap_log.txt", O_RDWR | O_CREAT, 0644);
    if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }
    ftruncate(fd, (off_t)SIZE);

    addr = mmap(NULL, SIZE,
                PROT_READ | PROT_WRITE,
                MAP_SHARED,
                fd, 0);
    if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
    close(fd);

    for (int i = 0; i < 3; i++) {
        t = time(NULL);
        tm_info = localtime(&t);
        strftime(tsbuf, sizeof(tsbuf), "%Y-%m-%d %H:%M:%S", tm_info);

        /* Update file contents through pointer - no write() system call */
        snprintf(addr, SIZE, "Entry %d: %s\n", i + 1, tsbuf);

        /* MS_ASYNC: schedule write but don't block */
        msync(addr, SIZE, MS_ASYNC);

        printf("Written: %s", addr);
        sleep(1);
    }

    /* Final sync to ensure all data is on disk */
    msync(addr, SIZE, MS_SYNC);
    munmap(addr, SIZE);
    return 0;
}

โš™ Lazy Page Loading (Demand Paging)

A critical performance characteristic of mmap(): pages are not loaded into RAM immediately when mmap() is called. The call only creates virtual memory mappings. Physical pages are loaded on demand when the process first accesses them. This is called demand paging or lazy loading.

1. mmap() called
VMA created, NO pages in RAM
โ†’
2. Process accesses addr[0]
Page fault triggered
โ†’
3. Kernel handles fault
Reads file page into RAM
โ†’
4. Process continues
Page is now in RAM
โš  Race condition with lazy loading: On Linux, file pages are loaded on first access. If the file is modified after mmap() but before the page is accessed, the process might see the modified content (if the kernel hasn’t loaded it yet) or the original content (if it has). Portable code must not rely on this behaviour.
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

/*
 * madvise() can hint to the kernel about access patterns,
 * allowing it to prefetch pages (overriding default lazy loading).
 * This is the manual counterpart to lazy loading.
 */
int main(void)
{
    int fd;
    struct stat sb;
    char *addr;

    fd = open("/etc/passwd", O_RDONLY);
    if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }
    fstat(fd, &sb);

    addr = mmap(NULL, (size_t)sb.st_size,
                PROT_READ, MAP_PRIVATE, fd, 0);
    if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
    close(fd);

    /*
     * MADV_SEQUENTIAL: Tell kernel we'll read pages in order.
     * Kernel will read-ahead aggressively (prefetch next pages).
     * Good for sequential scans of large files.
     *
     * Other useful hints:
     * MADV_RANDOM     - disable read-ahead (random access pattern)
     * MADV_WILLNEED   - prefetch these pages NOW
     * MADV_DONTNEED   - kernel can discard these pages (like a free)
     * MADV_HUGEPAGE   - use huge pages if possible
     */
    if (madvise(addr, (size_t)sb.st_size, MADV_SEQUENTIAL) == -1)
        perror("madvise (non-fatal)");

    /* Sequential scan - kernel will have prefetched ahead */
    size_t lines = 0;
    for (size_t i = 0; i < (size_t)sb.st_size; i++)
        if (addr[i] == '\n') lines++;

    printf("Lines in /etc/passwd: %zu\n", lines);

    /* Release pages immediately - no need to keep in cache */
    madvise(addr, (size_t)sb.st_size, MADV_DONTNEED);

    munmap(addr, (size_t)sb.st_size);
    return 0;
}

๐Ÿšจ SIGBUS โ€” Accessing Beyond the End of File

When mapping a file, the mapping size (length) may be rounded up to a page boundary by the kernel. If the file size is not a multiple of the page size, the bytes between the end of file and the next page boundary are mapped but contain zeros (they are readable without error). However, accessing beyond the next full page boundary (beyond the mapped length) delivers SIGBUS.

File content (4500 bytes)
addr[0] โ€“ addr[4499]
Readable, contains actual file data
Zero fill
addr[4500]โ€“addr[4095]
(padding to page boundary)
Beyond mapped length
addr[8192]+
SIGBUS on access
addr addr+4500 addr+8192
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

void sigbus_handler(int sig)
{
    (void)sig;
    printf("Caught SIGBUS: accessed beyond mapped region!\n");
    exit(EXIT_FAILURE);
}

int main(void)
{
    int fd;
    struct stat sb;
    char *addr;
    long page = sysconf(_SC_PAGESIZE);

    signal(SIGBUS, sigbus_handler);

    /* Create a file smaller than one page (e.g. 100 bytes) */
    fd = open("/tmp/small.txt", O_RDWR | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }
    ftruncate(fd, 100); /* 100 bytes */
    fstat(fd, &sb);

    /* Map one full page, but file is only 100 bytes */
    addr = mmap(NULL, (size_t)page,
                PROT_READ | PROT_WRITE,
                MAP_SHARED,
                fd, 0);
    if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
    close(fd);

    /* Safe: within file content (0..99) */
    printf("addr[0]  = 0x%02x (within file)\n", (unsigned char)addr[0]);
    printf("addr[99] = 0x%02x (last byte of file)\n", (unsigned char)addr[99]);

    /* Safe: padding zeros between file end and page end (100..4095) */
    printf("addr[100] = 0x%02x (zero padding, no SIGBUS)\n",
           (unsigned char)addr[100]);

    /*
     * DANGER: accessing beyond the mapped length on a MAP_SHARED mapping
     * after truncating the file would cause SIGBUS.
     * On MAP_PRIVATE the kernel may provide zeros instead.
     */
    printf("Truncating file to 0 bytes (simulates file shrink)...\n");
    ftruncate(open("/tmp/small.txt", O_RDWR), 0);

    printf("Accessing addr[0] after truncation (MAP_SHARED)...\n");
    /* This will SIGBUS because the file has no backing pages now */
    volatile char c = addr[0]; /* triggers SIGBUS */
    (void)c;

    munmap(addr, (size_t)page);
    unlink("/tmp/small.txt");
    return 0;
}

๐Ÿซ Interview Questions โ€” File Mappings

Q1: What are the two main uses of private file mappings in Linux?

First, loading code segments: when multiple processes run the same executable or use the same shared library, the text (code) segment is loaded via a private read-only mapping. All processes share the same physical pages for code. Second, process initialization: the initialized data segment (.data) of an executable is loaded using a private writable mapping. CoW ensures each process gets its own copy of variables only when they are written.

Q2: Why must a file be opened O_RDWR for a MAP_SHARED + PROT_WRITE mapping?

A MAP_SHARED mapping with write permission causes writes to be reflected in the underlying file. To write to a file, the file descriptor must have been opened with write permission (O_WRONLY or O_RDWR). Using O_RDONLY with MAP_SHARED | PROT_WRITE fails with EACCES because the kernel checks that the fd was opened with appropriate permissions before allowing shared writable mappings.

Q3: What is the difference between SIGSEGV and SIGBUS in the context of mmap()?

SIGSEGV (Segmentation Fault) is delivered when a process violates memory protection โ€” for example, writing to a read-only page or accessing an unmapped region. SIGBUS (Bus Error) is delivered in the context of memory mappings when a process accesses a mapped page that has no backing storage โ€” most commonly when accessing a file-backed mapping beyond the current end of the file (for example, if the file was truncated after the mapping was created).

Q4: What happens if you mmap() a file and another process truncates the file after the mapping is created?

For a MAP_SHARED mapping: accessing a page beyond the new end of file delivers SIGBUS because the backing storage no longer exists for that page. For a MAP_PRIVATE mapping: the kernel may provide anonymous zero pages beyond the (now smaller) file, so SIGBUS may not occur immediately (implementation dependent). This is a classic race condition that must be handled with appropriate locking (e.g., flock() or advisory locking) in real applications.

Q5: What is demand paging and how does it relate to mmap()?

Demand paging means physical memory pages are allocated and loaded only when first accessed, not when the mapping is created. When mmap() returns, the kernel only creates the virtual mapping (VMA entry) โ€” no disk I/O occurs. When the process first accesses a mapped virtual address, a page fault is triggered. The kernel handles this fault by loading the required file page from disk into physical RAM and updating the process’s page table. This makes mmap() extremely fast to call even for very large files.

Q6: Compare memory-mapped I/O vs read()/write() for large files. When is each preferred?

mmap() advantages: avoids user-kernel buffer copies (zero-copy for reads), allows random access without lseek, enables direct pointer manipulation, and naturally supports sharing between processes. Ideal for large files with random access patterns, memory-mapped databases, and IPC.

read()/write() advantages: simpler error handling, better for streaming sequential I/O, no VMA overhead, works with non-seekable file descriptors (pipes, sockets, devices). Preferred for small files, network I/O, and sequential log writing.

Key trade-off: mmap() increases the process’s virtual address space and VMA count; read()/write() use a fixed-size buffer. On 32-bit systems, mmap() of files larger than ~2 GB is impractical.

Q7: How does mmap() enable zero-copy data transfer?

Normally, read() copies data from the kernel page cache into the user buffer (kernel space โ†’ user space copy). With mmap(), the file’s page cache pages are mapped directly into the process’s virtual address space. The process reads data by accessing virtual addresses that directly reference the page cache โ€” no copy is needed. For sending data over a network socket, combining mmap() with sendfile() or splice() can achieve true zero-copy I/O.

Chapter 49 โ€” Memory Mappings Series Complete!

You have covered mmap() flags, memory protection, munmap(), and file mappings.

โ† Part 1: mmap() Flags โ† Part 2: munmap()

Leave a Reply

Your email address will not be published. Required fields are marked *