Nonlinear Mappings Reordering file pages in memory without multiple VMAs

 

Nonlinear Mappings
remap_file_pages() — Reordering file pages in memory without multiple VMAs

Chapter 49 Navigation: Index | ← Prev: mremap() & MAP_FIXED

What is a Nonlinear Mapping?

A normal mmap() creates a linear mapping: memory page 0 maps to file page 0, memory page 1 maps to file page 1, and so on — a strict sequential correspondence. A nonlinear mapping breaks this rule: you can put file page 2 at memory page 0, file page 0 at memory page 2, and so on — any permutation you want.

This is useful for applications that need multiple different views of the same file simultaneously — large databases, video decoders, garbage collectors, and virtual machine memory managers.

Linear vs Nonlinear — Side by Side

Linear Mapping (normal mmap)
Memory Page 0 File Page 0
Memory Page 1 File Page 1
Memory Page 2 File Page 2

Nonlinear Mapping (remap_file_pages)
Memory Page 0 File Page 2
Memory Page 1 File Page 1
Memory Page 2 File Page 0

Why Not Use MAP_FIXED for Nonlinear Mappings?

You can achieve a nonlinear layout using multiple mmap() calls with MAP_FIXED. But each such call creates a separate VMA (Virtual Memory Area) kernel data structure. Problems with many VMAs:

Problem Impact
Each VMA takes kernel memory to store Nonswappable kernel memory is consumed
Page fault handling scans the VMA list With tens of thousands of VMAs, each page fault becomes slow
Each VMA appears as a line in /proc/PID/maps The file becomes huge and tools slow down

remap_file_pages() solves this by rearranging the page table entries within a single existing VMA — no new VMAs are created.

remap_file_pages() Signature
#define _GNU_SOURCE
#include <sys/mman.h>

int remap_file_pages(void *addr,    /* Address within an existing mmap region   */
                     size_t size,   /* Size of the sub-region to remap (bytes)   */
                     int prot,      /* Must be 0 (ignored, reserved for future)  */
                     size_t pgoff,  /* File offset in PAGE SIZE units             */
                     int flags);    /* Must be 0 (currently unused)              */

/* Returns 0 on success, -1 on error */
Key points:

  • addr must be within an existing MAP_SHARED mapping created by mmap().
  • pgoff is in units of pages (not bytes). To refer to the 3rd page of the file, pass pgoff = 2.
  • The mapping must be MAP_SHAREDremap_file_pages() does not work on MAP_PRIVATE mappings.
  • This system call is Linux-specific — not in POSIX/SUSv3.
  • The same file page can be mapped into multiple locations in the same region.

Example 1: Creating the Nonlinear Mapping from TLPI Figure 49-5

This is the exact example from the book. A 3-page file is mapped, then pages 0 and 2 are swapped in the memory view.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>

int main(void) {
    long ps = sysconf(_SC_PAGESIZE);
    int fd;
    char *addr;

    /* Step 1: Create a 3-page file, write identifiable content to each page */
    fd = open("nonlinear.dat", O_RDWR | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) { perror("open"); exit(1); }
    if (ftruncate(fd, 3 * ps) == -1) { perror("ftruncate"); exit(1); }

    /* Map the file first so we can write to it */
    char *init = mmap(NULL, 3 * ps, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (init == MAP_FAILED) { perror("mmap init"); exit(1); }

    /* Write a marker at the start of each file page */
    memset(init,          'A', ps);   /* file page 0: all 'A' */
    memset(init + ps,     'B', ps);   /* file page 1: all 'B' */
    memset(init + 2 * ps, 'C', ps);   /* file page 2: all 'C' */
    munmap(init, 3 * ps);

    /* Step 2: Create the main MAP_SHARED mapping for remap_file_pages() */
    addr = mmap(NULL, 3 * ps, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr == MAP_FAILED) { perror("mmap main"); exit(1); }
    close(fd);

    printf("Before remap:\n");
    printf("  Memory page 0: '%c'\n", addr[0]);        /* A */
    printf("  Memory page 1: '%c'\n", addr[ps]);       /* B */
    printf("  Memory page 2: '%c'\n", addr[2 * ps]);   /* C */

    /*
     * Step 3: Rearrange — swap file pages 0 and 2 in memory view
     *
     * remap_file_pages(addr,          ps, 0, 2, 0)
     *   → memory page 0 now maps file page 2 (pgoff=2)
     *
     * remap_file_pages(addr + 2*ps,   ps, 0, 0, 0)
     *   → memory page 2 now maps file page 0 (pgoff=0)
     */
    if (remap_file_pages(addr, ps, 0, 2, 0) == -1) {
        perror("remap_file_pages page0→file2");
        exit(1);
    }
    if (remap_file_pages(addr + 2 * ps, ps, 0, 0, 0) == -1) {
        perror("remap_file_pages page2→file0");
        exit(1);
    }

    printf("\nAfter remap:\n");
    printf("  Memory page 0: '%c'  (was A, now C)\n", addr[0]);
    printf("  Memory page 1: '%c'  (unchanged B)\n", addr[ps]);
    printf("  Memory page 2: '%c'  (was C, now A)\n", addr[2 * ps]);

    munmap(addr, 3 * ps);
    return 0;
}

/*
 * Expected output:
 * Before remap:
 *   Memory page 0: 'A'
 *   Memory page 1: 'B'
 *   Memory page 2: 'C'
 *
 * After remap:
 *   Memory page 0: 'C'  (was A, now C)
 *   Memory page 1: 'B'  (unchanged B)
 *   Memory page 2: 'A'  (was C, now A)
 */

Example 2: Mapping the Same File Page to Multiple Locations

One powerful feature of remap_file_pages(): the same file page can appear at multiple addresses in the mapping simultaneously.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>

int main(void) {
    long ps = sysconf(_SC_PAGESIZE);
    int fd;
    char *addr;

    fd = open("repeat.dat", O_RDWR | O_CREAT | O_TRUNC, 0644);
    ftruncate(fd, 4 * ps);

    char *init = mmap(NULL, 4 * ps, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    memset(init,      'X', ps);   /* file page 0: 'X' */
    memset(init + ps, 'Y', ps);   /* file page 1: 'Y' */
    memset(init + 2 * ps, 'Z', ps); /* file page 2: 'Z' */
    memset(init + 3 * ps, 'W', ps); /* file page 3: 'W' */
    munmap(init, 4 * ps);

    addr = mmap(NULL, 4 * ps, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    close(fd);

    /* Map file page 0 into both memory page 0 AND memory page 3 */
    remap_file_pages(addr,          ps, 0, 0, 0);  /* page 0 → file page 0 (no change) */
    remap_file_pages(addr + 3 * ps, ps, 0, 0, 0);  /* page 3 → file page 0 (same!) */

    printf("Memory page 0: '%c'\n", addr[0]);           /* X */
    printf("Memory page 3: '%c'\n", addr[3 * ps]);      /* X (same file page) */

    /* Write via memory page 0 — visible through memory page 3 */
    addr[0] = '!';
    printf("After write via page 0:\n");
    printf("  page 0: '%c'\n", addr[0]);         /* ! */
    printf("  page 3: '%c'\n", addr[3 * ps]);    /* ! (same physical page) */

    munmap(addr, 4 * ps);
    return 0;
}

Example 3: Checking /proc/self/maps VMA Count

This example demonstrates the VMA savings. Approach A uses multiple mmap()+MAP_FIXED. Approach B uses one mmap() + multiple remap_file_pages().

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>

int count_vmas(void) {
    FILE *f = fopen("/proc/self/maps", "r");
    int count = 0;
    char buf[512];
    while (fgets(buf, sizeof(buf), f)) count++;
    fclose(f);
    return count;
}

int main(void) {
    long ps = sysconf(_SC_PAGESIZE);
    int fd, n = 100;  /* number of nonlinear remaps */
    char *region;

    fd = open("large.dat", O_RDWR | O_CREAT | O_TRUNC, 0644);
    ftruncate(fd, n * ps);

    /* ---- Approach B: one mmap + remap_file_pages ---- */
    region = mmap(NULL, n * ps, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

    printf("VMAs before remap_file_pages: %d\n", count_vmas());

    /* Rearrange: reverse the order of all pages */
    for (int i = 0; i < n; i++) {
        remap_file_pages(region + i * ps, ps, 0, n - 1 - i, 0);
    }

    printf("VMAs after %d remap_file_pages calls: %d\n", n, count_vmas());
    /* Still ONE VMA for this region — not 100 */

    munmap(region, n * ps);
    close(fd);
    return 0;
}
/* Typical output:
   VMAs before remap_file_pages: 14
   VMAs after 100 remap_file_pages calls: 14  ← same! */

Chapter 49 System Calls – Summary Table
System Call Purpose POSIX?
mmap() Create a new memory mapping Yes (SUSv3)
munmap() Remove a mapping Yes
msync() Sync mapping to file Yes
mprotect() Change page protections Yes
mremap() Resize or move a mapping No (Linux-specific)
remap_file_pages() Create nonlinear mappings without extra VMAs No (Linux-specific)

Interview Questions & Answers
Q1. What is the difference between a linear and a nonlinear file mapping?

A: In a linear mapping (created by normal mmap()), memory page N maps to file page N — a sequential one-to-one correspondence. In a nonlinear mapping (created by remap_file_pages()), memory page N can map to any file page M. The pages of the file can appear in any order within the memory region, including duplicates (the same file page at multiple memory locations).

Q2. Why does using many mmap()+MAP_FIXED calls to create nonlinear mappings hurt performance?

A: Each mmap() call creates a separate VMA (Virtual Memory Area) in the kernel. VMAs take nonswappable kernel memory to store. The virtual memory manager must scan the VMA list on every page fault — with tens of thousands of VMAs, this scan is slow. remap_file_pages() manipulates page table entries directly within one existing VMA, so the VMA count stays at 1 regardless of how many remaps you do.

Q3. What does the pgoff argument in remap_file_pages() represent?

A: pgoff is the file offset in units of page size (not bytes). So pgoff=0 means the first page of the file, pgoff=1 means the second page, etc. To convert from a byte offset: pgoff = byte_offset / sysconf(_SC_PAGESIZE).

Q4. Can remap_file_pages() be applied to a MAP_PRIVATE mapping?

A: No. remap_file_pages() can only be applied to MAP_SHARED mappings. Attempting to use it on a MAP_PRIVATE mapping returns an error.

Q5. Name three real-world applications that benefit from nonlinear mappings.

A: (1) Large database management systems that maintain multiple different page-level views of a database file simultaneously — they need to rearrange file pages in memory without creating thousands of VMAs. (2) Virtual machines and garbage collectors that need write-protect individual pages scattered throughout a large mapped region. (3) Video/image processing engines that need to reorder frames or tiles in memory without copying data.

Q6. After calling remap_file_pages(), how does the prot argument work?

A: The prot argument is currently ignored and must be specified as 0. The protection of the remapped pages remains the same as the protection of the entire VMA (as set when the original mmap() was called). Future versions of the kernel may allow changing protection per-page via this argument, but this has not yet been implemented.

Q7. What information does each line in /proc/PID/maps represent?

A: Each line represents one VMA (Virtual Memory Area). It shows: the virtual address range, the permission flags (r/w/x/p or s for private/shared), the file offset, device and inode numbers, and the filename or special region name (e.g., [heap], [stack], or a .so path). The total number of lines equals the total number of VMAs in the process — this is why remap_file_pages() is advantageous: it does not increase this count.

TLPI Exercises — Chapter 49
Exercise 49-1: cp using mmap() and memcpy()

Write a program analogous to cp(1) that uses mmap() and memcpy() (not read()/write()) to copy a source file to a destination. Use fstat() to get the source file size for the mapping, and ftruncate() to set the destination file size before mapping. See File 1 Example 4 for the full solution.

Exercise 49-2: mmap-based shared memory transfer

Rewrite the System V shared memory transfer programs (svshm_xfr_writer and svshm_xfr_reader from Chapter 48) to use a shared file mapping (MAP_SHARED) instead of System V shared memory. The file replaces the shared memory segment; synchronization still uses semaphores.

Exercise 49-3: Verify SIGBUS and SIGSEGV conditions

Write programs to demonstrate: (a) SIGSEGV from writing to a PROT_READ mapping, (b) SIGSEGV from accessing an unmapped address, (c) SIGBUS from accessing a page beyond the end of a file-backed mapping. See File 3 Examples 1–3 for reference implementations.

Exercise 49-4: Nonlinear mapping using MAP_FIXED

Write a program using the MAP_FIXED technique (multiple mmap() calls with MAP_FIXED) to create a nonlinear mapping similar to Figure 49-5. Then compare the VMA count (via /proc/self/maps) with the remap_file_pages() approach from Example 3 above.

Leave a Reply

Your email address will not be published. Required fields are marked *