What happens at page boundaries and beyond EOF mmap() Boundary Cases

 

mmap() Boundary Cases
Chapter 49 – Topic 1 | What happens at page boundaries and beyond EOF

Why Boundary Cases Matter

When you call mmap() the kernel maps your file into the process address space in units of pages (usually 4096 bytes on x86). But your file size and your requested mapping length are rarely exact multiples of the page size. What happens in the gaps? This topic explains two important cases:

  • Mapping length is not a multiple of page size, but mapping stays within the file.
  • Mapping extends beyond the end of the file (EOF).

Getting these wrong causes SIGSEGV or SIGBUS crashes at runtime.

mmap() Quick Recall

Before diving into boundary cases, here is the mmap() signature as a reminder:

#include <sys/mman.h>

void *mmap(void   *addr,    /* Hint for kernel where to place mapping (usually NULL) */
           size_t  length,  /* How many bytes to map                                */
           int     prot,    /* PROT_READ | PROT_WRITE | PROT_EXEC | PROT_NONE       */
           int     flags,   /* MAP_SHARED or MAP_PRIVATE (+ others)                 */
           int     fd,      /* File descriptor of the file to map                   */
           off_t   offset); /* Offset inside the file to start mapping from         */

/* Returns: start address of mapping on success, MAP_FAILED on error */

The offset must be a multiple of the page size. The length does not have to be, but the kernel rounds it up internally to the next page boundary.

Case 1 – Mapping Within File, Non-Page-Size Length

Suppose the file is 9500 bytes and you map 6000 bytes from offset 0. The system page size is 4096 bytes. Here is what the kernel does:

File on Disk (9500 bytes)
bytes 0 – 5999
mapped to process (accessible)
bytes 6000 – 8191
remainder of page (accessible,
mapped to file bytes 6000–8191)
bytes 8192 – 9499
unmapped
(SIGSEGV if accessed)
beyond file
unmapped
mmap requested: 0–5999 kernel rounded up to page boundary 8191 access here → SIGSEGV
Call: mmap(NULL, 6000, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0)

Key rules for Case 1:

  • You asked for 6000 bytes. The kernel rounds up to the next page: 8192 bytes (2 full pages).
  • Bytes 0–5999 are accessible and map to the file directly.
  • Bytes 6000–8191 are the “remainder of the page”. They are accessible but they also map to the actual file bytes 6000–8191 (since the file is bigger).
  • Bytes 8192 and beyond are not mapped. Accessing them raises SIGSEGV.

Code Example 1 – Basic mmap() of a file (within-file mapping)
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>

int main(void)
{
    int fd;
    void *addr;
    size_t map_len = 6000;  /* Not a multiple of page size (4096) */
    struct stat sb;

    /* Open the file for reading and writing */
    fd = open("testfile.dat", O_RDWR);
    if (fd == -1) {
        perror("open");
        exit(EXIT_FAILURE);
    }

    /* Get file size for information */
    fstat(fd, &sb);
    printf("File size: %ld bytes\n", (long)sb.st_size);

    /* Map 6000 bytes from offset 0.
     * Kernel internally rounds up to 8192 bytes (2 pages).
     * File is larger (9500 bytes), so bytes 6000–8191 also map to file.
     */
    addr = mmap(NULL, map_len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr == MAP_FAILED) {
        perror("mmap");
        exit(EXIT_FAILURE);
    }

    /* fd is no longer needed after mmap() succeeds */
    close(fd);

    /* Access bytes within the requested range – safe */
    printf("First byte: %d\n", ((unsigned char *)addr)[0]);
    printf("Last byte of requested range: %d\n", ((unsigned char *)addr)[5999]);

    /* NOTE: accessing addr[8192] or beyond would raise SIGSEGV
     * because that address is outside the mapped region.
     */

    /* Unmap when done */
    if (munmap(addr, map_len) == -1) {
        perror("munmap");
        exit(EXIT_FAILURE);
    }

    return 0;
}

Case 2 – Mapping Extends Beyond End of File

Now suppose the file is only 2200 bytes but you map 8192 bytes (2 pages). This is a much trickier case. The kernel creates the mapping, but different parts behave differently:

mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0) — File is only 2200 bytes
0 – 2199
Accessible
Mapped to file
2200 – 4095
Accessible
NOT mapped to file
Initialized to 0
(SUSv3 requirement)
4096 – 8191
Accessing raises
SIGBUS
(no file pages here)
8192+
Accessing raises
SIGSEGV
(outside mapping)
Page 0: fully in file Page 0 remainder: zeroed Page 1: SIGBUS zone Beyond mapping

Breaking this down:

  • Bytes 0–2199: These exist in the file. They are accessible and mapped to file data normally.
  • Bytes 2200–4095: These are in the first page (page 0), but beyond EOF. The kernel fills them with zeros as required by SUSv3. They are accessible. However, writes to this region are not written back to the file.
  • Bytes 4096–8191: This is the second page (page 1). There is no file backing here at all. Accessing these addresses raises SIGBUS — the kernel signals “bus error, no file here”.
  • Bytes 8192+: Outside the mapping entirely. Accessing raises SIGSEGV.

SIGSEGV vs SIGBUS — Quick Reference
Signal When Raised Default Action Meaning
SIGSEGV Access outside any mapping, or wrong permission Kill process + core dump Segmentation fault — invalid address
SIGBUS Access within a valid mapping but beyond EOF (on a full page) Kill process + core dump Bus error — no file backing for this page

Code Example 2 – Creating a small file and mapping more than its size
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>

int main(void)
{
    int fd;
    char *addr;
    long page_size;

    page_size = sysconf(_SC_PAGESIZE);  /* Typically 4096 */
    printf("System page size: %ld\n", page_size);

    /* Create a small file: only 2200 bytes */
    fd = open("small.dat", O_RDWR | O_CREAT | O_TRUNC, 0600);
    if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }

    /* Write 2200 bytes of 'A' into the file */
    char buf[2200];
    memset(buf, 'A', sizeof(buf));
    if (write(fd, buf, sizeof(buf)) != sizeof(buf)) {
        perror("write"); exit(EXIT_FAILURE);
    }

    /* Map 2 full pages (8192 bytes), even though file is only 2200 bytes */
    addr = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }

    close(fd);

    /* Safe: bytes 0–2199 are mapped to the file */
    printf("addr[0]    = '%c' (file data)\n", addr[0]);
    printf("addr[2199] = '%c' (file data)\n", addr[2199]);

    /* Safe: bytes 2200–4095 exist in the first page but are zeroed (not file) */
    printf("addr[2200] = %d  (zero-padded, NOT in file)\n", (int)(unsigned char)addr[2200]);
    printf("addr[4095] = %d  (zero-padded, NOT in file)\n", (int)(unsigned char)addr[4095]);

    /* DANGER: addr[4096] is on the SECOND page which has no file backing.
     * Accessing it would raise SIGBUS.
     * Uncomment the next line to see SIGBUS in action (process will crash):
     */
    /* printf("addr[4096] = %d\n", (int)(unsigned char)addr[4096]); */

    printf("If you access addr[4096], you get SIGBUS.\n");
    printf("If you access addr[%ld], you get SIGSEGV.\n", 2 * page_size);

    munmap(addr, 2 * page_size);
    return 0;
}

Extending the File to Make SIGBUS Go Away

A common pattern is to create an empty (or small) file, map a large region, and then grow the file using ftruncate() or write(). Once the file is extended, the previously SIGBUS-producing pages become safe to use. This is how file-backed shared memory is often implemented.

Code Example 3 – Using ftruncate() to extend file and avoid SIGBUS
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>

#define MAP_SIZE   8192  /* 2 pages */
#define INIT_SIZE  2200  /* Small initial file */

int main(void)
{
    int fd;
    char *addr;

    /* Step 1: Create a small file */
    fd = open("grow.dat", O_RDWR | O_CREAT | O_TRUNC, 0600);
    if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }

    /* Write INIT_SIZE bytes so the file exists */
    char buf[INIT_SIZE];
    memset(buf, 0, sizeof(buf));
    write(fd, buf, sizeof(buf));

    /* Step 2: Map MAP_SIZE (8192) bytes – more than the file */
    addr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }

    printf("File size before ftruncate: %d bytes\n", INIT_SIZE);
    printf("addr[4096] would raise SIGBUS right now.\n");

    /* Step 3: Extend the file to MAP_SIZE using ftruncate()
     * Now the file is 8192 bytes. Pages 0 and 1 both have file backing.
     * addr[4096] is now SAFE to access.
     */
    if (ftruncate(fd, MAP_SIZE) == -1) {
        perror("ftruncate"); exit(EXIT_FAILURE);
    }
    printf("File extended to %d bytes via ftruncate()\n", MAP_SIZE);

    /* Step 4: Now addr[4096] is safe */
    addr[4096] = 'X';
    printf("addr[4096] = '%c'  (now safe after ftruncate)\n", addr[4096]);

    close(fd);
    munmap(addr, MAP_SIZE);
    return 0;
}

Key Rules for mmap() Boundary Cases
  • The kernel rounds up the mapping length to the next page boundary.
  • Bytes within the rounded-up first page but beyond EOF are accessible and zeroed (SUSv3).
  • Pages completely beyond EOF in the mapping → SIGBUS on access.
  • Addresses beyond the entire mapping → SIGSEGV on access.
  • You can cure SIGBUS by extending the file with ftruncate() or write().
  • Always check the page size at runtime using sysconf(_SC_PAGESIZE).

Code Example 4 – Always detect page size at runtime
#include <stdio.h>
#include <unistd.h>

/* Round size up to the next multiple of page_size */
static size_t round_up_to_page(size_t size, long page_size)
{
    return (size + page_size - 1) & ~(page_size - 1);
}

int main(void)
{
    long page_size = sysconf(_SC_PAGESIZE);
    printf("Page size: %ld bytes\n", page_size);

    /* Example: file is 5000 bytes, you want to align mapping to page boundary */
    size_t file_size = 5000;
    size_t aligned   = round_up_to_page(file_size, page_size);
    printf("5000 bytes rounded up to page boundary: %zu bytes\n", aligned);
    /* Output: 8192 bytes (next 4096-byte boundary) */

    size_t file_size2 = 8192;
    size_t aligned2   = round_up_to_page(file_size2, page_size);
    printf("8192 bytes rounded up to page boundary: %zu bytes\n", aligned2);
    /* Output: 8192 bytes (already aligned) */

    return 0;
}

Interview Questions – mmap() Boundary Cases
Q1. What signal is raised when you access memory beyond the end of an mmap() mapping?
SIGSEGV. Accessing any address outside the mapped region (i.e., beyond the rounded-up mapping size) causes a segmentation fault. SIGBUS is different — it is raised when the address is within the mapping but there is no file backing for that page (e.g., the page is beyond EOF).
Q2. What is the difference between SIGSEGV and SIGBUS in the context of mmap()?
SIGSEGV — you accessed an address outside any valid mapped region (or violated read/write/exec permissions).
SIGBUS — you accessed an address inside a valid mapping but the corresponding page in the underlying file does not exist (the file is too short). The kernel literally has no physical page to back that address.
Q3. If a file is 2200 bytes and you map 8192 bytes, what happens when you access byte offset 2500?
Byte 2500 is within the first page (0–4095). Since the file only provides data up to byte 2199, bytes 2200–4095 are zero-filled by the kernel (required by SUSv3). So reading byte 2500 returns 0. Writing to byte 2500 is allowed in memory but the change is NOT written back to the file (no corresponding file byte exists). Other processes mapping this file with a large enough length will also see those zero bytes.
Q4. Why does the kernel round up the mmap() length to a page boundary?
The MMU (Memory Management Unit) works in page-sized chunks. There is no hardware mechanism to make part of a page accessible and part inaccessible. The kernel therefore allocates whole pages. The “extra” bytes in the last page are zeroed and not backed by the file if the file ends earlier.
Q5. You mapped 8192 bytes of a 2200-byte file. How do you fix the SIGBUS that occurs on the second page?
Extend the underlying file to at least 4097 bytes using ftruncate(fd, 8192) or by writing data with write(). Once the file has physical backing for that page, the SIGBUS goes away. The already-mapped region dynamically reflects the new file content without remapping.
Q6. What does mmap() with offset=4000 require, and why?
The offset argument must be a multiple of the system page size (4096 on most systems). If you pass offset=4000, mmap() returns EINVAL. The restriction exists because the kernel maps at page granularity — it can only start a mapping at a page boundary. If you want to map starting from byte 4000, you typically map from offset 0 (or the nearest lower page boundary) and then add the remainder to the returned pointer.
Q7. Can you use mmap() to create a file larger than its current size?
Not directly — mmap() alone does not grow the file. However, you can map a large region (beyond EOF), then call ftruncate(fd, new_size) to extend the file. The previously SIGBUS-raising pages in the mapping become valid once the file is extended. This pattern is used to build memory-mapped databases and shared-memory file regions efficiently.

Leave a Reply

Your email address will not be published. Required fields are marked *