When you call mmap() the kernel maps your file into the process address space in units of pages (usually 4096 bytes on x86). But your file size and your requested mapping length are rarely exact multiples of the page size. What happens in the gaps? This topic explains two important cases:
- Mapping length is not a multiple of page size, but mapping stays within the file.
- Mapping extends beyond the end of the file (EOF).
Getting these wrong causes SIGSEGV or SIGBUS crashes at runtime.
Before diving into boundary cases, here is the mmap() signature as a reminder:
#include <sys/mman.h>
void *mmap(void *addr, /* Hint for kernel where to place mapping (usually NULL) */
size_t length, /* How many bytes to map */
int prot, /* PROT_READ | PROT_WRITE | PROT_EXEC | PROT_NONE */
int flags, /* MAP_SHARED or MAP_PRIVATE (+ others) */
int fd, /* File descriptor of the file to map */
off_t offset); /* Offset inside the file to start mapping from */
/* Returns: start address of mapping on success, MAP_FAILED on error */
The offset must be a multiple of the page size. The length does not have to be, but the kernel rounds it up internally to the next page boundary.
Suppose the file is 9500 bytes and you map 6000 bytes from offset 0. The system page size is 4096 bytes. Here is what the kernel does:
| File on Disk (9500 bytes) | |||
| bytes 0 – 5999 mapped to process (accessible) |
bytes 6000 – 8191 remainder of page (accessible, mapped to file bytes 6000–8191) |
bytes 8192 – 9499 unmapped (SIGSEGV if accessed) |
beyond file unmapped |
| mmap requested: 0–5999 | kernel rounded up to page boundary 8191 | access here → SIGSEGV | |
mmap(NULL, 6000, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0)Key rules for Case 1:
- You asked for 6000 bytes. The kernel rounds up to the next page: 8192 bytes (2 full pages).
- Bytes 0–5999 are accessible and map to the file directly.
- Bytes 6000–8191 are the “remainder of the page”. They are accessible but they also map to the actual file bytes 6000–8191 (since the file is bigger).
- Bytes 8192 and beyond are not mapped. Accessing them raises SIGSEGV.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>
int main(void)
{
int fd;
void *addr;
size_t map_len = 6000; /* Not a multiple of page size (4096) */
struct stat sb;
/* Open the file for reading and writing */
fd = open("testfile.dat", O_RDWR);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
/* Get file size for information */
fstat(fd, &sb);
printf("File size: %ld bytes\n", (long)sb.st_size);
/* Map 6000 bytes from offset 0.
* Kernel internally rounds up to 8192 bytes (2 pages).
* File is larger (9500 bytes), so bytes 6000–8191 also map to file.
*/
addr = mmap(NULL, map_len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
perror("mmap");
exit(EXIT_FAILURE);
}
/* fd is no longer needed after mmap() succeeds */
close(fd);
/* Access bytes within the requested range – safe */
printf("First byte: %d\n", ((unsigned char *)addr)[0]);
printf("Last byte of requested range: %d\n", ((unsigned char *)addr)[5999]);
/* NOTE: accessing addr[8192] or beyond would raise SIGSEGV
* because that address is outside the mapped region.
*/
/* Unmap when done */
if (munmap(addr, map_len) == -1) {
perror("munmap");
exit(EXIT_FAILURE);
}
return 0;
}
Now suppose the file is only 2200 bytes but you map 8192 bytes (2 pages). This is a much trickier case. The kernel creates the mapping, but different parts behave differently:
| mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0) — File is only 2200 bytes | |||
| 0 – 2199 Accessible Mapped to file |
2200 – 4095 Accessible NOT mapped to file Initialized to 0 (SUSv3 requirement) |
4096 – 8191 Accessing raises SIGBUS (no file pages here) |
8192+ Accessing raises SIGSEGV (outside mapping) |
| Page 0: fully in file | Page 0 remainder: zeroed | Page 1: SIGBUS zone | Beyond mapping |
Breaking this down:
- Bytes 0–2199: These exist in the file. They are accessible and mapped to file data normally.
- Bytes 2200–4095: These are in the first page (page 0), but beyond EOF. The kernel fills them with zeros as required by SUSv3. They are accessible. However, writes to this region are not written back to the file.
- Bytes 4096–8191: This is the second page (page 1). There is no file backing here at all. Accessing these addresses raises SIGBUS — the kernel signals “bus error, no file here”.
- Bytes 8192+: Outside the mapping entirely. Accessing raises SIGSEGV.
| Signal | When Raised | Default Action | Meaning |
|---|---|---|---|
| SIGSEGV | Access outside any mapping, or wrong permission | Kill process + core dump | Segmentation fault — invalid address |
| SIGBUS | Access within a valid mapping but beyond EOF (on a full page) | Kill process + core dump | Bus error — no file backing for this page |
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
int main(void)
{
int fd;
char *addr;
long page_size;
page_size = sysconf(_SC_PAGESIZE); /* Typically 4096 */
printf("System page size: %ld\n", page_size);
/* Create a small file: only 2200 bytes */
fd = open("small.dat", O_RDWR | O_CREAT | O_TRUNC, 0600);
if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }
/* Write 2200 bytes of 'A' into the file */
char buf[2200];
memset(buf, 'A', sizeof(buf));
if (write(fd, buf, sizeof(buf)) != sizeof(buf)) {
perror("write"); exit(EXIT_FAILURE);
}
/* Map 2 full pages (8192 bytes), even though file is only 2200 bytes */
addr = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
close(fd);
/* Safe: bytes 0–2199 are mapped to the file */
printf("addr[0] = '%c' (file data)\n", addr[0]);
printf("addr[2199] = '%c' (file data)\n", addr[2199]);
/* Safe: bytes 2200–4095 exist in the first page but are zeroed (not file) */
printf("addr[2200] = %d (zero-padded, NOT in file)\n", (int)(unsigned char)addr[2200]);
printf("addr[4095] = %d (zero-padded, NOT in file)\n", (int)(unsigned char)addr[4095]);
/* DANGER: addr[4096] is on the SECOND page which has no file backing.
* Accessing it would raise SIGBUS.
* Uncomment the next line to see SIGBUS in action (process will crash):
*/
/* printf("addr[4096] = %d\n", (int)(unsigned char)addr[4096]); */
printf("If you access addr[4096], you get SIGBUS.\n");
printf("If you access addr[%ld], you get SIGSEGV.\n", 2 * page_size);
munmap(addr, 2 * page_size);
return 0;
}
A common pattern is to create an empty (or small) file, map a large region, and then grow the file using ftruncate() or write(). Once the file is extended, the previously SIGBUS-producing pages become safe to use. This is how file-backed shared memory is often implemented.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
#define MAP_SIZE 8192 /* 2 pages */
#define INIT_SIZE 2200 /* Small initial file */
int main(void)
{
int fd;
char *addr;
/* Step 1: Create a small file */
fd = open("grow.dat", O_RDWR | O_CREAT | O_TRUNC, 0600);
if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }
/* Write INIT_SIZE bytes so the file exists */
char buf[INIT_SIZE];
memset(buf, 0, sizeof(buf));
write(fd, buf, sizeof(buf));
/* Step 2: Map MAP_SIZE (8192) bytes – more than the file */
addr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
printf("File size before ftruncate: %d bytes\n", INIT_SIZE);
printf("addr[4096] would raise SIGBUS right now.\n");
/* Step 3: Extend the file to MAP_SIZE using ftruncate()
* Now the file is 8192 bytes. Pages 0 and 1 both have file backing.
* addr[4096] is now SAFE to access.
*/
if (ftruncate(fd, MAP_SIZE) == -1) {
perror("ftruncate"); exit(EXIT_FAILURE);
}
printf("File extended to %d bytes via ftruncate()\n", MAP_SIZE);
/* Step 4: Now addr[4096] is safe */
addr[4096] = 'X';
printf("addr[4096] = '%c' (now safe after ftruncate)\n", addr[4096]);
close(fd);
munmap(addr, MAP_SIZE);
return 0;
}
- The kernel rounds up the mapping length to the next page boundary.
- Bytes within the rounded-up first page but beyond EOF are accessible and zeroed (SUSv3).
- Pages completely beyond EOF in the mapping → SIGBUS on access.
- Addresses beyond the entire mapping → SIGSEGV on access.
- You can cure SIGBUS by extending the file with
ftruncate()orwrite(). - Always check the page size at runtime using
sysconf(_SC_PAGESIZE).
#include <stdio.h>
#include <unistd.h>
/* Round size up to the next multiple of page_size */
static size_t round_up_to_page(size_t size, long page_size)
{
return (size + page_size - 1) & ~(page_size - 1);
}
int main(void)
{
long page_size = sysconf(_SC_PAGESIZE);
printf("Page size: %ld bytes\n", page_size);
/* Example: file is 5000 bytes, you want to align mapping to page boundary */
size_t file_size = 5000;
size_t aligned = round_up_to_page(file_size, page_size);
printf("5000 bytes rounded up to page boundary: %zu bytes\n", aligned);
/* Output: 8192 bytes (next 4096-byte boundary) */
size_t file_size2 = 8192;
size_t aligned2 = round_up_to_page(file_size2, page_size);
printf("8192 bytes rounded up to page boundary: %zu bytes\n", aligned2);
/* Output: 8192 bytes (already aligned) */
return 0;
}
SIGBUS — you accessed an address inside a valid mapping but the corresponding page in the underlying file does not exist (the file is too short). The kernel literally has no physical page to back that address.
ftruncate(fd, 8192) or by writing data with write(). Once the file has physical backing for that page, the SIGBUS goes away. The already-mapped region dynamically reflects the new file content without remapping.offset argument must be a multiple of the system page size (4096 on most systems). If you pass offset=4000, mmap() returns EINVAL. The restriction exists because the kernel maps at page granularity — it can only start a mapping at a page boundary. If you want to map starting from byte 4000, you typically map from offset 0 (or the nearest lower page boundary) and then add the remainder to the returned pointer.mmap() alone does not grow the file. However, you can map a large region (beyond EOF), then call ftruncate(fd, new_size) to extend the file. The previously SIGBUS-raising pages in the mapping become valid once the file is extended. This pattern is used to build memory-mapped databases and shared-memory file regions efficiently.