Shared File Mappings & Memory-Mapped I/O Linux Memory Mappings Series

 

Shared File Mappings & Memory-Mapped I/O
Chapter 49 ยท TLPI ยท Linux Memory Mappings Series
๐Ÿ“– Section 49.4.2
๐Ÿ”ง MAP_SHARED
โšก mmap I/O vs read/write
๐ŸŽฏ Interview Ready

A shared file mapping maps a file region into a process’s virtual memory using MAP_SHARED. Unlike private mappings, writes to a shared mapping are carried through to the underlying file, and are also immediately visible to any other process that has mapped the same file region. Shared file mappings are the foundation for memory-mapped I/O and fast IPC using files.

๐Ÿ”‘ Key Terms
MAP_SHARED Memory-mapped I/O msync() Page Cache Dirty Pages IPC via mmap Paging Store TLB Page Fault sync_file_range() Zero-copy I/O Buffer Cache

1. What Is a Shared File Mapping?

When you call mmap() with MAP_SHARED and a file descriptor, the kernel maps the file region directly into your virtual address space. The file’s content in the kernel’s page cache becomes accessible as plain memory.

The key behaviors of MAP_SHARED:

  • Multiple processes mapping the same file region share the same physical pages in the kernel’s page cache โ€” no duplication of data in RAM.
  • Writes are visible to all โ€” if Process A writes to the mapped region, Process B (which has the same file mapped) immediately sees the change.
  • Writes reach the file โ€” the kernel’s virtual memory manager eventually writes dirty pages back to the file on disk (write-back). You can force immediate write-back with msync().

Two Processes Sharing the Same File Region via MAP_SHARED
Process A
virtual addr
page table
PT entries
โ†’
Kernel Page Cache
Mapped Pages
(Physical Memory)
All processes share
same physical pages
โ†”
File on Disk
Mapped region
of file
I/O managed
by kernel
Process B
virtual addr
page table
PT entries
โ†’ (same physical pages as Process A)
Key insight: The file acts as the paging store for the mapped region. The kernel uses the file pages directly as backing store โ€” exactly like swap, but backed by the file.

2. Creating a Shared File Mapping

#include <sys/mman.h>
#include <fcntl.h>

/* To create a shared file mapping, the file must be opened with O_RDWR */
int fd = open("datafile.bin", O_RDWR);

/* Map the entire file, shared, readable and writable */
void *addr = mmap(NULL,           /* kernel picks address */
                  file_size,      /* bytes to map         */
                  PROT_READ | PROT_WRITE,
                  MAP_SHARED,     /* <-- KEY: shared mapping */
                  fd,
                  0);             /* from beginning of file */

/* Now addr points to the file's content in memory.
   Read it like memory. Write to it โ€” changes go to the file. */
Important rule: For MAP_SHARED with PROT_WRITE, the file must be opened with O_RDWR. Opening with O_RDONLY and using MAP_SHARED|PROT_WRITE will fail with EACCES.

3. Memory-Mapped I/O โ€” How It Works

Traditional file I/O using read() and write() involves two data transfers every time:

  1. Between the file on disk and the kernel’s buffer cache.
  2. Between the kernel’s buffer cache and a user-space buffer (your char buf[]).

With mmap(), transfer 2 is eliminated. The kernel maps the file’s pages from the buffer cache directly into your virtual address space. You read and write through those pages with no extra copy.

Data Path: read()/write() vs mmap()
Traditional read()/write()
Disk
โ†• Transfer 1
(disk I/O)
Kernel Buffer Cache
โ†• Transfer 2
(extra copy!)
User-space buffer
char buf[4096]
2 data copies, 2 memory buffers
vs
mmap() MAP_SHARED
Disk
โ†• Transfer 1
(disk I/O)
Kernel Buffer Cache
โ†• page table mapping
User Virtual Address

(same physical pages!)
1 data copy, 1 shared buffer

The performance advantages of memory-mapped I/O:

  • Eliminates the user-space copy โ€” no extra memcpy between kernel and user buffers.
  • Reduces memory usage โ€” one buffer is shared between kernel space and user space. If multiple processes map the same file, they all share the same kernel buffer.
  • Simpler code โ€” no need to manage read/write positions, buffer sizes, or loop logic. Just access memory.
When mmap() is NOT faster: For sequential access of a file, mmap() offers little or no benefit over read()/write(). The entire file must transfer from disk to memory either way โ€” the overhead of mapping, page faults, and TLB management can make mmap() slower for simple sequential reads of small files. The win comes with large files + random access.

4. msync() โ€” Flushing Changes to Disk

When you write to a MAP_SHARED mapping, the kernel marks those pages as dirty and will eventually write them back to the file. But “eventually” might be seconds or minutes later. If you need the data on disk immediately (e.g., before another process reads the file, or before a crash), call msync().

#include <sys/mman.h>

int msync(void *addr,    /* Start of mapped region (must be page-aligned) */
          size_t length, /* Number of bytes to sync                        */
          int flags);    /* MS_SYNC or MS_ASYNC or MS_INVALIDATE           */

/* Returns 0 on success, -1 on error */
Flag Behavior When to Use
MS_SYNC Blocks until all dirty pages are written to disk. Returns only after write completes. Critical data that must be on disk before continuing
MS_ASYNC Schedules write-back but returns immediately. Write happens in background. Performance-sensitive paths, checkpoint data
MS_INVALIDATE Invalidates other mappings of the same file so they re-read from disk. After writing, to ensure all processes see fresh data
/* Sync a specific range of a shared mapping to disk */
char *addr = mmap(NULL, file_size, PROT_READ|PROT_WRITE,
                  MAP_SHARED, fd, 0);

/* ... modify addr[0..511] ... */

/* Force just the first 512 bytes to disk synchronously */
/* addr must be page-aligned; round down if needed      */
if (msync(addr, 512, MS_SYNC) == -1) {
    perror("msync");
}

5. Complete Example: Shared File Mapping (t_mmap.c)

This is a simplified version of the classic t_mmap.c example from TLPI. It maps a file, reads a string from the start, optionally writes a new string, then syncs to disk.

/* t_mmap.c โ€” Demonstrate shared file mapping with mmap() */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>

#define MEM_SIZE 10   /* We only map/use the first 10 bytes */

int main(int argc, char *argv[])
{
    char *addr;
    int   fd;

    if (argc < 2) {
        fprintf(stderr, "Usage: %s file [new-string]\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    /* Open the file read-write (required for MAP_SHARED + PROT_WRITE) */
    fd = open(argv[1], O_RDWR);
    if (fd == -1) {
        perror("open");
        exit(EXIT_FAILURE);
    }

    /* Map the first MEM_SIZE bytes as a SHARED mapping */
    addr = mmap(NULL,
                MEM_SIZE,
                PROT_READ | PROT_WRITE,
                MAP_SHARED,   /* <-- writes go back to the file */
                fd,
                0);
    if (addr == MAP_FAILED) {
        perror("mmap");
        exit(EXIT_FAILURE);
    }

    /* File descriptor is no longer needed after mmap() succeeds */
    if (close(fd) == -1) {
        perror("close");
        exit(EXIT_FAILURE);
    }

    /* Read: print whatever string is at the start of the mapping */
    printf("Current string=%.10s\n", addr);

    /* Write: if a new string was given, copy it into the mapped region */
    if (argc > 2) {
        if (strlen(argv[2]) >= MEM_SIZE) {
            fprintf(stderr, "String too long (max %d chars)\n", MEM_SIZE - 1);
            exit(EXIT_FAILURE);
        }

        strncpy(addr, argv[2], MEM_SIZE - 1);
        addr[MEM_SIZE - 1] = '\0';  /* ensure null termination */

        /* Sync the dirty page back to the file synchronously */
        /* Without msync(), data may not reach disk before process exits */
        if (msync(addr, MEM_SIZE, MS_SYNC) == -1) {
            perror("msync");
            exit(EXIT_FAILURE);
        }

        printf("Copied \"%s\" to shared memory\n", argv[2]);
    }

    /* Unmap the region */
    if (munmap(addr, MEM_SIZE) == -1) {
        perror("munmap");
        exit(EXIT_FAILURE);
    }

    return 0;
}
/* Build and run sequence:
   gcc -o t_mmap t_mmap.c

   Step 1: Create a 1024-byte file filled with zeros
   $ dd if=/dev/zero of=s.txt bs=1 count=1024
   1024+0 records in
   1024+0 records out

   Step 2: Write "hello" into the mapped region
   $ ./t_mmap s.txt hello
   Current string=
   Copied "hello" to shared memory

   (Current string was empty because file started with null bytes)

   Step 3: Read and overwrite with "goodbye"
   $ ./t_mmap s.txt goodbye
   Current string=hello
   Copied "goodbye" to shared memory

   Step 4: Verify the file on disk was actually modified
   $ od -c -w8 s.txt
   0000000   g   o   o   d   b   y   e  \0
   0000010  \0  \0  \0  \0  \0  \0  \0  \0
   *
   0002000

   The file on disk now contains "goodbye" โ€” MAP_SHARED wrote it back!
*/

6. IPC Using Shared File Mappings

Since all processes with a shared mapping of the same file see the same physical pages, writing to the mapping in one process is immediately visible in another process. This makes shared file mappings a method of fast Inter-Process Communication (IPC).

The key advantage over System V shared memory (Chapter 48): with a shared file mapping, the data persists โ€” it is stored in the file. Even after all processes exit and the machine reboots, the data is still in the file. System V shared memory, by contrast, is lost when the system restarts.

IPC via Shared File Mapping
Producer Process
addr[0] = data;
msync(addr, n, MS_SYNC);
writes
โ†’
Shared File
shared.dat
(on disk + in page cache)
โ†’
reads
Consumer Process
data = addr[0];
/* sees latest value */

Important: When multiple processes write to a shared mapping concurrently, you need to synchronize access to avoid race conditions. Common techniques:

  • POSIX semaphores (Chapter 53)
  • File locking with flock() / fcntl() (Chapter 55)
  • Mutexes placed inside the shared memory itself (with PTHREAD_PROCESS_SHARED attribute)

7. Code Example: Writer and Reader via Shared File Mapping

/* shm_writer.c โ€” Write a counter into a shared file mapping */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

#define SHM_FILE  "/tmp/shared_counter.dat"
#define SHM_SIZE  sizeof(int)

int main(void)
{
    int fd;
    int *counter;

    /* Create or open the shared file */
    fd = open(SHM_FILE, O_CREAT | O_RDWR, 0644);
    if (fd == -1) { perror("open"); exit(1); }

    /* Make the file exactly SHM_SIZE bytes
       (required before mapping โ€” file must be large enough) */
    if (ftruncate(fd, SHM_SIZE) == -1) { perror("ftruncate"); exit(1); }

    /* Map it shared and writable */
    counter = mmap(NULL, SHM_SIZE,
                   PROT_READ | PROT_WRITE,
                   MAP_SHARED, fd, 0);
    if (counter == MAP_FAILED) { perror("mmap"); exit(1); }
    close(fd);   /* fd no longer needed */

    /* Increment the counter 5 times */
    for (int i = 0; i < 5; i++) {
        (*counter)++;
        printf("Writer: counter = %d\n", *counter);
        msync(counter, SHM_SIZE, MS_SYNC);   /* flush to file */
        sleep(1);
    }

    munmap(counter, SHM_SIZE);
    return 0;
}
/* shm_reader.c โ€” Read the counter from the same shared file */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

#define SHM_FILE  "/tmp/shared_counter.dat"
#define SHM_SIZE  sizeof(int)

int main(void)
{
    int fd;
    int *counter;

    fd = open(SHM_FILE, O_RDWR);
    if (fd == -1) { perror("open"); exit(1); }

    counter = mmap(NULL, SHM_SIZE,
                   PROT_READ | PROT_WRITE,
                   MAP_SHARED, fd, 0);
    if (counter == MAP_FAILED) { perror("mmap"); exit(1); }
    close(fd);

    /* Poll the counter every second */
    for (int i = 0; i < 5; i++) {
        printf("Reader: counter = %d\n", *counter);
        /* Because MAP_SHARED shares physical pages, no re-read needed.
           The writer's changes are instantly visible in this process. */
        sleep(1);
    }

    munmap(counter, SHM_SIZE);
    return 0;
}

/* Run in two terminals:
   Terminal 1: ./shm_writer
   Terminal 2: ./shm_reader
   Reader will see counter values set by writer in real time.
   After both exit, the counter value persists in /tmp/shared_counter.dat */

8. When to Use mmap() vs read()/write()

Scenario Prefer mmap() Prefer read()/write()
Access pattern โœ“ Random access to large files โœ— Sequential read of entire file
File size โœ“ Large files (MBs to GBs) โœ— Small files (<page size)
Multiple processes โœ“ Multiple processes reading same file (shared pages) โœ— Single process, one-time read
Code complexity โœ“ Simplifies logic (no offset tracking) โœ— Overhead of mapping, page faults, TLB
IPC / persistence โœ“ Shared file mapping for persistent IPC โœ— Complex multiprocess write synchronization
Small I/Os โœ— Overhead can exceed simple syscall โœ“ Simpler for short, one-shot reads

9. Code Example: Structured Data over Shared Mapping

A common pattern in memory-mapped I/O is to define a C struct that matches the layout of data in the file, then cast the mapping pointer to that struct type. No read()/fseek()/write() loops needed.

/* struct_mmap.c โ€” Map a file as a structured record */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

/* Define a struct matching the file layout */
typedef struct {
    int   sensor_id;
    float temperature;
    float humidity;
    char  location[32];
} SensorRecord;

int main(void)
{
    const char *path = "/tmp/sensor.dat";
    int fd;
    SensorRecord *rec;

    /* Create the file with exactly sizeof(SensorRecord) bytes */
    fd = open(path, O_CREAT | O_RDWR | O_TRUNC, 0644);
    if (fd == -1) { perror("open"); exit(1); }

    /* Extend file to hold one record (ftruncate zeroes the space) */
    if (ftruncate(fd, sizeof(SensorRecord)) == -1) {
        perror("ftruncate"); exit(1);
    }

    /* Map the file โ€” cast directly to struct pointer */
    rec = mmap(NULL, sizeof(SensorRecord),
               PROT_READ | PROT_WRITE,
               MAP_SHARED, fd, 0);
    if (rec == MAP_FAILED) { perror("mmap"); exit(1); }
    close(fd);

    /* Write fields โ€” no fwrite(), no fseek(), just assign */
    rec->sensor_id   = 42;
    rec->temperature = 36.6f;
    rec->humidity    = 65.0f;
    strncpy(rec->location, "ServerRoom-A", sizeof(rec->location) - 1);

    /* Flush to file */
    msync(rec, sizeof(SensorRecord), MS_SYNC);
    printf("Written: sensor %d, temp=%.1f, hum=%.1f, loc=%s\n",
           rec->sensor_id, rec->temperature,
           rec->humidity,  rec->location);

    munmap(rec, sizeof(SensorRecord));

    /* Now read it back by opening and mapping again */
    fd = open(path, O_RDONLY);
    rec = mmap(NULL, sizeof(SensorRecord),
               PROT_READ, MAP_SHARED, fd, 0);
    if (rec == MAP_FAILED) { perror("mmap read"); exit(1); }
    close(fd);

    printf("Read back: id=%d temp=%.1f hum=%.1f loc=%s\n",
           rec->sensor_id, rec->temperature,
           rec->humidity,  rec->location);

    munmap(rec, sizeof(SensorRecord));
    return 0;
}

10. Disadvantages of Memory-Mapped I/O

  • Overhead for small I/Os: Setting up and tearing down a mapping (mapping the region, handling page faults, flushing the TLB, calling munmap()) can cost more than a simple read()/write() for small data.
  • Write-back complexity: The kernel handles write-back automatically, but it can sometimes be inefficient. Using msync() or sync_file_range() gives you manual control.
  • Page-size constraints: The offset into the file must be page-aligned. If you need to map an arbitrary offset, you may need to round down and adjust the pointer.
  • File size must be set in advance: For writable mappings on new files, you must pre-allocate space with ftruncate() before calling mmap(). You cannot extend a file by writing past its end through a mapping.
  • Address space consumption: On 32-bit systems, large mappings can exhaust the virtual address space. Less of a concern on 64-bit systems.

11. Interview Questions & Answers

Q1: What is memory-mapped I/O and how is it different from read()/write()?
Answer: Memory-mapped I/O uses mmap(MAP_SHARED) to map a file’s content directly into a process’s virtual address space, so file I/O becomes simple memory access (no explicit read()/write() calls). The key difference: read()/write() require two data copies โ€” disk to kernel buffer cache, then kernel buffer to user-space buffer. mmap() eliminates the second copy by mapping the kernel’s page cache pages directly into the process’s virtual address space, saving both memory and CPU time.
Q2: Why must you call ftruncate() before using mmap() to create/extend a file?
Answer: mmap() maps existing file pages into memory. If the file is too short, accessing the mapped region beyond the file’s end results in a SIGBUS signal (bus error). ftruncate() extends the file to the required size (zeroing the new space), ensuring that valid pages exist to back the entire mapped region.
Q3: What is the purpose of msync()? When should you call it?
Answer: msync() forces dirty (modified) pages of a MAP_SHARED mapping to be written back to the underlying file. Without it, the kernel may delay the write-back for performance. You should call msync() when: (a) data must be on disk before another process reads the file, (b) before signaling another process to consume the data, (c) as a checkpoint for crash-safety, or (d) before cleanly unmapping. Use MS_SYNC for blocking write-back, MS_ASYNC for non-blocking scheduled write-back.
Q4: How does a shared file mapping enable IPC between processes?
Answer: All processes that map the same file region with MAP_SHARED share the same physical pages in the kernel’s page cache. Writing to the mapped memory in one process is immediately reflected in another process’s view of the same region (no explicit send/receive needed). This gives shared-memory IPC speeds while also persisting data to a file. Unlike System V shared memory, the IPC state survives process restarts because the data lives in a real file.
Q5: What happens if the process writes to a MAP_SHARED mapping but exits without calling msync()?
Answer: The kernel will eventually write the dirty pages to the file through its normal write-back mechanism (typically within a few seconds or minutes, or when the page is evicted from cache). The data is not lost โ€” the kernel guarantees it will flush dirty pages to disk. However, if there is a system crash before the write-back occurs, the data can be lost. For crash safety, always call msync(MS_SYNC) before considering critical data committed to disk.
Q6: In what situations is mmap() slower than read()/write()?
Answer: For small files or small I/Os, the overhead of setting up the mapping (creating page table entries), handling page faults (each page must be faulted in on first access), and tearing down the mapping (TLB flush on munmap) can exceed the cost of a simple read()/write() system call. Also, for sequential access of files that are read only once, both methods transfer the same amount of data from disk, and the user-space/kernel-space copy saved by mmap is negligible compared to disk I/O time.
Q7: What is a SIGBUS error in the context of mmap()?
Answer: A SIGBUS (bus error) signal is sent to a process when it accesses a mapped region that has no corresponding file backing. The most common cause: the process maps a file of size N bytes, then the file is truncated to less than N bytes by another process, and the mapping process tries to access the pages that are now beyond the file’s end. Unlike SIGSEGV (which is accessing outside any mapped region), SIGBUS means the address is mapped but the underlying file page does not exist.
Q8: How is a shared file mapping different from System V shared memory?
Answer:

  • Persistence: Shared file mapping data lives in a regular file โ€” it persists across process restarts and system reboots. System V shared memory (shmget()) data is lost when the system reboots (though it persists between process restarts unless explicitly removed with ipcrm).
  • Backing: File mapping is backed by a real file path; System V uses an IPC key and kernel-managed object.
  • Creation: File mapping uses open() + mmap(); System V uses shmget() + shmat().
  • Visibility: Both share physical pages between processes with equal speed.
Q9: What is the TLB and why does it matter for mmap() performance?
Answer: The TLB (Translation Lookaside Buffer) is a CPU cache that stores recent virtual-to-physical address translations. Every time you call munmap(), the kernel must flush the TLB entries for the unmapped region on all CPUs (an expensive operation called a TLB shootdown, especially on multi-core systems). This is part of the overhead that makes mmap() less efficient for small, frequently created-and-destroyed mappings compared to reusing a single buffer with read()/write().

Chapter 49 โ€” Memory Mappings Series

You have covered private and shared file mappings. Next: anonymous mappings, mmap() for IPC without files, and msync() deep dive.

โ† Private File Mappings Back to EmbeddedPathashala

Leave a Reply

Your email address will not be published. Required fields are marked *