When you write to a MAP_SHARED mapping, the kernel eventually writes those changes to the underlying file on disk. But “eventually” can mean seconds or even minutes later — the kernel batches writes for efficiency. If the system crashes before flushing, your data is lost.
msync() gives you explicit control over when the flush happens. Databases, journaling systems, and any application requiring crash safety use it to ensure data is durable on disk before proceeding.
#include <sys/mman.h>
int msync(void *addr, /* Start of the mapped region to sync */
size_t length, /* Number of bytes to sync */
int flags); /* MS_SYNC | MS_ASYNC | MS_INVALIDATE */
/* Returns: 0 on success, -1 on error (errno set) */
addr argument must be page-aligned (SUSv3). The length is internally rounded up to the next page boundary. Always pass the original mmap() return address as addr.| Flag | Behaviour | When to Use | Blocks? |
|---|---|---|---|
| MS_SYNC | Waits until all modified pages have been physically written to the disk storage device. | Database commits, safety-critical writes, before shutdown. | Yes – blocks until done |
| MS_ASYNC | Schedules a write of modified pages to disk. Returns immediately. Pages become visible to other processes reading the file via read() right away. |
Best-effort flushing without stalling the process. | No – returns immediately |
| MS_INVALIDATE | After flushing, marks cached pages as stale. On next access, pages are re-read from the file. Makes changes by other processes (that wrote via write()) visible in the mapping. |
When another process has updated the file via write() and you need to see the new data. |
No additional blocking |
| MS_SYNC | your code calls msync() | ← kernel writes dirty pages to disk → | msync() returns (guaranteed on disk) |
your code continues |
| MS_ASYNC | your code calls msync() | msync() returns immediately | your code continues | kernel writes later (pdflush/writeback) |
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <string.h>
#include <unistd.h>
#define MAP_SIZE 4096
int main(void)
{
int fd;
char *addr;
fd = open("journal.dat", O_RDWR | O_CREAT, 0600);
if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }
/* Ensure file is at least MAP_SIZE bytes */
if (ftruncate(fd, MAP_SIZE) == -1) {
perror("ftruncate"); exit(EXIT_FAILURE);
}
addr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
close(fd);
/* Write data to the mapping – this modifies kernel page cache */
strcpy(addr, "TRANSACTION: debit account=1234 amount=500");
printf("Written to mapping: %s\n", addr);
/* MS_SYNC: blocks until data is physically on disk.
* Safe to power-cycle the machine after this returns.
* Use this for database commits, journaling, critical checkpoints.
*/
if (msync(addr, MAP_SIZE, MS_SYNC) == -1) {
perror("msync MS_SYNC");
exit(EXIT_FAILURE);
}
printf("MS_SYNC done: data is guaranteed on disk.\n");
munmap(addr, MAP_SIZE);
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <string.h>
#include <unistd.h>
#define MAP_SIZE 4096
int main(void)
{
int fd;
char *addr;
fd = open("log.dat", O_RDWR | O_CREAT, 0600);
ftruncate(fd, MAP_SIZE);
addr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
close(fd);
/* Write a log entry */
snprintf(addr, MAP_SIZE, "LOG: process started pid=%d", (int)getpid());
printf("Written: %s\n", addr);
/* MS_ASYNC: returns immediately.
* Kernel will schedule the write via pdflush/writeback kernel threads.
* The changes are visible to other processes doing read() on the file
* right away, but may not yet be on physical disk.
*/
if (msync(addr, MAP_SIZE, MS_ASYNC) == -1) {
perror("msync MS_ASYNC");
exit(EXIT_FAILURE);
}
printf("MS_ASYNC: write scheduled, we did not block.\n");
/* If you also want hard disk flush later, follow up with fsync()
* on a file descriptor pointing to the same file.
*/
int fd2 = open("log.dat", O_RDONLY);
if (fsync(fd2) == 0)
printf("fsync() after MS_ASYNC: now on disk.\n");
close(fd2);
munmap(addr, MAP_SIZE);
return 0;
}
/*
* Scenario: Process A has a MAP_SHARED mapping of shared.dat.
* Process B uses write() to update the same file.
* Without MS_INVALIDATE, Process A might still see the old cached data.
* MS_INVALIDATE tells the kernel: "discard my cached pages, re-read from file".
*
* On Linux with a unified VM system this is usually not needed because
* mmap() and read()/write() share the same page cache. But for
* portability to non-unified systems (older UNIX), you should use it.
*/
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>
#define MAP_SIZE 256
/* Simulates Process A (reader via mmap) */
int main(void)
{
int fd;
char *addr;
fd = open("shared.dat", O_RDWR | O_CREAT, 0600);
ftruncate(fd, MAP_SIZE);
addr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
/* Initial content */
memcpy(addr, "INITIAL DATA", 13);
printf("Before external write: '%s'\n", addr);
/* Simulate another process updating the file using write():
* In real code, this would be a separate process.
*/
lseek(fd, 0, SEEK_SET);
write(fd, "UPDATED BY WRITER", 18);
/* Without MS_INVALIDATE, we might still see "INITIAL DATA"
* in the mapping on a non-unified VM system.
* MS_INVALIDATE discards stale cached pages and re-reads from file.
*/
if (msync(addr, MAP_SIZE, MS_INVALIDATE) == -1) {
perror("msync MS_INVALIDATE");
exit(EXIT_FAILURE);
}
printf("After MS_INVALIDATE: '%s'\n", addr);
/* Should now show "UPDATED BY WRITER" */
close(fd);
munmap(addr, MAP_SIZE);
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>
#define MAP_SIZE 4096
/*
* Full bidirectional sync pattern:
* 1. Flush our dirty pages to disk (MS_SYNC).
* 2. Invalidate our cached pages so we pick up changes made
* by other writers (MS_INVALIDATE).
* Both flags can be OR'd together in one call.
*/
int main(void)
{
int fd;
char *addr;
fd = open("db_page.dat", O_RDWR | O_CREAT, 0600);
ftruncate(fd, MAP_SIZE);
addr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
close(fd);
/* Write a database record into the page */
snprintf(addr, MAP_SIZE,
"DB_RECORD: id=42 value=99 checksum=0xDEAD");
/* Sync: flush our writes AND invalidate stale cache in one call.
* After this:
* - Our data is on disk (MS_SYNC).
* - Any external updates to the file are visible to us (MS_INVALIDATE).
*/
if (msync(addr, MAP_SIZE, MS_SYNC | MS_INVALIDATE) == -1) {
perror("msync");
exit(EXIT_FAILURE);
}
printf("Record synced and cache refreshed: %s\n", addr);
munmap(addr, MAP_SIZE);
return 0;
}
On Linux, mmap() and read()/write() share the same page cache. There is only one copy of a file’s data in memory, whether you accessed it via a mapping or via system calls. This is called a unified virtual memory (VM) system.
| Linux Unified VM – One page cache, two access paths | ||
| Process A Uses mmap() (direct page access) |
Kernel Page Cache (single copy of file data) mmap and read/write see the same pages |
Process B Uses read()/write() (via system call) |
| On Linux: consistent views guaranteed. msync() only needed to push data from kernel cache → disk. | ||
The practical consequence for Linux programmers:
- You do not need
MS_INVALIDATEto see writes made by other processes viawrite()— Linux already keeps the mapping and page cache in sync. - You do not need
msync()at all for visibility between processes — only for disk durability. - However, for portability to non-Linux UNIX systems (which may not have a unified VM), you should still call
msync()and useMS_INVALIDATEappropriately.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>
#define MAP_SIZE 4096
int main(void)
{
int fd;
char *addr;
fd = open("sync_test.dat", O_RDWR | O_CREAT, 0600);
ftruncate(fd, MAP_SIZE);
addr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
strcpy(addr, "Test data for sync comparison");
/* Method 1: msync(MS_SYNC)
* Synchronous flush. Blocks until disk write is complete.
* Works directly on the memory address.
*/
if (msync(addr, MAP_SIZE, MS_SYNC) == 0)
printf("Method 1: msync(MS_SYNC) – blocking flush – done.\n");
/* Method 2: msync(MS_ASYNC) followed by fsync()
* First schedules the write (non-blocking),
* then fsync() blocks until the file descriptor's data hits disk.
* (Linux-specific extension to the standard)
*/
msync(addr, MAP_SIZE, MS_ASYNC);
if (fsync(fd) == 0)
printf("Method 2: msync(MS_ASYNC) + fsync() – done.\n");
/* Method 3: msync(MS_ASYNC) followed by fdatasync()
* fdatasync() is like fsync() but skips metadata (timestamps, etc.)
* Faster for pure data durability.
*/
msync(addr, MAP_SIZE, MS_ASYNC);
if (fdatasync(fd) == 0)
printf("Method 3: msync(MS_ASYNC) + fdatasync() – done (faster).\n");
close(fd);
munmap(addr, MAP_SIZE);
return 0;
}
/*
* Producer process: writes records to a shared file via mmap().
* Calls msync(MS_SYNC) after each record to ensure durability.
* Consumer process (not shown) maps the same file and reads records.
*
* This pattern is used in file-backed IPC, embedded databases,
* and any shared-memory-like design that survives process restart.
*/
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>
#include <time.h>
#define MAX_RECORDS 10
#define RECORD_SIZE 64
#define MAP_SIZE (MAX_RECORDS * RECORD_SIZE)
/* Simple record structure */
typedef struct {
int id;
char message[56];
} Record;
int main(void)
{
int fd;
Record *records;
int i;
fd = open("ipc_file.dat", O_RDWR | O_CREAT, 0600);
if (fd == -1) { perror("open"); exit(EXIT_FAILURE); }
ftruncate(fd, MAP_SIZE);
records = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (records == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
close(fd);
/* Write records one by one, syncing each to disk */
for (i = 0; i < MAX_RECORDS; i++) {
records[i].id = i + 1;
snprintf(records[i].message, sizeof(records[i].message),
"Record %d from pid %d", i + 1, (int)getpid());
/* Sync this specific record's page to disk before writing the next */
if (msync(&records[i], sizeof(Record), MS_SYNC) == -1) {
perror("msync");
break;
}
printf("Synced record %d: '%s'\n", records[i].id, records[i].message);
usleep(100000); /* 100ms delay to simulate real work */
}
printf("All records written and synced.\n");
printf("Consumer can now read ipc_file.dat via mmap().\n");
munmap(records, MAP_SIZE);
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>
/*
* msync() does not have to cover the entire mapping.
* You can sync just the pages that were modified.
* This is important for performance on large mappings (e.g., 1GB database file).
*/
#define TOTAL_SIZE (1024 * 1024) /* 1 MB mapping */
#define PAGE_SIZE 4096
int main(void)
{
int fd;
char *base;
long page_size;
page_size = sysconf(_SC_PAGESIZE);
fd = open("large_file.dat", O_RDWR | O_CREAT, 0600);
ftruncate(fd, TOTAL_SIZE);
base = mmap(NULL, TOTAL_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (base == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); }
close(fd);
/* Modify only one page at offset 512KB */
char *dirty_page = base + 512 * 1024;
memcpy(dirty_page, "Modified page at 512KB", 23);
/* Sync ONLY the dirty page, not the entire 1MB.
* addr must be page-aligned, so round down to page boundary.
*/
void *aligned_addr = (void *)((uintptr_t)dirty_page & ~(page_size - 1));
if (msync(aligned_addr, page_size, MS_SYNC) == -1) {
perror("partial msync");
exit(EXIT_FAILURE);
}
printf("Synced only 1 page (4KB) instead of the full 1MB mapping.\n");
printf("This is much faster for large files with sparse writes.\n");
munmap(base, TOTAL_SIZE);
return 0;
}
- Use
MS_SYNCwhen you need a guarantee that data survived to disk (databases, journals, checkpoints). - Use
MS_ASYNCwhen you want to hint the kernel to flush without stalling your thread. - Use
MS_INVALIDATEwhen you want to pick up writes made to the file by other processes viawrite()— especially important for portability to non-Linux systems. - On Linux you can pair
MS_ASYNCwithfsync()orfdatasync()for a controlled flush. - The
addrargument must be page-aligned. Pass the originalmmap()return value, or round down to a page boundary yourself. - For large mappings with sparse writes, sync only the dirty pages — not the entire mapping.
- Even though Linux’s unified VM makes
msync()unnecessary for inter-process visibility, always use it in portable code.
msync(MS_SYNC) gives you an explicit, synchronous guarantee: when it returns, the data is on the physical storage device. This is essential for any application requiring crash safety.MS_ASYNC is non-blocking. It schedules the write and returns immediately. The kernel’s writeback mechanism will flush the pages “soon.” After MS_ASYNC, the memory and the kernel buffer cache are in sync (other processes doing read() will see updated data), but the disk may not yet reflect the changes.
MS_INVALIDATE marks the pages in the mapped region as invalid (stale). On the next access, the kernel re-fetches those pages from the underlying file. This makes changes written to the file by another process (via write()) visible inside the current process’s mapping. You use it when two or more processes interact with the same file — one via mapping (mmap) and another via I/O calls (write) — and you need the mapping to reflect the latest file content.mmap() and read()/write() share the same page cache. If Process B does write(fd, ...), Process A’s mapping of the same file will see the change immediately without needing MS_INVALIDATE. However, SUSv3 does not require this, and older or non-Linux UNIX systems may not have a unified VM. For portable code, use MS_INVALIDATE.addr must be page-aligned. Passing a non-aligned address may return EINVAL on many implementations. SUSv4 allows implementations to either require alignment or silently round down — but you should always pass a page-aligned address for portability. In practice, pass the original mmap() return value (which is always page-aligned) or round down with (addr & ~(pagesize-1)).msync(MS_ASYNC):1. Call
fsync(fd) on the file descriptor — this blocks until the kernel buffer cache is written to disk (data + metadata).2. Call
fdatasync(fd) — like fsync() but skips updating metadata timestamps, making it faster when only data durability matters.Both options are non-standard extensions beyond the strict SUSv3 specification.
1. Write the redo log page using
mmap().2. Call
msync(log_page, page_size, MS_SYNC) to ensure the log is durable before modifying data pages.3. Write data pages to the mapping.
4. Call
msync(data_page, len, MS_SYNC) to commit data to disk.Only
MS_SYNC (not MS_ASYNC) provides the write-ahead logging guarantee that prevents data corruption on crash.