Linux File I/O – Atomicity & Race Conditions

Linux File I/O – Atomicity & Race Conditions

Why doing two steps is never the same as doing one – and how the kernel protects you

⚛️
Atomicity
What it means
🏁
Race Condition
How it happens
🔒
O_EXCL
Safe file creation
📝
O_APPEND
Safe shared writes

What You Will Learn in This Post

Before diving into advanced file I/O calls, you need to understand one foundational idea: atomicity. It is the reason certain open() flags exist, and ignoring it leads to bugs that are nearly impossible to reproduce. In this post we will start from a very simple story, build up the concept step by step, and then look at two classic problems in Linux file I/O — creating a file exclusively and appending to a shared file — and how the kernel solves both with a single flag.

Topics Covered

What is an atomic operation What is a race condition O_CREAT and O_EXCL flags O_APPEND flag Process scheduling and preemption Safe log file writing

⚛️ Section 1 — What Does “Atomic” Mean?

The word atomic literally means “cannot be cut”. In computer science an atomic operation is one that runs to completion without any other process being able to observe or interrupt it midway. It either finishes completely, or it never started. There is no visible in-between state.

Every individual Linux system call (open, read, write …) is atomic by the kernel’s guarantee. But two consecutive system calls are NOT atomic together. Between any two system calls, the Linux scheduler is perfectly allowed to pause your process and run something else. That gap is where trouble lives.

A Simple Everyday Analogy

Imagine a shared whiteboard in an office. Two people want to write their name on it only if no one else has. The steps are: (1) look at the board, (2) write your name if it is blank.

If Person A looks, sees it is blank, then Person B also looks, sees it is blank, and both write simultaneously — both believe they are the only writer. That is a race condition born from a non-atomic check-then-act.

The solution is a lock on the whiteboard that lets only one person see-and-write as a single indivisible step. That is exactly what O_CREAT | O_EXCL does for files.

Non-Atomic vs Atomic — Side by Side
❌ Non-Atomic (two steps)
Step 1: Does file exist?
← OS can pause here!
Step 2: If no → create file

Problem: Another process can
create the file between Step 1
and Step 2.

✅ Atomic (one step)
Step 1+2 combined: Check if
absent AND create — together.
No pause possible between
the check and the create.

Safe — exactly what
O_CREAT | O_EXCL does.

🏁 Section 2 — What is a Race Condition?

A race condition happens when two or more processes (or threads) access shared data at the same time, and the final outcome depends on which one gets there first. The behaviour becomes unpredictable — it may work correctly most of the time but fail occasionally, making it one of the hardest bugs to find and fix.

Why Processes Get Paused Between Calls

Linux uses preemptive scheduling. Every process is given a small time slice (a few milliseconds). When the slice runs out, the OS saves the process state and runs something else. Your process has no control over when this happens — it can be paused between any two instructions, including between two consecutive system calls in your code.

Race Condition Timeline — Two Processes Trying to Create the Same Log File
Time Process A Process B
T1 Checks: does app.log exist? → No waiting…
T2 ⚡ OS pauses Process A here ⚡ OS runs Process B
T3 waiting… Checks: does app.log exist? → No
T4 waiting… Creates app.log → success ✅
T5 Creates app.log → also “succeeds” ✅ waiting…
❌ Both A and B believe they exclusively created the file. The lock is broken.

Notice that this bug is nearly impossible to reproduce reliably. It only happens if the OS pauses Process A at exactly that moment. In testing it may never appear. In production under load it may cause corruption. This is the defining trait of a race condition.

🔒 Section 3 — Solving It: O_CREAT | O_EXCL for Exclusive File Creation

The open() system call accepts a combination of flags. Two of them, when used together, solve the exclusive creation problem atomically:

Flag What it does
O_CREAT Create the file if it does not exist. If it already exists, open it normally (no error).
O_EXCL Combined with O_CREAT: if the file already exists, fail with EEXIST. Without O_CREAT, O_EXCL has no effect.
O_CREAT | O_EXCL Atomic check-and-create: the kernel checks existence AND creates in one unbreakable step. Guaranteed to work correctly even with 1000 processes racing simultaneously.

Example 1 — Bad Way (Two Separate Steps)

First, here is the broken approach so you understand exactly why it is wrong:

/*
 * bad_create.c  —  DO NOT USE this pattern
 * This has a race condition between the stat() and the open().
 * Build: gcc bad_create.c -o bad_create
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>

int main(void)
{
    struct stat info;

    /* Step 1: check if file already exists */
    if (stat("mylock.txt", &info) == 0) {
        printf("File exists, someone else is using it.\n");
        return 1;
    }

    /*
     * ===================================================
     * THE GAP: OS can switch to another process right here.
     * The other process runs the same code, also passes
     * the stat() check above, and creates the file first.
     * When our process resumes, it creates the file too.
     * Both processes now believe they own the lock.
     * ===================================================
     */

    /* Step 2: create the file */
    int fd = open("mylock.txt", O_WRONLY | O_CREAT, 0644);
    if (fd == -1) {
        perror("open");
        return 1;
    }

    printf("PID %d: I think I created the lock file.\n", getpid());
    /* But another process may have the same belief! */

    close(fd);
    return 0;
}

Example 2 — Correct Way (Atomic O_CREAT | O_EXCL)

Now the fixed version. The entire check-and-create is inside a single system call:

/*
 * good_create.c  —  Correct exclusive file creation
 * Uses O_CREAT | O_EXCL for atomic check-and-create.
 * Build: gcc good_create.c -o good_create
 * Test : run two terminals simultaneously to see only one wins.
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

int main(void)
{
    /*
     * O_CREAT | O_EXCL together:
     *   - If the file does not exist  → create it, return a valid fd.
     *   - If the file already exists  → return -1, errno = EEXIST.
     *
     * The kernel does the check and the create in one atomic step.
     * No other process can sneak in between them.
     */
    int fd = open("mylock.txt",
                  O_WRONLY | O_CREAT | O_EXCL,
                  0644);

    if (fd == -1) {
        if (errno == EEXIST) {
            /* Someone else created it first — that is expected */
            printf("PID %d: File already exists. Another process owns it.\n",
                   getpid());
        } else {
            /* Some unexpected error */
            perror("open");
        }
        return 1;
    }

    /* Only ONE process will ever reach here */
    printf("PID %d: I created the lock file exclusively.\n", getpid());

    /* Write our PID so others know who holds it */
    char buf[32];
    int n = snprintf(buf, sizeof(buf), "owner_pid=%d\n", getpid());
    write(fd, buf, n);

    /* Simulate doing critical work */
    sleep(3);

    /* Cleanup: close and remove the lock */
    close(fd);
    unlink("mylock.txt");
    printf("PID %d: Done. Lock removed.\n", getpid());

    return 0;
}
💡 Try it yourself: Open two terminal windows. In both, run ./good_create at the same time. You will see exactly one process win and the other immediately print “File already exists.” No matter how many times you try, only one process creates the file.

Example 3 — Lock File with Retry Loop

In real programs you usually want to wait if the lock is held rather than give up immediately. Here is a practical pattern with a timeout:

/*
 * lock_retry.c  —  Lock file with timeout and retry
 * Practical pattern used by build systems, backup tools, etc.
 * Build: gcc lock_retry.c -o lock_retry
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#define LOCK_FILE   "/tmp/my_app.lock"
#define MAX_RETRIES 10           /* try up to 10 times          */
#define RETRY_WAIT  500000       /* wait 500 ms between retries  */

/* Returns 0 on success (lock acquired), -1 on failure */
int acquire_lock(void)
{
    for (int attempt = 1; attempt <= MAX_RETRIES; attempt++) {

        /* Atomic check-and-create — the only safe way */
        int fd = open(LOCK_FILE,
                      O_WRONLY | O_CREAT | O_EXCL,
                      0600);

        if (fd != -1) {
            /* We got the lock */
            char info[64];
            int n = snprintf(info, sizeof(info), "pid=%d\n", getpid());
            write(fd, info, n);
            close(fd);
            printf("Lock acquired on attempt %d.\n", attempt);
            return 0;
        }

        if (errno == EEXIST) {
            /* Lock is held by someone else — wait and retry */
            printf("Attempt %d: lock busy, waiting 500ms...\n", attempt);
            usleep(RETRY_WAIT);
        } else {
            /* Unexpected error */
            perror("open");
            return -1;
        }
    }

    printf("Could not acquire lock after %d attempts.\n", MAX_RETRIES);
    return -1;
}

void release_lock(void)
{
    unlink(LOCK_FILE);
    printf("Lock released.\n");
}

int main(void)
{
    if (acquire_lock() != 0) {
        fprintf(stderr, "Cannot proceed without lock.\n");
        return 1;
    }

    printf("PID %d doing exclusive work...\n", getpid());
    sleep(2);

    release_lock();
    return 0;
}

📝 Section 4 — The Second Race: Shared Log Files and O_APPEND

There is a second classic race condition that catches developers off guard: multiple processes writing to the same file. The most common scenario is a shared log file where each process appends lines.

Why lseek + write is Broken for Shared Files

Appending means: move to the end of the file, then write. If you do this with two separate calls — lseek(fd, 0, SEEK_END) then write() — another process can write between your seek and your write, leaving both writes at the same offset and one overwriting the other.

lseek + write Race — Both Processes Write at the Same Offset
Time Logger A Logger B File state
T1 lseek(SEEK_END) → offset 200 waiting… file size = 200 bytes
T2 ⚡ A paused ⚡ B runs
T3 waiting… lseek(SEEK_END) → offset 200 still 200 bytes
T4 waiting… write(“B data”) at offset 200 bytes 200-215 = B’s data
T5 write(“A data”) at offset 200 ← A’s saved offset! waiting… bytes 200-215 = A’s data
B’s data is OVERWRITTEN!

The Fix: O_APPEND Makes Seek + Write Atomic

When you open a file with O_APPEND, every single write() call automatically seeks to the current end of file and writes there as one unbreakable operation. You never need to call lseek() manually. Even with a thousand processes writing simultaneously, every write lands after the previous one with no overlap.

❌ Broken (lseek + write)
lseek(fd, 0, SEEK_END);
/* ← gap here! */
write(fd, line, len);
✅ Correct (O_APPEND)
/* seek+write is atomic */
write(fd, line, len);
/* no lseek needed */

Example 1 — Single Process Writing a Log (Understanding O_APPEND)

/*
 * append_single.c  —  Understand O_APPEND with one process first
 * Build: gcc append_single.c -o append_single
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int main(void)
{
    /*
     * O_APPEND: every write() atomically moves to end-of-file
     * and then writes. Even if the file was grown by another
     * process since our last write, we always append correctly.
     */
    int fd = open("app.log",
                  O_WRONLY | O_CREAT | O_APPEND,
                  0644);
    if (fd == -1) {
        perror("open");
        return 1;
    }

    /* Notice: we never call lseek() — O_APPEND handles positioning */
    const char *lines[] = {
        "INFO  : Server started\n",
        "DEBUG : Listening on port 8080\n",
        "INFO  : Connection accepted\n",
        NULL
    };

    for (int i = 0; lines[i] != NULL; i++) {
        ssize_t len    = strlen(lines[i]);
        ssize_t written = write(fd, lines[i], len);

        if (written != len) {
            fprintf(stderr, "Warning: partial write at line %d\n", i);
        }
    }

    /* Check where the file offset ended up */
    off_t pos = lseek(fd, 0, SEEK_CUR);
    printf("Current file offset after writes: %ld bytes\n", (long)pos);

    close(fd);
    printf("Done. Check app.log\n");
    return 0;
}

Example 2 — Multiple Processes Appending Safely

This example forks several child processes that all write to the same log file. With O_APPEND, all entries are preserved. Run it and count the lines — they will always match.

/*
 * multiproc_logger.c  —  5 child processes all log to the same file safely
 * Build: gcc multiproc_logger.c -o multiproc_logger
 * Run:   ./multiproc_logger
 *        wc -l shared.log    (should always print 50 = 5 procs × 10 lines)
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>

#define NUM_WORKERS    5
#define LINES_PER_PROC 10
#define LOG_FILE       "shared.log"

/* Remove old log so we start fresh */
void reset_log(void)
{
    unlink(LOG_FILE);
}

/* Each worker writes LINES_PER_PROC entries then exits */
void worker(int worker_id)
{
    /*
     * Open with O_APPEND. Every write() is atomic w.r.t. end-of-file.
     * Safe even when 5 workers share this same file path.
     */
    int fd = open(LOG_FILE, O_WRONLY | O_CREAT | O_APPEND, 0644);
    if (fd == -1) {
        perror("worker open");
        exit(1);
    }

    for (int line = 1; line <= LINES_PER_PROC; line++) {
        char entry[80];
        int n = snprintf(entry, sizeof(entry),
                         "worker=%d pid=%d line=%02d\n",
                         worker_id, getpid(), line);

        /* Single write() = atomic append. Safe. */
        if (write(fd, entry, n) != n) {
            fprintf(stderr, "worker %d: write error\n", worker_id);
        }

        usleep(5000);  /* 5 ms — interleave workers intentionally */
    }

    close(fd);
    exit(0);
}

int main(void)
{
    reset_log();

    printf("Launching %d workers, each writing %d lines...\n",
           NUM_WORKERS, LINES_PER_PROC);

    /* Fork all workers */
    for (int i = 1; i <= NUM_WORKERS; i++) {
        pid_t pid = fork();
        if (pid == 0) {
            worker(i);   /* child never returns from worker() */
        }
        if (pid == -1) {
            perror("fork");
        }
    }

    /* Parent waits for all children */
    for (int i = 0; i < NUM_WORKERS; i++) {
        wait(NULL);
    }

    /* Count lines — must equal NUM_WORKERS * LINES_PER_PROC */
    FILE *f = fopen(LOG_FILE, "r");
    int count = 0;
    char line[80];
    while (fgets(line, sizeof(line), f)) count++;
    fclose(f);

    int expected = NUM_WORKERS * LINES_PER_PROC;
    printf("Lines written: %d   Expected: %d   %s\n",
           count, expected,
           (count == expected) ? "✅ CORRECT" : "❌ DATA LOST");

    return 0;
}
📌 Important Limit to Know: O_APPEND guarantees that a single write() call is atomic. If your log entry requires two or more write() calls, they are not atomic as a group and could be interleaved. For multi-call log entries, use a single large write() or add file locking.
⚠️ NFS Warning: Some older Network File System (NFS) versions do not support the O_APPEND atomicity guarantee. On those systems the kernel falls back to a non-atomic sequence internally. For files on NFS shares accessed from multiple servers, always use advisory file locking in addition to O_APPEND.

Example 3 — Proving the Race Without O_APPEND

Run this example and compare line counts. Without O_APPEND, lines get overwritten and the count drops below expected.

/*
 * race_vs_safe.c  —  Side-by-side: broken vs safe appender
 * Build: gcc race_vs_safe.c -o race_vs_safe
 * Usage: ./race_vs_safe broken    (produces a file with missing lines)
 *        ./race_vs_safe safe      (produces a file with all lines)
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>

#define WORKERS 4
#define WRITES  100

void run_broken(void)
{
    unlink("broken_out.txt");
    for (int w = 0; w < WORKERS; w++) {
        if (fork() == 0) {
            int fd = open("broken_out.txt",
                          O_WRONLY | O_CREAT, 0644); /* no O_APPEND */
            for (int i = 0; i < WRITES; i++) {
                lseek(fd, 0, SEEK_END); /* race lives here */
                char line[32];
                int n = snprintf(line, sizeof(line), "w%d-i%d\n", w, i);
                write(fd, line, n);
            }
            close(fd);
            exit(0);
        }
    }
    for (int i = 0; i < WORKERS; i++) wait(NULL);
}

void run_safe(void)
{
    unlink("safe_out.txt");
    for (int w = 0; w < WORKERS; w++) {
        if (fork() == 0) {
            int fd = open("safe_out.txt",
                          O_WRONLY | O_CREAT | O_APPEND, 0644);
            for (int i = 0; i < WRITES; i++) {
                char line[32];
                int n = snprintf(line, sizeof(line), "w%d-i%d\n", w, i);
                write(fd, line, n); /* atomic append, no lseek needed */
            }
            close(fd);
            exit(0);
        }
    }
    for (int i = 0; i < WORKERS; i++) wait(NULL);
}

int main(int argc, char *argv[])
{
    if (argc < 2) {
        printf("Usage: %s [broken|safe]\n", argv[0]);
        return 1;
    }
    int expected = WORKERS * WRITES;

    if (strcmp(argv[1], "broken") == 0) {
        run_broken();
        FILE *f = fopen("broken_out.txt", "r");
        int n = 0;
        char buf[64];
        while (fgets(buf, sizeof(buf), f)) n++;
        fclose(f);
        printf("BROKEN  — got %d lines, expected %d (lost %d)\n",
               n, expected, expected - n);
    } else {
        run_safe();
        FILE *f = fopen("safe_out.txt", "r");
        int n = 0;
        char buf[64];
        while (fgets(buf, sizeof(buf), f)) n++;
        fclose(f);
        printf("SAFE    — got %d lines, expected %d  %s\n",
               n, expected,
               n == expected ? "✅ ALL LINES PRESENT" : "❌ LINES MISSING");
    }
    return 0;
}

📋 Quick Summary — Atomicity Rules
Operation Atomic? When to use
open(O_CREAT | O_EXCL) ✅ Yes Lock files, any “create if absent” pattern
stat() then open(O_CREAT) ❌ No Never use this two-step pattern for exclusive creation
write() with O_APPEND ✅ Yes Shared log files, multiple writers to one file
lseek(SEEK_END) + write() ❌ No Never use this two-step append with shared files
Single write() ✅ Yes Each individual write() call is atomic by itself
Two separate write() calls ❌ No Can be interleaved. Use a single write() or locking

Next in This Series

Post 2 — fcntl() in depth: reading and changing file flags at runtime, and the three-layer kernel model every Linux developer must know.

← Linux Course Index Post 2: fcntl() and Kernel Internals →

Leave a Reply

Your email address will not be published. Required fields are marked *