Linux File I/O – fcntl() and Kernel File Internals

Linux File I/O – fcntl() and Kernel File Internals

How to control open files at runtime β€” and how the kernel really stores them

πŸ”§
fcntl()
File control
🚩
Status Flags
Get and set
πŸ—ƒοΈ
fd Table
Per process
πŸ”—
OFD + inode
Kernel layers

What You Will Learn

Two topics in one post. First: fcntl(), a system call that lets you inspect and change the behaviour of a file descriptor after it has already been opened β€” no need to close and reopen. Second: the kernel’s three-layer internal model for representing files β€” file descriptors, open file descriptions, and i-nodes. Understanding this model explains many surprising behaviours you will see in multi-process programs.

Topics Covered

fcntl() system call F_GETFL and F_SETFL F_GETFD and F_SETFD FD_CLOEXEC flag O_NONBLOCK at runtime File descriptor table Open file description (OFD) i-node structure Shared vs independent offsets

πŸ”§ Section 1 β€” What is fcntl() and Why Do We Need It?

When you call open(), you pass flags like O_RDONLY or O_NONBLOCK. But sometimes you cannot control those flags at open time because:

  • You received the file descriptor from a parent process via fork()
  • The fd was created by a system call that does not accept open flags β€” like pipe() or socket()
  • A library gave you an fd and you need to change one flag without knowing what others are set
  • You want to switch a socket between blocking and non-blocking dynamically at runtime

fcntl() (“file control”) is the solution. It works on an already-open file descriptor and lets you read or change its properties without touching the file itself.

#include <fcntl.h>

int fcntl(int fd, int cmd, );

/* fd = the open file descriptor to work on */
/* cmd = what to do (F_GETFL, F_SETFL, etc.) */
/* … = optional extra argument (depends on cmd) */
/* Returns: result depends on cmd. -1 on error. */

Common fcntl() Commands
Command 3rd arg What it does
F_GETFL none Read the current open file status flags (O_APPEND, O_NONBLOCK, access mode…)
F_SETFL int flags Set the open file status flags. Only some flags can actually be changed (see below).
F_GETFD none Read the file descriptor flags (currently only FD_CLOEXEC exists)
F_SETFD int flags Set the file descriptor flags (set or clear FD_CLOEXEC)
F_DUPFD int startfd Duplicate the fd using the lowest available number β‰₯ startfd

🚩 Section 2 β€” Reading and Changing File Status Flags at Runtime

What Are File Status Flags?

When you call open() you give it a set of flags. The kernel stores these inside the open file description (more on that in Section 3). F_GETFL lets you read those stored flags as a bitmask integer at any time. F_SETFL lets you modify some of them.

Which Flags Can You Change After Opening?

βœ… Can Be Changed with F_SETFL ❌ Fixed at open() Time
O_APPEND β€” seek to end before every write
O_NONBLOCK β€” return EAGAIN instead of blocking
O_ASYNC β€” send SIGIO when I/O is possible
O_DIRECT β€” bypass kernel page cache
O_NOATIME β€” do not update last-access time
O_RDONLY β€” read-only access mode
O_WRONLY β€” write-only access mode
O_RDWR β€” read and write access
O_CREAT β€” create if not present
O_EXCL β€” fail if already exists
O_TRUNC β€” truncate on open
⚠️ Access mode quirk: The access mode flags (O_RDONLY, O_WRONLY, O_RDWR) do not each occupy one bit. They use 2 bits together. You must mask with O_ACCMODE before comparing, not test with &.

The Read-Modify-Write Pattern

Never pass a hardcoded constant to F_SETFL. That would erase all other flags. Always read the current flags first, change only the bit you care about, then write them back.

❌ WRONG β€” wipes other flags
fcntl(fd, F_SETFL, O_NONBLOCK);
/* Erases O_APPEND etc.! */
βœ… CORRECT β€” read-modify-write
int f = fcntl(fd, F_GETFL);
f |= O_NONBLOCK;
fcntl(fd, F_SETFL, f);

Example 1 β€” Inspecting All Flags on an Open File

/*
 * inspect_flags.c  β€”  Read and print every meaningful flag on an fd
 * Build: gcc inspect_flags.c -o inspect_flags
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>

void print_flags(int fd, const char *label)
{
    int flags = fcntl(fd, F_GETFL);
    if (flags == -1) { perror("fcntl F_GETFL"); return; }

    printf("\n--- %s (fd=%d) ---\n", label, fd);

    /*
     * Access mode uses 2 bits, so we cannot use simple &.
     * Mask with O_ACCMODE first, then compare.
     */
    int mode = flags & O_ACCMODE;
    if      (mode == O_RDONLY) printf("  Access mode : READ-ONLY\n");
    else if (mode == O_WRONLY) printf("  Access mode : WRITE-ONLY\n");
    else if (mode == O_RDWR)   printf("  Access mode : READ-WRITE\n");

    /* Single-bit flags can be tested directly with bitwise AND */
    printf("  O_APPEND   : %s\n", (flags & O_APPEND)   ? "ON" : "off");
    printf("  O_NONBLOCK : %s\n", (flags & O_NONBLOCK) ? "ON" : "off");
    printf("  O_SYNC     : %s\n", (flags & O_SYNC)     ? "ON" : "off");

    /* FD_CLOEXEC lives in the fd table, NOT in file status flags */
    int fd_flags = fcntl(fd, F_GETFD);
    printf("  FD_CLOEXEC : %s  (close-on-exec)\n",
           (fd_flags & FD_CLOEXEC) ? "ON" : "off");
}

int main(void)
{
    /* Three files opened with different flags */
    int fd1 = open("file1.txt", O_RDONLY | O_CREAT, 0644);
    int fd2 = open("file2.txt", O_WRONLY | O_CREAT | O_APPEND, 0644);
    int fd3 = open("file3.txt", O_RDWR   | O_CREAT | O_NONBLOCK, 0644);

    print_flags(fd1, "Read-only");
    print_flags(fd2, "Write-only + APPEND");
    print_flags(fd3, "Read-write + NONBLOCK");

    /* Check the standard streams */
    print_flags(STDIN_FILENO,  "stdin");
    print_flags(STDOUT_FILENO, "stdout");

    close(fd1); close(fd2); close(fd3);
    return 0;
}

Example 2 β€” Enable O_NONBLOCK on a Pipe After Creation

pipe() does not accept open flags. The only way to make a pipe non-blocking is to use fcntl() after creating it. This is one of the most common practical uses of fcntl() in real programs:

/*
 * pipe_nonblock.c  β€”  Enable O_NONBLOCK on a pipe using fcntl()
 * Pipes, sockets and FIFOs are the primary use case for fcntl(F_SETFL).
 * Build: gcc pipe_nonblock.c -o pipe_nonblock
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

/* Utility: add a flag without changing others */
int add_flag(int fd, int new_flag)
{
    int flags = fcntl(fd, F_GETFL);
    if (flags == -1) return -1;
    return fcntl(fd, F_SETFL, flags | new_flag);
}

/* Utility: remove a flag without changing others */
int remove_flag(int fd, int flag)
{
    int flags = fcntl(fd, F_GETFL);
    if (flags == -1) return -1;
    return fcntl(fd, F_SETFL, flags & ~flag);
}

int main(void)
{
    int pipefd[2];
    pipe(pipefd);  /* pipefd[0] = read end, pipefd[1] = write end */

    printf("Before fcntl:\n");
    printf("  read end O_NONBLOCK = %s\n",
           (fcntl(pipefd[0], F_GETFL) & O_NONBLOCK) ? "ON" : "off");

    /* pipe() gives us no way to set flags at creation time.
     * We use fcntl to add O_NONBLOCK afterwards. */
    add_flag(pipefd[0], O_NONBLOCK);

    printf("\nAfter adding O_NONBLOCK:\n");
    printf("  read end O_NONBLOCK = %s\n",
           (fcntl(pipefd[0], F_GETFL) & O_NONBLOCK) ? "ON" : "off");

    /* Demonstrate: reading from an empty pipe now returns EAGAIN */
    char buf[64];
    ssize_t n = read(pipefd[0], buf, sizeof(buf));
    if (n == -1 && errno == EAGAIN) {
        printf("\nread() on empty pipe returned EAGAIN β€” non-blocking works!\n");
        printf("The process did NOT block. It can continue doing other work.\n");
    }

    /* Write something then read again */
    write(pipefd[1], "hello", 5);
    n = read(pipefd[0], buf, sizeof(buf));
    printf("\nAfter writing 'hello': read() returned %zd bytes\n", n);

    /* Remove O_NONBLOCK to go back to blocking mode */
    remove_flag(pipefd[0], O_NONBLOCK);
    printf("O_NONBLOCK removed. Pipe is blocking again.\n");

    close(pipefd[0]);
    close(pipefd[1]);
    return 0;
}

Example 3 β€” Set close-on-exec on a Socket Safely

FD_CLOEXEC is stored separately from status flags β€” it lives in the fd table entry. Use F_GETFD / F_SETFD to manage it. Without it, child processes created by exec() inherit all your open sockets and file descriptors β€” a security risk.

/*
 * cloexec_example.c  β€”  Setting FD_CLOEXEC using F_GETFD / F_SETFD
 * When you exec() a new program, fds with FD_CLOEXEC are automatically
 * closed. Without it, the new program inherits all your open files.
 * Build: gcc cloexec_example.c -o cloexec_example
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>

void set_cloexec(int fd)
{
    /*
     * F_GETFD reads fd-level flags (only FD_CLOEXEC exists today).
     * These are DIFFERENT from file status flags (F_GETFL).
     * FD_CLOEXEC is private per process and per fd β€” it is not
     * shared between dup()'d fds.
     */
    int fd_flags = fcntl(fd, F_GETFD);
    if (fd_flags == -1) { perror("F_GETFD"); return; }

    if (fcntl(fd, F_SETFD, fd_flags | FD_CLOEXEC) == -1) {
        perror("F_SETFD");
    }
}

int main(void)
{
    int fd = open("secret_data.txt", O_RDWR | O_CREAT | O_TRUNC, 0600);
    if (fd == -1) { perror("open"); return 1; }

    printf("fd = %d opened\n", fd);

    /* Without FD_CLOEXEC: if this process calls exec(), the child
     * program will INHERIT fd and can read or write secret_data.txt */

    /* With FD_CLOEXEC: fd is automatically closed when exec() runs */
    set_cloexec(fd);

    int fdflags = fcntl(fd, F_GETFD);
    printf("FD_CLOEXEC is now: %s\n",
           (fdflags & FD_CLOEXEC) ? "SET" : "not set");

    printf("If we fork+exec here, the child will NOT inherit fd %d\n", fd);

    close(fd);
    unlink("secret_data.txt");
    return 0;
}

πŸ—ƒοΈ Section 3 β€” The Kernel’s Three-Layer File Model

This is the single most important concept for understanding how Linux file I/O truly works. Most beginners think “file descriptor = file”. It is not that simple. The kernel uses three completely separate data structures between your integer fd and the actual file on disk. Understanding the layers explains fork(), dup(), and thread I/O behaviour in one shot.

The Three Layers Explained

Layer 1: File Descriptor Table
Per process β€” in kernel memory
  • One row per open fd (0, 1, 2 …)
  • Stores: FD_CLOEXEC flag
  • Stores: pointer to Layer 2 entry
Think of it as your process’s phone book of open files
Layer 2: Open File Description (OFD)
System-wide β€” shared across processes
  • Stores: current file offset
  • Stores: access mode (r/w/rw)
  • Stores: status flags (O_APPEND…)
  • Stores: pointer to Layer 3 inode
Created each time open() is called
Layer 3: i-node
System-wide β€” backed by disk
  • Stores: file type, permissions
  • Stores: file size, timestamps
  • Stores: disk block locations
  • Stores: active locks
One per unique file on disk

How the Three Layers Connect β€” Visual Map

Two Processes β€” Multiple Sharing Scenarios
Process A
fd table
OFD Table (system-wide) inode Table
fd 0 ────────────────────────────► OFD #10 [offset=0, RDONLY] ──────► inode #5 (stdin)
fd 3 ─┐
fd 4 ── (dup’d)
β–Ί
OFD #20 [offset=512, RDWR] ──────►
← same OFD shared by fd3 and fd4!
inode #42 (data.bin)
fd 5 ────────────────────────────► OFD #30 [offset=0, RDONLY] ──────►
← separate OFD, independent offset!
inode #42 (same data.bin)
← same inode, different OFD
─── Process B fd table ───
fd 3 (Process B) ────────────────► OFD #20 ← same as Process A fd3!
(shared after fork())

Key Rules That Come From This Model

Property Lives in Shared between dup/fork fds?
File offset (read/write position) OFD βœ… YES β€” reading via fd3 also advances fd4’s position
O_APPEND, O_NONBLOCK flags OFD βœ… YES β€” changing on one fd changes both
FD_CLOEXEC (close-on-exec) fd table ❌ NO β€” private to each individual fd
File size, permissions, timestamps inode βœ… YES β€” all fds to the same file see same metadata
Two separate open() calls, same file OFD (separate) ❌ NO β€” completely independent offsets and flags

Example 1 β€” Proving the Shared Offset After dup()

/*
 * shared_offset.c  β€”  Show that dup()'d fds share the file offset
 * This demonstrates the OFD shared offset behaviour.
 * Build: gcc shared_offset.c -o shared_offset
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>

int main(void)
{
    /* Create a test file with known content */
    int wfd = open("offset_test.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    write(wfd, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26);
    close(wfd);

    /* Open for reading β€” this creates ONE Open File Description */
    int fd1 = open("offset_test.txt", O_RDONLY);

    /* dup() creates fd2 pointing to the SAME OFD */
    int fd2 = dup(fd1);

    printf("fd1 = %d, fd2 = %d (both point to same OFD)\n\n", fd1, fd2);

    char buf[5] = {0};

    /* Read 4 bytes via fd1 β†’ OFD offset moves from 0 to 4 */
    read(fd1, buf, 4);
    printf("Via fd1: read '%s'  (offset now at 4)\n", buf);

    /* Read 4 bytes via fd2 β†’ OFD offset was already at 4, not 0! */
    buf[0] = 0;
    read(fd2, buf, 4);
    printf("Via fd2: read '%s'  (continued from offset 4, not 0!)\n\n", buf);

    printf("Key takeaway: fd1 and fd2 share the same offset.\n");
    printf("Because dup() shares the OFD, not just the inode.\n\n");

    /* Now contrast: open the same file AGAIN = NEW OFD */
    int fd3 = open("offset_test.txt", O_RDONLY);

    buf[0] = 0;
    read(fd3, buf, 4);
    printf("Via fd3 (separate open): read '%s'  (starts fresh at 0)\n", buf);
    printf("Because open() creates a new OFD with its own offset.\n");

    close(fd1); close(fd2); close(fd3);
    return 0;
}

Example 2 β€” Parent and Child Share OFD After fork()

When a process calls fork(), the child gets a copy of the parent’s fd table. But the child’s fds point to the same OFDs as the parent’s. So if the parent reads 10 bytes before fork, the child picks up from byte 10.

/*
 * fork_shared_ofd.c  β€”  Parent and child share OFD after fork()
 * Build: gcc fork_shared_ofd.c -o fork_shared_ofd
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/wait.h>

int main(void)
{
    /* Create test file */
    int wfd = open("fork_test.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    write(wfd, "ONE TWO THREE FOUR FIVE", 23);
    close(wfd);

    int fd = open("fork_test.txt", O_RDONLY);

    /* Parent reads first 4 bytes β€” OFD offset is now 4 */
    char buf[8] = {0};
    read(fd, buf, 4);
    printf("Parent read: '%s'  (offset now %ld)\n",
           buf, (long)lseek(fd, 0, SEEK_CUR));

    /* fork() β€” child inherits the same fd pointing to the same OFD */
    pid_t pid = fork();

    if (pid == 0) {
        /* CHILD: the OFD offset is inherited as 4, not 0 */
        buf[0] = 0;
        read(fd, buf, 4);
        printf("Child  read: '%s'  (continued from parent's offset 4)\n", buf);
        close(fd);
        exit(0);
    }

    wait(NULL);

    /* After child ran, parent's offset also advanced
     * because both touched the same OFD */
    printf("Parent's offset after child read: %ld\n",
           (long)lseek(fd, 0, SEEK_CUR));

    close(fd);
    return 0;
}
πŸ’‘ Thread Implication: All threads in a process share the entire fd table. So threads share every OFD β€” including offsets. If thread A reads 100 bytes from a file, thread B’s next read starts 100 bytes further in. This is why multi-threaded file I/O needs either pread()/pwrite() (covered in Post 3) or explicit synchronisation.

Example 3 β€” Inspect Your Process’s Open Files via /proc

/*
 * list_open_fds.c  β€”  Print all open fds and what they point to
 * Uses /proc/self/fd which exposes the fd table as a directory.
 * Build: gcc list_open_fds.c -o list_open_fds
 */
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <dirent.h>
#include <string.h>

int main(void)
{
    /* Open a few files to have something to show */
    int fd_a = open("/etc/hostname", O_RDONLY);
    int fd_b = open("/tmp/demo.txt", O_RDWR | O_CREAT | O_TRUNC, 0644);
    int fd_c = dup(fd_a);  /* duplicate fd_a */

    printf("Opened fds: %d, %d, %d (dup of %d)\n\n",
           fd_a, fd_b, fd_c, fd_a);

    /* /proc/self/fd contains one symlink per open fd in this process */
    DIR *dir = opendir("/proc/self/fd");
    if (!dir) { perror("opendir"); return 1; }

    struct dirent *entry;
    char link_target[256];
    char full_path[64];

    printf("%-6s  %s\n", "fd", "Points to");
    printf("%-6s  %s\n", "------", "-----------------------------");

    while ((entry = readdir(dir)) != NULL) {
        /* Skip . and .. */
        if (entry->d_name[0] == '.') continue;

        snprintf(full_path, sizeof(full_path),
                 "/proc/self/fd/%s", entry->d_name);

        ssize_t n = readlink(full_path, link_target, sizeof(link_target) - 1);
        if (n > 0) {
            link_target[n] = '\0';
            printf("%-6s  %s\n", entry->d_name, link_target);
        }
    }
    closedir(dir);

    close(fd_a); close(fd_b); close(fd_c);
    unlink("/tmp/demo.txt");
    return 0;
}
/* Run: ./list_open_fds
   You will see fds 0,1,2 (stdin/out/err) plus your newly opened ones */

πŸ“‹ Section 4 β€” Quick Summary
Concept One-Line Summary
fcntl(F_GETFL) Read the current status flags of any open fd as a bitmask
fcntl(F_SETFL) Change some status flags β€” always read first, modify only the bit you want, write back
F_GETFD / F_SETFD Manage FD_CLOEXEC β€” private per-fd flag, not shared, use F_GETFD not F_GETFL
fd table Per-process, stores small integers mapping to OFD pointers + FD_CLOEXEC
OFD System-wide, created each open() call, stores offset + flags + inode pointer
inode Disk-backed, one per unique file, stores size/permissions/timestamps
dup / fork Both create new fd table entries pointing to the SAME OFD β€” shared offset
Two open() calls Creates two separate OFDs β€” independent offsets even for the same file

Next in This Series

Post 3 β€” dup(), dup2(), dup3() in depth β€” how shell redirection works internally, and pread/pwrite for thread-safe I/O.

← Post 1: Atomicity Post 3: dup() and pread() β†’

Leave a Reply

Your email address will not be published. Required fields are marked *