Filesystem Statistics: statvfs() in Linux Explained

 

📊 Filesystem Statistics: statvfs()
Query disk space, inode counts, mount flags, and build your own df utility
Topic 9 of 9
statvfs API
Syscalls
statvfs, fstatvfs
Use Cases
df, quota, monitoring

Key Terms:

statvfs() fstatvfs() struct statvfs f_bsize f_frsize f_blocks f_bfree f_bavail f_files f_ffree Reserved Blocks ST_RDONLY ST_NOSUID df utility

Why Query Filesystem Statistics?

Before writing data to disk, a well-written application checks whether the target filesystem has sufficient space. System monitoring tools need to report disk usage per mount point. The df(1) command you run every day is built entirely on these APIs. Linux exposes filesystem statistics through two POSIX-standard calls: statvfs() and fstatvfs().

Both functions populate the same struct statvfs, but differ in how you identify the filesystem: one takes a path, the other takes an open file descriptor. Understanding each field of this structure — especially the subtle difference between free blocks and available blocks — is essential for writing correct disk-space logic.

📌 Function Signatures

Both functions are declared in <sys/statvfs.h>:

#include <sys/statvfs.h>

/* Identify filesystem by path — any file/dir on that FS */
int statvfs(const char *pathname, struct statvfs *statvfsbuf);

/* Identify filesystem by open file descriptor */
int fstatvfs(int fd, struct statvfs *statvfsbuf);

/* Both return 0 on success, -1 on error (errno set) */

statvfs() follows symbolic links (like stat()). fstatvfs() is useful when you already have a file open and want to avoid a second path lookup — or when you need atomic statistics on the same file you are writing to.

Note: Linux also has the older statfs(2) / fstatfs(2) syscalls (declared in <sys/vfs.h>) which return a struct statfs with a f_type field (filesystem magic number). These are Linux-specific and non-portable. statvfs() is the POSIX-portable interface — prefer it in new code.

🗂️ struct statvfs — Field-by-Field

The kernel fills in the following structure:

struct statvfs {
    unsigned long  f_bsize;    /* Preferred I/O block size (for transfers)  */
    unsigned long  f_frsize;   /* Fundamental filesystem block size         */
    fsblkcnt_t     f_blocks;   /* Total blocks on FS (in f_frsize units)    */
    fsblkcnt_t     f_bfree;    /* Free blocks (privileged: root can use all)*/
    fsblkcnt_t     f_bavail;   /* Free blocks available to unprivileged user*/
    fsfilcnt_t     f_files;    /* Total inodes on filesystem                */
    fsfilcnt_t     f_ffree;    /* Free inodes (all)                         */
    fsfilcnt_t     f_favail;   /* Free inodes available to unprivileged user*/
    unsigned long  f_fsid;     /* Filesystem ID                             */
    unsigned long  f_flag;     /* Mount flags (ST_* constants)              */
    unsigned long  f_namemax;  /* Maximum filename length                   */
};

The most important fields and their relationships:

Field Units Meaning
f_frsize bytes The fundamental block size — all block counts below are in these units
f_bsize bytes Preferred I/O transfer size (often == f_frsize, but can differ on network FSes)
f_blocks f_frsize units Total capacity of the filesystem in blocks
f_bfree f_frsize units All free blocks — includes reserved blocks; only root can allocate these
f_bavail f_frsize units Free blocks available to non-root users (= f_bfree minus reserved)
f_files count Total inode slots on the filesystem
f_ffree count All free inodes (including root-reserved)
f_favail count Inodes available to unprivileged users
f_namemax bytes Maximum length of a filename component (255 on ext4, 255 on XFS)
Common Bug: Using f_bfree instead of f_bavail when checking whether a non-root process can write data. f_bfree includes reserved blocks that only root can use. A regular user will get ENOSPC even when f_bfree > 0 if the only free blocks are reserved. Always use f_bavail for user-space disk space checks.

🔒 Reserved Blocks: f_bfree vs f_bavail

ext2/ext3/ext4 reserves a configurable percentage of blocks (default: 5%) for the root user. This reservation serves two purposes:

🛡️ System Stability
System daemons (syslog, cron) can still write critical logs even when a filesystem is “full” from the user’s perspective.
⚡ Fragmentation Prevention
Keeping some free space ensures the block allocator can find contiguous runs for new data, reducing fragmentation.

Block Space Breakdown
Used (70%)
f_bavail (25%)
Res. (5%)
f_bfree = f_bavail + reserved (orange) f_bavail = what users can actually use

You can view and change the reserved block percentage with tune2fs:

# View reserved block count on /dev/sda1:
tune2fs -l /dev/sda1 | grep "Reserved block"

# Change reservation to 1% (good for large data drives):
tune2fs -m 1 /dev/sda1

# Example output from tune2fs:
# Reserved block count:      102400   (out of 2048000 total blocks → 5%)
# Block size:                4096

💻 Code Example 1: Basic statvfs() Usage

Print key statistics for a given path — similar to what df -h shows for one mount:

#include <stdio.h>
#include <stdlib.h>
#include <sys/statvfs.h>

/* Convert bytes to human-readable string */
static void human_size(unsigned long long bytes, char *buf, size_t len) {
    const char *units[] = {"B", "KB", "MB", "GB", "TB"};
    int u = 0;
    double val = (double)bytes;
    while (val >= 1024.0 && u < 4) { val /= 1024.0; u++; }
    snprintf(buf, len, "%.1f %s", val, units[u]);
}

int main(int argc, char *argv[]) {
    const char *path = (argc > 1) ? argv[1] : "/";
    struct statvfs sv;

    if (statvfs(path, &sv) == -1) {
        perror("statvfs");
        return EXIT_FAILURE;
    }

    unsigned long long block = sv.f_frsize;
    unsigned long long total  = sv.f_blocks * block;
    unsigned long long free_all = sv.f_bfree  * block;   /* includes reserved */
    unsigned long long avail  = sv.f_bavail  * block;   /* non-root usable   */
    unsigned long long used   = total - free_all;
    double use_pct = (total > 0) ? (100.0 * used / total) : 0.0;

    char tbuf[32], ubuf[32], abuf[32];
    human_size(total, tbuf, sizeof tbuf);
    human_size(used,  ubuf, sizeof ubuf);
    human_size(avail, abuf, sizeof abuf);

    printf("Filesystem statistics for: %s\n", path);
    printf("  Block size (frsize):  %lu bytes\n", sv.f_frsize);
    printf("  Total size:           %s  (%llu blocks)\n", tbuf, sv.f_blocks);
    printf("  Used:                 %s  (%.1f%%)\n", ubuf, use_pct);
    printf("  Available (non-root): %s\n", abuf);
    printf("  Total inodes:         %lu\n", (unsigned long)sv.f_files);
    printf("  Free inodes:          %lu\n", (unsigned long)sv.f_favail);
    printf("  Max filename length:  %lu\n", sv.f_namemax);

    return EXIT_SUCCESS;
}
$ gcc -o fsstat fsstat.c && ./fsstat /home
Filesystem statistics for: /home
  Block size (frsize):  4096 bytes
  Total size:           97.9 GB  (25600000 blocks)
  Used:                 42.3 GB  (43.2%)
  Available (non-root): 50.5 GB
  Total inodes:         6553600
  Free inodes:          6201044
  Max filename length:  255

💻 Code Example 2: fstatvfs() — Pre-write Space Check

Check available space on a filesystem via an already-open file descriptor before writing a large buffer. This is the correct pattern for applications that must not hit ENOSPC:

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/statvfs.h>

/* Returns 1 if 'needed_bytes' space is available on the FS
   holding 'fd', 0 otherwise. */
int has_space(int fd, unsigned long long needed_bytes) {
    struct statvfs sv;
    if (fstatvfs(fd, &sv) == -1) {
        perror("fstatvfs");
        return 0;
    }
    /* Use f_bavail — non-root usable blocks */
    unsigned long long available = (unsigned long long)sv.f_bavail * sv.f_frsize;
    printf("  Available: %llu bytes, Needed: %llu bytes\n",
           available, needed_bytes);
    return available >= needed_bytes;
}

int main(void) {
    /* Open/create target file */
    int fd = open("/tmp/testfile.dat", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd == -1) { perror("open"); return EXIT_FAILURE; }

    unsigned long long write_size = 500ULL * 1024 * 1024; /* 500 MB */

    printf("Checking space before writing 500 MB to /tmp ...\n");
    if (!has_space(fd, write_size)) {
        fprintf(stderr, "Not enough space — aborting write.\n");
        close(fd);
        return EXIT_FAILURE;
    }

    printf("Space OK — proceeding with write.\n");
    /* ... actual write logic here ... */

    close(fd);
    return EXIT_SUCCESS;
}
Why fstatvfs() here? We pass the already-open fd rather than a path. This ensures we query the exact filesystem the file lives on — no risk of path resolution landing on a different mount point between the check and the write.

💻 Code Example 3: df-Like Program (All Mount Points)

Read /proc/mounts to enumerate all mount points, then call statvfs() on each — exactly what df(1) does internally:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/statvfs.h>
#include <mntent.h>

static void print_row(const char *device, const char *mountp,
                      const char *fstype, struct statvfs *sv) {
    unsigned long long blk   = sv->f_frsize;
    unsigned long long total = sv->f_blocks * blk;
    unsigned long long free_ = sv->f_bfree  * blk;
    unsigned long long avail = sv->f_bavail  * blk;
    unsigned long long used  = total - free_;
    double pct = (total > 0) ? (100.0 * used / total) : 0.0;

    /* Print in 1K blocks, like df */
    printf("%-20s %-12s %10llu %10llu %10llu %5.0f%%  %s\n",
           device, fstype,
           total / 1024, used / 1024, avail / 1024,
           pct, mountp);
}

int main(void) {
    FILE *fp = setmntent("/proc/mounts", "r");
    if (!fp) { perror("setmntent"); return EXIT_FAILURE; }

    printf("%-20s %-12s %10s %10s %10s %5s  %s\n",
           "Filesystem", "Type", "1K-blocks", "Used", "Available",
           "Use%", "Mounted on");
    printf("%s\n", "----------------------------------------------------------------------"
                   "--------------------");

    struct mntent *me;
    while ((me = getmntent(fp)) != NULL) {
        /* Skip pseudo-filesystems with no real storage */
        if (strcmp(me->mnt_type, "proc")   == 0 ||
            strcmp(me->mnt_type, "sysfs")  == 0 ||
            strcmp(me->mnt_type, "devpts") == 0 ||
            strcmp(me->mnt_type, "cgroup") == 0)
            continue;

        struct statvfs sv;
        if (statvfs(me->mnt_dir, &sv) == -1)
            continue;  /* skip inaccessible mounts */

        print_row(me->mnt_fsname, me->mnt_dir, me->mnt_type, &sv);
    }

    endmntent(fp);
    return EXIT_SUCCESS;
}
$ gcc -o mydf mydf.c && ./mydf
Filesystem           Type          1K-blocks       Used  Available  Use%  Mounted on
------------------------------------------------------------------------------------------
/dev/sda1            ext4          51200000   21504000   27136000   42%   /
/dev/sda2            ext4         102400000   61440000   35840000   60%   /home
tmpfs                tmpfs          8192000          0    8192000    0%   /dev/shm
/dev/sdb1            xfs           204800000   4096000  200704000    2%   /data

💻 Code Example 4: Decoding f_flag — Mount Option Detection

The f_flag field encodes mount options as bitmask flags using ST_* constants (defined in <sys/statvfs.h>). These mirror the MS_* constants used by mount(2), but with a different prefix to distinguish the POSIX interface from the Linux syscall interface:

ST_ flag MS_ equivalent Meaning
ST_RDONLY MS_RDONLY Mounted read-only
ST_NOSUID MS_NOSUID setuid/setgid bits ignored
ST_NODEV MS_NODEV Device files cannot be used
ST_NOEXEC MS_NOEXEC Executables cannot be run
ST_SYNCHRONOUS MS_SYNCHRONOUS All writes are synchronous
ST_MANDLOCK MS_MANDLOCK Mandatory locking allowed
ST_NOATIME MS_NOATIME Access times not updated
ST_NODIRATIME MS_NODIRATIME Directory access times not updated
#include <stdio.h>
#include <stdlib.h>
#include <sys/statvfs.h>

void decode_flags(unsigned long f_flag) {
    printf("  Mount flags: ");
    if (f_flag == 0) { printf("(default/rw)\n"); return; }

    struct { unsigned long bit; const char *name; } flags[] = {
        { ST_RDONLY,      "ST_RDONLY"      },
        { ST_NOSUID,      "ST_NOSUID"      },
        { ST_NODEV,       "ST_NODEV"       },
        { ST_NOEXEC,      "ST_NOEXEC"      },
        { ST_SYNCHRONOUS, "ST_SYNCHRONOUS" },
        { ST_NOATIME,     "ST_NOATIME"     },
        { ST_NODIRATIME,  "ST_NODIRATIME"  },
        { ST_MANDLOCK,    "ST_MANDLOCK"    },
        { 0, NULL }
    };

    int first = 1;
    for (int i = 0; flags[i].name != NULL; i++) {
        if (f_flag & flags[i].bit) {
            printf("%s%s", first ? "" : " | ", flags[i].name);
            first = 0;
        }
    }
    printf("\n");
}

int main(int argc, char *argv[]) {
    const char *path = (argc > 1) ? argv[1] : "/";
    struct statvfs sv;

    if (statvfs(path, &sv) == -1) { perror("statvfs"); return EXIT_FAILURE; }

    printf("Path: %s\n", path);
    decode_flags(sv.f_flag);
    printf("  Max name length: %lu\n", sv.f_namemax);
    return EXIT_SUCCESS;
}
$ ./flags /
Path: /
  Mount flags: (default/rw)

$ ./flags /proc
Path: /proc
  Mount flags: ST_NOSUID | ST_NODEV | ST_NOEXEC

$ ./flags /boot   # if mounted noatime
Path: /boot
  Mount flags: ST_NOATIME

🔬 Linux-Specific: statfs() and Filesystem Type Detection

The Linux-specific statfs(2) syscall returns a struct statfs with an additional f_type field — a magic number identifying the filesystem type. This is useful when you need to branch on filesystem capabilities at runtime:

#include <stdio.h>
#include <sys/vfs.h>      /* Linux-only: statfs() */

/* Common filesystem magic numbers */
#define EXT4_SUPER_MAGIC   0xEF53
#define XFS_SUPER_MAGIC    0x58465342
#define TMPFS_MAGIC        0x01021994
#define BTRFS_SUPER_MAGIC  0x9123683E
#define NFS_SUPER_MAGIC    0x6969
#define FUSE_SUPER_MAGIC   0x65735546

const char *fstype_name(long type) {
    switch ((unsigned long)type) {
        case EXT4_SUPER_MAGIC:  return "ext2/3/4";
        case XFS_SUPER_MAGIC:   return "XFS";
        case TMPFS_MAGIC:       return "tmpfs";
        case BTRFS_SUPER_MAGIC: return "Btrfs";
        case NFS_SUPER_MAGIC:   return "NFS";
        case FUSE_SUPER_MAGIC:  return "FUSE";
        default:                return "unknown";
    }
}

int main(int argc, char *argv[]) {
    const char *path = (argc > 1) ? argv[1] : "/";
    struct statfs sf;

    if (statfs(path, &sf) == -1) { perror("statfs"); return 1; }

    printf("Path: %s\n", path);
    printf("  f_type:   0x%lX  (%s)\n",
           (unsigned long)sf.f_type, fstype_name(sf.f_type));
    printf("  f_bsize:  %ld\n", sf.f_bsize);
    printf("  f_blocks: %lu\n", (unsigned long)sf.f_blocks);
    return 0;
}
Portability: statfs() / struct statfs are Linux-specific (not POSIX). If you need the filesystem type in portable code, parse /proc/mounts or /proc/filesystems instead.

⚠️ Inode Exhaustion — A Hidden “Disk Full” Scenario

A filesystem can run out of inodes while still having plenty of free block space. This causes ENOSPC on file creation even when df shows gigabytes available — a notoriously confusing situation.

❌ Disk full error despite free space
touch /var/spool/mail/newfile
touch: cannot touch: No space left
df -h /var shows 40% used — but df -i /var shows 100% inode usage!

✅ Check inodes with df -i or statvfs
df -i /var
Use f_favail from statvfs for programmatic checks. Mail spools, cache dirs, and /tmp are common culprits.
/* Check BOTH block space AND inode availability */
int has_resources(const char *path,
                  unsigned long long need_bytes,
                  unsigned long need_inodes) {
    struct statvfs sv;
    if (statvfs(path, &sv) == -1) return 0;

    unsigned long long avail_bytes  = (unsigned long long)sv.f_bavail * sv.f_frsize;
    unsigned long      avail_inodes = (unsigned long)sv.f_favail;

    if (avail_bytes < need_bytes) {
        fprintf(stderr, "Insufficient block space on %s\n", path);
        return 0;
    }
    if (avail_inodes < need_inodes) {
        fprintf(stderr, "Insufficient inodes on %s "
                "(need %lu, have %lu)\n",
                path, need_inodes, avail_inodes);
        return 0;
    }
    return 1;
}

🎯 Interview Questions: statvfs() and Filesystem Statistics

Q1. What is the difference between f_bfree and f_bavail in struct statvfs?
f_bfree is the total number of free blocks on the filesystem, including blocks reserved for privileged (root) use. f_bavail is the subset of free blocks available to non-privileged processes. The difference equals the reserved block count (typically 5% on ext4). User-space applications should always check f_bavail, not f_bfree, to determine whether they can write data — otherwise a write may succeed in the check but fail with ENOSPC when the only remaining free blocks are in the reserved pool.
Q2. What does f_frsize represent, and how does it differ from f_bsize?
f_frsize is the fundamental filesystem block size — the unit in which all block counts (f_blocks, f_bfree, f_bavail) are expressed. To convert block counts to bytes, multiply by f_frsize. f_bsize is the preferred I/O transfer size, which may be a multiple of f_frsize for performance reasons (common on network filesystems like NFS). On local filesystems like ext4, both are typically equal to the filesystem block size (e.g., 4096 bytes). Always use f_frsize for capacity calculations.
Q3. How does df(1) implement its output? What kernel interface does it use?
df reads all mount points from /proc/mounts (or /etc/mtab) using getmntent(), then calls statvfs() on each mount directory. For each filesystem it computes: total = f_blocks × f_frsize, used = (f_blocks − f_bfree) × f_frsize, available = f_bavail × f_frsize, and use% = used / total × 100. The -i flag switches to inode reporting using f_files, f_ffree, and f_favail. The underlying syscall is statfs(2) or statvfs(2) depending on the implementation.
Q4. How would a program check if a filesystem is mounted read-only before attempting a write?
Call statvfs(path, &sv) and test the f_flag field for the ST_RDONLY bit: if (sv.f_flag & ST_RDONLY) { /* read-only */ }. This is more reliable than checking /proc/mounts because it goes directly to the kernel’s per-superblock mount flags and is immune to race conditions between reading a text file and acting on its content.
Q5. A process gets ENOSPC even though df shows 30% disk space free. What is the likely cause?
Inode exhaustion. The filesystem has free data blocks but has run out of inode slots. This happens in directories with millions of small files (mail spools, cache directories, session stores). Verify with df -i or by checking statvfs().f_favail == 0. The fix is either to delete files to free inodes, or to recreate the filesystem with a lower bytes-per-inode ratio (mkfs.ext4 -i <bytes>). On XFS and Btrfs, inodes are allocated dynamically so this is far less common.
Q6. What is the difference between statvfs() and the Linux-specific statfs()?
statvfs() is the POSIX-portable API (declared in <sys/statvfs.h>), returning a struct statvfs with standardized fields. It does not expose the filesystem type. statfs() is Linux-specific (declared in <sys/vfs.h>), returning a struct statfs that includes the f_type magic number identifying the filesystem type (e.g., 0xEF53 for ext4, 0x58465342 for XFS). Use statvfs() for portable code; use statfs() when you specifically need to branch on filesystem type at runtime on Linux.
Q7. What is the advantage of fstatvfs() over statvfs() when checking space before a write?
fstatvfs() takes an open file descriptor rather than a path. This guarantees the statistics come from the exact filesystem on which the file resides, with no risk of path resolution ambiguity (e.g., a symlink redirecting to a different mount). It also avoids a second kernel path traversal. The idiomatic pattern is: open the destination file, call fstatvfs(fd, &sv) to check f_bavail, and proceed with writes only if sufficient space exists — all using the same fd.

Leave a Reply

Your email address will not be published. Required fields are marked *