close() and lseek() — Closing Files and Navigating Within Them

Why closing a file properly matters, and how to jump to any position inside a file using lseek()

Topic
close() & lseek()

Level
Intermediate

Part
4 of 7

Keywords:

close() lseek() file offset SEEK_SET SEEK_CUR SEEK_END file hole sparse file random access

Closing Files and Moving Around in Them

When you finish working with a file, you should close it using close(). This sounds trivial, but improper closing is a common source of subtle bugs. After that, we explore lseek() — the system call that lets you jump to any position in a file, enabling random access rather than just reading from beginning to end.

The close() System Call

Function Signature

#include <unistd.h>

int close(int fd);

/* Returns:
    0   on success
   -1   on error (errno is set) */

Just one parameter — the file descriptor to close. Simple to call, but important to understand properly.

What Does close() Actually Do?

When you call close(fd), the kernel does several things:

Marks the file descriptor number as free — it can be reused for a future open()
Decrements the internal reference count on the open file description. If the count reaches zero, the kernel frees all the internal memory it was using to track the file
If file locking was used, releases any locks held on the file
For some network filesystems, flushes pending writes to the server

Think of it like returning a library book — you hand it back, the library marks the book as available again, and your name is removed from the borrower record.

Why You Must Close Files Explicitly

Yes, when your program exits, the OS automatically closes all open file descriptors. But you should not rely on this. Here is why:

Reason 1: File Descriptor Leaks

Every process has a limit on how many file descriptors it can have open at once (typically 1024 by default). If a long-running program (like a web server) opens files but never closes them, it will eventually run out of file descriptors and fail to open any more files. This is called a file descriptor leak.

Reason 2: NFS and Network Errors

For Network File System (NFS) mounts, a failed write might not be detected until close() is called. If you never check the return value of close(), you could miss the fact that your data was never actually saved on the server.

Reason 3: Code Readability and Safety

Closing files when you are done with them makes your code more readable and predictable. It is also safer when other programmers modify the code later — they know exactly when resources are freed.

Always Check the Return Value of close()

Many programmers write close(fd); and ignore the return value. This is a mistake. close() can fail — and if it does, it usually means data was lost.

/* WRONG — ignoring the return value */
close(fd);

/* CORRECT — always check for errors */
if (close(fd) == -1) {
    perror("close");
    /* Handle the error — possibly log it or exit */
}

Common errors from close():

EBADF — The file descriptor is not valid (already closed, or was never opened). This usually indicates a bug in your code — double-closing a file descriptor.
EIO — An I/O error occurred. For NFS, this can mean the data was never actually written to disk.

The Double Close Bug — A Sneaky Race Condition

Closing a file descriptor twice is one of the most dangerous bugs in multithreaded programs. Here is why:

Double Close Race Condition

Thread 1 Thread 2 ───────────────────────────────────────────────────── close(fd=5) ──────────────► fd 5 is now FREE open(“config.txt”) ──► gets fd=5 close(fd=5) AGAIN ──────────────► CLOSES Thread 2’s config.txt!! Thread 2 tries to read fd=5 ──► EBADF or WRONG FILE

After the first close, fd 5 is recycled. Another thread opens a new file and gets fd 5. Then the first thread’s second close() closes the wrong file! Use proper synchronization and careful ownership of file descriptors to avoid this.

The lseek() System Call — Random File Access

Understanding the File Offset

Every open file has a file offset — a number that says “the next read or write starts at this byte position”. Think of it like a bookmark in a book — it marks where you currently are.

When you first open a file, the offset is set to 0 (the beginning). Every time you call read() or write(), the offset automatically moves forward by the number of bytes you read or wrote.

File Offset — Moving Through a File

File Content: “Hello, World! How are you?” ↑ Offset = 0 (just opened) After read(fd, buf, 7): read “Hello, ” “Hello, World! How are you?” ↑ Offset = 7 After read(fd, buf, 6): read “World!” “Hello, World! How are you?” ↑ Offset = 13 After read(fd, buf, 100): read “How are you?” (only 14 bytes remain) Returns 14, next call returns 0 (EOF)

lseek() Function Signature

#include <unistd.h>

off_t lseek(int fd, off_t offset, int whence);

/* Returns:
    new file offset (a non-negative number)  on success
   -1   on error (errno is set to ESPIPE for pipes/sockets) */

The three parameters:

fd — file descriptor
offset — how many bytes to move (can be negative)
whence — the reference point for the offset

The l in lseek stands for long — historically the offset was typed as a long integer. Today it uses off_t which is typically a 64-bit integer on modern systems.

The whence Argument — Three Reference Points

The whence argument tells lseek where to measure the offset from:

SEEK_SET, SEEK_CUR, SEEK_END — Visual Guide

File: [Byte 0][Byte 1][Byte 2]…[Byte N-2][Byte N-1] [past end] ↑ ↑ ↑ ↑ SEEK_SET, 0 SEEK_END,-2 SEEK_END,-1 SEEK_END, 0 (start) (2nd last) (last byte) (past end) SEEK_SET measures from the BEGINNING of the file (offset 0 = start) lseek(fd, 0, SEEK_SET) → jump to start of file lseek(fd, 50, SEEK_SET) → jump to byte 50 SEEK_CUR measures from the CURRENT offset position lseek(fd, 0, SEEK_CUR) → don’t move, just get current position lseek(fd, 10, SEEK_CUR) → move 10 bytes forward lseek(fd, -5, SEEK_CUR) → move 5 bytes backward SEEK_END measures from PAST the last byte of the file lseek(fd, 0, SEEK_END) → just past the last byte (for appending) lseek(fd, -1, SEEK_END) → the last byte of the file lseek(fd, -10, SEEK_END) → 10 bytes before end of file lseek(fd, 100, SEEK_END) → 100 bytes past end (creates a file hole!)

Practical lseek() Examples

/* Jump to the very beginning of the file */
lseek(fd, 0, SEEK_SET);

/* Get the current file position without moving */
off_t current_pos = lseek(fd, 0, SEEK_CUR);

/* Jump to the end of the file */
lseek(fd, 0, SEEK_END);

/* Get the size of the file */
off_t size = lseek(fd, 0, SEEK_END);  /* jump to end */
lseek(fd, 0, SEEK_SET);               /* jump back to start */
printf("File size: %ld bytes\n", (long)size);

/* Go 10 bytes before the end */
lseek(fd, -10, SEEK_END);

/* Skip forward 100 bytes from current position */
lseek(fd, 100, SEEK_CUR);

/* Always check the return value! */
if (lseek(fd, 50, SEEK_SET) == -1) {
    perror("lseek");
}

lseek() Does Not Touch Disk

An important performance detail: calling lseek() does not cause any disk access. It only adjusts a number in the kernel’s in-memory file table. The actual disk read or write happens only when you call read() or write() afterward.

This means you can call lseek() as many times as you want — it is essentially free from a performance standpoint.

Where lseek() Cannot Be Used

lseek() only works on seekable file types — that is, files where you can jump around. It does NOT work on:

Pipes — data flows in one direction; you cannot go back
FIFOs (named pipes) — same reason
Sockets — network data is not random-access
Some terminal devices — they are sequential by nature

If you call lseek() on one of these, it returns -1 and sets errno to ESPIPE (illegal seek).

File Holes — Seeking Past the End of a File

What Happens When You Seek Past EOF and Write?

Here is something fascinating: you can use lseek() to move the file offset past the end of the file. If you then write data there, the gap between the old end and the new write is called a file hole.

File Hole — What It Looks Like

Sequence of operations: 1. File starts empty: size = 0 2. write(fd, “ABC”, 3) → file = [A][B][C], size = 3 3. lseek(fd, 100, SEEK_SET) → offset jumps to position 100 4. write(fd, “XYZ”, 3) → writes at position 100 Resulting file layout (size = 103 bytes): Position: 0 1 2 3 4 … 99 100 101 102 Content: [A] [B] [C] [0] [0] … [0] [X] [Y] [Z] ↑_____________↑ These are NULL bytes (zeros) This is the FILE HOLE — positions 3 to 99 If you read() from the hole, you get zero bytes (null bytes) But the hole takes NO disk space — it is not physically stored!

Why File Holes Are Useful

File holes make it possible to have very large sparse files — files that appear huge but take up very little disk space because most of the content is zero-filled holes.

Common real-world uses of sparse files include:

Virtual machine disk images — a VM disk file might be “100 GB” but only use 10 GB of actual disk space because most of the virtual disk is empty (holes)
Database files — databases pre-allocate large files but only write to the parts they actually use
Core dump files — when a program crashes, the dump contains large areas of zero-filled memory that become holes
Torrent files being downloaded — some download clients create a file of the full size upfront, then fill in pieces as they arrive

Important Detail:

Most native Linux filesystems (ext4, xfs, btrfs) support sparse files. But some filesystems — notably Microsoft’s FAT/VFAT — do not. On those, when you create a sparse file, the filesystem fills all the holes with actual null bytes on disk, consuming the full space.

Practical Example — Creating a Sparse File

#include <fcntl.h>
#include <unistd.h>

int main() {
    int fd = open("sparse.dat", O_RDWR | O_CREAT | O_TRUNC, 0644);

    /* Write 3 bytes at position 0 */
    write(fd, "ABC", 3);

    /* Jump to position 100,000 */
    lseek(fd, 100000, SEEK_SET);

    /* Write 3 bytes at position 100,000 */
    write(fd, "XYZ", 3);

    close(fd);
    /* sparse.dat is 100,003 bytes long
       but uses almost no disk space — positions 3-99999 are a hole */

    return 0;
}

/* Verify with:
   ls -lh sparse.dat    → shows 98K apparent size
   du -h sparse.dat     → shows only 8K actual disk usage */

Putting It All Together — An Interactive seek Demo

What the seek_io Demo Program Does

A classic teaching program demonstrates read(), write(), and lseek() together. You pass commands on the command line:

/* Usage: seek_io <filename> [commands...]
   Commands:
   s<N>  = seek to byte N from start
   r<N>  = read N bytes (print as text)
   R<N>  = read N bytes (print as hex)
   w<str> = write string str at current position

Example session:
$ touch testfile
$ ./seek_io testfile s0 wabc s0 r3
   → seeks to 0, writes "abc", seeks back to 0, reads 3 bytes → prints "abc"

$ ./seek_io testfile s100000 wabc
$ ./seek_io testfile s10000 R5
   → seeks to 10000 (inside the hole), reads 5 bytes
   → prints: 00 00 00 00 00   (all zeros — the hole!)
*/

The hex output of all zeros perfectly demonstrates that reading from a file hole returns null bytes, even though no disk space was used to store those zeros.

Key Takeaways from This Post

close() frees the file descriptor number for reuse and releases kernel resources
Always check close() return value — errors like EIO can mean data was lost
Never close the same fd twice in multithreaded programs — it is a dangerous race condition
Every open file has a file offset (like a bookmark) that advances with each read/write
lseek() lets you jump to any byte position in a file — before, middle, or even past end
SEEK_SET = from beginning, SEEK_CUR = from current, SEEK_END = from past-end
lseek() does not cause disk access — it only moves a number in the kernel
Seeking past end and writing creates a file hole — reads from holes return zero bytes
Holes save disk space — they are not physically stored on most native Linux filesystems

Up Next: ioctl() and the Universal I/O Summary

The final topic covers ioctl() — the “escape hatch” for device operations that do not fit the standard model — plus a full summary of everything covered in this series.

Next Post: ioctl() & Summary →

embeddedpathashala.com