close() & lseek()
Intermediate
4 of 7
Closing Files and Moving Around in Them
When you finish working with a file, you should close it using close(). This sounds trivial, but improper closing is a common source of subtle bugs. After that, we explore lseek() — the system call that lets you jump to any position in a file, enabling random access rather than just reading from beginning to end.
The close() System Call
#include <unistd.h>
int close(int fd);
/* Returns:
0 on success
-1 on error (errno is set) */
Just one parameter — the file descriptor to close. Simple to call, but important to understand properly.
When you call close(fd), the kernel does several things:
- Marks the file descriptor number as free — it can be reused for a future
open() - Decrements the internal reference count on the open file description. If the count reaches zero, the kernel frees all the internal memory it was using to track the file
- If file locking was used, releases any locks held on the file
- For some network filesystems, flushes pending writes to the server
Think of it like returning a library book — you hand it back, the library marks the book as available again, and your name is removed from the borrower record.
Yes, when your program exits, the OS automatically closes all open file descriptors. But you should not rely on this. Here is why:
Every process has a limit on how many file descriptors it can have open at once (typically 1024 by default). If a long-running program (like a web server) opens files but never closes them, it will eventually run out of file descriptors and fail to open any more files. This is called a file descriptor leak.
For Network File System (NFS) mounts, a failed write might not be detected until close() is called. If you never check the return value of close(), you could miss the fact that your data was never actually saved on the server.
Closing files when you are done with them makes your code more readable and predictable. It is also safer when other programmers modify the code later — they know exactly when resources are freed.
Many programmers write close(fd); and ignore the return value. This is a mistake. close() can fail — and if it does, it usually means data was lost.
/* WRONG — ignoring the return value */
close(fd);
/* CORRECT — always check for errors */
if (close(fd) == -1) {
perror("close");
/* Handle the error — possibly log it or exit */
}
Common errors from close():
- EBADF — The file descriptor is not valid (already closed, or was never opened). This usually indicates a bug in your code — double-closing a file descriptor.
- EIO — An I/O error occurred. For NFS, this can mean the data was never actually written to disk.
Closing a file descriptor twice is one of the most dangerous bugs in multithreaded programs. Here is why:
After the first close, fd 5 is recycled. Another thread opens a new file and gets fd 5. Then the first thread’s second close() closes the wrong file! Use proper synchronization and careful ownership of file descriptors to avoid this.
The lseek() System Call — Random File Access
Every open file has a file offset — a number that says “the next read or write starts at this byte position”. Think of it like a bookmark in a book — it marks where you currently are.
When you first open a file, the offset is set to 0 (the beginning). Every time you call read() or write(), the offset automatically moves forward by the number of bytes you read or wrote.
#include <unistd.h>
off_t lseek(int fd, off_t offset, int whence);
/* Returns:
new file offset (a non-negative number) on success
-1 on error (errno is set to ESPIPE for pipes/sockets) */
The three parameters:
- fd — file descriptor
- offset — how many bytes to move (can be negative)
- whence — the reference point for the offset
The l in lseek stands for long — historically the offset was typed as a long integer. Today it uses off_t which is typically a 64-bit integer on modern systems.
The whence argument tells lseek where to measure the offset from:
/* Jump to the very beginning of the file */
lseek(fd, 0, SEEK_SET);
/* Get the current file position without moving */
off_t current_pos = lseek(fd, 0, SEEK_CUR);
/* Jump to the end of the file */
lseek(fd, 0, SEEK_END);
/* Get the size of the file */
off_t size = lseek(fd, 0, SEEK_END); /* jump to end */
lseek(fd, 0, SEEK_SET); /* jump back to start */
printf("File size: %ld bytes\n", (long)size);
/* Go 10 bytes before the end */
lseek(fd, -10, SEEK_END);
/* Skip forward 100 bytes from current position */
lseek(fd, 100, SEEK_CUR);
/* Always check the return value! */
if (lseek(fd, 50, SEEK_SET) == -1) {
perror("lseek");
}
An important performance detail: calling lseek() does not cause any disk access. It only adjusts a number in the kernel’s in-memory file table. The actual disk read or write happens only when you call read() or write() afterward.
This means you can call lseek() as many times as you want — it is essentially free from a performance standpoint.
lseek() only works on seekable file types — that is, files where you can jump around. It does NOT work on:
- Pipes — data flows in one direction; you cannot go back
- FIFOs (named pipes) — same reason
- Sockets — network data is not random-access
- Some terminal devices — they are sequential by nature
If you call lseek() on one of these, it returns -1 and sets errno to ESPIPE (illegal seek).
File Holes — Seeking Past the End of a File
Here is something fascinating: you can use lseek() to move the file offset past the end of the file. If you then write data there, the gap between the old end and the new write is called a file hole.
File holes make it possible to have very large sparse files — files that appear huge but take up very little disk space because most of the content is zero-filled holes.
Common real-world uses of sparse files include:
- Virtual machine disk images — a VM disk file might be “100 GB” but only use 10 GB of actual disk space because most of the virtual disk is empty (holes)
- Database files — databases pre-allocate large files but only write to the parts they actually use
- Core dump files — when a program crashes, the dump contains large areas of zero-filled memory that become holes
- Torrent files being downloaded — some download clients create a file of the full size upfront, then fill in pieces as they arrive
Most native Linux filesystems (ext4, xfs, btrfs) support sparse files. But some filesystems — notably Microsoft’s FAT/VFAT — do not. On those, when you create a sparse file, the filesystem fills all the holes with actual null bytes on disk, consuming the full space.
#include <fcntl.h>
#include <unistd.h>
int main() {
int fd = open("sparse.dat", O_RDWR | O_CREAT | O_TRUNC, 0644);
/* Write 3 bytes at position 0 */
write(fd, "ABC", 3);
/* Jump to position 100,000 */
lseek(fd, 100000, SEEK_SET);
/* Write 3 bytes at position 100,000 */
write(fd, "XYZ", 3);
close(fd);
/* sparse.dat is 100,003 bytes long
but uses almost no disk space — positions 3-99999 are a hole */
return 0;
}
/* Verify with:
ls -lh sparse.dat → shows 98K apparent size
du -h sparse.dat → shows only 8K actual disk usage */
Putting It All Together — An Interactive seek Demo
A classic teaching program demonstrates read(), write(), and lseek() together. You pass commands on the command line:
/* Usage: seek_io <filename> [commands...]
Commands:
s<N> = seek to byte N from start
r<N> = read N bytes (print as text)
R<N> = read N bytes (print as hex)
w<str> = write string str at current position
Example session:
$ touch testfile
$ ./seek_io testfile s0 wabc s0 r3
→ seeks to 0, writes "abc", seeks back to 0, reads 3 bytes → prints "abc"
$ ./seek_io testfile s100000 wabc
$ ./seek_io testfile s10000 R5
→ seeks to 10000 (inside the hole), reads 5 bytes
→ prints: 00 00 00 00 00 (all zeros — the hole!)
*/
The hex output of all zeros perfectly demonstrates that reading from a file hole returns null bytes, even though no disk space was used to store those zeros.
Key Takeaways from This Post
close()frees the file descriptor number for reuse and releases kernel resources- Always check close() return value — errors like EIO can mean data was lost
- Never close the same fd twice in multithreaded programs — it is a dangerous race condition
- Every open file has a file offset (like a bookmark) that advances with each read/write
lseek()lets you jump to any byte position in a file — before, middle, or even past end- SEEK_SET = from beginning, SEEK_CUR = from current, SEEK_END = from past-end
lseek()does not cause disk access — it only moves a number in the kernel- Seeking past end and writing creates a file hole — reads from holes return zero bytes
- Holes save disk space — they are not physically stored on most native Linux filesystems
Up Next: ioctl() and the Universal I/O Summary
The final topic covers ioctl() — the “escape hatch” for device operations that do not fit the standard model — plus a full summary of everything covered in this series.
Next Post: ioctl() & Summary →
