Linux Filesystem & I/O Model – linux system programming course

 

Chapter 03 — Linux Filesystem & I/O Model
The Linux directory hierarchy, all seven file types, hard vs symbolic links, pathnames, permissions, file descriptors, and why “everything is a file” matters.
Navigation: Home | ← Ch 02 | Ch 04 →

Keywords

Directory HierarchyRoot DirectoryInodeHard LinkSymbolic LinkAbsolute PathRelative Pathrwx PermissionsFile Descriptorstdin stdout stderropen() read() write()stdio

The Single Directory Hierarchy

Linux maintains one single hierarchical directory structure for all files. Unlike Windows where each drive has its own tree (C:\, D:\), on Linux everything is under one root. USB drives, network shares, and additional disks are all mounted at points within this single tree.

The base is the root directory named /. Every file and directory on the entire system is a child or further descendant of root.

Typical Directory Structure
/                        ← root directory
├── bin/                 ← essential programs (ls, cp, bash)
├── boot/                ← kernel and bootloader (vmlinuz)
├── etc/                 ← system configuration files
│   ├── passwd           ← user accounts
│   └── group            ← group definitions
├── home/                ← user home directories
│   └── alice/
│       └── .bashrc
├── usr/                 ← user programs and libraries
│   └── include/
│       └── stdio.h
└── var/                 ← logs, mail, databases

The Seven File Types

Every file in the filesystem has a type. The first character in ls -l output shows the type:

1. Regular Files (–)

Ordinary data files — text, binaries, images, documents. The most common type.

2. Directories (d)

Special files whose contents are a table of filename-to-inode mappings. Every directory contains at least two entries: . (link to itself) and .. (link to its parent).

3. Symbolic Links (l)

A file containing the pathname of another file. The kernel transparently follows (dereferences) it when encountered in a path. Shown as linkname -> target in ls output.

4. Character Devices (c)

Devices providing a byte-stream interface — terminals, serial ports. Example: /dev/tty.

5. Block Devices (b)

Devices providing random-access block I/O — hard disks, SSDs, USB drives. Example: /dev/sda.

6. FIFOs / Named Pipes (p)

A pipe with a filesystem name. Used for IPC between unrelated processes. Created with mkfifo.

7. Sockets (s)

Used for IPC via network or local connections. Created by the socket() system call.

Hard Links vs Symbolic Links

Hard Links
  • A directory entry pointing directly to an inode (the kernel data structure holding file metadata and data pointers)
  • Every file starts with link count = 1; creating a hard link raises it to 2
  • Data is only freed when link count reaches zero
  • Cannot span filesystems (inodes are filesystem-local)
  • Cannot point to directories
  • All hard links to the same file are equal — no “original” vs “copy”
ln original.txt hardlink.txt      # create hard link
ls -li original.txt hardlink.txt  # same inode number, link count = 2
Symbolic Links (Symlinks)
  • A separate file containing a pathname string
  • Kernel dereferences it transparently when processing a path
  • If the target is deleted, the symlink becomes dangling
  • Can span filesystems
  • Can point to directories
ln -s original.txt symlink.txt    # create symbolic link
ls -l symlink.txt
# lrwxrwxrwx 1 alice ... symlink.txt -> original.txt

Pathnames: Absolute vs Relative

Absolute Pathnames

Begin with / and specify the full location from the root directory. Always unambiguous regardless of where the process currently is.

/home/alice/.bashrc/usr/include/stdio.h/etc/passwd
Relative Pathnames

Do not begin with /. Interpreted relative to the process’s current working directory (CWD). Each process inherits its CWD from its parent; the shell changes it with cd.

# Assuming CWD is /usr
include/stdio.h        → resolves to /usr/include/stdio.h
../home/alice          → resolves to /home/alice  (.. = parent)

File Permissions

Every file has an owner (UID) and a group (GID). The system divides users into three categories for access: owner, group, other. Three permission bits apply to each category:

The rwx Bits Explained
ls -l myfile.sh
-rwxr-x---  1  alice  devs  2048  May 1  myfile.sh
 │ │││ │││  └── other: no permission
 │ │││ └──────── group: read + execute
 │ └──────────── owner: read + write + execute
 └────────────── file type: regular file
  • r (read) — files: read content. Directories: list filenames inside.
  • w (write) — files: modify content. Directories: add, remove, rename entries.
  • x (execute) — files: run as a program. Directories: access files inside (also called search permission). Without x on a directory you cannot access any file inside it even if you know the name.

Universality of I/O

One of Linux’s most important design principles: the same four system calls — open(), read(), write(), close() — work on regular files, terminals, pipes, sockets, and device files. A program reading from stdin does not know or care whether stdin is a keyboard, a file, or a pipe. The kernel translates the request into the appropriate device-specific operation.

File Descriptors

I/O system calls refer to open files via a file descriptor — a small non-negative integer. Three are opened automatically for every program:

Standard File Descriptors
0 — stdin (standard input) 1 — stdout (standard output) 2 — stderr (standard error)

Shell I/O redirection works by closing and reopening these descriptors before exec() — the new program sees nothing unusual. In the C stdio library these correspond to the FILE* streams stdin, stdout, and stderr. Functions like printf(), fopen(), and fgets() are layered on top of the raw system call interface.

Interview Questions

Q1: What is the difference between a hard link and a symbolic link?

Answer: A hard link is a directory entry pointing directly to an inode. All hard links share the same inode number; the file data is freed only when all links are deleted (link count reaches zero). Hard links cannot cross filesystems and cannot point to directories. A symbolic link is a separate file containing a pathname string; the kernel follows it transparently. If the target is deleted, the symlink becomes dangling. Symlinks can cross filesystems and can point to directories. Use ls -li to distinguish them: hard links share the same inode number; symlinks show an arrow to their target.

Q2: What does the execute bit on a directory actually mean?

Answer: On a directory, execute means search permission — the ability to access files inside it. Without execute on a directory, you cannot access any file inside even if you know its full name and have permission on the file itself. Read permission on a directory only allows listing filenames. The kernel checks execute permission on every directory component in a pathname: to access /home/alice/file.txt, it checks x on /, then on /home, then on /home/alice, then read permission on file.txt itself.

Q3: What is universality of I/O and why does it matter?

Answer: Universality of I/O means the same four system calls — open(), read(), write(), close() — work on all file types: regular files, terminals, pipes, sockets, and device files. A program does not need to know what type of resource it reads from or writes to. This enables powerful compositions: a program reading from stdin works identically whether stdin is a keyboard, a file, or a pipe. It also means one consistent API covers everything, simplifying both programming and system design.

Q4: What are the three standard file descriptors and what do they represent?

Answer: 0 is stdin — default input source, normally the keyboard. 1 is stdout — default output destination. 2 is stderr — error and diagnostic messages, separate from stdout so they can be redirected independently. In an interactive shell all three are connected to the terminal. In the C library they correspond to FILE* streams stdin, stdout, and stderr. Shell redirection (> file, 2> err.log) works by closing and reopening these descriptors before exec(), transparently to the program.

Q5: What is the difference between an absolute and a relative pathname?

Answer: An absolute pathname starts with / and specifies the complete location from the root directory — it is unambiguous regardless of the process’s current directory. A relative pathname does not start with / and is interpreted relative to the process’s current working directory. The component .. refers to the parent directory and . to the current directory. Each process inherits its CWD from its parent; the shell changes it with cd. The same file can be referred to by many different relative paths depending on where the process currently is.

Continue to Chapter 04

Next: Processes, memory layout, fork and exec, credentials, capabilities, daemons, and shared libraries.

Chapter 04 → ← Chapter 02

Leave a Reply

Your email address will not be published. Required fields are marked *