Why blocking I/O breaks at scale, and how Linux solves it

 

Alternative I/O Models — Part 1
Why blocking I/O breaks at scale, and how Linux solves it

The Real-World Problem

Imagine you are writing a chat server. One hundred users are connected. Each user might send a message at any point in time. With normal blocking read(), your program would freeze waiting for one user and miss everyone else. This is not a bug — it is how blocking I/O works by design.

Alternative I/O models exist to answer one question: how can a single process efficiently monitor many file descriptors at the same time?

The Blocking I/O Problem

With standard blocking I/O, a server using one thread per client looks like this:

Client 1
Thread 1 — blocked on read()
Client 2
Thread 2 — blocked on read()
Client N
Thread N — blocked on read()
Problem: With 10,000 clients you need 10,000 threads. That is 10,000 × 8MB stack = 80GB RAM just for stacks. Not feasible.

The Alternative — Monitor All at Once

Alternative I/O models let one thread ask the kernel: “tell me which of these 10,000 fds is ready”. Then the program only does I/O on ready fds.

Single Process / Single Thread
fd 3
fd 4
fd 5
fd 6 ← READY
fd 7
… fd N
→ select/poll/epoll tells us: fd 6 has data → we call read(fd 6) → no blocking

Important Point — These APIs Don’t Do I/O

select(), poll(), and epoll do NOT read or write data. They only tell you which file descriptors are ready. After they return, you still call read() or write() yourself.

Think of them as a waiting room notification system — they say “your fd is ready”, and then you go do the actual work.

The Four Alternative I/O Models — Quick Comparison

Model How It Works Performance at Scale POSIX Standard?
select() Pass fd set every call; kernel scans it Poor — O(N) scan, limited to 1024 fds Yes (POSIX)
poll() Pass array every call; no fd limit Medium — still O(N) scan Yes (POSIX)
Signal-Driven I/O Kernel sends SIGIO when fd is ready Good — kernel remembers fds Mostly (SUSv3)
epoll Register once; kernel tracks changes Best — O(1) for readiness Linux only

What Does “Ready” Mean?

Ready for Reading
  • Data available in socket receive buffer
  • EOF reached (read returns 0)
  • A new connection waiting on listening socket
  • Error condition pending
Ready for Writing
  • Space available in socket send buffer
  • Pipe write end has room
  • Nonblocking write would not block
  • Error condition pending

The C10K Problem

In the late 1990s, Dan Kegel wrote a famous article asking: can a single web server handle 10,000 simultaneous clients? That is the “C10K problem” (C = connections, 10K = 10,000).

The answer requires using one of the alternative I/O models in this chapter — because thread-per-client simply does not scale. Today we deal with C100K and C1M (million connections), and epoll is the foundation that makes it possible.

1998
C10K Problem published by Dan Kegel
2001
epoll added to Linux kernel 2.5.44
Today
nginx/Node.js use epoll for millions of connections

Basic Flow — How Any of These Models Works

/* General pattern used by all alternative I/O models */

1. Setup phase:
   - Open/create your file descriptors (sockets, pipes, files)
   - Register them with select/poll/epoll (or enable signal-driven I/O)

2. Event loop:
   while (1) {
       /* Block here until at least one fd is ready */
       ready_count = select() or poll() or epoll_wait();

       /* Check which fds are ready */
       for each ready fd {
           if (fd is readable)
               read(fd, buf, sizeof(buf));  /* This will NOT block */
           if (fd is writable)
               write(fd, data, len);        /* This will NOT block */
       }
   }

3. Important rule:
   Once the monitoring API says fd is ready, I/O on that fd
   will NOT block (assuming no race with another thread).

Interview Questions

Q1: Why can’t we just create one thread per client?
Each thread needs a stack (typically 2MB to 8MB). With 10,000 clients you need 20GB to 80GB of memory just for thread stacks. Context switching overhead also becomes huge. Alternative I/O models let one thread handle thousands of connections by monitoring all fds together and only doing work when an fd is actually ready.
Q2: Do select/poll/epoll read or write data?
No. They only tell you which file descriptors are ready for I/O. After they return you still need to call read() or write() yourself. They are monitoring tools, not I/O tools.
Q3: What is meant by “fd is ready”?
A file descriptor is ready for reading when an I/O call on it (like read()) would not block — meaning data is available, EOF is reached, or an error is pending. Ready for writing means the kernel has buffer space so write() would not block.
Q4: Which is fastest — select, poll, or epoll? Why?
epoll is fastest for large numbers of file descriptors. select() and poll() require you to pass the full list of fds on every call and the kernel rescans everything each time (O(N)). epoll maintains an internal interest list in the kernel and only reports what changed, so checking readiness is O(1) regardless of how many fds you are watching.
Q5: Can these techniques be used on regular files?
Regular disk files always show as ready in select/poll/epoll because disk I/O is buffered — the kernel never blocks on disk reads from the page cache perspective. These mechanisms are most useful with network sockets, pipes, terminals, and other character devices where actual blocking can occur.

Leave a Reply

Your email address will not be published. Required fields are marked *