ET Programming Framework & FD Starvation Prevention

 

ET Programming Framework & FD Starvation Prevention
Chapter 63 · Advanced epoll Patterns · EmbeddedPathashala
3
Steps in ET framework
EAGAIN
The drain signal
Round-robin
Starvation fix

Knowing that edge-triggered epoll requires you to drain the buffer until EAGAIN is one thing. But what happens when one of your file descriptors has a massive or never-ending stream of data? If you drain it greedily, all your other connected clients starve — they never get a chance to be processed.

This tutorial covers the correct general framework for writing edge-triggered epoll servers, and the specific technique for preventing file descriptor starvation under heavy single-fd load.

Key Concepts
ET Framework FD Starvation Nonblocking I/O Round-Robin Ready List EAGAIN EWOULDBLOCK sigwaitinfo() Timer Handling

The General ET Framework — 3 Steps

There is a well-defined pattern for building any server using edge-triggered epoll. Follow these three steps and your server will be correct:

ET epoll General Framework
1
Make all monitored file descriptors nonblocking
Use fcntl(fd, F_SETFL, O_NONBLOCK) for every fd before adding it to epoll. This is mandatory — without it, your drain loop would block forever on the last read.
2
Build the interest list using epoll_ctl() with EPOLLET
Register each fd with EPOLLIN | EPOLLET (or EPOLLOUT | EPOLLET for write readiness). This only needs to happen once per fd — or again if the fd changes.
3
Handle I/O in a loop: wait → drain until EAGAIN
Call epoll_wait() to get ready fds. For each ready fd, call read()/write()/accept() in a loop until you get EAGAIN or EWOULDBLOCK. That signals the buffer is empty — then go back to epoll_wait().

The FD Starvation Problem

Here is a scenario that breaks naive ET implementations. Imagine your server has 3 clients connected:

Starvation Scenario
fd=5 (Client A)
Streaming 1GB of data
(endless stream)
fd=6 (Client B)
Small request waiting
fd=7 (Client C)
Small request waiting
What happens with naive ET handling:
epoll_wait() returns fd=5 as ready → your code starts draining fd=5 → reads 512 bytes, 512 more, 512 more… fd=5 never returns EAGAIN because new data keeps arriving → fd=6 and fd=7 never get serviced → Clients B and C are starved

This problem is unique to ET mode. With LT mode, you can safely read just a bit from each fd and go back to epoll_wait() — the kernel will keep reminding you about the remaining data. ET mode removes that safety net.

Note: starvation can also occur with signal-driven I/O for the same reason — it is also an edge-triggered mechanism.

Solution: Application-Managed Ready List with Round-Robin

The fix is to take control of scheduling yourself. Instead of fully draining each fd as soon as epoll_wait() reports it, you maintain your own application-level ready list and service each fd for a limited amount of work per round.

Anti-Starvation Loop Design
Loop iteration
Step A: Call epoll_wait() with a small or zero timeout if any fds are already in your ready list. Add newly ready fds to your application ready list.
Each fd turn
Step B: For each fd in your ready list, perform a limited amount of I/O (for example, one read of a fixed buffer size). Use round-robin order — don’t always start from fd=5. Move to the next fd after each limited I/O operation.
fd done check
Step C: Remove an fd from your application ready list only when its read()/write() returns EAGAIN or EWOULDBLOCK — meaning its buffer is empty. Then it goes back to waiting for epoll to report it.

Round-Robin Service — No Starvation
fd=5
Read 512B
fd=6
Read 512B
fd=7
Read 512B
fd=5
Read 512B
… continues
All clients get serviced. fd=5 eventually gets EAGAIN when its current batch is consumed. Then next epoll_wait() fires when more arrives.

Bonus: What Else You Can Do In This Loop

The anti-starvation loop structure gives you a natural place to do other things besides I/O. This is one of the reasons experienced developers prefer this pattern even when starvation is not an immediate concern:

Timer Handling

Check and fire expired timers on each loop iteration, without needing a separate thread or alarm signal.

Signal Acceptance

Call sigwaitinfo() or read from a signalfd in the loop to process pending signals safely between I/O operations.

Connection Cleanup

Check for idle or timed-out connections in each pass and close them before they accumulate.

Code Example — Anti-Starvation ET Server

This complete example demonstrates the anti-starvation pattern. It maintains an application-level ready list and services each fd in round-robin fashion.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/epoll.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define MAX_EVENTS   64
#define MAX_CLIENTS  256
#define PORT         8080
#define READ_CHUNK   512   /* Read this many bytes per round-robin turn */

/*
 * Application-level ready list.
 * We track which fds have been reported ready by epoll
 * but may still have data to read.
 */
int ready_list[MAX_CLIENTS];
int ready_count = 0;

/* Add fd to ready list if not already present */
void ready_list_add(int fd)
{
    for (int i = 0; i < ready_count; i++)
        if (ready_list[i] == fd) return;
    if (ready_count < MAX_CLIENTS)
        ready_list[ready_count++] = fd;
}

/* Remove fd from ready list */
void ready_list_remove(int fd)
{
    for (int i = 0; i < ready_count; i++) {
        if (ready_list[i] == fd) {
            ready_list[i] = ready_list[--ready_count];
            return;
        }
    }
}

void set_nonblocking(int fd)
{
    int flags = fcntl(fd, F_GETFL, 0);
    fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}

int main(void)
{
    int server_fd, epfd, nfds;
    struct epoll_event ev, events[MAX_EVENTS];

    epfd = epoll_create1(0);

    /* Create and bind server socket */
    server_fd = socket(AF_INET, SOCK_STREAM, 0);
    int opt = 1;
    setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
    set_nonblocking(server_fd);

    struct sockaddr_in addr = {
        .sin_family      = AF_INET,
        .sin_addr.s_addr = INADDR_ANY,
        .sin_port        = htons(PORT)
    };
    bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
    listen(server_fd, 128);

    /* Register server fd with EPOLLET (edge-triggered) */
    ev.events  = EPOLLIN | EPOLLET;
    ev.data.fd = server_fd;
    epoll_ctl(epfd, EPOLL_CTL_ADD, server_fd, &ev);

    printf("ET server listening on port %d\n", PORT);

    for (;;) {
        /*
         * If our application ready list already has fds waiting,
         * use timeout=0 so we quickly check for new fds and then
         * continue servicing the existing list.
         * If the list is empty, block until something becomes ready.
         */
        int timeout = (ready_count > 0) ? 0 : -1;

        nfds = epoll_wait(epfd, events, MAX_EVENTS, timeout);
        if (nfds == -1) {
            perror("epoll_wait");
            break;
        }

        /* Add newly ready fds to application ready list */
        for (int i = 0; i < nfds; i++) {
            int fd = events[i].data.fd;
            ready_list_add(fd);
        }

        /*
         * Service ready list in round-robin.
         * Do ONE limited read per fd per loop iteration.
         * This prevents any single fd from starving others.
         */
        int i = 0;
        while (i < ready_count) {
            int fd = ready_list[i];

            if (fd == server_fd) {
                /* Accept all pending connections */
                while (1) {
                    int client_fd = accept(server_fd, NULL, NULL);
                    if (client_fd == -1) {
                        if (errno == EAGAIN || errno == EWOULDBLOCK)
                            break;  /* No more pending connections */
                        perror("accept");
                        break;
                    }
                    set_nonblocking(client_fd);
                    ev.events  = EPOLLIN | EPOLLET;
                    ev.data.fd = client_fd;
                    epoll_ctl(epfd, EPOLL_CTL_ADD, client_fd, &ev);
                    printf("New client: fd=%d\n", client_fd);
                }
                /* server_fd is fully drained — remove from ready list */
                ready_list_remove(fd);
                /* Don't advance i — next fd is now at position i */

            } else {
                /* Client fd — do ONE limited read */
                char buf[READ_CHUNK];
                ssize_t n = read(fd, buf, sizeof(buf));

                if (n > 0) {
                    /* Process data */
                    printf("fd=%d: read %zd bytes\n", fd, n);
                    /*
                     * Do NOT remove from ready list yet.
                     * There may be more data — we'll try again next round.
                     * Move to next fd (round-robin).
                     */
                    i++;

                } else if (n == 0) {
                    /* Client disconnected */
                    printf("fd=%d: disconnected\n", fd);
                    epoll_ctl(epfd, EPOLL_CTL_DEL, fd, NULL);
                    close(fd);
                    ready_list_remove(fd);
                    /* Don't advance i */

                } else {
                    if (errno == EAGAIN || errno == EWOULDBLOCK) {
                        /* Buffer fully drained — remove from ready list */
                        printf("fd=%d: fully drained (EAGAIN)\n", fd);
                        ready_list_remove(fd);
                        /* Don't advance i */
                    } else {
                        perror("read");
                        epoll_ctl(epfd, EPOLL_CTL_DEL, fd, NULL);
                        close(fd);
                        ready_list_remove(fd);
                    }
                }
            }
        }

        /* Place to add: timer checks, signal processing, etc. */
    }

    close(epfd);
    close(server_fd);
    return 0;
}
Key design points in this code:

  • epoll_wait() uses timeout=0 when ready_count > 0, so it doesn’t block — it just picks up any new events and immediately continues servicing the existing list.
  • Each client fd gets only READ_CHUNK bytes per loop turn — not a full drain.
  • An fd is removed from the ready list only when EAGAIN is received (buffer truly empty) or on disconnect/error.
  • Round-robin is achieved by i++ after a successful read — moving to the next fd rather than looping back to the same one.

Why Level-Triggered Does Not Have This Starvation Problem

With level-triggered mode you do not need the anti-starvation loop. Here is why:

LT Mode — No Starvation Risk

You read some data from fd=5. Go back to epoll_wait(). epoll will report fd=5, fd=6, and fd=7 all as ready (if they all have data). You naturally cycle through them. Partial reads on fd=5 are safe — the kernel keeps reporting it.

ET Mode — Starvation Risk

If you naively drain fd=5 until EAGAIN on every notification, other fds never get a turn. You must implement round-robin yourself. The kernel does not help you here — that’s the trade-off for ET’s efficiency.

This is why many production servers (like nginx) use LT mode by default, reserving ET only for the specific code paths where they have careful drain loops and starvation handling in place.

Interview Questions
Q1: What are the three mandatory steps when using edge-triggered epoll?

Answer: First, make all file descriptors nonblocking using fcntl(fd, F_SETFL, O_NONBLOCK). Second, register fds with epoll_ctl() using the EPOLLET flag. Third, in the event loop, for each fd returned by epoll_wait(), call read()/write()/accept() in a loop until you receive EAGAIN or EWOULDBLOCK before returning to epoll_wait().

Q2: What is file descriptor starvation in the context of edge-triggered epoll?

Answer: FD starvation occurs when one file descriptor (e.g., a client sending a large stream) keeps the server busy draining its buffer, while other file descriptors that also have data waiting never get serviced. Because ET mode requires full draining until EAGAIN, if one fd never returns EAGAIN, the server loops on it indefinitely.

Q3: How do you prevent fd starvation when using edge-triggered epoll?

Answer: Maintain an application-level ready list. When epoll_wait() reports ready fds, add them to your list. Then service each fd in the list for a limited amount of I/O per turn (e.g., one fixed-size read), cycling through all fds in round-robin order. Only remove an fd from the ready list when it returns EAGAIN. Also, use timeout=0 in epoll_wait() when the ready list is non-empty, so new events are picked up without blocking.

Q4: Why should the epoll_wait() timeout be 0 when fds are already in the ready list?

Answer: If we set timeout to -1 (block indefinitely) and there are still fds in our ready list that haven’t been fully drained, we might miss the opportunity to service them promptly. With timeout=0, epoll_wait() returns immediately with any newly ready fds (without blocking), and we then continue servicing our existing ready list. This ensures fairness without unnecessarily blocking the event loop.

Q5: What other operations can be included in the anti-starvation event loop besides I/O?

Answer: The loop structure is a natural place to integrate timer expiry checks (fire callbacks for timers that have elapsed), signal handling via sigwaitinfo() or reading from a signalfd, connection timeout detection for idle clients, and other periodic maintenance tasks. This avoids the need for separate threads or signal handlers for these operations.

Q6: Does level-triggered epoll have a file descriptor starvation problem?

Answer: Not in the same way. With LT mode, you can read a limited amount from fd=5, return to epoll_wait(), and the kernel will report fd=5 again alongside fd=6 and fd=7 if they all have data. This natural reset at each epoll_wait() call gives other fds a fair chance. You can use blocking file descriptors and partial reads safely. Starvation from a single fd is not a concern because you are free to interleave calls to epoll_wait() with partial reads.

Q7: If an fd returned EAGAIN in ET mode, what does that mean and what should you do?

Answer: EAGAIN (same as EWOULDBLOCK on Linux) means the kernel receive buffer for that fd is currently empty — there is no more data to read right now. This is the correct exit condition for your drain loop. You should remove the fd from your application ready list and return to epoll_wait(). The next notification for that fd will come when the remote side sends more data, at which point the kernel generates a new edge event.

Next: Waiting for Signals and File Descriptors Together

Learn about the classic race condition when using select() with signals, and how Linux solves it with signalfd.

Next Tutorial → ← Previous

Leave a Reply

Your email address will not be published. Required fields are marked *