epoll — Practical Usage & Complete Example

 

epoll — Practical Usage & Complete Example
Chapter 63 · Alternative I/O Models · Linux Programming Interface
📡 Topic: epoll I/O
🛠 Level: Intermediate
💻 Code: Full working demo

What is epoll?

epoll is a Linux system call mechanism that lets your program watch many file descriptors at the same time and get notified only when one of them is ready for I/O. It is much more efficient than older methods like select() or poll() when you have hundreds or thousands of connections.

Think of it like a security guard at an apartment building. Instead of knocking on every door every few seconds to ask “are you ready?”, the guard just waits at the front desk and each flat notifies the guard when something happens. That is exactly what epoll does for file descriptors.

Key Terms in This Tutorial
epoll_create() epoll_ctl() epoll_wait() EPOLLIN EPOLLHUP EPOLLERR interest list ready list epoll_event EPOLL_CTL_ADD FIFO

How epoll Works — The Big Picture

epoll uses three system calls:

epoll Three-Step Flow
Step 1
epoll_create()
Create an epoll instance
Get back epfd
Step 2
epoll_ctl()
Add / remove fds
from interest list
Step 3
epoll_wait()
Block until an fd
becomes ready

Interest List vs Ready List
Interest List
fd 4 (FIFO p)
fd 5 (FIFO q)
fd 6 (…)
All fds you registered
Ready List
fd 4 → EPOLLIN
fd 5 → EPOLLHUP
Only fds with events

The epoll_input Demo Program — Line by Line

The demo program opens multiple files (in this case FIFOs) and monitors all of them using epoll. Let us walk through exactly what it does.

Step 1 — Create the epoll instance

The program starts by calling epoll_create(). You pass a hint for the number of fds you plan to monitor (the kernel ignores the exact value since Linux 2.6.8, but it must be greater than 0).

#include <sys/epoll.h>
#include <fcntl.h>

#define MAX_BUF    1000   /* max bytes in one read() */
#define MAX_EVENTS 5      /* max events per epoll_wait() call */

int epfd = epoll_create(argc - 1);
if (epfd == -1)
    errExit("epoll_create");

epfd is now a file descriptor that represents the epoll instance. You use it in all future calls.

Step 2 — Register file descriptors (interest list)

For each file you want to monitor, you open it and add it to the epoll interest list using epoll_ctl() with EPOLL_CTL_ADD.

struct epoll_event ev;
int fd, numOpenFds;

for (int j = 1; j < argc; j++) {
    fd = open(argv[j], O_RDONLY);
    if (fd == -1)
        errExit("open");

    printf("Opened \"%s\" on fd %d\n", argv[j], fd);

    ev.events  = EPOLLIN;   /* monitor for input data */
    ev.data.fd = fd;        /* store fd so we know which one fired */

    if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
        errExit("epoll_ctl");
}

numOpenFds = argc - 1;

The ev.data.fd field is your “tag” — when epoll reports an event, you read this field to find out which fd is ready. The kernel stores this value and gives it back to you unchanged.

Step 3 — The main event loop

Now we loop, calling epoll_wait() repeatedly. Each call blocks until at least one fd has data ready. The ready events are placed in the evlist array.

struct epoll_event evlist[MAX_EVENTS];
char buf[MAX_BUF];
int ready, s;

while (numOpenFds > 0) {
    printf("About to epoll_wait()\n");

    ready = epoll_wait(epfd, evlist, MAX_EVENTS, -1);
    /* -1 timeout means block forever until an event occurs */

    if (ready == -1) {
        if (errno == EINTR)
            continue;   /* interrupted by signal — just retry */
        else
            errExit("epoll_wait");
    }

    printf("Ready: %d\n", ready);

    for (int j = 0; j < ready; j++) {
        printf("  fd=%d; events: %s%s%s\n",
               evlist[j].data.fd,
               (evlist[j].events & EPOLLIN)  ? "EPOLLIN "  : "",
               (evlist[j].events & EPOLLHUP)  ? "EPOLLHUP " : "",
               (evlist[j].events & EPOLLERR)  ? "EPOLLERR " : "");

        if (evlist[j].events & EPOLLIN) {
            /* data available — read it */
            s = read(evlist[j].data.fd, buf, MAX_BUF);
            if (s == -1)
                errExit("read");
            printf("  read %d bytes: %.*s\n", s, s, buf);

        } else if (evlist[j].events & (EPOLLHUP | EPOLLERR)) {
            /* writer closed the pipe / error occurred */
            /* only close if EPOLLIN is NOT also set;
               if both are set there may be more data to read first */
            printf("  closing fd %d\n", evlist[j].data.fd);
            if (close(evlist[j].data.fd) == -1)
                errExit("close");
            numOpenFds--;
        }
    }
}

printf("All file descriptors closed; bye\n");

Event Handling Decision Tree
epoll_wait() returns an event on fd X
EPOLLIN set
Call read() to consume data
EPOLLHUP or EPOLLERR set
(and EPOLLIN NOT set)
Close the fd, decrement counter

Important detail: If both EPOLLIN and EPOLLHUP are set at the same time, we read first. There might still be data in the buffer even though the writer has closed its end. We will see EPOLLHUP again on the next epoll_wait() call and close then.

Running the Demo — Step by Step Output Explained

The demo uses two FIFOs (named pipes). Here is what happens when you run it:

# Create two FIFOs first
$ mkfifo p q
# Start the monitoring program
$ ./epoll_input p q
# In another terminal: cat > p
Opened “p” on fd 4
# In another terminal: cat > q
Opened “q” on fd 5
About to epoll_wait()
# Suspend epoll_input, type data into both FIFOs, then resume
About to epoll_wait()
Ready: 2
fd=4; events: EPOLLIN
read 4 bytes: ppp
fd=5; events: EPOLLIN EPOLLHUP
read 4 bytes: qqq
closing fd 5
About to epoll_wait()
# Close “cat > p” with Ctrl-D
Ready: 1
fd=4; events: EPOLLHUP
closing fd 4
All file descriptors closed; bye

Notice that when both ppp and qqq were typed (while epoll_input was suspended), epoll_wait() returned both events at once (Ready: 2). This is the efficiency advantage of epoll — one blocking call, multiple events.

Event Timeline
Event fd 4 (p) fd 5 (q) epoll_wait returns
Type “ppp” + “qqq”, close q EPOLLIN EPOLLIN + EPOLLHUP 2 events
Close p (Ctrl-D) EPOLLHUP Already closed 1 event

epoll Event Flags Explained
Flag Meaning When you see it
EPOLLIN Data is available to read Writer sent data into pipe/FIFO/socket
EPOLLOUT Space available to write Socket send buffer has room
EPOLLHUP Hangup — writer closed their end All writers closed the pipe/FIFO
EPOLLERR Error on the fd Something went wrong with the fd
EPOLLET Edge-triggered mode Only notifies once per state change
EPOLLONESHOT Notify once then disable Must re-arm with EPOLL_CTL_MOD

Complete Working Example — epoll on Multiple FIFOs

Below is a complete, self-contained program you can compile and run. It monitors multiple FIFOs and handles hangup events properly.

/*
 * epoll_monitor.c
 * Monitor multiple FIFOs using epoll.
 *
 * Compile:  gcc -o epoll_monitor epoll_monitor.c
 * Usage:    mkfifo fifo1 fifo2
 *           ./epoll_monitor fifo1 fifo2
 *           (in other terminals: echo "hello" > fifo1)
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/epoll.h>

#define MAX_BUF    1000
#define MAX_EVENTS 10

static void die(const char *msg) {
    perror(msg);
    exit(EXIT_FAILURE);
}

int main(int argc, char *argv[])
{
    if (argc < 2) {
        fprintf(stderr, "Usage: %s fifo1 [fifo2 ...]\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    /* 1. Create epoll instance */
    int epfd = epoll_create1(0);   /* epoll_create1 is the modern version */
    if (epfd == -1)
        die("epoll_create1");

    /* 2. Open each FIFO and add to interest list */
    int numOpenFds = 0;

    for (int j = 1; j < argc; j++) {
        /* Open in non-blocking mode so open() doesn't block
           waiting for a writer (optional but good practice) */
        int fd = open(argv[j], O_RDONLY | O_NONBLOCK);
        if (fd == -1)
            die("open");

        printf("Opened '%s' on fd %d\n", argv[j], fd);

        struct epoll_event ev;
        ev.events  = EPOLLIN;   /* watch for readable data */
        ev.data.fd = fd;

        if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
            die("epoll_ctl: EPOLL_CTL_ADD");

        numOpenFds++;
    }

    /* 3. Event loop */
    struct epoll_event evlist[MAX_EVENTS];
    char buf[MAX_BUF];

    while (numOpenFds > 0) {
        printf("Waiting for events...\n");

        int ready = epoll_wait(epfd, evlist, MAX_EVENTS, -1);
        if (ready == -1) {
            if (errno == EINTR)
                continue;    /* retry on signal */
            die("epoll_wait");
        }

        printf("Got %d event(s)\n", ready);

        for (int j = 0; j < ready; j++) {
            int cur_fd = evlist[j].data.fd;
            uint32_t ev = evlist[j].events;

            printf("  fd=%d events:%s%s%s\n",
                   cur_fd,
                   (ev & EPOLLIN)  ? " EPOLLIN"  : "",
                   (ev & EPOLLHUP) ? " EPOLLHUP" : "",
                   (ev & EPOLLERR) ? " EPOLLERR" : "");

            if (ev & EPOLLIN) {
                /* Read all available data */
                ssize_t n = read(cur_fd, buf, MAX_BUF - 1);
                if (n > 0) {
                    buf[n] = '\0';
                    printf("  read %zd bytes: %s", n, buf);
                } else if (n == 0) {
                    /* EOF — no writer holds the FIFO open */
                    printf("  EOF on fd %d\n", cur_fd);
                } else if (errno != EAGAIN) {
                    /* Real error (EAGAIN is normal in non-blocking) */
                    perror("  read");
                }
            }

            if ((ev & (EPOLLHUP | EPOLLERR)) && !(ev & EPOLLIN)) {
                /* Writer closed and no data left — remove and close */
                if (epoll_ctl(epfd, EPOLL_CTL_DEL, cur_fd, NULL) == -1)
                    perror("epoll_ctl: EPOLL_CTL_DEL");
                close(cur_fd);
                printf("  closed fd %d\n", cur_fd);
                numOpenFds--;
            }
        }
    }

    close(epfd);
    printf("Done — all file descriptors closed.\n");
    return 0;
}

Difference between epoll_create() and epoll_create1()

/* Old way — hint parameter is ignored since kernel 2.6.8 but must be > 0 */
int epfd = epoll_create(1);

/* New way — cleaner, supports EPOLL_CLOEXEC flag */
int epfd = epoll_create1(0);               /* no flags */
int epfd = epoll_create1(EPOLL_CLOEXEC);   /* auto-close on exec() */

Prefer epoll_create1() in new code. The EPOLL_CLOEXEC flag means the epoll fd is automatically closed when you call exec(), which prevents fd leaks.

All epoll_ctl() Operations
#include <sys/epoll.h>

/*
 * int epoll_ctl(int epfd, int op, int fd, struct epoll_event *ev);
 *
 * epfd = your epoll instance fd (from epoll_create)
 * op   = operation (see below)
 * fd   = the fd you want to add/modify/remove
 * ev   = event settings (NULL for DEL)
 */

struct epoll_event ev;
ev.events  = EPOLLIN | EPOLLOUT;
ev.data.fd = target_fd;

/* ADD — register a new fd */
epoll_ctl(epfd, EPOLL_CTL_ADD, target_fd, &ev);

/* MOD — change which events to monitor */
ev.events = EPOLLOUT;   /* now only watch for writability */
epoll_ctl(epfd, EPOLL_CTL_MOD, target_fd, &ev);

/* DEL — stop monitoring this fd */
epoll_ctl(epfd, EPOLL_CTL_DEL, target_fd, NULL);
/* Note: ev parameter is ignored for DEL (can pass NULL) */

Interview Questions — epoll Practical Usage
Q1: What are the three system calls used in epoll and what does each do?

epoll_create() creates an epoll instance and returns a file descriptor for it. epoll_ctl() adds, modifies, or removes file descriptors from the interest list. epoll_wait() blocks until one or more monitored fds become ready, then returns the list of ready events.

Q2: What is the difference between the interest list and the ready list in epoll?

The interest list is the set of all fds you have registered with epoll_ctl() — all the fds you want to watch. The ready list is a subset of the interest list containing only those fds that currently have I/O events pending. epoll_wait() returns only the entries in the ready list.

Q3: Why does epoll_wait() return even though you closed one fd but the ready list still has events?

When you close an fd that was added to epoll, the kernel removes it from the interest list automatically. But if the fd receives EPOLLHUP (writer closed) at the same time as EPOLLIN (data available), you must read the data first before closing. The program checks: if EPOLLIN is set, read; if only EPOLLHUP/EPOLLERR is set (without EPOLLIN), then close.

Q4: What happens if epoll_wait() is interrupted by a signal?

epoll_wait() returns -1 and sets errno to EINTR. You must check for this and simply call epoll_wait() again. This is why the code has if (errno == EINTR) continue; inside the error check.

Q5: What does the ev.data field store and how is it used?

ev.data is a union that you can use to store any value you want — typically the fd itself (ev.data.fd), a pointer (ev.data.ptr), or a 64-bit integer (ev.data.u64). When an event fires, the kernel gives this value back to you in evlist[j].data. Storing the fd lets you know which fd triggered the event.

Q6: What is the advantage of epoll over select() or poll() for large numbers of fds?

With select() and poll(), you pass the entire list of fds on every call and the kernel scans all of them. Time complexity is O(n) per call. With epoll, you register fds once; epoll_wait() only returns fds that are actually ready. Time complexity is O(ready_events), which is much better when you have thousands of fds but only a few active at a time.

Q7: What is the timeout parameter in epoll_wait() and what does -1 mean?

The timeout is in milliseconds. -1 means block indefinitely until an event occurs. 0 means return immediately (non-blocking poll). Any positive value is the maximum time to wait in milliseconds.

Q8: Write code to add a socket fd to epoll and monitor it for both read and write events.

struct epoll_event ev;
ev.events  = EPOLLIN | EPOLLOUT | EPOLLERR;
ev.data.fd = sockfd;

if (epoll_ctl(epfd, EPOLL_CTL_ADD, sockfd, &ev) == -1) {
    perror("epoll_ctl");
    exit(EXIT_FAILURE);
}

Continue Learning

Next: epoll Semantics — open file descriptions, dup(), and fork()

EmbeddedPathashala

Leave a Reply

Your email address will not be published. Required fields are marked *