What is epoll?
epoll is a Linux system call mechanism that lets your program watch many file descriptors at the same time and get notified only when one of them is ready for I/O. It is much more efficient than older methods like select() or poll() when you have hundreds or thousands of connections.
Think of it like a security guard at an apartment building. Instead of knocking on every door every few seconds to ask “are you ready?”, the guard just waits at the front desk and each flat notifies the guard when something happens. That is exactly what epoll does for file descriptors.
epoll uses three system calls:
Get back epfd
from interest list
becomes ready
The demo program opens multiple files (in this case FIFOs) and monitors all of them using epoll. Let us walk through exactly what it does.
Step 1 — Create the epoll instance
The program starts by calling epoll_create(). You pass a hint for the number of fds you plan to monitor (the kernel ignores the exact value since Linux 2.6.8, but it must be greater than 0).
#include <sys/epoll.h>
#include <fcntl.h>
#define MAX_BUF 1000 /* max bytes in one read() */
#define MAX_EVENTS 5 /* max events per epoll_wait() call */
int epfd = epoll_create(argc - 1);
if (epfd == -1)
errExit("epoll_create");
epfd is now a file descriptor that represents the epoll instance. You use it in all future calls.
Step 2 — Register file descriptors (interest list)
For each file you want to monitor, you open it and add it to the epoll interest list using epoll_ctl() with EPOLL_CTL_ADD.
struct epoll_event ev;
int fd, numOpenFds;
for (int j = 1; j < argc; j++) {
fd = open(argv[j], O_RDONLY);
if (fd == -1)
errExit("open");
printf("Opened \"%s\" on fd %d\n", argv[j], fd);
ev.events = EPOLLIN; /* monitor for input data */
ev.data.fd = fd; /* store fd so we know which one fired */
if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
errExit("epoll_ctl");
}
numOpenFds = argc - 1;
The ev.data.fd field is your “tag” — when epoll reports an event, you read this field to find out which fd is ready. The kernel stores this value and gives it back to you unchanged.
Step 3 — The main event loop
Now we loop, calling epoll_wait() repeatedly. Each call blocks until at least one fd has data ready. The ready events are placed in the evlist array.
struct epoll_event evlist[MAX_EVENTS];
char buf[MAX_BUF];
int ready, s;
while (numOpenFds > 0) {
printf("About to epoll_wait()\n");
ready = epoll_wait(epfd, evlist, MAX_EVENTS, -1);
/* -1 timeout means block forever until an event occurs */
if (ready == -1) {
if (errno == EINTR)
continue; /* interrupted by signal — just retry */
else
errExit("epoll_wait");
}
printf("Ready: %d\n", ready);
for (int j = 0; j < ready; j++) {
printf(" fd=%d; events: %s%s%s\n",
evlist[j].data.fd,
(evlist[j].events & EPOLLIN) ? "EPOLLIN " : "",
(evlist[j].events & EPOLLHUP) ? "EPOLLHUP " : "",
(evlist[j].events & EPOLLERR) ? "EPOLLERR " : "");
if (evlist[j].events & EPOLLIN) {
/* data available — read it */
s = read(evlist[j].data.fd, buf, MAX_BUF);
if (s == -1)
errExit("read");
printf(" read %d bytes: %.*s\n", s, s, buf);
} else if (evlist[j].events & (EPOLLHUP | EPOLLERR)) {
/* writer closed the pipe / error occurred */
/* only close if EPOLLIN is NOT also set;
if both are set there may be more data to read first */
printf(" closing fd %d\n", evlist[j].data.fd);
if (close(evlist[j].data.fd) == -1)
errExit("close");
numOpenFds--;
}
}
}
printf("All file descriptors closed; bye\n");
(and EPOLLIN NOT set)
Important detail: If both EPOLLIN and EPOLLHUP are set at the same time, we read first. There might still be data in the buffer even though the writer has closed its end. We will see EPOLLHUP again on the next epoll_wait() call and close then.
The demo uses two FIFOs (named pipes). Here is what happens when you run it:
Notice that when both ppp and qqq were typed (while epoll_input was suspended), epoll_wait() returned both events at once (Ready: 2). This is the efficiency advantage of epoll — one blocking call, multiple events.
| Event | fd 4 (p) | fd 5 (q) | epoll_wait returns |
|---|---|---|---|
| Type “ppp” + “qqq”, close q | EPOLLIN | EPOLLIN + EPOLLHUP | 2 events |
| Close p (Ctrl-D) | EPOLLHUP | Already closed | 1 event |
| Flag | Meaning | When you see it |
|---|---|---|
EPOLLIN |
Data is available to read | Writer sent data into pipe/FIFO/socket |
EPOLLOUT |
Space available to write | Socket send buffer has room |
EPOLLHUP |
Hangup — writer closed their end | All writers closed the pipe/FIFO |
EPOLLERR |
Error on the fd | Something went wrong with the fd |
EPOLLET |
Edge-triggered mode | Only notifies once per state change |
EPOLLONESHOT |
Notify once then disable | Must re-arm with EPOLL_CTL_MOD |
Below is a complete, self-contained program you can compile and run. It monitors multiple FIFOs and handles hangup events properly.
/*
* epoll_monitor.c
* Monitor multiple FIFOs using epoll.
*
* Compile: gcc -o epoll_monitor epoll_monitor.c
* Usage: mkfifo fifo1 fifo2
* ./epoll_monitor fifo1 fifo2
* (in other terminals: echo "hello" > fifo1)
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/epoll.h>
#define MAX_BUF 1000
#define MAX_EVENTS 10
static void die(const char *msg) {
perror(msg);
exit(EXIT_FAILURE);
}
int main(int argc, char *argv[])
{
if (argc < 2) {
fprintf(stderr, "Usage: %s fifo1 [fifo2 ...]\n", argv[0]);
exit(EXIT_FAILURE);
}
/* 1. Create epoll instance */
int epfd = epoll_create1(0); /* epoll_create1 is the modern version */
if (epfd == -1)
die("epoll_create1");
/* 2. Open each FIFO and add to interest list */
int numOpenFds = 0;
for (int j = 1; j < argc; j++) {
/* Open in non-blocking mode so open() doesn't block
waiting for a writer (optional but good practice) */
int fd = open(argv[j], O_RDONLY | O_NONBLOCK);
if (fd == -1)
die("open");
printf("Opened '%s' on fd %d\n", argv[j], fd);
struct epoll_event ev;
ev.events = EPOLLIN; /* watch for readable data */
ev.data.fd = fd;
if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
die("epoll_ctl: EPOLL_CTL_ADD");
numOpenFds++;
}
/* 3. Event loop */
struct epoll_event evlist[MAX_EVENTS];
char buf[MAX_BUF];
while (numOpenFds > 0) {
printf("Waiting for events...\n");
int ready = epoll_wait(epfd, evlist, MAX_EVENTS, -1);
if (ready == -1) {
if (errno == EINTR)
continue; /* retry on signal */
die("epoll_wait");
}
printf("Got %d event(s)\n", ready);
for (int j = 0; j < ready; j++) {
int cur_fd = evlist[j].data.fd;
uint32_t ev = evlist[j].events;
printf(" fd=%d events:%s%s%s\n",
cur_fd,
(ev & EPOLLIN) ? " EPOLLIN" : "",
(ev & EPOLLHUP) ? " EPOLLHUP" : "",
(ev & EPOLLERR) ? " EPOLLERR" : "");
if (ev & EPOLLIN) {
/* Read all available data */
ssize_t n = read(cur_fd, buf, MAX_BUF - 1);
if (n > 0) {
buf[n] = '\0';
printf(" read %zd bytes: %s", n, buf);
} else if (n == 0) {
/* EOF — no writer holds the FIFO open */
printf(" EOF on fd %d\n", cur_fd);
} else if (errno != EAGAIN) {
/* Real error (EAGAIN is normal in non-blocking) */
perror(" read");
}
}
if ((ev & (EPOLLHUP | EPOLLERR)) && !(ev & EPOLLIN)) {
/* Writer closed and no data left — remove and close */
if (epoll_ctl(epfd, EPOLL_CTL_DEL, cur_fd, NULL) == -1)
perror("epoll_ctl: EPOLL_CTL_DEL");
close(cur_fd);
printf(" closed fd %d\n", cur_fd);
numOpenFds--;
}
}
}
close(epfd);
printf("Done — all file descriptors closed.\n");
return 0;
}
Difference between epoll_create() and epoll_create1()
/* Old way — hint parameter is ignored since kernel 2.6.8 but must be > 0 */
int epfd = epoll_create(1);
/* New way — cleaner, supports EPOLL_CLOEXEC flag */
int epfd = epoll_create1(0); /* no flags */
int epfd = epoll_create1(EPOLL_CLOEXEC); /* auto-close on exec() */
Prefer epoll_create1() in new code. The EPOLL_CLOEXEC flag means the epoll fd is automatically closed when you call exec(), which prevents fd leaks.
#include <sys/epoll.h>
/*
* int epoll_ctl(int epfd, int op, int fd, struct epoll_event *ev);
*
* epfd = your epoll instance fd (from epoll_create)
* op = operation (see below)
* fd = the fd you want to add/modify/remove
* ev = event settings (NULL for DEL)
*/
struct epoll_event ev;
ev.events = EPOLLIN | EPOLLOUT;
ev.data.fd = target_fd;
/* ADD — register a new fd */
epoll_ctl(epfd, EPOLL_CTL_ADD, target_fd, &ev);
/* MOD — change which events to monitor */
ev.events = EPOLLOUT; /* now only watch for writability */
epoll_ctl(epfd, EPOLL_CTL_MOD, target_fd, &ev);
/* DEL — stop monitoring this fd */
epoll_ctl(epfd, EPOLL_CTL_DEL, target_fd, NULL);
/* Note: ev parameter is ignored for DEL (can pass NULL) */
epoll_create() creates an epoll instance and returns a file descriptor for it. epoll_ctl() adds, modifies, or removes file descriptors from the interest list. epoll_wait() blocks until one or more monitored fds become ready, then returns the list of ready events.
The interest list is the set of all fds you have registered with epoll_ctl() — all the fds you want to watch. The ready list is a subset of the interest list containing only those fds that currently have I/O events pending. epoll_wait() returns only the entries in the ready list.
When you close an fd that was added to epoll, the kernel removes it from the interest list automatically. But if the fd receives EPOLLHUP (writer closed) at the same time as EPOLLIN (data available), you must read the data first before closing. The program checks: if EPOLLIN is set, read; if only EPOLLHUP/EPOLLERR is set (without EPOLLIN), then close.
epoll_wait() returns -1 and sets errno to EINTR. You must check for this and simply call epoll_wait() again. This is why the code has if (errno == EINTR) continue; inside the error check.
ev.data is a union that you can use to store any value you want — typically the fd itself (ev.data.fd), a pointer (ev.data.ptr), or a 64-bit integer (ev.data.u64). When an event fires, the kernel gives this value back to you in evlist[j].data. Storing the fd lets you know which fd triggered the event.
With select() and poll(), you pass the entire list of fds on every call and the kernel scans all of them. Time complexity is O(n) per call. With epoll, you register fds once; epoll_wait() only returns fds that are actually ready. Time complexity is O(ready_events), which is much better when you have thousands of fds but only a few active at a time.
The timeout is in milliseconds. -1 means block indefinitely until an event occurs. 0 means return immediately (non-blocking poll). Any positive value is the maximum time to wait in milliseconds.
struct epoll_event ev;
ev.events = EPOLLIN | EPOLLOUT | EPOLLERR;
ev.data.fd = sockfd;
if (epoll_ctl(epfd, EPOLL_CTL_ADD, sockfd, &ev) == -1) {
perror("epoll_ctl");
exit(EXIT_FAILURE);
}
Next: epoll Semantics — open file descriptions, dup(), and fork()
