epoll_wait()
Intermediate
2 of 3
The Heart of epoll — Waiting for Events
After adding file descriptors to the interest list using epoll_ctl(), your program calls epoll_wait() to block and wait until one or more of those descriptors become ready for I/O.
Unlike select() which can only return one event at a time efficiently, epoll_wait() can return multiple ready file descriptors in a single call. This makes it extremely efficient for servers handling thousands of connections.
#include <sys/epoll.h>
int epoll_wait(int epfd, struct epoll_event *evlist, int maxevents, int timeout);
/* Returns:
- number of ready fds (>0) on success
- 0 if timeout expired before any event
- -1 on error (check errno)
*/
| Parameter | Type | Purpose |
|---|---|---|
| epfd | int | The epoll instance to monitor |
| evlist | epoll_event* | Caller-allocated array to receive ready events |
| maxevents | int | Maximum number of events to return (size of evlist) |
| timeout | int | How long to wait (milliseconds, 0, or -1) |
You allocate the evlist array yourself. The kernel fills it with ready events. Each entry in evlist is an epoll_event struct:
evlist[i].events— which events occurred (EPOLLIN, EPOLLHUP, etc.)evlist[i].data— the user data you set when calling epoll_ctl()
Important: The data field in each returned event is the same value you stored when calling epoll_ctl(). This is how you know which fd fired — epoll does not give you the fd directly, only your stored data.
These are the bit values used in ev.events when calling epoll_ctl(), and returned in evlist[i].events by epoll_wait(). They mirror the poll() event bits with an “E” prefix added.
| Flag | Input to epoll_ctl()? | Returned by epoll_wait()? | Meaning |
|---|---|---|---|
| EPOLLIN | ✅ | ✅ | Normal data available to read |
| EPOLLPRI | ✅ | ✅ | High-priority / out-of-band data ready |
| EPOLLRDHUP | ✅ | ✅ | Peer shut down writing (half-close). Since Linux 2.6.17 |
| EPOLLOUT | ✅ | ✅ | Socket/pipe ready to accept write data |
| EPOLLET | ✅ | ❌ | Use edge-triggered notification (default is level-triggered) |
| EPOLLONESHOT | ✅ | ❌ | Notify only once, then disable until re-armed |
| EPOLLERR | ❌ | ✅ | An error occurred on the fd (kernel reports automatically) |
| EPOLLHUP | ❌ | ✅ | Hangup on the fd — always reported, even if not requested |
Note: EPOLLERR and EPOLLHUP are always monitored by the kernel even if you don’t set them. You do not need to add them when calling epoll_ctl(). You should still check for them in your event loop.
By default, once you register an fd with EPOLL_CTL_ADD, epoll keeps notifying you every time it becomes ready. Sometimes you want to be notified only once — process the event, then decide if you want to watch again. That is what EPOLLONESHOT does.
EPOLLONESHOT
returns event
INACTIVE
EPOLL_CTL_MOD
/* Register fd with EPOLLONESHOT */
ev.events = EPOLLIN | EPOLLONESHOT;
ev.data.fd = sockfd;
epoll_ctl(epfd, EPOLL_CTL_ADD, sockfd, &ev);
/* After handling the event, re-arm it */
ev.events = EPOLLIN | EPOLLONESHOT;
ev.data.fd = sockfd;
epoll_ctl(epfd, EPOLL_CTL_MOD, sockfd, &ev);
/* Use MOD, not ADD — the fd is still in the list */
EPOLLONESHOT is very useful in multi-threaded servers: once a thread picks up an event, you disable that fd so no other thread accidentally processes the same connection simultaneously.
One key strength of epoll over select/poll is thread safety. In a multithreaded server:
- One thread can call
epoll_wait()while another thread callsepoll_ctl()to add new fds - Changes to the interest list take effect immediately — epoll_wait() in the other thread sees them right away
- This is the foundation for the “one epoll, many worker threads” server design pattern
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <sys/epoll.h>
#define MAX_EVENTS 10
#define BUF_SIZE 256
int main(void)
{
int epfd, nready, i, fd;
struct epoll_event ev, evlist[MAX_EVENTS];
char buf[BUF_SIZE];
/* Create epoll instance */
epfd = epoll_create1(0);
if (epfd == -1) { perror("epoll_create1"); exit(1); }
/* Watch stdin (fd=0) for incoming data */
ev.events = EPOLLIN;
ev.data.fd = 0;
if (epoll_ctl(epfd, EPOLL_CTL_ADD, 0, &ev) == -1) {
perror("epoll_ctl"); exit(1);
}
printf("Watching stdin. Type something and press Enter.\n");
printf("Press Ctrl+D to quit.\n\n");
for (;;) {
/* Block until at least one event (timeout=-1 means forever) */
nready = epoll_wait(epfd, evlist, MAX_EVENTS, -1);
if (nready == -1) {
if (errno == EINTR)
continue; /* interrupted by signal, retry */
perror("epoll_wait");
exit(1);
}
/* Process each ready fd */
for (i = 0; i < nready; i++) {
fd = evlist[i].data.fd;
if (evlist[i].events & EPOLLIN) {
ssize_t n = read(fd, buf, BUF_SIZE - 1);
if (n <= 0) {
/* EOF or error */
printf("EOF on fd %d, removing\n", fd);
epoll_ctl(epfd, EPOLL_CTL_DEL, fd, NULL);
} else {
buf[n] = '\0';
printf("Read from fd %d: %s", fd, buf);
}
}
if (evlist[i].events & (EPOLLHUP | EPOLLERR)) {
printf("HUP/ERR on fd %d\n", fd);
epoll_ctl(epfd, EPOLL_CTL_DEL, fd, NULL);
close(fd);
}
}
}
close(epfd);
return 0;
}
Compile: gcc epoll_wait_demo.c -o epoll_wait_demo && ./epoll_wait_demo
EPOLLRDHUP is more specific: the peer has shut down the write side (TCP half-close). This lets you detect when the remote side is done sending, even if you can still write. You must set EPOLLRDHUP in epoll_ctl() to receive it.
ev.events settings. These conditions are always reported when they occur. However, you still need to check for them in your event-handling code and handle them properly (close the fd, clean up state).evlist[i].data field — specifically whatever you stored in ev.data when you called epoll_ctl(). If you stored ev.data.fd = fd, then evlist[i].data.fd tells you the fd. If you stored a pointer, evlist[i].data.ptr gives you the pointer. epoll does not directly tell you the fd — only your stored data does.maxevents events per call. If more fds are ready than your array can hold, the remaining ready fds will be reported on the next call to epoll_wait(). This is why you often call epoll_wait() in a loop or use a large maxevents value.if (errno == EINTR) continue;