epoll_wait() — Waiting for I/O Events

Linux Chapter 63 · Alternative I/O Models · EmbeddedPathashala

Topic
epoll_wait()

Level
Intermediate

Part
2 of 3

The Heart of epoll — Waiting for Events

After adding file descriptors to the interest list using epoll_ctl(), your program calls epoll_wait() to block and wait until one or more of those descriptors become ready for I/O.

Unlike select() which can only return one event at a time efficiently, epoll_wait() can return multiple ready file descriptors in a single call. This makes it extremely efficient for servers handling thousands of connections.

Key Terms in This Tutorial

epoll_wait() evlist timeout EPOLLIN EPOLLOUT EPOLLHUP EPOLLERR EPOLLET EPOLLONESHOT EPOLLRDHUP

Function Signature

#include <sys/epoll.h>

int epoll_wait(int epfd, struct epoll_event *evlist, int maxevents, int timeout);

/* Returns:
   - number of ready fds (>0) on success
   - 0 if timeout expired before any event
   - -1 on error (check errno)
*/

Parameter	Type	Purpose
epfd	int	The epoll instance to monitor
evlist	epoll_event*	Caller-allocated array to receive ready events
maxevents	int	Maximum number of events to return (size of evlist)
timeout	int	How long to wait (milliseconds, 0, or -1)

How the evlist Array Works

You allocate the evlist array yourself. The kernel fills it with ready events. Each entry in evlist is an epoll_event struct:

evlist[i].events — which events occurred (EPOLLIN, EPOLLHUP, etc.)
evlist[i].data — the user data you set when calling epoll_ctl()

evlist[] returned by epoll_wait()

evlist[0]

fd=5

EPOLLIN

evlist[1]

fd=8

EPOLLOUT

evlist[2]

fd=12

EPOLLHUP

evlist[3]

empty

not used

epoll_wait() returned 3 → only evlist[0], [1], [2] are valid. Always loop up to the return value.

Important: The data field in each returned event is the same value you stored when calling epoll_ctl(). This is how you know which fd fired — epoll does not give you the fd directly, only your stored data.

Timeout Behavior — Three Modes

timeout = -1

Block forever
Waits indefinitely until at least one fd in the interest list is ready, or until a signal is caught. Use this when you have nothing else to do.

timeout = 0

Non-blocking poll
Returns immediately with whatever events are currently available. Returns 0 if nothing is ready. Use for polling loops.

timeout > 0

Timed wait (milliseconds)
Blocks up to timeout ms. Returns early if an event fires or a signal interrupts. Returns 0 if time expires with no events.

epoll Event Bit Flags

These are the bit values used in ev.events when calling epoll_ctl(), and returned in evlist[i].events by epoll_wait(). They mirror the poll() event bits with an “E” prefix added.

Flag	Input to epoll_ctl()?	Returned by epoll_wait()?	Meaning
EPOLLIN	✅	✅	Normal data available to read
EPOLLPRI	✅	✅	High-priority / out-of-band data ready
EPOLLRDHUP	✅	✅	Peer shut down writing (half-close). Since Linux 2.6.17
EPOLLOUT	✅	✅	Socket/pipe ready to accept write data
EPOLLET	✅	❌	Use edge-triggered notification (default is level-triggered)
EPOLLONESHOT	✅	❌	Notify only once, then disable until re-armed
EPOLLERR	❌	✅	An error occurred on the fd (kernel reports automatically)
EPOLLHUP	❌	✅	Hangup on the fd — always reported, even if not requested

Note: EPOLLERR and EPOLLHUP are always monitored by the kernel even if you don’t set them. You do not need to add them when calling epoll_ctl(). You should still check for them in your event loop.

EPOLLONESHOT — One-Shot Monitoring

By default, once you register an fd with EPOLL_CTL_ADD, epoll keeps notifying you every time it becomes ready. Sometimes you want to be notified only once — process the event, then decide if you want to watch again. That is what EPOLLONESHOT does.

EPOLLONESHOT Lifecycle

ADD fd with
EPOLLONESHOT

→

epoll_wait()
returns event

→

fd marked
INACTIVE

→

Re-arm with
EPOLL_CTL_MOD

After the one-shot event fires, the fd stays in the interest list but is marked inactive. You cannot use EPOLL_CTL_ADD again (it’s still there). Use EPOLL_CTL_MOD to re-enable it.

/* Register fd with EPOLLONESHOT */
ev.events  = EPOLLIN | EPOLLONESHOT;
ev.data.fd = sockfd;
epoll_ctl(epfd, EPOLL_CTL_ADD, sockfd, &ev);

/* After handling the event, re-arm it */
ev.events  = EPOLLIN | EPOLLONESHOT;
ev.data.fd = sockfd;
epoll_ctl(epfd, EPOLL_CTL_MOD, sockfd, &ev);
/* Use MOD, not ADD — the fd is still in the list */

EPOLLONESHOT is very useful in multi-threaded servers: once a thread picks up an event, you disable that fd so no other thread accidentally processes the same connection simultaneously.

epoll in Multithreaded Programs

One key strength of epoll over select/poll is thread safety. In a multithreaded server:

One thread can call epoll_wait() while another thread calls epoll_ctl() to add new fds
Changes to the interest list take effect immediately — epoll_wait() in the other thread sees them right away
This is the foundation for the “one epoll, many worker threads” server design pattern

Code Example: epoll_wait() Basic Event Loop

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <sys/epoll.h>

#define MAX_EVENTS  10
#define BUF_SIZE   256

int main(void)
{
    int epfd, nready, i, fd;
    struct epoll_event ev, evlist[MAX_EVENTS];
    char buf[BUF_SIZE];

    /* Create epoll instance */
    epfd = epoll_create1(0);
    if (epfd == -1) { perror("epoll_create1"); exit(1); }

    /* Watch stdin (fd=0) for incoming data */
    ev.events  = EPOLLIN;
    ev.data.fd = 0;
    if (epoll_ctl(epfd, EPOLL_CTL_ADD, 0, &ev) == -1) {
        perror("epoll_ctl"); exit(1);
    }

    printf("Watching stdin. Type something and press Enter.\n");
    printf("Press Ctrl+D to quit.\n\n");

    for (;;) {
        /* Block until at least one event (timeout=-1 means forever) */
        nready = epoll_wait(epfd, evlist, MAX_EVENTS, -1);
        if (nready == -1) {
            if (errno == EINTR)
                continue;    /* interrupted by signal, retry */
            perror("epoll_wait");
            exit(1);
        }

        /* Process each ready fd */
        for (i = 0; i < nready; i++) {

            fd = evlist[i].data.fd;

            if (evlist[i].events & EPOLLIN) {
                ssize_t n = read(fd, buf, BUF_SIZE - 1);
                if (n <= 0) {
                    /* EOF or error */
                    printf("EOF on fd %d, removing\n", fd);
                    epoll_ctl(epfd, EPOLL_CTL_DEL, fd, NULL);
                } else {
                    buf[n] = '\0';
                    printf("Read from fd %d: %s", fd, buf);
                }
            }

            if (evlist[i].events & (EPOLLHUP | EPOLLERR)) {
                printf("HUP/ERR on fd %d\n", fd);
                epoll_ctl(epfd, EPOLL_CTL_DEL, fd, NULL);
                close(fd);
            }
        }
    }

    close(epfd);
    return 0;
}

Compile: gcc epoll_wait_demo.c -o epoll_wait_demo && ./epoll_wait_demo

Interview Questions — epoll_wait() & Events

Q1. What does epoll_wait() return?

It returns the number of ready file descriptors (always ≥ 1 on success). It returns 0 if the timeout expired with no events. It returns -1 on error, with errno set. A common case is errno=EINTR when a signal interrupts the wait — your code should restart the call in that case.

Q2. What is the difference between EPOLLHUP and EPOLLRDHUP?

EPOLLHUP means a full hangup — both directions of the connection are closed. The kernel always reports this even if you did not set it.
EPOLLRDHUP is more specific: the peer has shut down the write side (TCP half-close). This lets you detect when the remote side is done sending, even if you can still write. You must set EPOLLRDHUP in epoll_ctl() to receive it.

Q3. Why don’t you need to add EPOLLERR and EPOLLHUP explicitly?

The kernel always monitors for errors and hangups regardless of your ev.events settings. These conditions are always reported when they occur. However, you still need to check for them in your event-handling code and handle them properly (close the fd, clean up state).

Q4. What is EPOLLONESHOT and when would you use it?

EPOLLONESHOT disables monitoring of an fd after the first event fires. After that you must re-arm it with EPOLL_CTL_MOD. The main use case is in multi-threaded servers: when a worker thread picks up a connection event, you one-shot that fd so no other thread tries to handle the same connection simultaneously. Without this, two threads could both see the same fd as ready.

Q5. What does timeout=-1 mean in epoll_wait()?

It means block indefinitely until at least one fd in the interest list becomes ready, or until a signal is delivered. There is no timeout. This is the mode to use when your program has nothing else to do between events.

Q6. How do you know which fd caused an event when epoll_wait() returns?

Through the evlist[i].data field — specifically whatever you stored in ev.data when you called epoll_ctl(). If you stored ev.data.fd = fd, then evlist[i].data.fd tells you the fd. If you stored a pointer, evlist[i].data.ptr gives you the pointer. epoll does not directly tell you the fd — only your stored data does.

Q7. Can epoll_wait() return fewer events than are actually ready?

Yes. It returns at most maxevents events per call. If more fds are ready than your array can hold, the remaining ready fds will be reported on the next call to epoll_wait(). This is why you often call epoll_wait() in a loop or use a large maxevents value.

Q8. What happens if epoll_wait() is interrupted by a signal?

It returns -1 with errno set to EINTR. This is not an error — it just means a signal was delivered while blocking. Your code must check for EINTR and restart the epoll_wait() call. This is why the event loop always has: if (errno == EINTR) continue;

Next: Complete epoll Example Program

See a full working program that monitors multiple FIFOs using epoll

Part 3 → Full Example EmbeddedPathashala

embeddedpathashala.com

epoll_wait() — Waiting for I/O Events

The Heart of epoll — Waiting for Events

Leave a Reply Cancel reply