epoll API Deep Dive: epoll_create & epoll_ctl

epoll API Deep Dive: epoll_create & epoll_ctl
Chapter 63 โ€“ Alternative I/O Models | Part 5 of 5
๐Ÿ”ฌ Topic: epoll System Calls
๐Ÿ”‘ Key: epoll_create1, epoll_ctl
๐ŸŽฏ Level: Intermediate

From Theory to Working Code

In Part 4 we understood what epoll does and why. Now we go deeper into the actual system calls โ€” their exact signatures, the structures involved, all the flags, and how to correctly use them to build a working event loop. By the end of this tutorial you will have a complete echo server built on epoll.

Keywords in this tutorial:

epoll_create() epoll_create1() EPOLL_CLOEXEC epoll_ctl() EPOLL_CTL_ADD EPOLL_CTL_MOD EPOLL_CTL_DEL epoll_event EPOLLIN EPOLLOUT EPOLLET EPOLLONESHOT epoll_wait() FD_CLOEXEC

๐Ÿ—๏ธ epoll_create() โ€” Creating the epoll Instance
#include <sys/epoll.h>

int epoll_create(int size);
/* Returns: file descriptor on success, -1 on error */

epoll_create() creates a new epoll instance and returns a file descriptor that is your handle to it. This fd is just like any other โ€” you can pass it around, dup() it, and you must close() it when done.

โš ๏ธ The size argument used to be a hint about how many FDs you plan to monitor. Since Linux 2.6.8 it is completely ignored โ€” the kernel allocates structures dynamically. You must still pass a positive value (pass 1) to avoid EINVAL from older kernels.

When you close() the epoll fd, the instance is destroyed and all resources freed. If you fork() or dup() the epoll fd, all copies refer to the same epoll instance.

int epfd = epoll_create(1);     /* size ignored since 2.6.8, pass any positive value */
if (epfd == -1) {
    perror("epoll_create");
    exit(EXIT_FAILURE);
}
/* ... use epfd ... */
close(epfd);   /* destroy the instance when done */

โœจ epoll_create1() โ€” The Modern Version (Linux 2.6.27+)
#include <sys/epoll.h>

int epoll_create1(int flags);
/* Returns: file descriptor on success, -1 on error */

epoll_create1() is the preferred modern replacement. It drops the useless size argument and adds a flags argument. Currently only one flag exists:

EPOLL_CLOEXEC
Sets the close-on-exec flag (FD_CLOEXEC) on the epoll file descriptor. This means the epoll fd is automatically closed when you call exec() to run another program. Without this flag, the child process after exec() would unknowingly inherit the epoll fd โ€” a resource leak and security concern.
/* Preferred โ€” automatically close epoll fd on exec() */
int epfd = epoll_create1(EPOLL_CLOEXEC);
if (epfd == -1) {
    perror("epoll_create1");
    exit(EXIT_FAILURE);
}

/* Pass 0 for flags if you don't want CLOEXEC */
int epfd2 = epoll_create1(0);

epoll_create(size)
โŒ size arg is ignored (confusing)
โŒ No flags โ€” cannot set CLOEXEC atomically
โœ… Available since Linux 2.6
epoll_create1(flags)
โœ… Clean API โ€” no useless size
โœ… EPOLL_CLOEXEC flag available
โœ… Available since Linux 2.6.27

Use epoll_create1() in all new code.

๐Ÿ”ง epoll_ctl() โ€” Managing the Interest List
#include <sys/epoll.h>

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *ev);
/* Returns: 0 on success, -1 on error */

Arguments:

Argument Type Meaning
epfd int epoll instance fd (from epoll_create1)
op int Operation: EPOLL_CTL_ADD, EPOLL_CTL_MOD, or EPOLL_CTL_DEL
fd int The file descriptor to add/modify/remove
ev struct epoll_event * Events to watch and user data (NULL for DEL)

EPOLL_CTL_ADD
Add fd to the interest list. Specify which events to watch in ev. Fails with EEXIST if the fd is already in the list.
EPOLL_CTL_MOD
Change which events to watch for an fd already in the list. For example, switch from watching EPOLLIN to watching EPOLLOUT after you’ve received data and now need to send.
EPOLL_CTL_DEL
Remove fd from the interest list. The ev argument is ignored (pass NULL). Closing a fd automatically removes it from all epoll instances โ€” DEL is needed only when you want to stop monitoring without closing.

๐Ÿ“ฆ The epoll_event Structure
struct epoll_event {
    uint32_t     events;    /* bitmask: which events to watch */
    epoll_data_t data;      /* user data passed back when event fires */
};

typedef union epoll_data {
    void        *ptr;       /* pointer to anything you want */
    int          fd;        /* most common: store the fd itself */
    uint32_t     u32;
    uint64_t     u64;
} epoll_data_t;

The data union is extremely useful โ€” when epoll_wait() returns an event, this data comes back with it. Storing the fd in data.fd means you always know which connection triggered the event without searching a lookup table.

Common epoll Event Flags
Flag Meaning
EPOLLIN Data available to read
EPOLLOUT Write buffer has space โ€” can write without blocking
EPOLLERR Error condition on the fd (always monitored by default)
EPOLLHUP Hangup โ€” peer closed the connection (always monitored)
EPOLLPRI High-priority data (e.g. TCP out-of-band data)
EPOLLET Switch to edge-triggered mode (default is level-triggered)
EPOLLONESHOT After one event fires, disable this fd. Re-arm with EPOLL_CTL_MOD.
EPOLLRDHUP Peer half-closed the connection (sent FIN)
/* Example: add a socket fd to epoll โ€” watch for read events */
struct epoll_event ev;
ev.events  = EPOLLIN;         /* notify when data arrives */
ev.data.fd = client_fd;       /* pass fd back when event fires */

if (epoll_ctl(epfd, EPOLL_CTL_ADD, client_fd, &ev) == -1) {
    perror("epoll_ctl ADD");
}

/* Modify: now watch for write events too */
ev.events  = EPOLLIN | EPOLLOUT;
ev.data.fd = client_fd;
epoll_ctl(epfd, EPOLL_CTL_MOD, client_fd, &ev);

/* Remove: stop monitoring (without closing) */
epoll_ctl(epfd, EPOLL_CTL_DEL, client_fd, NULL);  /* ev is ignored for DEL */

โณ epoll_wait() โ€” Waiting for Events
#include <sys/epoll.h>

int epoll_wait(int epfd, struct epoll_event *events,
               int maxevents, int timeout);
/* Returns: number of ready fds, 0 on timeout, -1 on error */
Argument Meaning
epfd epoll instance fd
events Array of epoll_event structs โ€” kernel fills this with ready events
maxevents Size of your events array โ€” max events returned per call
timeout -1 = block forever, 0 = non-blocking, positive = milliseconds
#define MAX_EVENTS 64
struct epoll_event events[MAX_EVENTS];

int nready = epoll_wait(epfd, events, MAX_EVENTS, -1); /* block until events */
if (nready == -1) {
    if (errno == EINTR)
        continue;   /* interrupted by signal โ€” retry */
    perror("epoll_wait");
    break;
}

for (int i = 0; i < nready; i++) {
    int fd = events[i].data.fd;
    uint32_t evmask = events[i].events;

    if (evmask & EPOLLIN)
        handle_read(fd);
    if (evmask & EPOLLOUT)
        handle_write(fd);
    if (evmask & (EPOLLERR | EPOLLHUP))
        handle_error(fd);
}

๐Ÿ’ป Complete Example โ€” epoll Echo Server (Level-Triggered)

This is a complete TCP echo server using epoll. It accepts multiple clients and echoes back everything they send, all in a single thread with no blocking.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/epoll.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <fcntl.h>

#define PORT       8080
#define BACKLOG    50
#define MAX_EVENTS 64
#define BUF_SIZE   1024

/* Make a socket non-blocking */
static void set_nonblocking(int fd)
{
    int flags = fcntl(fd, F_GETFL, 0);
    fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}

/* Add fd to epoll interest list, watching for EPOLLIN */
static void epoll_add(int epfd, int fd)
{
    struct epoll_event ev;
    ev.events  = EPOLLIN;
    ev.data.fd = fd;
    if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1) {
        perror("epoll_ctl ADD");
        exit(EXIT_FAILURE);
    }
}

int main(void)
{
    /* --- Create listening socket --- */
    int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
    if (listen_fd == -1) { perror("socket"); exit(1); }

    int opt = 1;
    setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    addr.sin_family      = AF_INET;
    addr.sin_addr.s_addr = INADDR_ANY;
    addr.sin_port        = htons(PORT);

    if (bind(listen_fd, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
        perror("bind"); exit(1);
    }
    if (listen(listen_fd, BACKLOG) == -1) {
        perror("listen"); exit(1);
    }

    set_nonblocking(listen_fd);
    printf("Echo server listening on port %d\n", PORT);

    /* --- Create epoll instance --- */
    int epfd = epoll_create1(EPOLL_CLOEXEC);
    if (epfd == -1) { perror("epoll_create1"); exit(1); }

    /* Add listening socket to epoll */
    epoll_add(epfd, listen_fd);

    struct epoll_event events[MAX_EVENTS];

    /* --- Main event loop --- */
    for (;;) {
        int nready = epoll_wait(epfd, events, MAX_EVENTS, -1);
        if (nready == -1) {
            if (errno == EINTR) continue;
            perror("epoll_wait");
            break;
        }

        for (int i = 0; i < nready; i++) {
            int fd = events[i].data.fd;

            if (fd == listen_fd) {
                /* --- New connection arrived --- */
                struct sockaddr_in client_addr;
                socklen_t addr_len = sizeof(client_addr);

                int client_fd = accept(listen_fd,
                                       (struct sockaddr *)&client_addr,
                                       &addr_len);
                if (client_fd == -1) {
                    perror("accept");
                    continue;
                }

                set_nonblocking(client_fd);
                epoll_add(epfd, client_fd);  /* watch new client */
                printf("New client connected: fd=%d\n", client_fd);

            } else {
                /* --- Data from existing client --- */
                char buf[BUF_SIZE];
                ssize_t n = read(fd, buf, sizeof(buf));

                if (n <= 0) {
                    /* Client closed or error */
                    if (n == 0)
                        printf("Client fd=%d disconnected\n", fd);
                    else
                        perror("read");

                    epoll_ctl(epfd, EPOLL_CTL_DEL, fd, NULL);
                    close(fd);
                } else {
                    /* Echo back to client */
                    write(fd, buf, n);
                    printf("Echoed %zd bytes to fd=%d\n", n, fd);
                }
            }
        }
    }

    close(epfd);
    close(listen_fd);
    return 0;
}

Compile and test:

# Compile
gcc -o echo_server echo_server.c

# Run the server
./echo_server

# In another terminal โ€” connect with netcat
nc localhost 8080
# Type anything and press Enter โ€” it echoes back

# Test multiple clients simultaneously
nc localhost 8080 &
nc localhost 8080 &
nc localhost 8080 &

๐Ÿ“Š How the Echo Server Event Loop Works
epoll_wait() blocks โ€” no CPU used while idle
โฌ‡
Event arrives โ€” kernel wakes us up, fills events[]
โฌ‡
If fd == listen_fd
accept() new client
add to epoll
If fd == client
read() data
write() it back
โฌ† loop back

๐ŸŽฏ EPOLLONESHOT โ€” Fire Once Then Disarm

EPOLLONESHOT is useful in multithreaded servers. Once an event fires, the fd is automatically disarmed โ€” no more events until you re-arm it with EPOLL_CTL_MOD. This prevents two threads from handling the same fd simultaneously.

/* Add fd with EPOLLONESHOT โ€” fires once then must be re-armed */
struct epoll_event ev;
ev.events  = EPOLLIN | EPOLLONESHOT;
ev.data.fd = client_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, client_fd, &ev);

/* After handling the event in your thread, re-arm for next event */
ev.events  = EPOLLIN | EPOLLONESHOT;
ev.data.fd = client_fd;
epoll_ctl(epfd, EPOLL_CTL_MOD, client_fd, &ev);

Without EPOLLONESHOT in a thread pool, two threads could both get woken up for the same fd and start reading from it simultaneously โ€” causing data corruption. EPOLLONESHOT guarantees one event โ†’ one handler at a time.

๐Ÿ“ Important: Closing FDs and epoll

When you close(fd), Linux automatically removes it from all epoll interest lists. You do NOT need to call epoll_ctl(EPOLL_CTL_DEL, ...) before closing.

โš ๏ธ However, if you dup() a fd and then close one of the duplicates, the fd is NOT removed from epoll because the underlying file description still has references (via the dup’d copy). Always close ALL duplicates to ensure removal.

/* Safe pattern: always close directly, no DEL needed */
close(client_fd);    /* automatically removed from epoll */

/* BUT if you dup'd it: */
int dup_fd = dup(client_fd);
close(client_fd);     /* NOT removed yet โ€” dup_fd still references it */
close(dup_fd);        /* NOW it is removed from epoll */

๐ŸŽฏ Interview Questions
Q1. What does the size argument to epoll_create() do?
Since Linux 2.6.8 it is completely ignored. It was originally a hint to the kernel about how many FDs to expect. The kernel now allocates structures dynamically. You still must pass a positive value to avoid EINVAL on older kernels. Use epoll_create1() in new code.
Q2. What does EPOLL_CLOEXEC do and why is it important?
EPOLL_CLOEXEC sets the FD_CLOEXEC flag on the epoll file descriptor, causing it to be closed automatically when exec() is called. Without it, a child process after exec() would inherit the epoll fd โ€” a resource leak. Using EPOLL_CLOEXEC with epoll_create1() sets this atomically, avoiding the race condition of setting it separately with fcntl().
Q3. What are the three op values for epoll_ctl() and when is each used?
EPOLL_CTL_ADD โ€” add a new fd to the interest list (fails with EEXIST if already there). EPOLL_CTL_MOD โ€” change which events are watched for an existing fd (e.g., switch from EPOLLIN to EPOLLOUT). EPOLL_CTL_DEL โ€” remove an fd from the interest list; ev is ignored (pass NULL). Needed when you want to stop monitoring without closing the fd.
Q4. What is the epoll_data_t union in struct epoll_event and why is it useful?
epoll_data_t is a union (fd, ptr, u32, u64) that you fill when registering an fd. When epoll_wait() returns an event, this data comes back with it. Storing the fd in data.fd means you immediately know which client triggered the event. Using data.ptr lets you store a pointer to a connection context struct โ€” more powerful than storing just the fd.
Q5. What happens to an epoll interest list entry when you close() the fd?
Closing a fd automatically removes it from all epoll instances โ€” no need to call EPOLL_CTL_DEL first. However, if the fd has been dup()’d, closing one copy does NOT remove it from epoll because the underlying file description still has references. All duplicate fds must be closed before epoll removes the entry.
Q6. What is EPOLLONESHOT and when should you use it?
EPOLLONESHOT disarms a file descriptor after its first event fires โ€” subsequent events are not reported until you re-arm with EPOLL_CTL_MOD. It is essential in multithreaded server designs where multiple threads share an epoll instance. Without it, two threads could both receive events for the same client fd simultaneously and corrupt each other’s reads. EPOLLONESHOT guarantees one fd is handled by one thread at a time.
Q7. What is the timeout argument to epoll_wait() and what are the three possible behaviours?
timeout=-1: block indefinitely until at least one event occurs (most common in event loops). timeout=0: return immediately even if no events โ€” non-blocking poll. timeout=N (positive): wait up to N milliseconds, then return 0 if no events occurred. The return value is the number of ready fds, 0 on timeout, -1 on error.

Chapter 63 Complete! ๐ŸŽ‰

You now understand signal-driven I/O with F_SETSIG, realtime signals, overflow handling, multithreaded signal routing, and the complete epoll API.

โ† Back to Part 1 EmbeddedPathashala Home

Leave a Reply

Your email address will not be published. Required fields are marked *